Назад към всички

feishu-voice

// Send and receive voice messages on Feishu (Lark) using ElevenLabs TTS and STT. Activate when user asks to send a voice message on Feishu, or when receiving a Feishu audio message (media attachment with .ogg/.opus file) that needs transcription. Supports smart reply mode — auto voice-reply to voice m

$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
namefeishu-voice
descriptionSend and receive voice messages on Feishu (Lark) using ElevenLabs TTS and STT. Activate when user asks to send a voice message on Feishu, or when receiving a Feishu audio message (media attachment with .ogg/.opus file) that needs transcription. Supports smart reply mode — auto voice-reply to voice messages, text-reply to text messages.

Feishu Voice (TTS + STT)

Send voice messages and transcribe received voice messages on Feishu using ElevenLabs.

Prerequisites

  • sag CLI (ElevenLabs TTS): npm i -g sag or go install
  • ffmpeg / ffprobe: brew install ffmpeg
  • ElevenLabs paid plan (required for library voices)
  • Feishu app with im:message:send_as_bot and im:file permissions

Environment Variables

VariableRequiredDescription
ELEVENLABS_API_KEYElevenLabs API key
FEISHU_APP_ID✅ (TTS)Feishu app ID
FEISHU_APP_SECRET✅ (TTS)Feishu app secret
ELEVENLABS_VOICE_ID✅ (TTS)Voice ID (browse at elevenlabs.io/voice-library)
ELEVENLABS_MODEL_ID✅ (TTS)Model ID (e.g. eleven_multilingual_v2, eleven_v3)
ELEVENLABS_SPEEDSpeech speed 0.5-2.0 (default: 1.0)

If FEISHU_APP_ID / FEISHU_APP_SECRET are not in env, extract from openclaw config:

export FEISHU_APP_ID=$(python3 -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['channels']['feishu']['appId'])")
export FEISHU_APP_SECRET=$(python3 -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['channels']['feishu']['appSecret'])")

Voice Selection

See config/voice-config.example.json for a curated voice list. Browse all voices at https://elevenlabs.io/voice-library or run sag voices.

Recommended models:

  • eleven_multilingual_v2 — best for Chinese and multilingual content
  • eleven_v3 — latest English-optimized model

Sending Voice Messages (TTS)

scripts/feishu-voice-send.sh <text> <receive_id> [receive_id_type] [speed]
  • receive_id: target user open_id or chat_id
  • receive_id_type: open_id (default) or chat_id
  • speed: speech speed multiplier, 0.5-2.0 (default: 1.0)

Receiving Voice Messages (STT)

When OpenClaw delivers a Feishu voice message, it arrives as a media attachment (.ogg file). Transcribe with:

scripts/feishu-voice-stt.sh /path/to/audio.ogg

Returns recognized text to stdout. Uses ElevenLabs scribe_v1 model with automatic language detection.

Fallback: Download via Feishu API

If the audio file is not delivered as an attachment (only file_key available):

  1. List recent messages: GET /im/v1/messages?container_id_type=chat&container_id=CHAT_ID&page_size=5&sort_type=ByCreateTimeDesc
  2. Download audio: GET /im/v1/messages/{message_id}/resources/{file_key}?type=file
  3. Run STT script on the downloaded file

Smart Reply Mode

When receiving messages, follow this pattern for natural conversation:

  • Voice message received → transcribe with STT → understand → reply with voice (TTS)
  • Text message received → understand → reply with text
  • Override: user can request voice/text reply explicitly

Important Notes

  • Feishu msg_type must be "audio" — not "media" or "file"
  • OpenClaw's message tool asVoice does not work correctly for Feishu — use this script instead
  • STT uses ElevenLabs scribe_v1 model, supports Chinese, English, and 90+ languages
  • For free ElevenLabs accounts, only premade voices work; library voices require a paid plan