telegram-voice-to-voice-macos
This skill implements a macOS Apple Silicon Telegram voice-to-voice workflow that transcribes .ogg notes locally with yap and generates replies via macOS TTS (say + ffmpeg), storing per-user state in voice_state/telegram.json. It includes local shell scripts (scripts/transcribe_telegram_ogg.sh, scripts/tts_telegram_voice.sh) that execute say, ffmpeg, and yap.
Telegram voice-to-voice (macOS)
Requirements
- macOS Tahoe on Apple Silicon (Macintosh Silicon).
yapCLI available inPATH(Speech.framework transcription).- Project: https://github.com/finnvoor/yap (by finnvoor)
ffmpegavailable inPATH.
Persistent reply mode (voice vs text)
Store a small per-user preference file in the workspace:
- State file:
voice_state/telegram.json - Key: Telegram sender user id (string)
- Values:
"voice"(default): reply with a Telegram voice note"text": reply with a single text message
If the file does not exist or the sender id is missing: assume "voice".
Toggle commands
If an inbound text message is exactly:
/audio off→ set state to"text"and confirm with a short text reply./audio on→ set state to"voice"and confirm with a short text reply.
Getting the inbound audio (.ogg)
Telegram voice notes often show up as <media:audio> in message text.
OpenClaw saves the attachment to disk (typically .ogg) under:
~/.openclaw/media/inbound/
Recommended approach:
- If the inbound message context includes an attachment path, use it.
- Otherwise, take the most recent
*.oggfrom~/.openclaw/media/inbound/.
Transcription
Default locale: en-US.
Preferred:
yap transcribe --locale "${YAP_LOCALE:-en-US}" <path.ogg>
If transcription fails or is empty: ask the user to repeat or send text.
Helper script:
scripts/transcribe_telegram_ogg.sh [path.ogg]
Reply behavior
Mode: voice (default)
- Generate the reply text.
- Convert reply text to an OGG/Opus voice note using:
scripts/tts_telegram_voice.sh "<reply text>" [SYSTEM|VoiceName]
The script prints the generated .ogg path to stdout.
- Send the
.oggback to Telegram as a voice note (not a generic audio file):
- use the
messagetool withasVoice: trueandmedia: <path.ogg> - optionally set
replyToto thread the response
Notes:
- Use
SYSTEMto rely on the current macOS system voice (recommended).
Mode: text
Reply with a single text message:
Transcription: <...>Reply: <...>