High Risk:This skill has significant security concerns. Review the findings below before installing.

voice-reply

Caution·Scanned 2/18/2026

This skill provides local TTS using Piper and sherpa-onnx, with an installer to fetch models and configure ffmpeg and environment vars. The installer runs scripts/install.sh, uses curl to download archives from https://github.com/k2-fsa/... and executes the downloaded bin/sherpa-onnx-offline-tts binary, which carries download-and-execute risk.

from clawhub.ai·v1.0.0·10.2 KB·0 installs
Scanned from 1.0.0 at 124d6a8 · Transparency log ↗
$ vett add clawhub.ai/stolot0mt0m/voice-replyReview security findings before installing

voice-reply

Local Text-to-Speech for OpenClaw using Piper voices via sherpa-onnx

Generate voice audio replies that work as Telegram voice notes - 100% offline, no API keys required.

Features

  • 100% Local - No internet connection required after setup
  • No API Keys - Completely free, no accounts needed
  • Multi-language - German and English voices included (more available)
  • Telegram Ready - Outputs as voice bubbles, not file attachments
  • Auto-detect Language - Automatically selects the right voice

Quick Start

1. Install Dependencies

cd scripts
sudo ./install.sh

This installs:

  • sherpa-onnx runtime (~28 MB)
  • German voice "thorsten" (~64 MB)
  • English voice "ryan" (~110 MB)
  • ffmpeg (if not present)

2. Add to OpenClaw

Copy the skill to your OpenClaw skills directory:

cp -r . ~/.openclaw/skills/voice-reply

3. Use It

Ask your OpenClaw agent:

  • "Reply with a voice message"
  • "Say that as audio"
  • "Read this aloud: Hello world"

Or call directly:

/voice_reply "Hello, how are you?" en

Voices

LanguageVoiceQualitySize
Germanthorstenmedium64 MB
Englishryanhigh110 MB

More voices available at Piper Samples.

Requirements

  • Linux (Ubuntu 22.04+ recommended)
  • ~200 MB disk space
  • ~500 MB RAM during synthesis
  • ffmpeg

How It Works

  1. Text is converted to speech using sherpa-onnx with Piper VITS models
  2. WAV output is converted to OGG Opus (Telegram voice format)
  3. Output includes [[audio_as_voice]] tag for Telegram voice bubbles

Credits

License

MIT License