elevenlabs-stt
✓Verified·Scanned 2/17/2026
Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
from clawhub.ai·vfb94c35·6.8 KB·0 installs
Scanned from 1.0.0 at fb94c35 · Transparency log ↗
$ vett add clawhub.ai/clawdbotborges/elevenlabs-stt
🎙️ ElevenLabs Speech-to-Text Skill
A Clawdbot skill for transcribing audio files using ElevenLabs' Scribe v2 model.
Features
- 🌍 90+ languages supported with automatic detection
- 👥 Speaker diarization — identify different speakers
- 🎵 Audio event tagging — detect laughter, music, applause, etc.
- 📝 Word-level timestamps — precise timing in JSON output
- 🎧 All major formats — mp3, m4a, wav, ogg, webm, mp4, and more
Installation
For Clawdbot
Add to your clawdbot.json:
{
skills: {
entries: {
"elevenlabs-stt": {
source: "github:clawdbotborges/elevenlabs-stt",
apiKey: "sk_your_api_key_here"
}
}
}
}
Standalone
git clone https://github.com/clawdbotborges/elevenlabs-stt.git
cd elevenlabs-stt
export ELEVENLABS_API_KEY="sk_your_api_key_here"
Usage
# Basic transcription
./scripts/transcribe.sh audio.mp3
# With speaker diarization
./scripts/transcribe.sh meeting.mp3 --diarize
# Specify language for better accuracy
./scripts/transcribe.sh voice_note.ogg --lang en
# Full JSON with timestamps
./scripts/transcribe.sh podcast.mp3 --json
# Tag audio events (laughter, music, etc.)
./scripts/transcribe.sh recording.wav --events
Options
| Flag | Description |
|---|---|
--diarize | Enable speaker diarization |
--lang CODE | ISO language code (e.g., en, pt, es, fr) |
--json | Output full JSON response with word timestamps |
--events | Tag audio events like laughter, music, applause |
-h, --help | Show help message |
Examples
Transcribe a voice message
./scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Output: "Hey, just wanted to check in about the meeting tomorrow."
Meeting with multiple speakers
./scripts/transcribe.sh meeting.mp3 --diarize --lang en --json
{
"text": "Welcome everyone. Let's start with updates.",
"words": [
{"text": "Welcome", "start": 0.0, "end": 0.5, "speaker": "speaker_0"},
{"text": "everyone", "start": 0.5, "end": 1.0, "speaker": "speaker_0"}
]
}
Process with jq
# Get just the text
./scripts/transcribe.sh audio.mp3 --json | jq -r '.text'
# Get word count
./scripts/transcribe.sh audio.mp3 --json | jq '.words | length'
Requirements
curl— for API requestsjq— for JSON parsing (optional, but recommended)- ElevenLabs API key with Speech-to-Text access
API Key
Get your API key from ElevenLabs:
- Sign up or log in
- Go to Profile → API Keys
- Create a new key or copy existing one
License
MIT