elevenlabs-stt

✓Verified·Scanned 2/17/2026

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

from clawhub.ai·vfb94c35·6.8 KB·0 installs

Scanned from 1.0.0 at fb94c35 · Transparency log ↗

$ vett add clawhub.ai/clawdbotborges/elevenlabs-stt

🎙️ ElevenLabs Speech-to-Text Skill

A Clawdbot skill for transcribing audio files using ElevenLabs' Scribe v2 model.

Features

🌍 90+ languages supported with automatic detection
👥 Speaker diarization — identify different speakers
🎵 Audio event tagging — detect laughter, music, applause, etc.
📝 Word-level timestamps — precise timing in JSON output
🎧 All major formats — mp3, m4a, wav, ogg, webm, mp4, and more

Installation

For Clawdbot

Add to your clawdbot.json:

{
  skills: {
    entries: {
      "elevenlabs-stt": {
        source: "github:clawdbotborges/elevenlabs-stt",
        apiKey: "sk_your_api_key_here"
      }
    }
  }
}

Standalone

git clone https://github.com/clawdbotborges/elevenlabs-stt.git
cd elevenlabs-stt
export ELEVENLABS_API_KEY="sk_your_api_key_here"

Usage

# Basic transcription
./scripts/transcribe.sh audio.mp3

# With speaker diarization
./scripts/transcribe.sh meeting.mp3 --diarize

# Specify language for better accuracy
./scripts/transcribe.sh voice_note.ogg --lang en

# Full JSON with timestamps
./scripts/transcribe.sh podcast.mp3 --json

# Tag audio events (laughter, music, etc.)
./scripts/transcribe.sh recording.wav --events

Options

Flag	Description
`--diarize`	Enable speaker diarization
`--lang CODE`	ISO language code (e.g., `en`, `pt`, `es`, `fr`)
`--json`	Output full JSON response with word timestamps
`--events`	Tag audio events like laughter, music, applause
`-h, --help`	Show help message

Examples

Transcribe a voice message

./scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Output: "Hey, just wanted to check in about the meeting tomorrow."

Meeting with multiple speakers

./scripts/transcribe.sh meeting.mp3 --diarize --lang en --json

{
  "text": "Welcome everyone. Let's start with updates.",
  "words": [
    {"text": "Welcome", "start": 0.0, "end": 0.5, "speaker": "speaker_0"},
    {"text": "everyone", "start": 0.5, "end": 1.0, "speaker": "speaker_0"}
  ]
}

Process with jq

# Get just the text
./scripts/transcribe.sh audio.mp3 --json | jq -r '.text'

# Get word count
./scripts/transcribe.sh audio.mp3 --json | jq '.words | length'

Requirements

curl — for API requests
jq — for JSON parsing (optional, but recommended)
ElevenLabs API key with Speech-to-Text access

API Key

Get your API key from ElevenLabs:

Sign up or log in
Go to Profile → API Keys
Create a new key or copy existing one

License

MIT