elevenlabs-stt

Verified·Scanned 2/17/2026

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

from clawhub.ai·vfb94c35·6.8 KB·0 installs
Scanned from 1.0.0 at fb94c35 · Transparency log ↗
$ vett add clawhub.ai/clawdbotborges/elevenlabs-stt

🎙️ ElevenLabs Speech-to-Text Skill

A Clawdbot skill for transcribing audio files using ElevenLabs' Scribe v2 model.

Features

  • 🌍 90+ languages supported with automatic detection
  • 👥 Speaker diarization — identify different speakers
  • 🎵 Audio event tagging — detect laughter, music, applause, etc.
  • 📝 Word-level timestamps — precise timing in JSON output
  • 🎧 All major formats — mp3, m4a, wav, ogg, webm, mp4, and more

Installation

For Clawdbot

Add to your clawdbot.json:

{
  skills: {
    entries: {
      "elevenlabs-stt": {
        source: "github:clawdbotborges/elevenlabs-stt",
        apiKey: "sk_your_api_key_here"
      }
    }
  }
}

Standalone

git clone https://github.com/clawdbotborges/elevenlabs-stt.git
cd elevenlabs-stt
export ELEVENLABS_API_KEY="sk_your_api_key_here"

Usage

# Basic transcription
./scripts/transcribe.sh audio.mp3

# With speaker diarization
./scripts/transcribe.sh meeting.mp3 --diarize

# Specify language for better accuracy
./scripts/transcribe.sh voice_note.ogg --lang en

# Full JSON with timestamps
./scripts/transcribe.sh podcast.mp3 --json

# Tag audio events (laughter, music, etc.)
./scripts/transcribe.sh recording.wav --events

Options

FlagDescription
--diarizeEnable speaker diarization
--lang CODEISO language code (e.g., en, pt, es, fr)
--jsonOutput full JSON response with word timestamps
--eventsTag audio events like laughter, music, applause
-h, --helpShow help message

Examples

Transcribe a voice message

./scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Output: "Hey, just wanted to check in about the meeting tomorrow."

Meeting with multiple speakers

./scripts/transcribe.sh meeting.mp3 --diarize --lang en --json
{
  "text": "Welcome everyone. Let's start with updates.",
  "words": [
    {"text": "Welcome", "start": 0.0, "end": 0.5, "speaker": "speaker_0"},
    {"text": "everyone", "start": 0.5, "end": 1.0, "speaker": "speaker_0"}
  ]
}

Process with jq

# Get just the text
./scripts/transcribe.sh audio.mp3 --json | jq -r '.text'

# Get word count
./scripts/transcribe.sh audio.mp3 --json | jq '.words | length'

Requirements

  • curl — for API requests
  • jq — for JSON parsing (optional, but recommended)
  • ElevenLabs API key with Speech-to-Text access

API Key

Get your API key from ElevenLabs:

  1. Sign up or log in
  2. Go to Profile → API Keys
  3. Create a new key or copy existing one

License

MIT

Links