doubao-open-tts

✓Verified·Scanned 2/19/2026

This skill provides Text-to-Speech via Doubao (Volcano Engine) with interactive voice selection and CLI/Python APIs. It reads and writes .env using VOLCANO_TTS_APPID, VOLCANO_TTS_ACCESS_TOKEN, VOLCANO_TTS_SECRET_KEY and calls https://openspeech.bytedance.com/api/v1/tts to synthesize audio.

from clawhub.ai·v53b14a7·74.7 KB·0 installs

Scanned from 1.0.2 at 53b14a7 · Transparency log ↗

$ vett add clawhub.ai/xdrshjr/doubao-open-tts

Doubao Open TTS SKILL

Overview

This SKILL provides text-to-speech (TTS) capabilities for AI Agents using the Doubao (Volcano Engine) API. It enables agents to convert text into natural-sounding speech with 200+ voice options.

Skill Information

Name: doubao-open-tts
Type: Text-to-Speech Synthesis
Provider: Volcano Engine (Doubao)
Version: 1.0.0
Developer: xdrshjr

Capabilities

Core Functions

Text-to-Speech Synthesis
- Convert any text to natural speech
- Support for multiple audio formats (mp3, pcm, wav)
- Adjustable speech speed and volume
Voice Selection
- 200+ voices across multiple categories
- Interactive voice selection workflow
- Support for Chinese and multilingual voices
Voice Categories
- General (Normal & Multilingual with emotions)
- Roleplay (characters, personalities)
- Video Dubbing (cartoon characters)
- Audiobook (storytelling voices)
- Customer Service (professional voices)
- Fun Accents (regional dialects)

Agent Integration

# Example: Agent using this SKILL
from skills.doubao_open_tts import VolcanoTTS, get_voice_selection_prompt, find_voice_by_name

# Step 1: Show voice options to user
prompt = get_voice_selection_prompt()
# Agent displays: "Please select a voice..."

# Step 2: User selects voice (e.g., "Shiny")
user_choice = "Shiny"
voice_type, voice_name = find_voice_by_name(user_choice)

# Step 3: Agent synthesizes speech
tts = VolcanoTTS(app_id="...", access_token="...", secret_key="...")
audio_path = tts.synthesize(
    text="Hello, I'm your AI assistant",
    voice_type=voice_type,
    output_file="response.mp3"
)

Configuration

Required Parameters

Parameter	Description	Example
`app_id`	Volcano Engine App ID	`your_app_id`
`access_token`	API Access Token	`your_token`
`secret_key`	API Secret Key	`your_secret`

Optional Parameters

Parameter	Description	Default	Range
`voice_type`	Voice identifier	`zh_female_cancan_mars_bigtts`	See voice list
`encoding`	Audio format	`mp3`	mp3, pcm, wav
`speed`	Speech speed	`1.0`	0.5 - 2.0
`volume`	Volume level	`1.0`	0.5 - 2.0

Setup

Install Dependencies
```
pip install -r requirements.txt
```

Configure Credentials

cp .env.example .env
# Edit .env with your Volcano Engine credentials

Get API Credentials
- Visit Volcano Engine Console
- Enable "Doubao Voice" service
- Create application to get AppID, Access Token, and Secret Key

Usage Examples

Basic Synthesis

from skills.doubao_open_tts import VolcanoTTS

tts = VolcanoTTS()
audio = tts.synthesize("Hello world", output_file="output.mp3")

Interactive Voice Selection

from skills.doubao_open_tts import get_voice_selection_prompt, find_voice_by_name

# Get prompt for user
prompt = get_voice_selection_prompt()
print(prompt)

# Parse user selection
voice_type, name = find_voice_by_name("Shiny")

With Custom Parameters

audio = tts.synthesize(
    text="Custom speech",
    voice_type="zh_male_sunwukong_mars_bigtts",  # Monkey King voice
    speed=1.2,
    volume=0.8,
    encoding="wav"
)

Voice Categories

Popular Voices

灿灿/Shiny (Default) - General purpose, Chinese/English
猴哥 - Monkey King character voice
快乐小东 - Cheerful male voice
霸道总裁 - Dominant CEO character

Category Overview

General-Multilingual: 20+ voices with emotion support
Roleplay: 20+ character voices
Video Dubbing: Cartoon and character voices
Audiobook: Storytelling optimized voices
Customer Service: Professional service voices
Fun Accents: Regional dialects (Cantonese, Sichuan, etc.)

File Structure

doubao-open-tts/
├── README.md              # This file
├── doubao-open-tts.md     # Detailed documentation
├── requirements.txt       # Python dependencies
├── .env.example          # Configuration template
└── scripts/
    ├── tts.py            # Main SKILL implementation
    └── test_tts.py       # Test and example scripts

Agent Workflow Integration

Typical Use Case

User Request: "Read this article aloud"
Agent Action:
- Call get_voice_selection_prompt()
- Present voice options to user
User Response: "Use the cheerful female voice"
Agent Action:
- Call find_voice_by_name("cheerful female")
- Get voice_type
- Call tts.synthesize(article_text, voice_type=voice_type)
Result: Return audio file to user

Dependencies

volcengine-python-sdk - Volcano Engine SDK
python-dotenv - Environment variable management

License

MIT License

Support

Issues: GitHub Issues
Developer: @xdrshjr
Documentation: See doubao-open-tts.md for complete API reference