omnicaptions-convert

✓Verified·Scanned 2/18/2026

This skill converts between caption formats and provides CLI and Python usage for omnicaptions (e.g., omnicaptions convert, omnicaptions transcribe). It includes shell commands (pip install omni-captions-skills --extra-index-url https://lattifai.github.io/pypi/simple/, omnicaptions transcribe ...) and network calls to https://youtube.com/watch?v=VIDEO_ID.

by lattifai·v22fc757·6.9 KB·70 installs

Scanned from main at 22fc757 · Transparency log ↗

$ vett add lattifai/omni-captions-skills/omnicaptions-convert

Caption Format Conversion

Convert between 30+ caption/caption formats using lattifai-captions.

⚡ YouTube Workflow

# 1. Transcribe YouTube video directly
omnicaptions transcribe "https://youtube.com/watch?v=VIDEO_ID" -o transcript.md

# 2. Convert to any format
omnicaptions convert transcript.md -o output.srt
omnicaptions convert transcript.md -o output.ass
omnicaptions convert transcript.md -o output.vtt

When to Use

Converting SRT to VTT, ASS, TTML, etc.
Converting Gemini markdown transcript to standard caption formats
Converting YouTube VTT (with word-level timestamps) to other formats
Batch format conversion

When NOT to Use

Need transcription (use /omnicaptions:transcribe)
Need translation (use /omnicaptions:translate)

Setup

pip install omni-captions-skills --extra-index-url https://lattifai.github.io/pypi/simple/

Quick Reference

Format	Extension	Read	Write
SRT	`.srt`	✓	✓
VTT	`.vtt`	✓	✓
ASS/SSA	`.ass`	✓	✓
TTML	`.ttml`	✓	✓
Gemini MD	`.md`	✓	✓
JSON	`.json`	✓	✓
TXT	`.txt`	✓	✓

Full list: SRT, VTT, ASS, SSA, TTML, DFXP, SBV, SUB, LRC, JSON, TXT, TSV, Audacity, Audition, FCPXML, EDL, and more.

CLI Usage

# Convert (auto-output to same directory, only changes extension)
omnicaptions convert input.srt -t vtt           # → ./input.vtt
omnicaptions convert transcript.md              # → ./transcript.srt

# Specify output file or directory
omnicaptions convert input.srt -o output/       # → output/input.srt
omnicaptions convert input.srt -o output.vtt    # → output.vtt

# Specify format explicitly
omnicaptions convert input.txt -o out.srt -f txt -t srt

ASS Style Presets

When converting to ASS format, use --style to apply preset styles:

omnicaptions convert input.srt -o output.ass --style default    # White text, bottom
omnicaptions convert input.srt -o output.ass --style top        # White text, top
omnicaptions convert input.srt -o output.ass --style bilingual  # White + Yellow (for bilingual)
omnicaptions convert input.srt -o output.ass --style yellow     # Yellow text, bottom

Preset	Position	Line 1	Line 2	Use Case
`default`	Bottom	White	White	Standard captions
`top`	Top	White	White	When bottom is occupied
`bilingual`	Bottom	White	Yellow	Bilingual captions (原文 + 译文)
`yellow`	Bottom	Yellow	Yellow	High visibility

Bilingual Example

If your SRT has two-line captions like:

1
00:00:01,000 --> 00:00:03,000
Hello World
你好世界

Use --style bilingual or custom colors:

# Preset: white + yellow
omnicaptions convert bilingual.srt -o output.ass --style bilingual

# Custom colors: green English + yellow Chinese
omnicaptions convert bilingual.srt -o output.ass --line1-color "#00FF00" --line2-color "#FFFF00"

# Mix preset with custom line2 color
omnicaptions convert bilingual.srt -o output.ass --style default --line2-color "#FF6600"

Custom Color Options

Option	Description
`--line1-color "#RRGGBB"`	First line (original) color
`--line2-color "#RRGGBB"`	Second line (translation) color

Common colors: #FFFFFF (white), #FFFF00 (yellow), #00FF00 (green), #00FFFF (cyan), #FF6600 (orange)

Font Size and Resolution

Font size is auto-calculated based on video resolution. Resolution is detected from (priority order):

--resolution argument (e.g., 1080p, 4k, 1920x1080)
--video argument (uses ffprobe to detect)
.meta.json file (saved by omnicaptions download)
Default: 1080p

# Auto-detect from .meta.json (saved by download command)
omnicaptions convert abc123.en.srt -o abc123.en.ass --karaoke

# Specify resolution directly
omnicaptions convert input.srt -o output.ass --resolution 4k
omnicaptions convert input.srt -o output.ass --resolution 720p
omnicaptions convert input.srt -o output.ass --resolution 1920x1080

# Detect from video file (uses ffprobe)
omnicaptions convert input.srt -o output.ass --video video.mp4

# Override auto-calculated fontsize
omnicaptions convert input.srt -o output.ass --resolution 4k --fontsize 80

Resolution	PlayRes	Auto FontSize
480p	854×480	24
720p	1280×720	32
1080p	1920×1080	48 (default)
2K	2560×1440	64
4K	3840×2160	96

Karaoke Mode

Generate karaoke subtitles with word-level highlighting. Requires word-level timing (use LaiCut alignment first).

# Basic karaoke (sweep effect - gradual fill)
omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke

# Different effects
omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke sweep    # Gradual fill (default)
omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke instant  # Instant highlight
omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke outline  # Outline then fill

# LRC karaoke (enhanced word timestamps)
omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.lrc --karaoke

Effect	ASS Tag	Description
`sweep`	`\kf`	Gradual fill from left to right (default)
`instant`	`\k`	Instant word highlight
`outline`	`\ko`	Outline fills, then text fills

Karaoke Workflow

# 1. Align with LaiCut (get word-level timing in JSON)
omnicaptions LaiCut audio.mp3 lyrics.txt

# 2. Convert to karaoke ASS
omnicaptions convert lyrics_LaiCut.json -o karaoke.ass --karaoke

# Or combine with style
omnicaptions convert lyrics_LaiCut.json -o karaoke.ass --karaoke --style yellow

Python Usage

from omnicaptions import Caption

# Load any format
cap = Caption.read("input.srt")

# Write to any format
cap.write("output.vtt")
cap.write("output.ass")
cap.write("output.ttml")

Common Mistakes

Mistake	Fix
Format not detected	Use `--from` / `--to` flags
Missing timestamps	Source format must have timing info
Encoding error	Specify `encoding="utf-8"`

Related Skills

Skill	Use When
`/omnicaptions:transcribe`	Need transcript from audio/video
`/omnicaptions:translate`	Translate with Gemini API
`/omnicaptions:translate`	Translate with Claude (no API key)
`/omnicaptions:download`	Download video/captions first

Workflow Examples

# Transcribe → Convert → Translate (with Claude)
/omnicaptions:transcribe video.mp4
/omnicaptions:convert video_GeminiUnd.md -o video.srt
/omnicaptions:translate video.srt -l zh --bilingual