ai-podcast-pipeline
This skill builds end-to-end Korean AI podcast packages from Trend/QuickView-* sources, producing scripts, dual-voice TTS, subtitles, thumbnails, and YouTube metadata. It runs shell commands (e.g., python3 ..., ffmpeg), requires GEMINI_API_KEY/NANO_BANANA_KEY, and calls https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent.
AI Podcast Pipeline
Build end-to-end podcast assets from Trend/QuickView-* content.
Core Workflow
- Select source QuickView file.
- Generate script (full or compressed mode).
- Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability).
- Generate full-text Korean subtitles (no ellipsis truncation).
- Render subtitle MP4 with tuned font/size/timing shift.
- Build thumbnail + YouTube metadata.
- Deliver final package.
Step 1) Select Source
Prefer weekly file:
/home/tw2/Documents/n8n/data/shared/syn/8.quartz/Trend/QuickView-YYMM-주차주.md
If user gives wk.aiee.app URL, map to local 8.quartz markdown first.
Step 2) Generate Script
Read and apply:
references/podcast_prompt_template_ko.md
Modes:
- Full mode: 15~20 minutes
- Compressed mode: 5~7 minutes (core tips only)
Rules:
- no system/meta text in spoken lines
- host intro once at opening only
- conversational Korean, short sentences, actionable
- save script in
archive/
Step 3) Build Audio (Gemini Multi-Speaker, Reliable)
Preferred: chunked builder (timeout-safe)
# set once per shell (do not print/share your key)
export GEMINI_API_KEY="<YOUR_GEMINI_API_KEY>"
python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/build_dualvoice_audio.py \
--input <script.txt> \
--outdir <outdir> \
--basename podcast_full_dualvoice \
--chunk-lines 6
Single-pass (short scripts)
python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/gemini_multispeaker_tts.py \
--input-file <dialogue.txt> \
--outdir <outdir> \
--basename podcast_dualvoice \
--retries 3 \
--timeout-seconds 120
Default voice mapping (2026-02-10 fixed):
- Callie (female) →
Kore - Nick (male) →
Puck
Output: MP3 (default delivery format)
Step 4) Build Korean Subtitles (Full Text)
Use full-text subtitle builder (no ... truncation):
python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/build_korean_srt.py \
--script <script.txt> \
--audio <final.mp3> \
--output <outdir>/podcast.srt \
--max-chars 22
Step 5) Render Subtitled MP4 (Font + Timing)
Use renderer with adjustable font and timing shift:
python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/render_subtitled_video.py \
--image <thumbnail.png> \
--audio <final.mp3> \
--srt <podcast.srt> \
--output <outdir>/final.mp4 \
--font-name "Do Hyeon" \
--font-size 27 \
--shift-ms -250
Notes:
shift-msnegative = subtitle earlier (for lag fixes)- If text clipping occurs, lower
font-size(e.g., 25~27) - keep text inside safe area; avoid overlap with character/object
Step 6) Build Thumbnail + YouTube Metadata
# set once per shell (do not print/share your key)
export GEMINI_API_KEY="<YOUR_GEMINI_API_KEY>"
python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/build_podcast_assets.py \
--source "<QuickView path or URL>"
Reference (layout/copy guardrails):
references/thumbnail_guidelines_ko.md
Step 7) Final Delivery Checklist
Always include:
- source used
- final MP3 path
- subtitle MP4 path + size
- thumbnail path
- YouTube title options (3)
- YouTube description
Reliability Rules
- Gemini timeout on long input: use chunked builder (
build_dualvoice_audio.py) - Subtitle clipping: reduce font size and increase bottom margin
- Subtitle lag: adjust
--shift-ms(usually-150to-300) - Keep generated assets under Telegram practical limits
Security Notes
- API keys must be passed via environment variables (
GEMINI_API_KEY), not hardcoded. - Never paste raw keys into prompts, logs, screenshots, or public posts.
- Recent hardening: thumbnail generation now passes keys via env (not CLI args).
References
references/podcast_prompt_template_ko.mdreferences/workflow_runbook.mdreferences/thumbnail_guidelines_ko.md