ai-podcast-pipeline

Review·Scanned 2/18/2026

This skill builds end-to-end Korean AI podcast packages from Trend/QuickView-* sources, producing scripts, dual-voice TTS, subtitles, thumbnails, and YouTube metadata. It runs shell commands (e.g., python3 ..., ffmpeg), requires GEMINI_API_KEY/NANO_BANANA_KEY, and calls https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent.

from clawhub.ai·v2427962·41.1 KB·0 installs
Scanned from 0.1.1 at 2427962 · Transparency log ↗
$ vett add clawhub.ai/jeong-wooseok/ai-podcast-pipelineReview findings below

AI Podcast Pipeline

Build end-to-end podcast assets from Trend/QuickView-* content.

Core Workflow

  1. Select source QuickView file.
  2. Generate script (full or compressed mode).
  3. Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability).
  4. Generate full-text Korean subtitles (no ellipsis truncation).
  5. Render subtitle MP4 with tuned font/size/timing shift.
  6. Build thumbnail + YouTube metadata.
  7. Deliver final package.

Step 1) Select Source

Prefer weekly file:

  • /home/tw2/Documents/n8n/data/shared/syn/8.quartz/Trend/QuickView-YYMM-주차주.md

If user gives wk.aiee.app URL, map to local 8.quartz markdown first.

Step 2) Generate Script

Read and apply:

  • references/podcast_prompt_template_ko.md

Modes:

  • Full mode: 15~20 minutes
  • Compressed mode: 5~7 minutes (core tips only)

Rules:

  • no system/meta text in spoken lines
  • host intro once at opening only
  • conversational Korean, short sentences, actionable
  • save script in archive/

Step 3) Build Audio (Gemini Multi-Speaker, Reliable)

Preferred: chunked builder (timeout-safe)

# set once per shell (do not print/share your key)
export GEMINI_API_KEY="<YOUR_GEMINI_API_KEY>"
python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/build_dualvoice_audio.py \
  --input <script.txt> \
  --outdir <outdir> \
  --basename podcast_full_dualvoice \
  --chunk-lines 6

Single-pass (short scripts)

python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/gemini_multispeaker_tts.py \
  --input-file <dialogue.txt> \
  --outdir <outdir> \
  --basename podcast_dualvoice \
  --retries 3 \
  --timeout-seconds 120

Default voice mapping (2026-02-10 fixed):

  • Callie (female) → Kore
  • Nick (male) → Puck

Output: MP3 (default delivery format)

Step 4) Build Korean Subtitles (Full Text)

Use full-text subtitle builder (no ... truncation):

python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/build_korean_srt.py \
  --script <script.txt> \
  --audio <final.mp3> \
  --output <outdir>/podcast.srt \
  --max-chars 22

Step 5) Render Subtitled MP4 (Font + Timing)

Use renderer with adjustable font and timing shift:

python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/render_subtitled_video.py \
  --image <thumbnail.png> \
  --audio <final.mp3> \
  --srt <podcast.srt> \
  --output <outdir>/final.mp4 \
  --font-name "Do Hyeon" \
  --font-size 27 \
  --shift-ms -250

Notes:

  • shift-ms negative = subtitle earlier (for lag fixes)
  • If text clipping occurs, lower font-size (e.g., 25~27)
  • keep text inside safe area; avoid overlap with character/object

Step 6) Build Thumbnail + YouTube Metadata

# set once per shell (do not print/share your key)
export GEMINI_API_KEY="<YOUR_GEMINI_API_KEY>"
python3 /home/tw2/.openclaw/workspace/skills/ai-podcast-pipeline/scripts/build_podcast_assets.py \
  --source "<QuickView path or URL>"

Reference (layout/copy guardrails):

  • references/thumbnail_guidelines_ko.md

Step 7) Final Delivery Checklist

Always include:

  1. source used
  2. final MP3 path
  3. subtitle MP4 path + size
  4. thumbnail path
  5. YouTube title options (3)
  6. YouTube description

Reliability Rules

  • Gemini timeout on long input: use chunked builder (build_dualvoice_audio.py)
  • Subtitle clipping: reduce font size and increase bottom margin
  • Subtitle lag: adjust --shift-ms (usually -150 to -300)
  • Keep generated assets under Telegram practical limits

Security Notes

  • API keys must be passed via environment variables (GEMINI_API_KEY), not hardcoded.
  • Never paste raw keys into prompts, logs, screenshots, or public posts.
  • Recent hardening: thumbnail generation now passes keys via env (not CLI args).

References

  • references/podcast_prompt_template_ko.md
  • references/workflow_runbook.md
  • references/thumbnail_guidelines_ko.md