yt-to-blog
This skill automates turning a YouTube URL into a blog, Substack draft, X thread, and HeyGen vertical videos. It reads/writes skills/yt-content-engine/config.json, runs shell commands like bash skills/yt-content-engine/setup.sh, and makes network calls to https://api.heygen.com.
YT-to-Blog Content Engine
YouTube URL → blog post + Substack + tweets + vertical video clips. The whole content machine.
Pipeline Overview
YouTube URL
↓
① Transcript (summarize CLI)
↓
② Blog Draft (AI-written in your voice)
↓
③ Substack Publish (browser automation)
↓
④ X/Twitter Post (bird CLI)
↓
④b Facebook Group (optional reminder)
↓
⑤ Script Splitter (extract hook moments)
↓
⑥ HeyGen Videos (AI avatar vertical clips)
↓
⑦ Post-Processing (ffmpeg crop/scale)
↓
📁 Output Folder (blog.md, videos, tweet.txt, URLs)
One URL in → Five platforms out. Run the whole thing or any step individually.
First-Time Setup Wizard
Walk the user through this on first use. It takes ~10 minutes once, then never again.
Step 1: Check Dependencies
Run the setup script to check what's installed:
bash skills/yt-content-engine/setup.sh
Required CLIs:
| Tool | Purpose | Install |
|---|---|---|
summarize | YouTube transcript extraction | brew install steipete/tap/summarize |
bird | X/Twitter posting | brew install steipete/tap/bird |
ffmpeg | Video post-processing | brew install ffmpeg |
curl | API calls to HeyGen | Usually pre-installed on macOS |
python3 | Helper scripts | Usually pre-installed on macOS |
If anything is missing, tell the user what to install and wait for confirmation.
Step 2: HeyGen API Key
- Tell the user: "Go to https://app.heygen.com/settings — grab your API key from the API section."
- If they don't have a HeyGen account: "Sign up at https://heygen.com — the free tier gives you a few credits to test with."
- Save the key to
config.json(see config schema below). - Test it:
curl -s -H "X-Api-Key: API_KEY_HERE" https://api.heygen.com/v2/avatars | python3 -c "import sys,json; d=json.load(sys.stdin); print('✅ API key works!' if 'data' in d else '❌ Invalid key')"
Step 3: HeyGen Avatar Setup
Tell the user:
"For vertical video clips, you need a HeyGen avatar. Here's what matters:
Record in PORTRAIT mode (hold your phone vertically). This is critical — if you record landscape, the avatar will be a small strip in the center of a 9:16 frame and we'll need to crop/scale it (which works but loses quality).
Go to https://app.heygen.com/avatars → Create Instant Avatar → follow their recording guide. Stand in good lighting, look at camera, speak naturally for 2+ minutes.
Once created, grab your Avatar ID from the avatar details page."
List their existing avatars to help them pick. Note: the avatars endpoint returns both custom and stock avatars — filter for the user's custom ones (they typically appear first and have personal names):
curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/avatars | python3 -c "
import sys, json
data = json.load(sys.stdin)
for a in data.get('data', {}).get('avatars', []):
print(f\" {a['avatar_id']} — {a.get('avatar_name', 'unnamed')}\")
"
Step 4: HeyGen Voice Clone
Tell the user:
"Go to https://app.heygen.com/voice-clone → Clone your voice. Upload a clean audio sample (1-2 min of you speaking naturally). HeyGen will create a voice ID.
Once done, grab your Voice ID from the voice settings."
List their voices. User's cloned voices typically appear first; stock voices come after:
curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/voices | python3 -c "
import sys, json
data = json.load(sys.stdin)
for v in data.get('data', {}).get('voices', []):
print(f\" {v['voice_id']} — {v.get('name', 'unnamed')} ({v.get('language', '?')})\")
"
⚠️ IMPORTANT: Use the FULL voice_id (e.g., 69da9c9bca78499b98fdac698d2a20cd), not a truncated version. The API will return "Voice validation failed" if you use a shortened ID.
Step 5: Substack Login
Substack has no API — posting requires browser automation.
- Open the OpenClaw managed browser: use browser tool with
profile="openclaw" - Navigate to
https://substack.com/sign-in - Help the user log in with their credentials
- Verify access by navigating to their publication dashboard
- Save the publication URL to
config.json
The browser session persists across restarts. One-time setup.
Step 6: Save Config
Create skills/yt-content-engine/config.json (relative to your workspace):
{
"heygen": {
"apiKey": "YOUR_API_KEY",
"avatarId": "YOUR_AVATAR_ID",
"voiceId": "YOUR_VOICE_ID"
},
"substack": {
"publication": "yourblog.substack.com"
},
"twitter": {
"handle": "@yourhandle"
},
"author": {
"voice": "Description of your writing voice and style",
"name": "Your Name"
},
"video": {
"clipCount": 5,
"maxClipSeconds": 60,
"cropMode": "auto"
}
}
Tip: If the user already has a voice guide from the yt-to-blog skill, read it from skills/yt-to-blog/references/voice-guide.md and use it for the author.voice field.
Step 7: Verify Everything
Run the setup script with the config in place:
bash skills/yt-content-engine/setup.sh
It will test each component and report status.
How to Invoke
Full Pipeline
"Turn this into a full content suite: https://youtu.be/XXXXX"
"Content engine this video: [URL]"
"Run the full pipeline on [URL]"
Individual Steps
"Just get me the transcript from [URL]"
"Write a blog post from [URL]" (steps 1-2)
"Post this to Substack" (step 3, after blog exists)
"Tweet about this blog post" (step 4)
"Generate video clips from this blog" (steps 5-7)
"Just split this into scripts" (step 5 only)
Pipeline Steps
Step ①: Transcript
Create the output directory for this run, then fetch the YouTube transcript:
mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/scripts
mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/videos
summarize "YOUTUBE_URL" --extract > /tmp/yt-content-engine/transcript.txt
The --extract flag prints the raw transcript without LLM summarization. Read the output. If it fails (no captions available), try with --youtube yt-dlp for auto-generated captions, or tell the user and suggest they provide a manual transcript.
Step ②: Blog Draft
Transform the transcript into a polished long-form blog post.
Load the author voice from config.json → author.voice. If a more detailed voice guide exists at skills/yt-to-blog/references/voice-guide.md, read and use that too.
Analysis phase — before writing, extract from the transcript:
- Core thesis — the single strongest argument or revelation
- Key data points — statistics, quotes, dates, names
- Narrative moments — anecdotes, examples, scenes
- Source links — URLs, studies, references mentioned
- Missing context — what does the reader need that the video assumed?
Writing structure:
- Cold open (1-3 paragraphs): Scene-setting. Specific, sensory, emotional hook before data.
- Thesis pivot (1 paragraph): Connect scene to the bigger story.
- Data body (5-15 paragraphs): Alternate data and editorial. Each stat gets a punch line. Subheadings for major breaks only.
- Callback (1-2 paragraphs): Return to opening scene/metaphor.
- Closing (3-6 short paragraphs): Escalating fragments. Final hammer line.
Writing rules:
- Vary sentence length dramatically — long data sentences, then short punches
- Em dashes for asides, not parentheses
- Sentence fragments for emphasis
- No bullet lists in the body — narrative flow
- Inline source links, no footnotes
- No "in conclusion" or "to summarize"
- Credit video source naturally: "As [Name] put it..." with link
- Target: 1,500-3,000 words
Generate 3-5 headline options with distinct strategies (contrast/irony, revelation, moral framing, callback). Each with a subtitle. Let the user pick.
Save the final draft to the output folder as blog.md.
Step ③: Substack Publish
Post the blog to Substack via browser automation.
- Read
config.json→substack.publication - Open managed browser (
profile="openclaw") - Navigate to
https://PUBLICATION.substack.com/publish/post - Click the title field, type the title
- Click the subtitle area, type the subtitle
- Click the body area
- Write markdown to a temp file, copy to clipboard (
pbcopy < /tmp/post.md), paste into editor (Meta+v) - Substack auto-saves as draft
Known issues:
- Em dashes (
—) may garble as,Äîduring clipboard paste → find/replace after paste - Large posts: pause briefly between paste and verification
- Verify draft at
https://PUBLICATION.substack.com/publish
Default: save as draft. Only publish if the user explicitly says "publish it" — always confirm first.
Save the Substack URL to output/substack-url.txt.
Step ④: X/Twitter Post
Compose and post using the bird CLI.
Compose the tweet/thread:
- If the blog has a single killer hook → single tweet with link
- If there are multiple strong points → thread (3-5 tweets)
- Include the Substack URL
- Match the author's voice but punchier — tweets are hooks, not summaries
- Use the handle from
config.json→twitter.handle
Post with bird:
# Single tweet
bird tweet "Your tweet text here"
# Thread (post first tweet, then reply to it)
bird tweet "Tweet 1 text here"
# Note the returned tweet ID, then:
bird reply TWEET_ID "Tweet 2 text here"
# And chain:
bird reply TWEET_2_ID "Tweet 3 text here"
Always show the user the tweet text before posting and get confirmation.
Save tweet text to output/tweet.txt.
Step ④b: Facebook Group (Optional)
If config.json includes a facebook.group URL, remind the user to post to their Facebook Group.
Note: Facebook Group API posting is heavily restricted. Browser automation is unreliable due to Facebook's anti-bot measures. Best approach:
- Draft a Facebook post version of the content (shorter, more casual than tweet)
- Save to
output/facebook-post.txt - Remind the user: "Don't forget to post to [Group Name] — here's your draft"
- User posts manually
This keeps Facebook distribution in the workflow without fighting their API restrictions.
Step ⑤: Script Splitter
Extract 3-5 "hook moments" from the blog post and rewrite each as a spoken-word script for vertical video.
What to look for (scan the blog for these patterns):
- Hook/Controversy — the most provocative claim, the thing that makes people stop scrolling
- Data Bomb — a surprising statistic or fact that reframes understanding
- Counterintuitive Take — something that contradicts conventional wisdom
- Emotional Moment — a story, anecdote, or human element that creates connection
- Call-to-Action Closer — a rallying cry, challenge, or "what you should do now"
Not every blog will have all five. Extract what's there. Minimum 3 clips.
Rewrite rules for spoken delivery:
- Hook first — open with the most attention-grabbing line. No preamble.
- Conversational — write for speaking, not reading. Contractions, natural rhythm.
- 30-60 seconds each — roughly 75-150 words per clip
- Self-contained — each clip must work on its own, no "as I mentioned earlier"
- End with punch — close on the strongest line, not a trailing thought
- No stage directions — just the words to speak, nothing else
Format each script:
CLIP 1: [descriptive title]
---
[Script text here, 75-150 words]
Use config.json → video.clipCount for the target number of clips (default: 5).
Use config.json → video.maxClipSeconds for max duration (default: 60).
Save scripts to output/scripts/clip-1.txt, clip-2.txt, etc.
Step ⑥: HeyGen Video Generation
Submit each script to HeyGen API v2 to generate AI avatar videos.
Read config:
# Parse config.json
API_KEY=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['apiKey'])")
AVATAR_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['avatarId'])")
VOICE_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['voiceId'])")
For each script, submit a video generation request:
curl -s -X POST "https://api.heygen.com/v2/video/generate" \
-H "X-Api-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"video_inputs": [{
"character": {
"type": "avatar",
"avatar_id": "'"$AVATAR_ID"'",
"avatar_style": "normal"
},
"voice": {
"type": "text",
"input_text": "'"$(cat output/scripts/clip-1.txt)"'",
"voice_id": "'"$VOICE_ID"'"
}
}],
"dimension": {
"width": 1080,
"height": 1920
}
}'
Parse the response to get video_id:
import json
response = json.loads(response_text)
video_id = response["data"]["video_id"]
Submit ALL clips before polling. HeyGen renders in parallel — submit all scripts first, collect all video_ids, then poll them all. This cuts total render time from N×3min to ~3min.
Poll for completion (every 15 seconds, timeout after 10 minutes):
curl -s -H "X-Api-Key: $API_KEY" \
"https://api.heygen.com/v1/video_status.get?video_id=$VIDEO_ID" \
| python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(d['status'], d.get('video_url',''))"
Statuses: pending → processing → completed (with video_url) or failed (with error).
Download completed videos:
curl -L -o "output/videos/clip-1-raw.mp4" "$VIDEO_URL"
Credit note: ~1 credit per 1 minute of video. A typical 5-clip run uses ~3 credits. Warn the user about credit usage before submitting.
Step ⑦: Video Post-Processing
If the avatar was recorded in landscape (common), the 9:16 video will show a small avatar strip centered in a large frame with background fill. Fix this with ffmpeg.
Check config.json → video.cropMode:
"auto"— detect and crop automatically"portrait"— skip cropping (avatar was recorded in portrait)"manual"— ask user for crop coordinates
Auto-crop pipeline:
# 1. Detect content bounds by scanning center column for non-background pixels
# Extract a single frame
ffmpeg -i input.mp4 -vframes 1 -y /tmp/frame.png
# 2. Use ffmpeg cropdetect to find content bounds
ffmpeg -i input.mp4 -vf "cropdetect=24:16:0" -frames:v 30 -f null - 2>&1 | grep cropdetect
# Parse the crop values from output: crop=W:H:X:Y
# 3. Crop content strip, scale up, center-crop to 1080x1920
ffmpeg -i input.mp4 \
-vf "crop=DETECTED_W:DETECTED_H:DETECTED_X:DETECTED_Y,scale=1080:-1,crop=1080:1920:0:(ih-1920)/2" \
-c:a copy \
-y output.mp4
Alternative manual detection (preferred — cropdetect often fails when background is white/light):
HeyGen typically renders landscape avatars centered on a white/light background in the 9:16 frame. Scan the center column for non-white pixels to find the actual content strip:
# Extract a frame, then scan center column for content bounds
ffmpeg -y -ss 5 -i input.mp4 -frames:v 1 /tmp/frame.png 2>/dev/null
ffmpeg -y -i /tmp/frame.png -vf "crop=1:ih:iw/2:0,format=gray" -f rawvideo -pix_fmt gray - 2>/dev/null | \
python3 -c "
import sys
data = sys.stdin.buffer.read()
first = last = None
for i, b in enumerate(data):
if b < 240: # Non-white pixel = actual content
if first is None: first = i
last = i
if first is not None:
print(f'CONTENT_Y={first}')
print(f'CONTENT_HEIGHT={last - first}')
print(f'CENTER={( first + last) // 2}')
else:
print('No content bounds detected — avatar may already fill the frame')
"
Then crop the content strip, scale proportionally to fill width, and center-crop to 9:16:
ffmpeg -y -i input.mp4 \
-vf "crop=iw:CONTENT_HEIGHT:0:CONTENT_Y,scale=-1:1920,crop=1080:1920:(ow-1080)/2:0" \
-c:v libx264 -crf 23 -preset fast -c:a aac \
output.mp4
Proven crop values for common HeyGen landscape avatars (1080x1920 canvas):
- Content strip typically at y≈656, height≈607px
- Example:
crop=1080:607:0:656,scale=3413:1920,crop=1080:1920:1166:0 - Always detect per-video — avatar placement can shift
Save processed videos to output/videos/clip-1.mp4, clip-2.mp4, etc.
If crop mode is portrait, just copy the raw files:
cp output/videos/clip-1-raw.mp4 output/videos/clip-1.mp4
Step ⑧: Output
Organize everything in a dated output folder:
output-YYYY-MM-DD/
├── blog.md # Final blog post
├── tweet.txt # Tweet text (posted or ready to post)
├── substack-url.txt # URL of Substack draft/post
├── scripts/
│ ├── clip-1.txt # Spoken word scripts
│ ├── clip-2.txt
│ └── ...
├── videos/
│ ├── clip-1.mp4 # Final processed vertical videos
│ ├── clip-2.mp4
│ └── ...
└── manifest.json # Run metadata
manifest.json:
{
"source": "https://youtu.be/XXXXX",
"date": "2026-02-03",
"blog": "blog.md",
"substackUrl": "https://...",
"tweetUrl": "https://...",
"clips": ["clip-1.mp4", "clip-2.mp4", "..."],
"heygenCreditsUsed": 3
}
Report the summary to the user:
- ✅ Blog post: X words
- ✅ Substack: [URL] (draft/published)
- ✅ Tweet: posted / ready to post
- ✅ X video clips generated and processed
- 💰 HeyGen credits used: ~X
Config Reference
Config file: skills/yt-content-engine/config.json (relative to workspace root)
| Key | Description | Default |
|---|---|---|
heygen.apiKey | HeyGen API key | Required |
heygen.avatarId | Your HeyGen avatar ID | Required |
heygen.voiceId | Your cloned voice ID | Required |
substack.publication | Substack subdomain | Required |
twitter.handle | X/Twitter handle | Required |
author.voice | Writing style description | Recommended |
author.name | Author name for attribution | Recommended |
video.clipCount | Number of clips to generate | 5 |
video.maxClipSeconds | Max seconds per clip | 60 |
video.cropMode | auto, portrait, or manual | auto |
Tips & Troubleshooting
- HeyGen rendering takes 2-3 min per clip. Set expectations — a 5-clip run takes 10-15 minutes of render time.
- Portrait avatars save time. No cropping needed. Worth re-recording if you use this regularly.
- Substack session expires? Re-run the browser login step (Step 5 of setup).
- bird CLI not posting? Run
bird authto re-authenticate. - Bad crop detection? Switch
cropModetomanualand eyeball the content bounds from a frame export. - HeyGen quota errors? Check credits at https://app.heygen.com/settings — upgrade plan or reduce clip count.
- Transcript unavailable? Some videos don't have captions. Try
summarize "URL" --extract --youtube yt-dlpfor auto-generated captions, or ask the user for a manual transcript.