YouTube Video Transcript
// Fetch, summarize, and save YouTube transcripts with timestamp navigation, chapter detection, and searchable content.
Most YouTube transcript tools either require paid APIs, use suspicious proxies, or just dump raw text without structure. This skill extracts transcripts locally using yt-dlp, preserves timestamps for navigation, detects chapters automatically, and exports to any format you need.
When to Use
User shares a YouTube link and wants to read instead of watch. User asks what someone says about a topic at a specific moment. User needs to extract quotes with timestamps for research or content creation. User wants to summarize a video or search within its content.
How It Works
┌──────────────────────────────────────────────┐
│ YOUTUBE TRANSCRIPT FLOW │
└──────────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌─────────┐
│ VIDEO │ │ METADATA │ │SUBTITLES│
│ URL │ │ FETCH │ │ CHECK │
└────┬────┘ └────┬─────┘ └────┬────┘
│ │ │
│ youtube.com/ │ Title, duration, │ Manual first,
│ watch?v=... │ chapters, lang │ auto fallback
│ │ │
└───────────────────┴────────────────────┘
│
▼
┌─────────────────┐
│ EXTRACT + CLEAN │
│ VTT → Markdown │
│ with timestamps │
└────────┬────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌───────────┐ ┌──────────┐
│ CHAPTERS │ │ SEARCH │ │ EXPORT │
│ detected │ │ by topic │ │ MD/SRT/ │
│ or smart │ │ timestamp │ │ TXT/JSON │
└──────────┘ └───────────┘ └──────────┘
The Extraction Process
1. 📋 Get Metadata First
Always fetch video info before extracting subtitles:
yt-dlp -j "VIDEO_URL"
This gives you title, duration, official chapters, and available languages. Use it to confirm the right video and check what subtitles exist.
2. 📝 Prefer Manual Subtitles
Manual (uploaded) subtitles are higher quality than auto-generated:
# Try manual first
yt-dlp --write-sub --sub-lang en --skip-download "VIDEO_URL"
# Fall back to auto-generated if manual unavailable
yt-dlp --write-auto-sub --sub-lang en --skip-download "VIDEO_URL"
Auto-generated transcripts often have errors, missing punctuation, and wrong word boundaries. Manual subtitles are human-verified.
3. 🕐 Preserve Timestamps Always
Every segment must include timestamps. Format: [HH:MM:SS] or [MM:SS] for videos under 1 hour.
Why this matters: Users need to jump to specific moments. "Take me to where they discuss pricing" requires knowing the timestamp.
Output format:
[00:00] Welcome to this video about machine learning
[00:15] Today we'll cover three main topics
[00:30] First, let's talk about neural networks
Chapter Detection
From Video Markers
Many videos have chapter markers embedded. Extract from metadata:
yt-dlp -j "VIDEO_URL" | jq '.chapters'
Smart Detection (No Markers)
When video lacks chapters, detect natural breaks from transcript:
- Topic changes (semantic shift in content)
- Speaker changes (different voice patterns)
- Explicit transitions ("Now let's talk about...", "Moving on...")
- Long pauses between segments
Search Within Transcripts
When user asks "where do they talk about X":
- Search transcript for keywords and semantic matches
- Return segments with timestamps
- Include surrounding context (10-15 seconds before/after)
Response format:
Found 3 mentions of "machine learning":
[05:23] "...this is where machine learning really shines..."
Context: Discussing data processing approaches
[12:45] "...traditional methods vs machine learning..."
Context: Comparison section
Generate clickable links: https://youtube.com/watch?v=VIDEO_ID&t=323
Architecture
Memory lives in ~/youtube-video-transcript/. See memory-template.md for structure.
~/youtube-video-transcript/
├── memory.md # Preferences + recent videos
├── videos/ # Cached transcripts (with consent)
│ └── {video_id}.md # Individual video data
└── exports/ # Exported files
Quick Reference
| Topic | File |
|---|---|
| Setup process | setup.md |
| Memory template | memory-template.md |
| Advanced patterns | patterns.md |
Core Rules
1. Metadata Before Extraction
Always run yt-dlp -j URL first. This confirms the video, shows available languages, and reveals official chapters. Never extract blind.
2. Manual Over Auto
| Subtitle Type | Quality | When to Use |
|---|---|---|
| Manual | High | Always try first |
| Auto-generated | Medium | Fallback only |
Check with yt-dlp --list-subs URL for unfamiliar channels.
3. Timestamps Are Sacred
Never strip timestamps during any operation. They enable navigation, citation, and deep linking into the video.
4. Cache With Consent
| User Response | Action |
|---|---|
| "Yes, save it" | Cache to ~/youtube-video-transcript/videos/ |
| "No thanks" | Don't cache, show once |
| Not asked yet | Ask after first extraction |
Always tell user where files are saved and offer to show or delete them.
5. Handle Multiple Languages
If user doesn't specify:
- Check available languages
- Prefer manual over auto
- Default to English
- Report which language was used
yt-dlp --list-subs "VIDEO_URL"
6. Quote Extraction Includes Context
When extracting quotes for research:
- 10-15 seconds before/after for context
- Exact timestamp for the quote start
- Speaker identification if multiple speakers
7. Transparency on Quality
| Subtitle Type | Tell User |
|---|---|
| Manual | "Using official subtitles" |
| Auto-generated | "Using auto-generated (may have errors)" |
| None available | "No subtitles found for this video" |
Export Formats
| Format | Use Case | Command |
|---|---|---|
| Markdown | Reading, notes | Default |
| SRT | Video editors | --sub-format srt |
| Plain text | Search, grep | Strip timestamps |
| JSON | Programmatic | --write-info-json |
Common Traps
| Trap | Consequence | Prevention |
|---|---|---|
| Not checking subtitles first | Wasted time on unavailable video | Always --list-subs first |
| Ignoring auto-generated quality | Garbage text with errors | Prefer manual, warn about auto |
| Losing timestamps | Can't navigate video | Never strip in any operation |
| Extracting without metadata | Missing title, chapters | Always fetch -j first |
| Caching without consent | Privacy violation | Ask before saving |
Quick Commands
| User Says | Action |
|---|---|
| "Transcribe this video" | Extract + display |
| "What do they say about X?" | Search + timestamps |
| "Save this transcript" | Cache with confirmation |
| "Export as SRT" | Convert format |
| "Show saved videos" | List ~/youtube-video-transcript/videos/ |
| "Delete video X" | Remove from cache |
Security & Privacy
Data that stays local (with your consent):
- Transcripts cached in ~/youtube-video-transcript/ (only if you agree)
- Preferences stored locally (only after confirmation)
- No external API calls beyond YouTube's public subtitle endpoints
Transparency guarantees:
- Always asks before saving transcripts locally
- Tells you where files are saved
- Offers to show or delete saved data anytime
This skill does NOT:
- Use proxy services or third-party APIs
- Send your queries to external services
- Store credentials or authentication
- Save anything without your explicit consent
Related Skills
Install with clawhub install <slug> if user confirms:
summarizer— create summaries from any contentvideo-captions— generate and edit video subtitlesffmpeg— advanced video and audio processing
Feedback
- If useful:
clawhub star youtube-video-transcript - Stay updated:
clawhub sync