Built for Claude Code & OpenAI Codex. Read open-course videos as one merged notebook — textbook + concept encyclopedia in a single static site.
Point it at a few YouTube or Bilibili playlists on the same topic. Your coding agent does the heavy lifting — crawls the videos, tags every transcript chunk against your ontology, clusters them into a clean concept graph, designs a pedagogical chapter order, and writes each chapter + concept page. No separate Anthropic API key required: every LLM stage runs in-session through your agent's existing Claude Code or Codex subscription. Pagefind search and bilingual (中文 / English) output ship at the flip of a flag.
Live demo: Diffusion Models 教材 · Quickstart · How it works · Use with Claude Code / Codex / any agent · Roadmap
English · 中文
🚧 Early days — expect rough edges.
video-to-notebookis in active development; v2.0.0 is the first public release after thecourse-merger→video-to-notebookrename. If you hit a bug, a confusing prompt, an off-target chapter, or a crawler that refuses to play nicely with a real playlist, please open an issue with the failing command + a few lines of log — that's by far the fastest way to get it fixed. Feature requests, ontology files for new domains, and crawler adapters for platforms beyond YouTube + Bilibili (Coursera, edX, MIT-OCW, …) are all welcome too.
What separates video-to-notebook from "feed playlists to ChatGPT, ask for a summary":
-
Source fidelity first. Every chapter and concept page must faithfully transmit how the lecturer actually taught the idea — the exact analogies, the whiteboard derivation steps, the verbatim phrasings, the named citations. Your own framing is layered on top with an explicit
🟡 教材外补充flag. If two courses give different metaphors for the same concept, both get preserved and labelled. If a reader who watched the lectures doesn't recognise the chapter, the system over-generalised. -
No fabrication — debug the pipeline instead. When the source chunks for a chapter come up thin (e.g. all 20 chunks are course-logistics chatter, or one alphabetically-early course dominates the LIMIT), the agent is required to stop and diagnose rather than paper over the gap with training-data knowledge. Common bugs to check: chunk-selection SQL with
LIMIT 20 ORDER BY course_slug(now fixed via depth-first allocation), tagging that matched by lecture-title keyword without the concept being discussed,--max-source-chunkstoo low. -
Textbook-note depth, not magazine-summary depth. Each chapter targets ~5,000–8,000 中文字 with: a TL;DR callout block, 8–14 numbered sections (一二三四 …), step-by-step math derivations where every line has a
**Why**:annotation, all distinctive lecturer analogies preserved, 3–5 colour-coded callouts (info/note/warning/tip/quote), engineering details embedded as inline callouts (not deferred), a complete runnable PyTorch skeleton when a model is introduced, and 5–7 takeaways anchored to specific lecturer-given examples. Under 4,000 字 = under-developed.
These three rules live inside the synthesize and explain style guides, so any agent driving the pipeline inherits them automatically.
|
Chapters laid out in pedagogical order — your beginner reads top-to-bottom and gets a complete arc, not a mosaic of fragments.
|
Every important concept gets its own rich page — for the reader who wants depth on one idea.
|
Auto-applied via |
No React, no Vue. Astro + 200 lines of vanilla JS. The whole concept page weighs ~30 KB gzipped (including SVG + interaction). |
Every LLM stage (tag, cluster, curriculum, synthesize, explain) runs in-session by default. The CLI writes a prompts envelope to <state_dir>/prompts/<step>.json and exits; you (the agent) write decisions to the sibling .decisions.json and re-invoke with --apply. Works from Claude Code, OpenAI Codex, Cursor, Continue, or your own script. See § Drive it from your AI coding agent below.
# Install the CLI (Python 3.12+)
pip install video-to-notebook # or: uv tool install video-to-notebook
brew install node yt-dlp # Node 20+ for the build, yt-dlp for crawling
# Run the pipeline
export ANTHROPIC_API_KEY=sk-ant-...
mkdir my-study-site && cd my-study-site
video-to-notebook init --language en # or `--language zh` (default)
# YouTube playlist
video-to-notebook crawl "https://www.youtube.com/playlist?list=PLxxx" --name cs336
# Bilibili — single video, season list, or series. Cookies required (see § Bilibili below).
video-to-notebook crawl "https://www.bilibili.com/video/BVxxx/" --name vizuara-llm --cookies-from chrome
video-to-notebook tag --ontology examples/ontology-llm.yaml --use-api # ~$0.10/course
video-to-notebook cluster --ontology examples/ontology-llm.yaml --use-api # ~$0.30/run
video-to-notebook build
video-to-notebook serve # http://localhost:4321The
--use-apiflag opts in to the Anthropic SDK path. Without it, the CLI defaults to writing in-session prompts (Option A).
Total cost for a 5-course corpus: ~$2-4 first run, $0 on re-runs (idempotent).
Everything orbits around a single SQLite file under .video-to-notebook/ that every stage reads from and writes to. Stages are independent CLI commands, so you can re-run any one of them without touching the others.
Each subcommand is idempotent and resumable:
- Add a new course → only that course gets crawled / tagged.
- Re-run
cluster→ picks up new proposed tags, doesn't re-process settled ones. build --incremental→ re-renders only what changed.
Output is a static site — deploy anywhere (GitHub Pages, S3, Vercel, Netlify, your own nginx).
# 1. Python CLI (3.12+)
pip install video-to-notebook
# or: uv tool install video-to-notebook
# 2. External requirements
brew install node yt-dlp # Node 20+ for the HTML build; yt-dlp for crawling
playwright install chromium # only if running e2e tests
# 3. (Optional) Whisper fallback for videos without published captions
pip install 'video-to-notebook[whisper]' # mlx-whisper on macOS + faster-whisper everywhereWhen a video has no published captions, pass --whisper to crawl and the pipeline downloads the audio, transcribes it locally (mlx-whisper on Apple Silicon, faster-whisper on Linux/Windows), and feeds the synthesised VTT into the same chunk pipeline. No API key, no network round-trip after the audio download.
video-to-notebook crawl "https://www.bilibili.com/video/BV..." \
--name my-course \
--cookies-from edge \
--whisper \
--whisper-lang zh # optional: skip auto-detect, hint Chinese
# done: 12 ok, 8 via whisper, 0 no-subs, 0 errors--whisper-model overrides the default. On mlx the default is mlx-community/whisper-large-v3-turbo — large-v3 distilled, ~800 MB weights, ~2× real-time on M-series. Empirically this is the sweet spot for bilingual lectures: produces Simplified Chinese with natural punctuation (vs. small which tends to emit Traditional Chinese and drop all punctuation). For absolute best accuracy on English technical terms, pass --whisper-model mlx-community/whisper-large-v3-mlx (~3 GB, ~1× real-time). On faster-whisper the default is small; pass large-v3 for the same quality bump.
The Bilibili crawler accepts three URL shapes, all of them properly enumerated by crawl:
| URL shape | Example | Behaviour |
|---|---|---|
| Single video (multi-part) | https://www.bilibili.com/video/BVxxx/ |
Each ?p=N becomes one lecture |
| Space season list | https://space.bilibili.com/<uid>/lists/<id>?type=season |
Each entry is a separate BV; resolves to its own canonical video page |
| Series / fav list | https://space.bilibili.com/<uid>/channel/seriesdetail?sid=<id> |
Same as season |
Authentication: Bilibili requires logged-in cookies for both playlist enumeration and audio download. Two options:
# A) Read cookies from a browser session (simplest if it works)
video-to-notebook crawl "<bilibili-url>" --name <slug> --cookies-from chrome --whisper
# B) Use a manually-exported cookies.txt (most portable — works around
# macOS Keychain blocking yt-dlp from decrypting Chrome v10 cookies,
# and survives bilibili's aggressive anti-scraping that breaks
# unauthenticated requests with HTTP 412)
video-to-notebook crawl "<bilibili-url>" --name <slug> \
--cookies-file ~/path/to/bilibili-cookies.txt --whisperFor option B, install a Chrome extension like "Get cookies.txt LOCALLY", navigate to bilibili.com while logged in, export Netscape-format cookies. Save outside ~/Downloads, ~/Desktop, ~/Documents (those are TCC-protected on macOS and yt-dlp can't read them) — ~/note/, ~/code/, or any subfolder of ~/Documents/ (not ~/Documents/ itself) all work.
In v2.0.0 the project was renamed from course-merger to video-to-notebook. Existing projects migrate in three commands — no re-crawl, no re-tag, the SQLite schema is unchanged:
# inside the project directory
mv .course-merger .video-to-notebook # rename the marker
uv tool upgrade video-to-notebook # or: pip install -U video-to-notebook
video-to-notebook build # back to workThe course-merger binary still exists as a back-compat shim — it prints a one-line deprecation notice and forwards to video-to-notebook. Scheduled for removal in v3.0.0. See CHANGELOG.md for the full rationale.
Every LLM stage defaults to a two-phase in-session flow (v2.3+): the CLI writes a JSON envelope of pending work to <state_dir>/prompts/<step>.json and exits; the agent reads it, reasons, writes the decisions JSON to the sibling .decisions.json path; the CLI applies that to SQLite when you re-invoke with --apply. The protocol is agent-agnostic — schemas + conventions live in docs/AGENT_PROTOCOL.md.
git clone https://github.com/LinZhuoChen/video-to-notebook.git
bash video-to-notebook/skills/video-to-notebook/scripts/install-locally.shThen in Claude Code:
Full skill manifest at |
git clone https://github.com/LinZhuoChen/video-to-notebook.git
cd my-study-site
bash video-to-notebook/skills/video-to-notebook/scripts/install-codex.sh
codex # Codex reads AGENTS.mdOr install globally so Codex knows about video-to-notebook from anywhere: bash video-to-notebook/skills/video-to-notebook/scripts/install-codex.sh --globalCodex reads |
|
Any agent that reads JSON, reasons, writes JSON can drive the pipeline. The When writing the results envelope, set the agent-id ( |
If you don't use an agent, set The textbook-generation stages ( |
agent says "Crawl this playlist and tag using examples/ontology-llm.yaml."
CLI loop:
video-to-notebook init && video-to-notebook crawl <url>
for batch in chunks_of(20):
video-to-notebook tag --limit 20 # writes prompts/tag.json
agent reads prompts/tag.json,
writes prompts/tag.decisions.json
video-to-notebook tag --apply
video-to-notebook cluster # writes prompts/cluster.json
agent reads + writes prompts/cluster.decisions.json
video-to-notebook cluster --apply
same for curriculum / synthesize (per chapter) / explain (per concept)
video-to-notebook build
(For an end-to-end worked example, see examples/frontier-notebook/RUNBOOK.md.)
| API key mode | In-session mode | |
|---|---|---|
| API key required | ✅ yes (--use-api) |
❌ no |
| Cost for demo corpus | ~$2-4 | $0 extra (covered by your agent's subscription) |
| Speed for 1000 chunks | ~5-10 min | ~1-2 hours |
| Speed for 100 chunks | ~30 sec | ~5-10 min |
| Curriculum / synthesize / explain | ❌ not available | ✅ this is the only mode |
| Best for | Large corpora, one-shot batch | Small/medium corpora + textbook generation |
After tag + cluster, synthesize the corpus into a merged textbook:
# 1. Design the chapter sequence (in-session in Claude Code)
video-to-notebook curriculum # writes prompts/curriculum.json
# Claude reads it, designs the chapter order, writes prompts/curriculum.decisions.json
video-to-notebook curriculum --apply
# 2. The agent asks you: batch mode or chapter-by-chapter?
# - 整本批量做: loops through all N chapters, applies as it goes, build once.
# Best for re-runs once you trust the style.
# - 一章一章来: synthesizes chapter 1, builds, hands control back so you can
# inspect /textbook/1/ and give feedback before chapter 2.
# Best for the first run on a new corpus.
# 3. For each chapter (whichever mode):
video-to-notebook synthesize --chapter N # writes prompts/synthesize/chapter-N.json
# Agent reads + writes /tmp/chN.html following the v3 style guide:
# - TL;DR callout at the top + 8–14 numbered sections (一二三四 …)
# - Step-by-step derivations with **Why**: annotations per line
# - All distinctive lecturer analogies preserved (don't collapse to one)
# - 3–5 callout boxes (info/note/warning/tip/quote) inline
# - Engineering details embedded as callouts, not deferred
# - Complete PyTorch skeleton when a model is introduced
# - 5–7 takeaways anchored to lecturer-given examples
# - Target body length: 5,000–8,000 中文字 per chapter
video-to-notebook synthesize --chapter N --apply
# 4. Build & view
video-to-notebook build
video-to-notebook serve # http://localhost:4321/textbook/Each chapter is a self-contained HTML fragment with inline SVG, CSS animations, embedded source-video iframes (timestamp-deep-linked), LaTeX math (KaTeX-rendered), and colour-coded callout boxes. The v3 style guide (src/video_to_notebook/synthesize/prompts.py) enforces the source-fidelity + textbook-note-depth principles automatically — any agent driving the pipeline inherits them.
Linear textbooks are great for first-time readers. The concept encyclopedia is for the reader looking up one idea in depth:
video-to-notebook explain --concept linear-algebra # writes prompts/explain/linear-algebra.json
# Claude writes prompts/explain/linear-algebra.decisions.json following the v2 style guide:
# - per-concept CSS namespace prefix (la-)
# - CSS-variable-only colors (works in light + dark + per-module accent)
# - 9 fixed sections in order
# - one of 3 interactive widget templates
video-to-notebook explain --concept linear-algebra --apply
video-to-notebook build # /concepts/<slug>/ now serves the rich explainerThe v2 style guide in src/video_to_notebook/explain/prompts.py enforces:
- Anti-bias opener every entry must begin by naming a common misunderstanding and correcting it
- One-invariant rule every animation/interaction must make exactly ONE invariant visible
- Equation-chain rule every formula shows the substitution chain (no "可以推出 X")
- Counter-example pitfalls each of 3 misconceptions must include a specific numerical or visual counter-example
- See-also constraint all linked slugs must exist in
related_conceptsenvelope
The built site (video-to-notebook build && video-to-notebook serve) ships with:
| Feature | What it does |
|---|---|
| 🌓 Dark mode | html.dark class + prefers-color-scheme fallback; toggle in header; localStorage persistence |
| 🎨 Per-module accent | Green / blue / purple / amber / rose, scoped via data-module-idx; cards, sidebars, drop caps all inherit |
| 📊 Chapter mini-map | Right rail tracks h2/h3 headings with IntersectionObserver; click to scroll-anchor |
| ⌨️ Keyboard nav | ← / → step through chapters; ignored in inputs; floating hint chip |
| 📱 Mobile drawer | Hamburger button <900 px opens slide-in panel; mirrors textbook sidebar; Escape/backdrop closes |
| 🔍 Search | Client-side via Pagefind |
| 🧮 LaTeX math | KaTeX auto-renders $...$ inline / $$...$$ block |
| 🎬 Video deep links | Every concept page lists source clips with timestamp-deep-linked iframes |
Per course (50-100 lectures, ~1500 chunks):
| Stage | Model | Cost |
|---|---|---|
| Crawl | n/a (yt-dlp) | $0 |
| Tag | Claude Haiku (prompt caching) | ~$0.10-0.30 |
| Cluster | Claude Sonnet | ~$0.20-0.50 |
| Curriculum | in-session Claude | $0 extra |
| Synthesize (per chapter) | in-session Claude | $0 extra |
| Explain (per concept) | in-session Claude | $0 extra |
| Build | n/a (Astro) | $0 |
| Total per course | ~$0.30-0.80 |
For a 5-course corpus expect ~$2-4 first run. Re-runs are free thanks to per-chunk idempotency.
The examples/frontier-notebook/ directory is the recommended starting point:
cp -r examples/frontier-notebook examples/my-corpus
# Edit examples/my-corpus/courses.toml and examples/my-corpus/ontology.yaml
bash examples/my-corpus/build.shThe build script chains crawl/tag/cluster/build, reads courses.toml, and lands a working site at examples/my-corpus/.video-to-notebook-project/site/dist/.
Shipped: v1.0 foundation · v1.1 in-session mode · v1.2 textbook generator · v1.3 concept encyclopedia + design-system polish · v1.4 multi-agent support (Codex + Cursor + Continue alongside Claude Code) · v2.0 project renamed to video-to-notebook, source-fidelity + textbook-depth quality discipline, chapter chunk-selection regression fixed, full zh/en i18n on prompts + Astro UI, automatic template overlay-sync on every build · v2.1 Whisper fallback — videos without published captions transcribe locally via mlx-whisper (Apple Silicon) or faster-whisper (cross-platform), feeding the synthesised VTT back into the same chunk pipeline · v2.1.1 robust Bilibili — season / list / series URLs now resolve to per-BV canonical pages (fixed silent dedup bug where every entry got the same audio); new --cookies-file <path> flag for Netscape-format cookies.txt, working around macOS Keychain blocking and bilibili's 412 anti-scraping; tested end-to-end on a real 14-video 8-hour B站 course (see CHANGELOG.md).
Deferred:
- Live filter on compare view client-side
?courses=cs336,gpu-modeselection -
reviewCLI human-in-the-loop dispatch forambiguouscluster decisions - Multi-language concept aliasing dedicated Chinese ↔ English concept name pairs (current i18n covers UI + generated prose, not slug-level aliasing)
- Design spec:
docs/specs/2026-05-09-video-to-notebook-skill-design.md - Implementation plans (TDD-decomposed):
- Plan 1: Foundation + Crawl
- Plan 2: Tag + Cluster
- Plan 3: Build + HTML
- Plan 4: Demo + Deploy + Skill
- Plan 6: Textbook generator
- Plan 7: Bilingual demo
video-to-notebook is a tool. The user is responsible for ensuring they have the right to crawl, process, and redistribute any content fed through this pipeline. This includes YouTube / Bilibili Terms of Service governing programmatic content access, the original creator's license on the lecture content, and fair use / transformative use considerations in the user's jurisdiction.
The tool's authors disclaim responsibility for content generated by users. Personal study use is generally low risk. Public redistribution or commercial use of synthesized content may not be. Consult the source materials' licenses before going beyond personal use.
See CONTRIBUTING.md. PRs welcome — particularly new crawler adapters and ontology files for non-AI/CS domains.
MIT — see file for details.
Made with 🤖 + ☕ by chenlinzhuo. Built atop Claude Code, Astro, yt-dlp, Pagefind, KaTeX.
If video-to-notebook saved you a weekend of YouTube / Bilibili binging, give it a ⭐ on GitHub.