Skip to content

feat(transcribe): add local Whisper transcription helper#60

Open
abman4444 wants to merge 1 commit into
browser-use:mainfrom
abman4444:add-local-whisper-transcriber
Open

feat(transcribe): add local Whisper transcription helper#60
abman4444 wants to merge 1 commit into
browser-use:mainfrom
abman4444:add-local-whisper-transcriber

Conversation

@abman4444

@abman4444 abman4444 commented Jun 8, 2026

Copy link
Copy Markdown

What

Adds helpers/transcribe_whisper.py — a drop-in alternative to the ElevenLabs Scribe transcribe.py helper for users who don't have an API key or prefer a free, fully offline option.

Why

The existing pipeline requires an ElevenLabs API key for transcription. Users on free-tier accounts or those who want a zero-cost setup have no fallback. Local Whisper fills that gap.

How it works

  • Uses openai-whisper with word_timestamps=True to produce word-level timestamps
  • Output JSON matches the Scribe envelope shape (words, text, language_code, alignment) so pack_transcripts.py, render.py, and the rest of the pipeline work unchanged
  • Caches transcripts per source file — skips re-transcription on repeat runs (same behaviour as Scribe helper)
  • Extracts mono 16kHz WAV via ffmpeg before passing to Whisper (same as the Scribe path)
  • Supports all Whisper model sizes (tinylarge); defaults to medium (good balance of speed and accuracy for English)

Usage

# Basic
python helpers/transcribe_whisper.py my_video.mp4

# Custom model and language
python helpers/transcribe_whisper.py my_video.mp4 --model large --language en

# Custom output directory
python helpers/transcribe_whisper.py my_video.mp4 --edit-dir /path/to/edit

Reviewer notes

  • No changes to existing files — purely additive
  • Requires openai-whisper (pip install openai-whisper) and ffmpeg on PATH, both already listed as setup dependencies
  • Word-level timestamp accuracy is slightly lower than Scribe (Whisper drifts ~50–100ms) but within the cut-padding working window defined in SKILL.md

Summary by cubic

Add a local Whisper transcription helper that outputs Scribe-compatible JSON, enabling offline, zero-API-key transcription without changing the rest of the pipeline. Adds caching and model selection for better control and speed.

  • New Features

    • Added helpers/transcribe_whisper.py using openai-whisper with word_timestamps=True.
    • Emits Scribe-compatible JSON (words, text, language_code, alignment) so pack_transcripts.py and render.py work unchanged.
    • Extracts mono 16 kHz WAV via ffmpeg before transcription.
    • Caches transcripts per source file to skip repeat runs.
    • Supports all Whisper models (tinylarge); defaults to medium. Optional --language.
  • Dependencies

    • Requires openai-whisper and ffmpeg available on PATH.

Written for commit ff6884c. Summary will update on new commits.

Review in cubic

Adds helpers/transcribe_whisper.py — a drop-in alternative to the
ElevenLabs Scribe transcribe.py helper for users who don't have an
API key or prefer a free, offline option.

- Uses openai-whisper with word_timestamps=True to produce word-level
  timestamps matching the Scribe JSON envelope shape, so pack_transcripts.py,
  render.py, and the rest of the pipeline work unchanged
- Caches transcripts per source (skips re-transcription on repeat runs)
- Supports all Whisper model sizes (tiny → large); defaults to medium
- Extracts mono 16kHz WAV via ffmpeg before transcription (same as Scribe path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@guivinicius

Copy link
Copy Markdown

Thats pretty useful PR. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants