Live Captioning with Whisper
This feature provides real-time live captioning for presentations using Whisper.cpp.
⚠️ Local Use Only: This feature is designed for local development and presentations running on your own machine. It will not work on deployed static sites or remote servers.
Overview
The caption system polls a JSON file that gets updated in real-time by Whisper.cpp as you speak. The slides display the live transcript as it’s generated.
Prerequisites
- Node.js and npm installed
- A working microphone
- whisper.cpp compiled on your system
Setup Instructions
1. Build Whisper.cpp
Whisper.cpp is included as a git submodule. After cloning this repository, initialize and build it:
# Initialize and fetch the whisper.cpp submodule
git submodule update --init --recursive
# Navigate to the submodule directory
cd presentations/whisper.cpp
# Install SDL2 (required for microphone capture)
brew install sdl2
# Build with CMake
cmake -B build -DWHISPER_SDL2=ON
cmake --build build --config Release
This creates the whisper-stream binary at presentations/whisper.cpp/build/bin/whisper-stream.
2. Download a Model
Download a Whisper model. The multilingual models support 99 languages including French, Spanish, German, Japanese, and many more.
Available Models
| Model | Parameters | English-only | Multilingual | Memory | Speed vs large |
|---|---|---|---|---|---|
| tiny | 39M | tiny.en | tiny | ~1 GB | ~10x faster |
| base | 74M | base.en | base | ~1 GB | ~7x faster |
| small | 244M | small.en | small | ~2 GB | ~4x faster |
| medium | 769M | medium.en | medium | ~5 GB | ~2x faster |
| large-v3 | 1550M | - | large-v3 | ~10 GB | baseline |
| large-v3-turbo | 809M | - | large-v3-turbo | ~6 GB | ~8x faster |
English-only models (.en suffix) perform better for English transcription, especially tiny.en and base.en.
Multilingual models (without .en) support 99 languages and can also translate to English.
Quantized models (with -q5_0, -q5_1, or -q8_0 suffix) are smaller and faster but slightly less accurate.
Recommended Models
- For English presentations:
base.en(good balance of speed and accuracy) - For French or other languages:
base(multilingual) - For best accuracy:
mediumorlarge-v3-turbo - For fastest speed:
tinyortiny.en
Download a Model
# From the whisper.cpp directory
# English-only (recommended for English)
bash ./models/download-ggml-model.sh base.en
# Multilingual (supports French, Spanish, German, Japanese, etc.)
bash ./models/download-ggml-model.sh base
# For better accuracy with multilingual
bash ./models/download-ggml-model.sh medium
Supported Languages
The multilingual models support 99 languages including:
European: French, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Greek, Swedish, Norwegian, Danish, Finnish, Czech, Romanian, Hungarian, Bulgarian, Croatian, Serbian, Slovak, Slovenian, Lithuanian, Latvian, Estonian, Icelandic, Irish, Welsh, Catalan, Basque, Galician, and more.
Asian: Chinese (Mandarin & Cantonese), Japanese, Korean, Hindi, Tamil, Telugu, Bengali, Urdu, Thai, Vietnamese, Indonesian, Malay, Tagalog, and more.
Middle Eastern & African: Arabic, Hebrew, Persian, Turkish, Swahili, Yoruba, Hausa, Amharic, and more.
Other: Haitian Creole, Maori, Hawaiian, and more.
See the full list of supported languages
This downloads the model to presentations/whisper.cpp/models/ (e.g., ggml-base.en.bin or ggml-base.bin).
3. Configure Environment Variables (Optional)
If you installed whisper.cpp in the presentations directory, the scripts will auto-detect the paths. Otherwise, set these environment variables:
export WHISPER_BIN="./presentations/whisper.cpp/build/bin/whisper-stream"
export WHISPER_MODEL="./presentations/whisper.cpp/models/ggml-base.en.bin"
Add these to your ~/.zshrc or ~/.bashrc to make them permanent.
4. Start the Caption Listener
From the ox.ca project root:
npm run dev:whisper
This script:
- Launches
whisper-streamwith your microphone - Writes the live transcript to
presentations/whisper-demo/transcript.json - Updates the file every 500ms as new words are recognized
5. View the Slides
Open your presentation in a browser:
# If not already running, start a local server
npm run serve
# or
python3 -m http.server 5500
Navigate to your slides. The caption button in the menu bar will turn green (🟢 Captions On) when the transcript file is being updated.
How It Works
- whisper-stream: Captures audio from your microphone and performs speech-to-text conversion in real-time
- run-whisper.js: Node script that runs whisper-stream and writes output to JSON
- whisper-transcript.js: Client-side script that polls the JSON file every 2 seconds
- Transcript boxes: HTML elements with
data-transcript-srcattributes display the live text
Default Paths
The scripts will attempt to auto-detect whisper.cpp in these locations (in order):
$WHISPER_BINenvironment variable (if set)./presentations/whisper.cpp/build/bin/whisper-stream(relative to project root)./whisper.cpp/build/bin/whisper-stream(if running from presentations dir)/usr/local/bin/whisper-stream
For models:
$WHISPER_MODELenvironment variable (if set)./presentations/whisper.cpp/models/ggml-base.en.bin(relative to project root)./whisper.cpp/models/ggml-base.en.bin(if running from presentations dir)
Adding Captions to Your Slides
To add live captions to any presentation:
- Load the whisper-transcript.js script:
<script src="ca-slides/whisper-transcript.js"></script> - Add a transcript box to any slide:
<div class="live-transcript" data-transcript-src="../whisper-demo/transcript.json"></div>
The caption button in the menu bar will automatically show status and provide setup instructions.
Troubleshooting
Caption button stays grey
- Check that
npm run dev:whisperis running - Verify the transcript.json file is being created and updated:
ls -l presentations/whisper-demo/transcript.json - Check browser console for fetch errors
No audio captured
- Check microphone permissions in System Preferences
- Verify the correct microphone is selected (whisper-stream uses the default input device)
- Test with:
whisper-stream -m /path/to/model.bin(should show live transcript in terminal)
Transcript is delayed or incorrect
- Try a larger model (e.g.,
small.enormedium.en) for better accuracy - Speak clearly and at a moderate pace
- Reduce background noise
Path not found errors
- Set explicit paths with
WHISPER_BINandWHISPER_MODELenvironment variables - Check that whisper.cpp is compiled:
ls presentations/whisper.cpp/build/bin/whisper-stream - Verify model file exists:
ls presentations/whisper.cpp/models/ggml-base.en.bin
Privacy Note
All processing happens locally on your machine. No audio or transcript data is sent to external servers.
