This feature provides real-time live captioning for presentations using Whisper.cpp.
⚠️ Local Use Only: This feature is designed for local development and presentations running on your own machine. It will not work on deployed static sites or remote servers.
The caption system polls a JSON file that gets updated in real-time by Whisper.cpp as you speak. The slides display the live transcript as it’s generated.
Whisper.cpp is included as a git submodule. After cloning this repository, initialize and build it:
# Initialize and fetch the whisper.cpp submodule
git submodule update --init --recursive
# Navigate to the submodule directory
cd presentations/whisper.cpp
# Install SDL2 (required for microphone capture)
brew install sdl2
# Build with CMake
cmake -B build -DWHISPER_SDL2=ON
cmake --build build --config Release
This creates the whisper-stream binary at presentations/whisper.cpp/build/bin/whisper-stream.
Download a Whisper model. The multilingual models support 99 languages including French, Spanish, German, Japanese, and many more.
| Model | Parameters | English-only | Multilingual | Memory | Speed vs large |
|---|---|---|---|---|---|
| tiny | 39M | tiny.en | tiny | ~1 GB | ~10x faster |
| base | 74M | base.en | base | ~1 GB | ~7x faster |
| small | 244M | small.en | small | ~2 GB | ~4x faster |
| medium | 769M | medium.en | medium | ~5 GB | ~2x faster |
| large-v3 | 1550M | - | large-v3 | ~10 GB | baseline |
| large-v3-turbo | 809M | - | large-v3-turbo | ~6 GB | ~8x faster |
English-only models (.en suffix) perform better for English transcription, especially tiny.en and base.en.
Multilingual models (without .en) support 99 languages and can also translate to English.
Quantized models (with -q5_0, -q5_1, or -q8_0 suffix) are smaller and faster but slightly less accurate.
base.en (good balance of speed and accuracy)base (multilingual)medium or large-v3-turbotiny or tiny.en# From the whisper.cpp directory
# English-only (recommended for English)
bash ./models/download-ggml-model.sh base.en
# Multilingual (supports French, Spanish, German, Japanese, etc.)
bash ./models/download-ggml-model.sh base
# For better accuracy with multilingual
bash ./models/download-ggml-model.sh medium
The multilingual models support 99 languages including:
European: French, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Greek, Swedish, Norwegian, Danish, Finnish, Czech, Romanian, Hungarian, Bulgarian, Croatian, Serbian, Slovak, Slovenian, Lithuanian, Latvian, Estonian, Icelandic, Irish, Welsh, Catalan, Basque, Galician, and more.
Asian: Chinese (Mandarin & Cantonese), Japanese, Korean, Hindi, Tamil, Telugu, Bengali, Urdu, Thai, Vietnamese, Indonesian, Malay, Tagalog, and more.
Middle Eastern & African: Arabic, Hebrew, Persian, Turkish, Swahili, Yoruba, Hausa, Amharic, and more.
Other: Haitian Creole, Maori, Hawaiian, and more.
See the full list of supported languages
This downloads the model to presentations/whisper.cpp/models/ (e.g., ggml-base.en.bin or ggml-base.bin).
If you installed whisper.cpp in the presentations directory, the scripts will auto-detect the paths. Otherwise, set these environment variables:
export WHISPER_BIN="./presentations/whisper.cpp/build/bin/whisper-stream"
export WHISPER_MODEL="./presentations/whisper.cpp/models/ggml-base.en.bin"
Add these to your ~/.zshrc or ~/.bashrc to make them permanent.
From the ox.ca project root:
npm run dev:whisper
This script:
whisper-stream with your microphonepresentations/whisper-demo/transcript.jsonOpen your presentation in a browser:
# If not already running, start a local server
npm run serve
# or
python3 -m http.server 5500
Navigate to your slides. The caption button in the menu bar will turn green (🟢 Captions On) when the transcript file is being updated.
data-transcript-src attributes display the live textThe scripts will attempt to auto-detect whisper.cpp in these locations (in order):
$WHISPER_BIN environment variable (if set)./presentations/whisper.cpp/build/bin/whisper-stream (relative to project root)./whisper.cpp/build/bin/whisper-stream (if running from presentations dir)/usr/local/bin/whisper-streamFor models:
$WHISPER_MODEL environment variable (if set)./presentations/whisper.cpp/models/ggml-base.en.bin (relative to project root)./whisper.cpp/models/ggml-base.en.bin (if running from presentations dir)To add live captions to any presentation:
<script src="ca-slides/whisper-transcript.js"></script>
<div class="live-transcript" data-transcript-src="../whisper-demo/transcript.json"></div>
The caption button in the menu bar will automatically show status and provide setup instructions.
npm run dev:whisper is runningls -l presentations/whisper-demo/transcript.jsonwhisper-stream -m /path/to/model.bin (should show live transcript in terminal)small.en or medium.en) for better accuracyWHISPER_BIN and WHISPER_MODEL environment variablesls presentations/whisper.cpp/build/bin/whisper-streamls presentations/whisper.cpp/models/ggml-base.en.binAll processing happens locally on your machine. No audio or transcript data is sent to external servers.