Whisper.cpp supports word-by-word timestamps, but doesn’t support exporting them in the Podcast Namespace JSON structure. This is a quick guide on how to do that.
- Setup whisper.cpp:
# Clone whisper.cpp and navigate into it’s directory: git clone https://github.com/ggerganov/whisper.cpp.git && cd whisper.cpp/ # Download a language model: bash ./models/download-ggml-model.sh base.en # Build the project: make
- Transcribe an audio file with the
--output-json
and--max-len
arguments:./main -m ./models/ggml-base.en.bin -f <input.wav> -oj -ml 1 # **Warning** # The main example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool. For example, you can use ffmpeg like this: ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le input.wav
- Setup transcripts-to-json:
# Clone transcripts-to-json and navigate into it’s directory: git clone https://github.com/nathangathright/transcripts-to-json.git && cd transcripts-to-json/
- Convert
# SRT to JSON python srt-to-json.py <input.srt> # JSON to JSON python json-to-json.py <input.json>