Podcast Transcription Setup
Ultra Quick Start
-
Install uv (if you don’t have it):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Run the self-contained script:
./transcribe_episodes.py
That’s it! The script automatically manages its own dependencies with uv.
What It Does
- Batch processes all 9 episodes from your
MP3s/directory - Uses faster-whisper with large-v3 model - 4-5x faster than original Whisper
- Lower memory usage - ~2GB vs ~6GB with original whisper
- Voice Activity Detection - Better handling of silence and music
- Creates multiple output formats:
.txt- Clean plain text transcript.srt- Subtitle file with timestamps (for video).vtt- Web-compatible subtitles.json- Detailed data with word-level timing
Output Structure
transcripts/
βββ 01-rizzoli-matt_rodbard.txt
βββ 01-rizzoli-matt_rodbard.srt
βββ 01-rizzoli-matt_rodbard.json
βββ 02-notion-rob_giampietro.txt
βββ ...etc
Performance Notes
- First run: Downloads faster-whisper model (~3GB for large-v3)
- Processing time: ~2-5% of audio length (1-3 minutes per 60-minute episode)
- Memory usage: ~2GB (much lower than original whisper)
- Accuracy: Same as original Whisper, with better VAD (Voice Activity Detection)
- Language: Set to English with confidence scoring
Customization Options
Use faster model (lower quality):
model = whisper.load_model("base") # Much faster
Add speaker detection:
# In transcribe_episode function, add:
result = model.transcribe(mp3_path,
language='en',
word_timestamps=True,
verbose=False,
initial_prompt="This is a podcast conversation between Craig Mod and a guest.")
Output only specific formats:
Comment out the formats you don’t want in the save_transcript() function.
Troubleshooting
“No module named ‘whisper’”:
pip install openai-whisper
Memory errors: Use smaller model:
model = whisper.load_model("base")
FFmpeg errors:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
Very slow performance:
Consider using faster-whisper:
pip install faster-whisper
Integration Ideas
Once you have transcripts, you could:
- Add to episode pages - Include transcript sections
- Enable search - Make episodes searchable by content
- Generate show notes - Extract key topics automatically
- Create clips - Find quotable moments with timestamps
