Local Speech-to-Text with the OpenAI Whisper Skill: No API Key Required
Ever needed to transcribe a voice memo, podcast, or meeting recording? The OpenAI Whisper skill for Clawdbot lets your AI assistant transcribe audio files locally—no API keys, no cloud uploads, no subscriptions. Just fast, accurate speech-to-text right on your machine.
Who Needs This?
If you've ever:
- Wanted to search through old voice memos
- Needed transcripts from meeting recordings
- Wanted subtitles for videos
- Had to translate audio from another language
- Wanted to keep sensitive audio data local
...then this skill is for you.
Installation
Installing the skill is straightforward:
npx clawdhub@latest install openai-whisperThe skill requires the Whisper CLI. On macOS:
brew install openai-whisperOn Linux, you'll need Python and pip:
pip install openai-whisperThe first time you run Whisper, it downloads the model to ~/.cache/whisper. The default "turbo" model is about 3GB but offers an excellent speed/accuracy balance.
Basic Usage
Once installed, your Clawdbot can transcribe audio files. Here are the core commands:
Simple transcription to text file:
whisper /path/to/audio.mp3 --model medium --output_format txt --output_dir .Generate subtitles (SRT format):
whisper /path/to/audio.m4a --output_format srtTranslate non-English audio to English:
whisper /path/to/spanish_audio.mp3 --task translate --output_format txtChoosing the Right Model
Whisper offers multiple model sizes. Pick based on your needs:
- tiny / base — Fast, good for quick drafts or when speed matters
- small / medium — Great balance of speed and accuracy for most use cases
- large / turbo — Best accuracy, but slower and needs more RAM
The skill defaults to turbo on recent installs, but you can override:
whisper recording.m4a --model smallPractical Examples
1. Transcribe a voice memo and save as text:
whisper ~/Downloads/voice_memo.m4a --output_format txt --output_dir ~/transcripts2. Create subtitles for a video:
whisper video.mp4 --output_format srt --output_dir .3. Batch process multiple files:
whisper *.mp3 --output_format txt4. Transcribe with timestamps:
whisper interview.mp3 --output_format vttTips & Best Practices
-
Audio quality matters — Clean audio transcribes better. Reduce background noise when possible.
-
Use the right format — Whisper handles mp3, m4a, wav, mp4, and more. No conversion needed.
-
Language detection is automatic — Whisper detects the spoken language, but you can force it with
--language en. -
Translation is one-way — The
--task translateoption converts to English only (not from). -
Output formats — Choose
txt(plain text),srt(subtitles),vtt(web subtitles),tsv(timestamps), orjson(detailed). -
GPU acceleration — If you have a CUDA-compatible GPU, Whisper uses it automatically for faster processing.
Why Local Matters
Using Whisper locally means:
- Privacy — Sensitive recordings never leave your machine
- No costs — No per-minute API charges
- No limits — Transcribe as much as you want
- Offline capable — Works without internet (after model download)
Conclusion
The OpenAI Whisper skill transforms Clawdbot into a powerful transcription assistant. Whether you're processing voice memos, creating video subtitles, or translating foreign-language audio, it's all done locally with zero API hassle.
Links:
Comments (0)
No comments yet. Be the first to comment!