WhisperPen Design Document

Project Overview

WhisperPen is a command-line tool that leverages speech recognition and AI to convert spoken words into enhanced text. It combines OpenAI's Whisper model for accurate speech recognition with Ollama's Qwen 2.5 32B model for text enhancement.

User Requirements

Primary Requirements

Speech Recognition
- Accept voice input from users
- Convert speech to text accurately
- Support Chinese language input
- Translate to English
- Support wake word detection
  - Wake word: "小王小王"
  - Background listening
  - Low resource usage
  - Quick response time
AI Enhancement
- Use local Ollama platform
- Utilize Qwen 2.5 32B model
- Enhance text quality
- Maintain professional tone
Output Management
- Save to whisperpen.md
- Auto-copy to clipboard
- Support multiple outputs

Enhanced Requirements

Improved Recognition
- Offline processing
- Noise reduction
- Better accuracy
- Fast response time
Performance Optimization
- Configuration caching
- Quick environment check
- Efficient resource usage
- Temporary file management

Technical Implementation

Component Architecture

Speech Handler (speech_handler.py)

class SpeechHandler:
    def __init__(self):
        # Initialize Whisper model
        # Configure audio settings
        # Setup noise reduction

Text Processor (text_processor.py)

class TextProcessor:
    def __init__(self):
        # Initialize Qwen model
        # Configure processing parameters

File Handler (file_handler.py)

class FileHandler:
    def __init__(self):
        # Setup file management
        # Configure clipboard

Wake Word Detector (wake_detector.py)

class WakeDetector:
    def __init__(self):
        # Initialize PocketSphinx
        # Configure wake word model
        # Setup background listening

Processing Pipeline

Audio Capture
- Sample rate: 44100Hz
- Bit depth: 16-bit
- Channel: Mono
- Noise reduction: Butterworth filter
- Preprocessing: scipy signal processing
- Volume normalization: Required
- Signal-to-noise ratio: Needs improvement
Wake Word Detection
- Engine: PocketSphinx
- Wake word: "小王小王"
- Mode: Background listening
- Resource usage: Minimal
- Response time: < 0.5s
- States:
  - Sleeping (waiting for wake word)
  - Waking (transitioning)
  - Active (listening for commands)
  - Processing (handling input)
Speech Recognition
- Model: OpenAI Whisper base
- Model size: Upgrade to medium/large for better accuracy
- Language: Chinese
- Format: WAV
- Mode: Offline processing
- Initial prompt: Add language context
- Temperature: Lower for more accurate results
- Model Loading:
  - Cache model to disk
  - Lazy loading strategy
  - Optimize memory usage
  - Support model quantization
- Performance Optimization:
  - Model quantization (int8)
  - Batch processing
  - Smaller model for initial pass
  - Parallel processing
  - GPU acceleration if available
Text Enhancement
- Model: qwen2.5:32b
- Task: Translation + Enhancement
- Context: Professional
- API: Ollama local deployment
Output Management
- Format: Markdown
- Location: whisperpen.md
- Clipboard: Automatic
- Cache: Configuration persistence
- Display Format:
  - Show original recognition
  - Show enhanced version
  - Use rich formatting
  - Support comparison view

Quality Assurance

Performance Metrics
- Recognition accuracy > 95%
- Processing time < 5s
- Memory usage < 4GB
Error Handling
- Audio capture failures
- Recognition errors
- Model loading issues
- File system errors
User Experience
- Clear progress indicators
- Helpful error messages
- Intuitive interface

Future Enhancements

Planned Features
- Multiple language support
- Custom model selection
- Batch processing
- Configuration UI
Technical Debt
- Code optimization
- Test coverage
- Documentation
- Performance monitoring

Version Control

All changes must be documented in:

changelog.md - Feature and requirement changes
README.md - User-facing documentation
This design document - Technical specifications

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design_doc.md

design_doc.md

WhisperPen Design Document

Project Overview

User Requirements

Primary Requirements

Enhanced Requirements

Technical Implementation

Component Architecture

Processing Pipeline

Quality Assurance

Future Enhancements

Version Control

Files

design_doc.md

Latest commit

History

design_doc.md

File metadata and controls

WhisperPen Design Document

Project Overview

User Requirements

Primary Requirements

Enhanced Requirements

Technical Implementation

Component Architecture

Processing Pipeline

Quality Assurance

Future Enhancements

Version Control