Author: Marco V. Torresi
Version: 20240822-1920
Audio to Text Transcriber is a Python application that transcribes audio files into text using OpenAI's Whisper API. This tool supports batch processing of audio files, saving transcriptions in Markdown (.md
) and SubRip Subtitle (.srt
) formats. The application features a graphical user interface (GUI) with folder path caching, real-time debugging output, and an "Always on Top" option for ease of use.
- Batch Processing: Transcribe multiple audio files in one go by selecting entire folders.
- Multiple Output Formats: Save transcriptions as Markdown and SRT files.
- Real-Time Debugging: View detailed logs of the transcription process in the GUI.
- Folder Path Caching: Automatically remember the last used folders for input and output.
- Always on Top: Keep the application window on top of other windows for better accessibility.
First, clone the repository to your local machine:
git clone https://github.com/your-username/audio-to-text-transcriber.git
cd audio-to-text-transcriber
Install the required Python packages using pip
:
pip install -r requirements.txt
You'll need an OpenAI API key to use Whisper for transcription. Create a .env
file in the root directory of the project and add your API key:
OPENAI_API_KEY=your_openai_api_key_here
To run the Audio to Text Transcriber, execute the following command from the root directory of the project:
python main.py
Once the application is running, you will be presented with a graphical interface.
- Target Folder: Click "Browse Target" to select the folder containing your audio files.
- Output Folder: Click "Browse Output" to select the folder where the transcriptions will be saved.
- RUN: Click the "RUN" button to start the transcription process. Progress will be displayed in the debug area.
- CANCEL: If a transcription process is running, you can click "CANCEL" to stop it.
- EXIT: Close the application by clicking the "EXIT" button.
- Always on Top: Enable the "Always on Top" checkbox to keep the window on top of other applications.
The application supports the following audio formats:
.mp3
.wav
.flac
- Markdown (
.md
): Contains the plain text transcription of the audio. - SubRip Subtitle (
.srt
): Includes timestamps and segmenting for subtitle-style output.
main.py
: Entry point for the application. Initializes the GUI and starts the Tkinter main loop.gui_app.py
: Contains the main logic for the graphical user interface (GUI), handling user input, folder selection, and process control.transcription.py
: Manages the interaction with the Whisper API for transcribing audio files. Handles the saving of transcription results in both Markdown and SRT formats.utils_file.py
: Utility module for file handling, including retrieving lists of audio files from the selected directories.target_dir.txt
/output_dir.txt
: Cache files that store the last used directories for input and output folders. These files help the application remember previous folder selections.
This project relies on the following dependencies:
requests
python-dotenv
tkinter
- Other dependencies are listed in the
requirements.txt
file.
- Multi-Language Support: Add an option to specify the input language for more accurate transcriptions of non-English audio.
- Batch Progress Indicators: Display progress bars or other visual indicators to show the progress of batch transcription jobs.
- Error Handling: Improve error handling to manage network issues or unsupported file formats more gracefully.
- Custom Output Formats: Allow users to customize the transcription output formats (e.g., plain text, JSON, VTT).
This project is licensed under the MIT License. See the LICENSE
file for details.
This project was developed with the assistance of AI tools, including OpenAI's ChatGPT. Some of the code and content within this repository were generated or enhanced with the help of these AI models. I value transparency in development and want to credit the AI assistance that contributed to the completion of this project.