This project explores automating the process of editing podcast interviews with the help of AI models. Usually, the process of editing a podcast interview content comes down to:
- Split audio from video
- Generate accurate transcripts with correct speaker names
- Analyze the interview (video/audio) and edit it into multiple clips with each has a specific topic or highlight
- Generate Chapters for each of the clip as well
- Write an article about the interview to publish on media such as Medium
- Write out intriguing social media posts for each clip to spread it out in public
This project aims to automate most of the editing work with multiple AI models, including Whisper from OpenAI, WhisperX from Oxford University, LLMs such as Claude or GPT3.5 and GPT4 (API), and other useful tools such as LangChain and VectorDBs.
Below is a rough draft of the system design of this project:
Environment Setup: Suggested Python Version: 3.10 & PyTorch 2.0 to match WhisperX
Refer to Audio_Transcript Folder (current code is a little sloppy, will be improved over time)
- You can extract audio from the video with
video_to_audio.py
- Audio transcription and diarization (speaker assignment) with
video_transcription.py
(Need Whisper and WhisperX properly installed and imported)- Customize your input and output file path for the transcript in the python script, and you can also determine whether to use CPU or GPU for the transcription task (In our case we used CPU)
- You also will need your
Hugging Face access token
and accept the user agreements for couple models for accessing WhisperX, you can check out theWhisperX
github repo for more info.
- After running the code and get your transcript, you still need to replace the speaker label with thier real names or any names you customized. This can be done with
speaker_correction.py
where you will be prompted to enter the speaker mappings in adict
and the script will do the job for you and save the corrected transcript to the output path you defined.
For the sake of easiness, using a third party framework here would be a quick solution for us now to deal with long texts and complicated tasks and to build out a MVP of this automation tool.