Podcast Content Transcription and Editing Automation With LLMs

This project explores automating the process of editing podcast interviews with the help of AI models. Usually, the process of editing a podcast interview content comes down to:

Split audio from video
Generate accurate transcripts with correct speaker names
Analyze the interview (video/audio) and edit it into multiple clips with each has a specific topic or highlight
Generate Chapters for each of the clip as well
Write an article about the interview to publish on media such as Medium
Write out intriguing social media posts for each clip to spread it out in public

This project aims to automate most of the editing work with multiple AI models, including Whisper from OpenAI, WhisperX from Oxford University, LLMs such as Claude or GPT3.5 and GPT4 (API), and other useful tools such as LangChain and VectorDBs. Below is a rough draft of the system design of this project:

Environment Setup: Suggested Python Version: 3.10 & PyTorch 2.0 to match WhisperX

Part I: Video Transcription and Speaker Diarization with Whisper/WhisperX

Refer to Audio_Transcript Folder (current code is a little sloppy, will be improved over time)

You can extract audio from the video with video_to_audio.py
Audio transcription and diarization (speaker assignment) with video_transcription.py (Need Whisper and WhisperX properly installed and imported)
- Customize your input and output file path for the transcript in the python script, and you can also determine whether to use CPU or GPU for the transcription task (In our case we used CPU)
- You also will need your Hugging Face access token and accept the user agreements for couple models for accessing WhisperX, you can check out the WhisperX github repo for more info.
After running the code and get your transcript, you still need to replace the speaker label with thier real names or any names you customized. This can be done with speaker_correction.py where you will be prompted to enter the speaker mappings in a dict and the script will do the job for you and save the corrected transcript to the output path you defined.

Part II: Working with LLMs and LangChain

For the sake of easiness, using a third party framework here would be a quick solution for us now to deal with long texts and complicated tasks and to build out a MVP of this automation tool.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Transcripts		Transcripts
llm_direct		llm_direct
llm_langchain		llm_langchain
transcribe_audio		transcribe_audio
youtube_transcript		youtube_transcript
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SysDesign.png		SysDesign.png
prompts.yml		prompts.yml
requirements.txt		requirements.txt
test_llm_direct.py		test_llm_direct.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Podcast Content Transcription and Editing Automation With LLMs

Part I: Video Transcription and Speaker Diarization with Whisper/WhisperX

Part II: Working with LLMs and LangChain

About

Releases

Packages

Contributors 2

Languages

License

The-Craft-Podcast/Podcast_Content_Automation

Folders and files

Latest commit

History

Repository files navigation

Podcast Content Transcription and Editing Automation With LLMs

Part I: Video Transcription and Speaker Diarization with Whisper/WhisperX

Part II: Working with LLMs and LangChain

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages