A powerful Streamlit web application that leverages OpenAI's language models and LangChain to extract insights directly from YouTube video transcripts. This chatbot allows users to ask questions about the content of any YouTube video, providing detailed, context-aware responses based on the video's transcript.
Check out the live app here: YouTube Transcript Chatbot 🚀
- Intelligent Transcript Analysis: Uses advanced AI to extract and understand video content
- Interactive Web Interface: Built with Streamlit for a user-friendly experience
- OpenAI Integration: Utilizes GPT-3.5-turbo for generating accurate responses
- Flexible Querying: Ask any question about the video's content
- Robust Error Handling: Provides clear feedback and error messages
- Transcript Chunking: Implements intelligent text splitting for comprehensive analysis
- Python 3.8+
- OpenAI API Key
-
Clone the repository:
git clone https://github.com/yourusername/youtube-transcript-chatbot.git cd youtube-transcript-chatbot
-
Install dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API Key:
- Create a
.env
file in the project root - Add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
- Create a
-
Launch the Streamlit app:
streamlit run app.py
-
In the web interface:
- Enter your OpenAI API Key
- Paste a YouTube video URL
- Ask a question about the video's content
- Click "Get Answer" to receive a detailed response
app.py
: Streamlit user interface handlingyoutube_retriever.py
: Core logic for transcript retrieval and AI processing- Utilizes LangChain for document loading, text splitting, and retrieval
def load_split_transcript(url):
"""Loads YouTube transcript and splits into manageable chunks"""
loader = YoutubeLoader.from_youtube_url(url)
transcript = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100
)
split_transcript = text_splitter.split_documents(transcript)
return split_transcript
- Support for multiple video sources
- Caching mechanism for faster repeated queries
- Enhanced error handling for various URL formats
- Option to download or export conversation
- Multilingual transcript support
- Improved context retention across multiple questions
- Streamlit
- LangChain
- OpenAI
- YouTube Transcript API
- python-dotenv
- Requires a valid OpenAI API key
- Be mindful of API usage and associated costs
- Responses are generated based on transcript content
- Not all videos may have accessible transcripts
Khaled Soudy
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
This project is open-source and available under the MIT License.
🎬 Unlock the insights hidden in YouTube videos, one question at a time! 🤖