A set of AI tools to generate and convert thematic content
We're deep into the era of bigdata being processed and reprocessed over and over to be fed into processing infrastructure again. This happens so often that data that the infrastructure sees was probably already processed and mutated several times—starting from autocorrection and filters of social media all the way to social bots dumping copies of copies back into the internet and LLMs learning from the content that was already generated by its predecessors.
This repository contains various AI scripts to process automatically generated information into the formats that can then be more easily fed back into AI for bigdata processing and such. Everything below will be written by an AI. Goodbye.
The youtube.py
script provides functionality to:
- Download and normalize YouTube video transcripts
- Cache processed transcripts for efficiency
- Integrate with OpenRouter AI for transcript normalization
- Error handling with Telegram notifications
- Progress tracking for transcription tasks
- Copy
.env.example
to.env
- Configure your environment variables:
OPENROUTER_API_KEY
: Your OpenRouter API keyTELEGRAM_BOT_TOKEN
: Your Telegram bot token for notificationsTELEGRAM_CHAT_ID
: Your Telegram chat ID for receiving notifications
The system includes comprehensive error handling with Telegram notifications:
- ❌ Error notifications for failed operations
- ✅ Success notifications for completed transcriptions
- Detailed error traces for debugging
- Thread-safe error handling for concurrent processing
Run the YouTube transcript processor:
python youtube.py
You'll be prompted for:
- Path to glossary/docs file
- YouTube video URLs (enter blank line to finish)
The script will process videos concurrently and send status updates via Telegram.