A comprehensive Python application that performs real-time sentiment analysis on news headlines, storing the results in a SQLite database and generating interactive visualizations. The system employs multiple sentiment analysis models, including VADER, FinBERT, and RoBERTa, to provide nuanced sentiment scoring.
- Multi-Model Sentiment Analysis: Combines VADER, FinBERT, and RoBERTa models for robust sentiment scoring
- Real-time RSS Feed Processing: Automatically fetches and analyzes news headlines
- Interactive Visualizations: Comprehensive dashboards using Plotly
- Efficient Data Storage: SQLite database with optimized indexing
- Duplicate Detection: Intelligent similarity-based duplicate removal
- Comprehensive Analysis: Including timeline views, sentiment distributions, and statistical breakdowns
from news_analysis import DatabaseManager, SentimentAnalyzer, DataVisualizer
# Initialize components
db = DatabaseManager('custom_database.db')
analyzer = SentimentAnalyzer()
visualizer = DataVisualizer()
# Run specific analyses
visualizer.create_visualizations('custom_database.db')
- Daily Entry Counts
- Hourly Distribution
- Sentiment Timeline
- Summary Length Distribution
- Sentiment Distribution
- Weekly Patterns
- Sentiment Moving Average
- Headline Length vs Sentiment
- Time of Day Sentiment
- Recent Headlines Table
- Most Positive Headlines
- Most Negative Headlines
- Statistical Summaries
CREATE TABLE sentiment_scores (
date TEXT,
time TEXT,
title TEXT,
summary TEXT,
score REAL
)
idx_date
: Optimizes date-based queriesidx_title
: Facilitates headline searchesidx_score
: Improves sentiment-based filtering
The project includes functions to:
- Eliminate duplicate or near-duplicate entries based on a similarity threshold.
- Provide analysis and cleanup of the dataset for better performance and accuracy.
# Adjust similarity threshold (default: 0.85)
remove_duplicates(db_path='news_sentiment.db', similarity_threshold=0.90)
- Write-Ahead Logging (WAL) mode
- Optimized cache settings
- Efficient indexing strategy
- Regular VACUUM operations
- Thread pooling for parallel sentiment analysis
- LRU caching for frequently accessed data
- Batch processing capabilities
- GPU acceleration when available
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- NLTK team for VADER sentiment analysis
- Hugging Face for transformer models
- Plotly team for visualization capabilities
- Contributors and maintainers of all dependent libraries