GitHub - abhy-kumar/NLPulse: This program aims to give a sentiment score to each headline featured in today's top stories. Eventually it aims to gather insights about the top headlines being reported.

📊 Overview

A comprehensive Python application that performs real-time sentiment analysis on news headlines, storing the results in a SQLite database and generating interactive visualizations. The system employs multiple sentiment analysis models, including VADER, FinBERT, and RoBERTa, to provide nuanced sentiment scoring.

🌟 Key Features

Multi-Model Sentiment Analysis: Combines VADER, FinBERT, and RoBERTa models for robust sentiment scoring
Real-time RSS Feed Processing: Automatically fetches and analyzes news headlines
Interactive Visualizations: Comprehensive dashboards using Plotly
Efficient Data Storage: SQLite database with optimized indexing
Duplicate Detection: Intelligent similarity-based duplicate removal
Comprehensive Analysis: Including timeline views, sentiment distributions, and statistical breakdowns

Custom Configuration

from news_analysis import DatabaseManager, SentimentAnalyzer, DataVisualizer

# Initialize components
db = DatabaseManager('custom_database.db')
analyzer = SentimentAnalyzer()
visualizer = DataVisualizer()

# Run specific analyses
visualizer.create_visualizations('custom_database.db')

📊 Visualization Types

Main Dashboard

Daily Entry Counts
Hourly Distribution
Sentiment Timeline
Summary Length Distribution
Sentiment Distribution
Weekly Patterns
Sentiment Moving Average
Headline Length vs Sentiment
Time of Day Sentiment

Headlines Analysis

Recent Headlines Table
Most Positive Headlines
Most Negative Headlines
Statistical Summaries

🗄️ Database Schema

sentiment_scores Table

CREATE TABLE sentiment_scores (
    date TEXT,
    time TEXT,
    title TEXT,
    summary TEXT,
    score REAL
)

Indexes

idx_date: Optimizes date-based queries
idx_title: Facilitates headline searches
idx_score: Improves sentiment-based filtering

🔍 Duplicate Detection

The project includes functions to:

Eliminate duplicate or near-duplicate entries based on a similarity threshold.
Provide analysis and cleanup of the dataset for better performance and accuracy.

Configuration

# Adjust similarity threshold (default: 0.85)
remove_duplicates(db_path='news_sentiment.db', similarity_threshold=0.90)

📈 Performance Optimization

Database Optimization

Write-Ahead Logging (WAL) mode
Optimized cache settings
Efficient indexing strategy
Regular VACUUM operations

Processing Optimization

Thread pooling for parallel sentiment analysis
LRU caching for frequently accessed data
Batch processing capabilities
GPU acceleration when available

📝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Guidelines

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

NLTK team for VADER sentiment analysis
Hugging Face for transformer models
Plotly team for visualization capabilities
Contributors and maintainers of all dependent libraries

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
LICENSE		LICENSE
NLPulse.ipynb		NLPulse.ipynb
NLPulse_v0.1.ipynb		NLPulse_v0.1.ipynb
NLPulse_v0.3.ipynb		NLPulse_v0.3.ipynb
README.md		README.md
database_report.html		database_report.html
headlines_dashboard.html		headlines_dashboard.html
main_dashboard.html		main_dashboard.html
news_sentiment.db		news_sentiment.db
news_sentiment_TOI.db		news_sentiment_TOI.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Overview

🌟 Key Features

Custom Configuration

📊 Visualization Types

Main Dashboard

Headlines Analysis

🗄️ Database Schema

sentiment_scores Table

Indexes

🔍 Duplicate Detection

Configuration

📈 Performance Optimization

Database Optimization

Processing Optimization

📝 Contributing

Guidelines

📄 License

🙏 Acknowledgments

About

Languages

License

abhy-kumar/NLPulse

Folders and files

Latest commit

History

Repository files navigation

📊 Overview

🌟 Key Features

Custom Configuration

📊 Visualization Types

Main Dashboard

Headlines Analysis

🗄️ Database Schema

sentiment_scores Table

Indexes

🔍 Duplicate Detection

Configuration

📈 Performance Optimization

Database Optimization

Processing Optimization

📝 Contributing

Guidelines

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages