An advanced web application for efficiently summarizing Bengali news articles.
- 🔍 Overview
- 🎯 Problem Statement
- ✨ Features
- 🧠 Model Architecture
- 📁 Project Structure
- 🚀 Installation
- 🖥 Usage
- 📄 License
Bengali Text Summarizer is a web application developed as part of the CSE499B Senior Design Project II for the B.Sc. final year project. It allows users to input Bengali news articles and generate concise summaries, addressing the challenge of navigating through vast amounts of Bengali news content efficiently.
- Web Crawler: Initially developed to fetch articles from various Bengali newspaper portals.
- Data Collection: Gathered articles and summaries from multiple sources.
- Model Training: Utilized a pre-trained model (google/mt5-small) with a Seq2Seq architecture.
- Web Interface: Built using Next.js 15 and React 19 for a responsive and user-friendly experience.
The proliferation of Bangla news portals presents a challenge in navigating so many articles within a limited time, compounded by the language's inherent complexity.
- 📈 Overwhelming volume of Bengali news content online
- ⏳ Limited time for comprehensive reading
- 🔤 Inherent complexity of the Bengali language
- 🧠 Difficulty in quickly grasping essential information from articles
- 📝 Input Bengali news articles
- 🤖 Generate concise summaries
- 📊 Responsive design for various devices
- 🌓 Dark mode support
Our summarization model is based on the Seq2Seq architecture using the pre-trained google/mt5-small
model and MT5Tokenizer.
Set | Metric | Text (token count) | Summary (token count) |
---|---|---|---|
Training | Mean length | 1576.52 | 61.15 |
Max length | 9645 | 316 | |
Min length | 23 | 5 | |
Std length | 943.45 | 25.43 | |
Validation | Mean length | 1266.48 | 56.78 |
Max length | 2559 | 105 | |
Min length | 153 | 22 | |
Std length | 540.35 | 17.93 | |
Test | Mean length | 1302.62 | 57.51 |
Max length | 2548 | 105 | |
Min length | 182 | 21 | |
Std length | 542.46 | 17.75 |
import torch from transformers import MT5ForConditionalGeneration, MT5Tokenizer
model_name = "google/mt5-small" tokenizer = MT5Tokenizer.from_pretrained(model_name) model = MT5ForConditionalGeneration.from_pretrained(model_name)
train_inputs = tokenize_data(df_4_train, max_length=512, max_target_length=100) val_inputs = tokenize_data(df_4_val, max_length=512, max_target_length=100) test_inputs = tokenize_data(df_4_test, max_length=512, max_target_length=100)
training_args = Seq2SeqTrainingArguments( output_dir="./results", eval_strategy="epoch", learning_rate=1e-5, per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=5, weight_decay=0.01, save_total_limit=2, predict_with_generate=True, save_safetensors=False )
Epoch | Training Loss | Validation Loss |
---|---|---|
1 | 1.046200 | 0.692979 |
2 | 1.015400 | 0.683604 |
3 | 1.027700 | 0.676918 |
4 | 0.988000 | 0.672858 |
5 | 0.994400 | 0.671896 |
src /
├── app/
│ ├── api/
│ │ └── bts-summarize // BTS summarization API endpoint
│ ├── fonts // Custom fonts directory
│ ├── globals.css // Global CSS styles
│ ├── layout.tsx // Root layout component
│ └── page.tsx // Home page component
├── component/
│ ├── layout/
│ │ ├── footer.tsx // Footer component
│ │ └── NavigationBar.tsx // Navigation bar component
│ ├── page-contents/
│ │ ├── AdditionalContents/
│ │ │ ├── AuthDialog.tsx // User authentication dialog
│ │ │ ├── FacultyAdvisor.tsx // Faculty advisor details
│ │ │ ├── ProjectMetadata.tsx // Brings back project metadata together
│ │ │ ├── ProjectOverview.tsx // Project overview section
│ │ │ ├── StatsCard.tsx // Statistical metrics card
│ │ │ ├── TeamMembers.tsx // Team members list
│ │ │ └── TrainingChart.tsx // Training data chart
│ │ └── SummaryGenerator/
│ │ ├── ArticleInput.tsx // Input for articles to summarize
│ │ ├── ArticleList.tsx // List of articles
│ │ ├── ArticleSummary.tsx // Summarized article display
│ │ ├── CategoryList.tsx // Article category list
│ │ ├── Header.tsx // Header for Summary Generator
│ │ ├── MainContent.tsx // Main content area
│ │ ├── Sidebar.tsx // Sidebar navigation
│ │ └── SummaryGenerator.tsx // Main Summary Generator component
│ └── ui // Shadcn UI components
├── context/
│ └── ThemeContext.tsx // Theme context for app theming
├── hooks/
│ └── useSummaryGenerator.tsx // Custom hook for Summary Generator
└── lib/
├── constants.ts // Application-wide constants
├── errors.ts // Error handling utilities
├── huggingface.ts // Hugging Face API utilities
├── types.ts // TypeScript types and interfaces
├── utils.ts // Utility functions
└── validation.ts // Data validation functions
- Clone the repository:
- Navigate to the project directory:
- Install dependencies:
- Start the development server:
git clone https://github.com/your-username/bengali-text-summarizer.git
cd bengali-text-summarizer
npm install
npm run dev
- Open your browser and navigate to
http://localhost:3000
- Input a Bengali news article in the provided text area
- Click the "Summarize" button
- View the generated summary
This project is licensed under the MIT License. See the LICENSE file for details.