Skip to content

Latest commit

 

History

History
322 lines (289 loc) · 11.9 KB

README.md

File metadata and controls

322 lines (289 loc) · 11.9 KB

Bengali Text Summarizer Logo Bengali Text Summarizer

An advanced web application for efficiently summarizing Bengali news articles.


Python PyTorch NumPy Pandas Hugging Face

NextJS React Node.js NPM TypeScript TailwindCSS Framer Motion Lucide React Lucide React ESLin Next License


📚 Table of Contents


🔍 Project Overview

Bengali Text Summarizer is a web application developed as part of the CSE499B Senior Design Project II for the B.Sc. final year project. It allows users to input Bengali news articles and generate concise summaries, addressing the challenge of navigating through vast amounts of Bengali news content efficiently.

🌐 Key Components

  1. Web Crawler: Initially developed to fetch articles from various Bengali newspaper portals.
  2. Data Collection: Gathered articles and summaries from multiple sources.
  3. Model Training: Utilized a pre-trained model (google/mt5-small) with a Seq2Seq architecture.
  4. Web Interface: Built using Next.js 15 and React 19 for a responsive and user-friendly experience.

🎯 Problem Statement

The proliferation of Bangla news portals presents a challenge in navigating so many articles within a limited time, compounded by the language's inherent complexity.

  • 📈 Overwhelming volume of Bengali news content online
  • ⏳ Limited time for comprehensive reading
  • 🔤 Inherent complexity of the Bengali language
  • 🧠 Difficulty in quickly grasping essential information from articles

✨ Features

  • 📝 Input Bengali news articles
  • 🤖 Generate concise summaries
  • 📊 Responsive design for various devices
  • 🌓 Dark mode support

🧠 Model Architecture

Our summarization model is based on the Seq2Seq architecture using the pre-trained google/mt5-small model and MT5Tokenizer.


📊 Dataset Statistics

Set Metric Text (token count) Summary (token count)
Training Mean length 1576.52 61.15
Max length 9645 316
Min length 23 5
Std length 943.45 25.43
Validation Mean length 1266.48 56.78
Max length 2559 105
Min length 153 22
Std length 540.35 17.93
Test Mean length 1302.62 57.51
Max length 2548 105
Min length 182 21
Std length 542.46 17.75

💻 Model Code Snippet


import torch
from transformers import MT5ForConditionalGeneration, MT5Tokenizer

model_name = "google/mt5-small" tokenizer = MT5Tokenizer.from_pretrained(model_name) model = MT5ForConditionalGeneration.from_pretrained(model_name)

Tokenize the datasets

train_inputs = tokenize_data(df_4_train, max_length=512, max_target_length=100) val_inputs = tokenize_data(df_4_val, max_length=512, max_target_length=100) test_inputs = tokenize_data(df_4_test, max_length=512, max_target_length=100)

Training arguments

training_args = Seq2SeqTrainingArguments( output_dir="./results", eval_strategy="epoch", learning_rate=1e-5, per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=5, weight_decay=0.01, save_total_limit=2, predict_with_generate=True, save_safetensors=False )


📈 Training Results

Epoch Training Loss Validation Loss
1 1.046200 0.692979
2 1.015400 0.683604
3 1.027700 0.676918
4 0.988000 0.672858
5 0.994400 0.671896

📁 Project Structure


src /
├── app/
│   ├── api/
│   │   └── bts-summarize               // BTS summarization API endpoint
│   ├── fonts                           // Custom fonts directory
│   ├── globals.css                     // Global CSS styles
│   ├── layout.tsx                      // Root layout component
│   └── page.tsx                        // Home page component
├── component/
│   ├── layout/
│   │   ├── footer.tsx                  // Footer component
│   │   └── NavigationBar.tsx           // Navigation bar component
│   ├── page-contents/
│   │   ├── AdditionalContents/
│   │   │   ├── AuthDialog.tsx          // User authentication dialog
│   │   │   ├── FacultyAdvisor.tsx      // Faculty advisor details
│   │   │   ├── ProjectMetadata.tsx     // Brings back project metadata together
│   │   │   ├── ProjectOverview.tsx     // Project overview section
│   │   │   ├── StatsCard.tsx           // Statistical metrics card
│   │   │   ├── TeamMembers.tsx         // Team members list
│   │   │   └── TrainingChart.tsx       // Training data chart
│   │   └── SummaryGenerator/
│   │       ├── ArticleInput.tsx        // Input for articles to summarize
│   │       ├── ArticleList.tsx         // List of articles
│   │       ├── ArticleSummary.tsx      // Summarized article display
│   │       ├── CategoryList.tsx        // Article category list
│   │       ├── Header.tsx              // Header for Summary Generator
│   │       ├── MainContent.tsx         // Main content area
│   │       ├── Sidebar.tsx             // Sidebar navigation
│   │       └── SummaryGenerator.tsx    // Main Summary Generator component
│   └── ui                              // Shadcn UI components
├── context/
│   └── ThemeContext.tsx                // Theme context for app theming
├── hooks/
│   └── useSummaryGenerator.tsx         // Custom hook for Summary Generator
└── lib/
    ├── constants.ts                    // Application-wide constants
    ├── errors.ts                       // Error handling utilities
    ├── huggingface.ts                  // Hugging Face API utilities
    ├── types.ts                        // TypeScript types and interfaces
    ├── utils.ts                        // Utility functions
    └── validation.ts                   // Data validation functions


🚀 Installation

  1. Clone the repository:
  2. git clone https://github.com/your-username/bengali-text-summarizer.git
  3. Navigate to the project directory:
  4. cd bengali-text-summarizer
  5. Install dependencies:
  6. npm install
  7. Start the development server:
  8. npm run dev

🖥 Usage

  1. Open your browser and navigate to http://localhost:3000
  2. Input a Bengali news article in the provided text area
  3. Click the "Summarize" button
  4. View the generated summary

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.