Text-Summarizer is a powerful tool that leverages PyTorch and Transformer models to automatically generate concise and informative summaries from text. Whether you're dealing with lengthy articles, research papers, or any other form of written content, our tool will help you extract the essence and main points in a fraction of the time.
We've harnessed the capabilities of the T5Tokenizer Transformer in PyTorch to create a robust summarization pipeline. Here's how it works:
- Input sequences and their corresponding target sequences are used for training (Supervised Learning).
- The input sequence is encoded, and the encoded hidden states are passed through cross-attention layers to the decoder.
- The decoder generates the summary output in an auto-regressive manner.
- The generated summaries are stored in a .tsv file for easy access.
Precision Score: 0.961
Recall Score: 0.125
Our NLTK-based summarization process involves these steps:
- Data clean-up, including the removal of special characters, stop words, and punctuation.
- Creation of word tokens and sentence tokens using the Natural Language Tool Kit library.
- Evaluation of word frequency and subsequent calculation of weighted frequency for each sentence.
- Generation of a summary by selecting the top 30% of sentences based on their weighted importance.
Precision Score: 0.666
Recall Score: 0.356
- Clone the repository.
- Install the necessary dependencies.
- Run the Text-Summarizer script on your desired input text.
This project was brought to you by the collaborative efforts of:
- Sainik Khaddar
- Saptarshi Pani
- Arindam Saha
For inquiries or feedback, please reach out to us:
Email: [email protected]
This project is licensed under the MIT License.