Skip to content

Basic Named Entity Recogniztion and Word Frecuency without the use of NLP libraries

Notifications You must be signed in to change notification settings

cristinamatacuta/ManualTextProcessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

ManualTextProcessing

Basic Named Entity Recogniztion and Word Frecuency without the use of NLP libraries

This project performs text analysis on a given book chapter/book, providing various insights such as the longest sentence, longest word, named entities, and the top 10 most used words.

Table of Contents

  • Installation
  • Usage
  • Features
  • Contributing
  • License

Installation

  1. Clone the repository:
git clone https://github.com/cristinamatacuta/ManualTextProcessing

Usage

python processing.py input_file_path

Replace input_file_path with the path to the text file you want to analyze.

Features

  1. 📚 Longest Sentence: Find and display the longest sentence in the text.
  2. 📗 Longest Word: Identify and display the longest word in the text.
  3. 🌆 Named Entities A-L and M-Z: Extract and list named entities from A to L and M to Z, excluding stop words.
  4. 📊 Top 10 Most Used Words: Display the top 10 most frequently used words in the text. 📌 Note: The file containing stop words is included in the repository.

Contributing

If you'd like to contribute to this project or have suggestions for improvement, please don't hesitate to reach out! I'm learning as I go and appreciate your input.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) [2023] [Cristina Matacuta]

About

Basic Named Entity Recogniztion and Word Frecuency without the use of NLP libraries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages