Skip to content

cristinamatacuta/NLP-and-the-Syrian-war-literature

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Analysis Toolkit

Note: This is my first project! 😊

This repository contains a Python toolkit for text analysis. It provides various functions and tools for processing and analyzing text data, including:

  • Tokenization and sentence segmentation
  • Text cleaning (lowercasing, stop word removal, punctuation removal)
  • Lemmatization
  • Named Entity Recognition (NER)
  • Sentiment Analysis
  • Frequency Analysis
  • Topic Modeling
  • Dispersion Plots
  • Word Cloud Generation

Getting Started

To use this toolkit, follow these steps:

  1. Clone the repository to your local machine:

    git clone https://github.com/your-username/text-analysis-toolkit.git
    

2.Install the required Python packages by running:

pip install -r requirements.txt

3.Place your text files (e.g., "Syrian.txt" and "MyCountry.txt") in the root directory of the project.

4.Modify the main() function in text_analysis.py to process your specific text data and analysis tasks.

5.Run the script with the input file paths as arguments:

python text_analysis.py Syrian.txt MyCountry.txt

6.Explore the results, including charts, dispersion plots, and word clouds generated by the toolkit.

Usage

Here's a brief overview of how to use the functions provided by the toolkit:

1.:closed_book: read_book(file_path): Read and load your text data from the specified file.

  1. 📈 perform_sentiment_analysis(): Analyze the sentiment of your text.

  2. 👪 find_most_common_names(): Find the most common person names in the text.

  3. 📊 create_name_frequency_chart(): Create bar charts of the most common names.

  4. 📈 create_dispersion_plot(): Generate dispersion plots for specific words.

  5. 📄 perform_topic_modeling_on_real_words(): Perform topic modeling on your text data.

  6. ☁️ create_word_cloud(): Create word clouds to visualize word frequency.

Contributing

If you'd like to contribute to this project or have suggestions for improvement, please don't hesitate to reach out! I'm learning as I go and appreciate your input.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) [2023] [Cristina Matacuta]