Note: This is my first project! 😊
This repository contains a Python toolkit for text analysis. It provides various functions and tools for processing and analyzing text data, including:
- Tokenization and sentence segmentation
- Text cleaning (lowercasing, stop word removal, punctuation removal)
- Lemmatization
- Named Entity Recognition (NER)
- Sentiment Analysis
- Frequency Analysis
- Topic Modeling
- Dispersion Plots
- Word Cloud Generation
To use this toolkit, follow these steps:
-
Clone the repository to your local machine:
git clone https://github.com/your-username/text-analysis-toolkit.git
2.Install the required Python packages by running:
pip install -r requirements.txt
3.Place your text files (e.g., "Syrian.txt" and "MyCountry.txt") in the root directory of the project.
4.Modify the main() function in text_analysis.py to process your specific text data and analysis tasks.
5.Run the script with the input file paths as arguments:
python text_analysis.py Syrian.txt MyCountry.txt
6.Explore the results, including charts, dispersion plots, and word clouds generated by the toolkit.
Here's a brief overview of how to use the functions provided by the toolkit:
1.:closed_book: read_book(file_path): Read and load your text data from the specified file.
-
📈 perform_sentiment_analysis(): Analyze the sentiment of your text.
-
👪 find_most_common_names(): Find the most common person names in the text.
-
📊 create_name_frequency_chart(): Create bar charts of the most common names.
-
📈 create_dispersion_plot(): Generate dispersion plots for specific words.
-
📄 perform_topic_modeling_on_real_words(): Perform topic modeling on your text data.
-
☁️ create_word_cloud(): Create word clouds to visualize word frequency.
If you'd like to contribute to this project or have suggestions for improvement, please don't hesitate to reach out! I'm learning as I go and appreciate your input.
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) [2023] [Cristina Matacuta]