Skip to content

ProText-Analyzer v1.0.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@rubydamodar rubydamodar released this 09 Oct 11:42
· 17 commits to main since this release
f631288

Blackcoffer Logo

ProText Analyzer

📖 Overview

ProText-Analyzer is a powerful tool designed to extract and analyze textual data from online articles. This project focuses on sentiment analysis and readability assessment, providing valuable insights into the nature and complexity of textual content.

🎯 Objectives

  • Extract article content from a list of URLs.
  • Perform sentiment and readability analysis.
  • Present results in a structured format for easy interpretation.

🚀 Features

  • Data Extraction:

    • Retrieves article titles and bodies from specified URLs.
    • Saves content in organized text files for further analysis.
  • Text Analysis:

    • Sentiment Analysis: Calculate positive, negative, and subjectivity scores using TextBlob.
    • Readability Metrics: Evaluate average sentence length, percentage of complex words, and compute the FOG index.
    • Word-Level Metrics: Measure total word count, syllable count per word, average word length, and identify personal pronouns.

🛠️ Technologies Used

  • Programming Language: Python
  • Libraries:
    • requests - for fetching HTML content
    • beautifulsoup4 - for parsing HTML
    • textblob - for sentiment analysis
    • spacy - for advanced text processing
    • syllapy - for counting syllables
    • pandas - for data manipulation and analysis

📥 Installation

To get started with ProText-Analyzer, follow these steps:

  1. Clone the repository:

    git clone https://github.com/rubydamodar/ProText-Analyzer.git
    cd ProText-Analyzer
  2. Install the required libraries:

    pip install requests beautifulsoup4 textblob spacy syllapy pandas

📄 Usage

  1. Prepare your URLs in the Input.xlsx file.
  2. Run the data extraction script:
    python dataextraction.py
  3. Analyze the extracted text using the provided analysis functions.

📝 Output Structure

The results are saved in a structured format (CSV or Excel) containing the following variables:

  • Positive Score
  • Negative Score
  • Polarity Score
  • Subjectivity Score
  • Average Sentence Length
  • Percentage of Complex Words
  • FOG Index
  • Average Number of Words Per Sentence
  • Complex Word Count
  • Word Count
  • Syllable Count Per Word
  • Personal Pronouns
  • Average Word Length

💡 Future Enhancements

  • Implement advanced NLP techniques for improved sentiment analysis.
  • Extend support for multiple languages.
  • Enhance user interface and error handling.

🤝 Acknowledgments

  • Thank you to all contributors and libraries that made this project possible.

📧 Contact

For any inquiries or collaborations, feel free to reach out at [email protected].