TextClassification: A Python Library for simple text classification

What is it?

The purpose of this library is to make text classification easily available. It relies on three components which work together:

Preprocessor: Reads in data and stores feature vectors and predictions
Featurizer: Extracts features out of text data
Classifier: Uses extracted features to train a classification model and do inference on unseen instances

The documentation where the functionality of each of the components is explained can be found here.

Installation

To install this library, execute the following commands in your terminal:

git clone https://github.com/bogdankostic/TextClassification.git
cd TextClassification
pip install -r requirements.txt
pip install --editable .

Usage

Training a new classifier requires only three steps:

Read data using a Preprocessor
Extract features using a Featurizer
Pass the data with extracted feature to a Classifier

Code example:

from text_classification.preprocessor.csv_preprocessor import CSVPreprocessor
from text_classification.featurizer.tweet_featurizer import TweetFeaturizer
from text_classification.classifier.class_average import ClassAverageClassifier

preprocessor = CSVPreprocessor(train_filename="train.tsv")

featurizer = TweetFeaturizer()
featurizer.extract_features(preprocessor)

classifier = ClassAverageClassifier()
classifier.train(preprocessor)

Example

This library has been built around the Hillary Clinton and Donald Trump Tweets dataset, which can be downloaded from here: https://www.kaggle.com/benhamner/clinton-trump-tweets

The example script that trains and evaluates a model on the dataset can be found in this here. This example achieves an accuracy and a macro-averaged F1-score of 0.57.

Tests

Tests can be executed using the command pytest from within the test folder TextClassification/test.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
test		test
text_classification		text_classification
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextClassification: A Python Library for simple text classification

What is it?

Installation

Usage

Example

Tests

About

Releases

Packages

Languages

bogdankostic/TextClassification

Folders and files

Latest commit

History

Repository files navigation

TextClassification: A Python Library for simple text classification

What is it?

Installation

Usage

Example

Tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages