Skip to content

AndraeRay/text-classification-ml-starter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Text Classification Machine Learning Starter Pack

I have come across many different tutorials for machine learning that uses the 20,000 articles dataset, but not many showing how to use your own data.

This starter pack gets you started with using your own data. You can easily delete the subfolders in categories and add your own training data.

This starter kit uses: Scikit-learn (install link) and Jupyter notebok (install link)

This current sample data set is a collection of verses from the American Standard Bible (Public domain Text).

Steps to get started

  1. Clone or download this repo
  2. run jupyter notebook from the main folder.
  3. From the top right select New from your jupyter notebook. Copy and the paste the code.
  4. When ready, just replace the subfolders in categories and adjust the categories definition in the ml_starter.py.

Because of the small data size 7 categories with 10 samples each, the results are not very accurate.

A Large sample size will increase accuracy.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages