I have come across many different tutorials for machine learning that uses the 20,000 articles dataset, but not many showing how to use your own data.
This starter pack gets you started with using your own data. You can easily delete the subfolders in categories and add your own training data.
This starter kit uses: Scikit-learn (install link) and Jupyter notebok (install link)
This current sample data set is a collection of verses from the American Standard Bible (Public domain Text).
- Clone or download this repo
- run
jupyter notebook
from the main folder. - From the top right select
New
from your jupyter notebook. Copy and the paste the code. - When ready, just replace the subfolders in categories and adjust the categories definition in the ml_starter.py.
Because of the small data size 7 categories with 10 samples each, the results are not very accurate.
A Large sample size will increase accuracy.