Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 461 Bytes

README.md

File metadata and controls

8 lines (5 loc) · 461 Bytes

Example baseline source code authorship attribution classifier

Trained with Python 3 sources from https://github.com/Jur1cek/gcj-dataset - year 2020.

Used method is language agnostic, so it is pretty easy to train it with some other programming language, actually you can use it with natural text too.

I am too lazy of requirements.txt - you should be fine with numpy, pandas and sklearn. Also if you are gonna train, do not forget to download input file.