Skip to content

tomoschuk/DuolingoSharedTask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuolingoSharedTask

Brendan Tomoschuk's and Jarrett Lovelett's entry to the Duolingo Shared Task on Second Language Acquisition Modeling. See conference paper in Documents/. Given basic user data and limited feature set, predict probability of translations errors at the token level.

Setup & Use

  • Clone this repository and then download the provided datasets from here. Save data to Data/, preserving the following directory structure (keep existing filenames):
    /DuolingoSharedTask
        /Data
            /data_en_es
            /data_es_en
            /data_fr_en

  • makeDataFrame.ipynb: a notebook that reads in the data, processes it, generates new features, and saves the datasets using pickle (one for each target language).

  • buildModel-forest.ipynb: a notebook that reads in the pickled data and builds a random forest classifier for each language separately and generates predictions for the test set. Options to use training data to predict dev set instead of test set, or to include dev set in the training data.

  • See Data/starter_code for baseline.py, a baseline model provided by the shared task organizers (reads in raw data, do not use makeDataFrame.ipynb) and eval.py, a script that evaluates the predictions generated by the baseline model and/or buildModel.ipynb and reports several metrics.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published