DuolingoSharedTask

Brendan Tomoschuk's and Jarrett Lovelett's entry to the Duolingo Shared Task on Second Language Acquisition Modeling. See conference paper in Documents/. Given basic user data and limited feature set, predict probability of translations errors at the token level.

Setup & Use

Clone this repository and then download the provided datasets from here. Save data to Data/, preserving the following directory structure (keep existing filenames):
/DuolingoSharedTask
    /Data
        /data_en_es
        /data_es_en
        /data_fr_en
makeDataFrame.ipynb: a notebook that reads in the data, processes it, generates new features, and saves the datasets using pickle (one for each target language).
buildModel-forest.ipynb: a notebook that reads in the pickled data and builds a random forest classifier for each language separately and generates predictions for the test set. Options to use training data to predict dev set instead of test set, or to include dev set in the training data.
See Data/starter_code for baseline.py, a baseline model provided by the shared task organizers (reads in raw data, do not use makeDataFrame.ipynb) and eval.py, a script that evaluates the predictions generated by the baseline model and/or buildModel.ipynb and reports several metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Data		Data
Documents		Documents
Notebooks		Notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuolingoSharedTask

Setup & Use

About

Releases

Packages

Languages

tomoschuk/DuolingoSharedTask

Folders and files

Latest commit

History

Repository files navigation

DuolingoSharedTask

Setup & Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages