Implementation was done in a conda environment with the latest version of Python 3.8. After installing Anaconda or Miniconda, create a conda environment via:
conda create -n env python=3.8
conda activate env
Most required packages can be installed via the requirements.txt
file calling pip recursively:
pip install -r requirements.txt
Unzip the clts
zip folder from the OSF project or download version 2.2.0 CLTS:
wget https://github.com/cldf-clts/clts/archive/refs/tags/v2.2.0.zip
unzip v2.2.0.zip
Furthermore, version 0.9 of the Lexibank version of the NorthEuraLex dataset has to be installed. It is also provided with some adjustments made to the Manchu data (these changes are also available when downloading the git tree rather than the release version). This is how you would obtain it from GitHub:
wget https://github.com/lexibank/northeuralex/archive/refs/tags/v4.0.zip
unzip v4.0.zip
cd northeuralex-4.0
pip install .
As soon as the dataset is installed, configure the path to the CLTS data by running cldfbench catconfig
and then entering the absolute path into the config file in /home/$USER/.config/cldf/catalog.ini
:
[clones]
clts = /path/to/clts
Then, cd back to the location of this package and install it:
cd path/to/repo
pip install .
And you're set!
The models can be trained by running the notebook train_nelex.ipynb. The models along with the data used for training & testing will be saved automatically in notebooks/out/
. Otherwise you can use the pretrained models by downloading the nelex_unique
zip folder and extracting it into the out
folder. Only models for NELEX10 are provided.
One notebook performs the analysis for a single language. The results are output in the form of latex tables and plots (used with minimal changes in the thesis document itself). The table below maps the notebooks to the type of analysis:
Notebook | Experiment | Description |
---|---|---|
all.ipynb | Masking | Compares mean surprisal in the vowel-only and consonant-only condition for all languages in NorthEuraLex |
nelex10.ipynb | Masking | Evaluates surprisal in the masking experiments for the languages in NELEX10 |
finnish.ipynb | Harmony | Feature surprisal for Finnish +-BACK feature |
hungarian.ipynb | Harmony | Feature surprisal for Hungarian +-BACK feature |
turkish.ipynb | Harmony | Feature surprisal for Turkish +-BACK and +-ROUND features |
manchu.ipynb | Harmony | Feature surprisal for Manchu +-BACK feature |
khalkha_mongolian | Harmony | Feature surprisal for Khalkha Mongolian +-ATR and +-ROUND features |
non_vh_langs.ipynb | Harmony | Feature surprisal for languages without vowel harmony, for +-BACK and +-ROUND features |
surprisal_reduction.iypnb | Plots difference between harmonic and disharmonic distribution for all feature-language combinations for NELEX10 | |
vh_vs_non_vh.ipynb | Plots mean differences for the 5 vowel harmony languages for +-BACK and +-ROUND by feature |