Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists

Requirements

Implementation was done in a conda environment with the latest version of Python 3.8. After installing Anaconda or Miniconda, create a conda environment via:

conda create -n env python=3.8
conda activate env

Most required packages can be installed via the requirements.txt file calling pip recursively:

pip install -r requirements.txt

Unzip the clts zip folder from the OSF project or download version 2.2.0 CLTS:

wget https://github.com/cldf-clts/clts/archive/refs/tags/v2.2.0.zip
unzip v2.2.0.zip

Furthermore, version 0.9 of the Lexibank version of the NorthEuraLex dataset has to be installed. It is also provided with some adjustments made to the Manchu data (these changes are also available when downloading the git tree rather than the release version). This is how you would obtain it from GitHub:

wget https://github.com/lexibank/northeuralex/archive/refs/tags/v4.0.zip
unzip v4.0.zip
cd northeuralex-4.0
pip install .

As soon as the dataset is installed, configure the path to the CLTS data by running cldfbench catconfig and then entering the absolute path into the config file in /home/$USER/.config/cldf/catalog.ini:

[clones]
clts = /path/to/clts

Then, cd back to the location of this package and install it:

cd path/to/repo
pip install .

And you're set!

Training

The models can be trained by running the notebook train_nelex.ipynb. The models along with the data used for training & testing will be saved automatically in notebooks/out/. Otherwise you can use the pretrained models by downloading the nelex_unique zip folder and extracting it into the out folder. Only models for NELEX10 are provided.

Analysis

One notebook performs the analysis for a single language. The results are output in the form of latex tables and plots (used with minimal changes in the thesis document itself). The table below maps the notebooks to the type of analysis:

Notebook	Experiment	Description
all.ipynb	Masking	Compares mean surprisal in the vowel-only and consonant-only condition for all languages in NorthEuraLex
nelex10.ipynb	Masking	Evaluates surprisal in the masking experiments for the languages in NELEX10
finnish.ipynb	Harmony	Feature surprisal for Finnish +-BACK feature
hungarian.ipynb	Harmony	Feature surprisal for Hungarian +-BACK feature
turkish.ipynb	Harmony	Feature surprisal for Turkish +-BACK and +-ROUND features
manchu.ipynb	Harmony	Feature surprisal for Manchu +-BACK feature
khalkha_mongolian	Harmony	Feature surprisal for Khalkha Mongolian +-ATR and +-ROUND features
non_vh_langs.ipynb	Harmony	Feature surprisal for languages without vowel harmony, for +-BACK and +-ROUND features
surprisal_reduction.iypnb		Plots difference between harmonic and disharmonic distribution for all feature-language combinations for NELEX10
vh_vs_non_vh.ipynb		Plots mean differences for the 5 vowel harmony languages for +-BACK and +-ROUND by feature

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebooks		notebooks
scripts		scripts
src/eff		src/eff
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
northeuralex_lang_ids.txt		northeuralex_lang_ids.txt
northeuralex_vowel_harmony.csv		northeuralex_vowel_harmony.csv
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists

Requirements

Training

Analysis

About

Releases 2

Packages

Languages

License

uds-lsv/vowel-harmony-from-word-lists

Folders and files

Latest commit

History

Repository files navigation

Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists

Requirements

Training

Analysis

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages