Proto-word reconstruction with NNs

Get the data

Our self-compiled dataset is included in the repository at data/romance_swadesh*.csv. For the data derived from the Ciobanu 2014 dataset, please email us.

How to train our models

Python requirements

We used Python 3.7.9 in a conda environment. All required packages are listed in requirements.txt. To install them run:

pip install -r requirements.txt

in your local environment.

We expect you to execute the scripts in the source folder.

Command line parameters

This list can be queried in the command line by running a script with the -h flag.

Parameter	Function	Default
`--aligned`	Use the rows containing the manual alignments	`False`
`--ancestor`	Header of the column containing the ancestor of the cognate set	`latin`
`--data`	Path to the file containing the cognate sets	`../data/romance_swadesh_ipa.csv`
`--epochs`	Number of epochs	`10`
`--model`	One of [ipa, asjp, latin]	`ipa`
`--n_hidden`	Number of hidden layers in the feedforward model	`2`
`--ortho`	Use one-hot character embeddings	`False`
`--out_tag`	Flag for the output folder	`swadesh`

Scripts

All Python scripts can be found in scource/. The scripts ending in _cv.py Perform cross-validation on the dataset with 5 cross-validation folds.

Name	Description	Cross-validation
`feedforward.py`	Trains the feedforward model	No
`many2one_lstm.py`	Trains the LSTM model	No
`feedforward_cv.py`	Trains the feedforward model	Yes
`many2one_lstm_cv.py`	Trains the LSTM model	Yes
`ciobanu_rnn.py`	Trains the RNN model	Yes

Sample script calls

To train the feedforward model on ASJP feature encodings and the aligned data:

python feedforward.py --data=../data/romance_asjp_full.csv --model=asjp --aligned --out_tag=swadesh

The results will be saved at out/plots_swadesh_feedforward.

To train the LSTM model on Latin character embeddings on dataset A:

python many2one_lstm.py --data../data/romance_ciobanu_latin.csv --model=latin --ortho

The --ortho flag is required here since we don't have feature encodings for the Latin characters.

To train the feedforward model on IPA character embeddings on dataset B and run cross-validation:

python feedforward_cv.py --data=../data/romance_swadesh_ipa.csv --model=latin --ortho

To train the RNN model on ASJP feature encodings and the aligned data:

python ciobanu_rnn.py --data=../data/romance_asjp_full.csv --model=asjp --ortho --out_tag=swadesh

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.old		.old
data		data
example		example
latex		latex
papers		papers
source		source
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proto-word reconstruction with NNs

Get the data

How to train our models

Python requirements

Command line parameters

Scripts

Sample script calls

About

Releases

Packages

Contributors 2

Languages

justeuer/sopro-nlpwithnn

Folders and files

Latest commit

History

Repository files navigation

Proto-word reconstruction with NNs

Get the data

How to train our models

Python requirements

Command line parameters

Scripts

Sample script calls

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages