pTreeTSL

About

This repository contains code used in

Torres, C., Hanson, K., Graf, T., & Mayer, C. (2023). Modeling island effects with probabilistic tier-based strictly local grammars over trees. Proceedings of the Society for Computation in Linguistics. Vol. 4. Article 15.

This code fits a pTSL grammar to a data set that consists of minimalist dependency trees with associated Likert scores.

Structure of the repository

src/

The code use in the paper.

tree.py: This is the code responsible for fitting the model. Its use is described in detail below.
produce_training_file.R: A script to combine Likert ratings from Sprouse et al. (2016) with the corresponding dependency trees.
analyze_results.R: A script for visualizing the results of the model, computing correlations, etc.

data/

This folder contains the following subfolders:

fixed_params: Configuration files specifying which parameters should have projection probabilities fixed to 1.
free_params: Configuration files specifying which parameters should be fit to data.
lexicon: Files specifying mappings between lexical labels and syntactic features.
sprouse_data: Likert ratings from Sprouse et al. (2016)
training_data: Files containing Likert rating - dependency tree pairs.
trees: Annotated trees.

The files with _agg correspond to averaged Likert ratings rather than individual participants' ratings. The files with _filtered include only the island violations used in the paper, while the files without _filtered have the full set of island types from Sprouse et al. (2016). The no_wh files correspond to simulations where nodes with wh features were treated as free parameters.

The paper used the aggregated, filtered data with wh features fixed to 1.

Details about the dependency tree annotation scheme are given in data/annotation.md.

figs/

Some figures used in the paper.

results/

Results of the model.

Running `tree.py`

tree.py can be run from the command line. It expects the following arguments.

training_file (required): The path to the .csv containing the training data. See the training data sets for examples of the required format.
feature_file (required): The path to the .csv containing the mapping from lexical symbols to features. See the training data sets for examples of the required format.
feature_key (optional): The column name in the feautres file that contains features.
free_params (optional): Path to a file containing a list of parameters to be fit. One parameter on each line. If this is not specified, probabilities will be learned for all symbols. Parameters that are not specified here will be assigned projection probabilities of 0, unless they are included in the fixed_params list below.
fixed_params (optional): Path to a file containing a list of parameters that will always project (probability of projection fixed to 1).
beta (optional): Regularization penalty. Higher values will force learned parameters to be closer to 0 or 1 (i.e. more categorical). Defaults to 0 (no regularization). Not used in the paper. You can also pass in a list of values separated by spaces, in which case one optimization will be done for each value of beta.
outfile (optional): Path to save model results to.
name (optional): Model name. Used in output files.
itr (optional): Number of times to re-run optimization).

The command used to run the model in the paper was:

python src/tree.py data/training_data/training_data_agg_filtered.csv data/lexicon/ptreetsl_lexicon_phonetic.csv --feature_key features --free_params data/free_params/free_params_fixed.csv --fixed_params data/fixed_params/fixed_params_wh.csv --outfile results/results.csv

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
figs		figs
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pTreeTSL

About

Structure of the repository

src/

data/

figs/

results/

Running `tree.py`

About

Releases

Packages

Contributors 3

Languages

License

connormayer/pTreeTSL

Folders and files

Latest commit

History

Repository files navigation

pTreeTSL

About

Structure of the repository

src/

data/

figs/

results/

Running tree.py

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Running `tree.py`

Packages