😺 ContraCAT

This repository contains the data and scripts for the paper ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation.

Setup

If you just want the data for your own models and tests, you can find it under data. Go to scripts to reproduce or tweak or data generation and evaluation method.

First install the requirements:

python3 -m venv YOUR_ENVIRONMENT_NAME
pip install -r requirements.txt

If you want to reproduce or modify our data generation, you also need 3 external resources:

ContraPro
GermaNet
some form of mapping from German noun to gender

(Details for each are further down)

Usage

The data and scripts directory each contain two subfolders: adversarial_ContraPro and templates. If you quickly want to look at commands to reproduce our data generation, go to commands.sh.

Adversarial ContraPro

The data for each adversarial attack consists of a file for English (en.txt) and German (de.txt). Attacks of the types possessive-extension and synonym-replacement additionally contain a modified version of the original contrapro.json, since only a subset of ContraPro was modified for these attacks. This means we only want to evaluate on that subset.

The following three example commands show how to create the data for the three types of attacks.

python3 -m scripts.adversarial_attack.phrase_addition -n true_separate -d "es ist wahr" -e "it is true" -a --end_punct .
python3 -m scripts.adversarial_attack.possessive_extension -n david -d "Davids" -e "David's" --de_prepend --en_prepend`
python3 -m scripts.adversarial_attack.synonym_replacement

Templates

The data for our ContraCAT templaes is under data/templates and is further semantically divided into the different steps of coreference. On the lowest level, each concrete template consists of en.txt and de.txt file, as well as a groundtruth file called gender.txt. To generate the template, you call scripts/generate_template.py, e.g.:

python3 -m scripts.templates.generate -n 0_priors/verb -e "Wow! <PRO_NOM> <TRANS_VERB> it." -d "Wow! <PRO_NOM> <TRANS_VERB> <PRO_ACC_3_SIN>."

(for more details see commands.sh)

Scoring and evaluation

To evaluate the attacks or templates, you need to have a scores-file such as the provided example_scores. Each line in that file corresponds to a line in the above mentioned en.txt and de.txt. These scores are then evaluated with two scripts.

For adversarial attacks, we call the original ContraPro evaluation, e.g.:

python3 -m ContraPro.evaluate --reference PATH_TO_contrapro.json --scores scripts/example_scores For possessive extension and synonym replacement the reference json-file has to be a subset,e.g.:

python3 -m ContraPro.evaluate --reference data/adversarial_ContraPro/synonyms/modified_contrapro_subset.json --scores scripts/example_scores

For the evaluation of templates, simply call our script scripts/templates/evaluate.py.

Dependencies for adversarial attack generation

ContraPro: Simply clone the ContraPro repo into the project directory.
GermaNet: this was needed to get a mapping from an English to a German synset (find access here). You will have to change the path in config.py if you download it.
noun-to-gender-mapping: we cannot provide this mapping and you will have to implement get_gender_dict() in utils.py on your own.

Citation

Please cite the following paper:

@inproceedings{stojanovski-etal-2020-contracat,
    title = "{C}ontra{CAT}: Contrastive Coreference Analytical Templates for Machine Translation",
    author = "Stojanovski, Dario  and
      Krojer, Benno  and
      Peskov, Denis  and
      Fraser, Alexander",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.417",
    pages = "4732--4749",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

😺 ContraCAT

Setup

Usage

Adversarial ContraPro

Templates

Scoring and evaluation

Dependencies for adversarial attack generation

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
data		data
resources		resources
scripts		scripts
.gitignore		.gitignore
README.md		README.md
commands.sh		commands.sh
config.py		config.py
example_scores		example_scores
requirements.txt		requirements.txt

BennoKrojer/ContraCAT

Folders and files

Latest commit

History

Repository files navigation

😺 ContraCAT

Setup

Usage

Adversarial ContraPro

Templates

Scoring and evaluation

Dependencies for adversarial attack generation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages