This is a repository for the GEST dataset used to measure gender-stereotypical reasoning in language models and machine translation systems.
- Paper: Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling
- The dataset is also available at HuggingFace Datasets.
- December 6th 2024 -
data/gest_1.1.csv
was added. This is a new version that has 244 typos and other errors fixed.
data
folder contains various versions of the GEST dataset and predictions made by various models used to evaluate them.
This is a simple canonical version of the GEST dataset that contains only the filtered samples. Each line contains a text of the sample and the stereotype ID.
sentence
- Text of the sample.stereotype
- Stereotype ID [1,16].
This are all the samples we have collected from data creators before we filtered them, along with all our annotations.
sentence
- Sentence proposed by the data creator.stereotype
- Stereotype ID [1,16] assigned to thesentence
.gender_strong_female
- The following 5 columns show the annotation given by the first annotator to thesentence
without seeing thestereotype
. The question was: Stereotypically, is the sentence female or male?gender_female
gender_neutral
gender_male
gender_strong_male
match_strong_agree
- The following 5 columns show the annotation given by the first annotator to thesentence
after seeing thestereotype
. The question was: Does the stereotype match the sentence?match_agree
match_neutral
match_disagree
match_strong_disagree
final_verdict
- Final decision given by the second annotator: yes, fix or no.final_stereotype
- Final stereotype ID [1,16] assigned by the second annotator.comment
- Comment written by the first annotator.fix
- Fixed sentence proposed by either annotator.first_annotator
- Initials of the first data annotator.second_annotator
- Initials of the second data annotator.creator
- Initials of the data creator.
Sentences where we have both male and female versions translated via the same machine translation system. See Section 4.3.2 in the paper for more details. Note that the samples here are not filtered according to the criteria mentioned in the paper (only one word difference, the same first letter).
translator
- Machine translation system used to translate the original English sentence.language
- Target language of themale
andfemale
translations.original
- Original English sentence.stereotype
- Stereotype ID.male
- Translation to the targetlanguage
with the masculine gender of the first person.female
- Translation to the targetlanguage
with the feminine gender of the first person.
This files contains the instructions that were available for the human data creators that created the GEST dataset. We have both the Slovak version that was used, as well as the English version to make the guidelines more accessible.
translation.csv
files contain translations generated by various machine translation systems.
from
- Original English sentence.to
- Translation to the target language.
Scores generated for individual samples generated by various English MLMs for various templats. See Section 4.2 for more details. The particular models and template are indicated in the file names. Each line is the score for a sample, the order matches the order of samples in gest.csv
.
Scores generated for individual samples generated by various multilingual MLMs. See Section 4.3 for more details. The particular models are indicated in the file names. Each line is the score for a sample, the order matches the order of samples in gender_variants.csv
.
Notebooks _mt.ipynb
, _english_mlm.ipynb
, and _slavic_mlm.ipynb
were used to analyze the results and visualize data. They match sections 4.1, 4.2, and 4.3 respectively.
_inference.ipynb
was used to generate translations, parses, and other predictions.
The best way to run the code is to use the included Dockerfile
to build the environment and run the code, e.g.:
docker build . -t gest
docker run --gpus all -p 8888:8888 -v ${PWD}:/labs -it gest
--gpus all
is optional.
The code work with various paid machine translation services. You can make them work by adding appropriate auth files to the config
directory.
aws_access_key
andaws_secret_key
for the Amazon Translate.chatgpt_auth
with the OpenAI auth key.deepl_auth
with the DeepL auth key.gcp.json
file with the Google Cloud Platform service account key.