Skip to content
/ gest Public

GEST - A dataset for measuring gender-stereotypical reasoning in machine translation and language models.

License

Notifications You must be signed in to change notification settings

kinit-sk/gest

Repository files navigation

GEST Dataset

This is a repository for the GEST dataset used to measure gender-stereotypical reasoning in language models and machine translation systems.

Changelog

  • December 6th 2024 - data/gest_1.1.csv was added. This is a new version that has 244 typos and other errors fixed.

Data

data folder contains various versions of the GEST dataset and predictions made by various models used to evaluate them.

gest.csv

This is a simple canonical version of the GEST dataset that contains only the filtered samples. Each line contains a text of the sample and the stereotype ID.

  • sentence - Text of the sample.
  • stereotype - Stereotype ID [1,16].

annotations.csv

This are all the samples we have collected from data creators before we filtered them, along with all our annotations.

  • sentence - Sentence proposed by the data creator.
  • stereotype - Stereotype ID [1,16] assigned to the sentence.
  • gender_strong_female - The following 5 columns show the annotation given by the first annotator to the sentence without seeing the stereotype. The question was: Stereotypically, is the sentence female or male?
  • gender_female
  • gender_neutral
  • gender_male
  • gender_strong_male
  • match_strong_agree - The following 5 columns show the annotation given by the first annotator to the sentence after seeing the stereotype. The question was: Does the stereotype match the sentence?
  • match_agree
  • match_neutral
  • match_disagree
  • match_strong_disagree
  • final_verdict - Final decision given by the second annotator: yes, fix or no.
  • final_stereotype - Final stereotype ID [1,16] assigned by the second annotator.
  • comment - Comment written by the first annotator.
  • fix - Fixed sentence proposed by either annotator.
  • first_annotator - Initials of the first data annotator.
  • second_annotator - Initials of the second data annotator.
  • creator - Initials of the data creator.

gender_variants.csv

Sentences where we have both male and female versions translated via the same machine translation system. See Section 4.3.2 in the paper for more details. Note that the samples here are not filtered according to the criteria mentioned in the paper (only one word difference, the same first letter).

  • translator - Machine translation system used to translate the original English sentence.
  • language - Target language of the male and female translations.
  • original - Original English sentence.
  • stereotype - Stereotype ID.
  • male - Translation to the target language with the masculine gender of the first person.
  • female - Translation to the target language with the feminine gender of the first person.

data_guidelines.pdf

This files contains the instructions that were available for the human data creators that created the GEST dataset. We have both the Slovak version that was used, as well as the English version to make the guidelines more accessible.

translations/

translation.csv files contain translations generated by various machine translation systems.

  • from - Original English sentence.
  • to - Translation to the target language.

predictions/english_mlm/

Scores generated for individual samples generated by various English MLMs for various templats. See Section 4.2 for more details. The particular models and template are indicated in the file names. Each line is the score for a sample, the order matches the order of samples in gest.csv.

predictions/slavic_mlm/

Scores generated for individual samples generated by various multilingual MLMs. See Section 4.3 for more details. The particular models are indicated in the file names. Each line is the score for a sample, the order matches the order of samples in gender_variants.csv.

Code

Notebooks

Notebooks _mt.ipynb, _english_mlm.ipynb, and _slavic_mlm.ipynb were used to analyze the results and visualize data. They match sections 4.1, 4.2, and 4.3 respectively.

_inference.ipynb was used to generate translations, parses, and other predictions.

Dockerfile

The best way to run the code is to use the included Dockerfile to build the environment and run the code, e.g.:

docker build . -t gest
docker run --gpus all -p 8888:8888 -v ${PWD}:/labs -it gest

--gpus all is optional.

Translators

The code work with various paid machine translation services. You can make them work by adding appropriate auth files to the config directory.

  • aws_access_key and aws_secret_key for the Amazon Translate.
  • chatgpt_auth with the OpenAI auth key.
  • deepl_auth with the DeepL auth key.
  • gcp.json file with the Google Cloud Platform service account key.

About

GEST - A dataset for measuring gender-stereotypical reasoning in machine translation and language models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages