TrQuAD - The Turkish SQuAD

TrQuAD is Turkish SQuAD - Turkish translation of the SQuAD. The dataset can be downloaded from its devoted HuggingFace repo. The dataset is of version 1.1. - there are no empty answer strings.

More about the dataset preparation can be found in our research paper. For the dataset format and more information please visit the HF repo.

Generating the benchmark

We adopted the original SQuAD benchmarking scripts. The benchmarking process can be fired up via run.sh . Here's the set of hyperparameters we used for benchmarking BERTurk:

python3 run_qa.py \
  --model_name_or_path dbmdz/bert-base-turkish-cased \
  --dataset_name turkish-nlp-suite/TrQuAD \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 256 \
  --learning_rate 5e-5 \
  --num_train_epochs 3 \
  --max_seq_length 128 \
  --doc_stride 32 \
  --output_dir berturk \
  --overwrite_output_dir True

The success metrics are same with original SQuAD, exact match and F1 scores. BERTurk achieved scores of 56.09/71.56.

Research paper and citation

Coming soon!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrQuAD - The Turkish SQuAD

Generating the benchmark

Research paper and citation

About

Releases

Packages

Languages

License

turkish-nlp-suite/TrQuAD

Folders and files

Latest commit

History

Repository files navigation

TrQuAD - The Turkish SQuAD

Generating the benchmark

Research paper and citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages