Skip to content

Latest commit

 

History

History
57 lines (40 loc) · 1.89 KB

README.md

File metadata and controls

57 lines (40 loc) · 1.89 KB

Multilingual Event Linking

Overview

Setup

# install torch, transformers, datasets and rank-bm25
python -m pip install -r baselines/requirements.txt -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html

Download data from 🤗 datasets

The XLEL-WD dataset can be download directly from 🤗 datasets.

Example: run the following script to download the event dictionary and the mentions for multilingual Wikinews zero-shot evaluation.

# dataset is downloaded to baselines/data/wikinews-zero-shot-multilingual
python download_data.py \
    --config wikinews-zero-shot \
    --task multilingual \
    --out-dir data

Configuration options: wikipedia-zero-shot, wikinews-zero-shot, wikinews-cross-domain. Task options: multilingual, crosslingual.

BM25

Example run on wikinews-zero-shot (config) and multilingual (task).

bash bm25_ranker/slurm-scripts/wikinews-zero-shot/bert-base-multilingual-uncased/context-16/sbatch_plus_test.sh

Multilingual BLINK

Download pretrained model checkpoints. Checkpoints for each task (multilingual and crosslingual) are about 5G size.

python multilingual-BLINK/models/download_models.py --task crosslingual --out multilingual-BLINK/models

Example run on wikipedia-zero-shot (config) and crosslingual (task) using bert-base-multilingual-uncased.

# biencoder
sbatch multilingual-BLINK/slurm-scripts/wikipedia-zero-shot-crosslingual/eval/bert-base-multilingual-uncased/sbatch_bi_in.sh
# crossencoder
sbatch multilingual-BLINK/slurm-scripts/wikipedia-zero-shot-crosslingual/eval/bert-base-multilingual-uncased/sbatch_cr_in.sh