Scripts to build RuleTaker-CWA and RuleTaker-Skip-fact from the original RuleTaker dataset Clark et al. 2020.
First download the RuleTaker dataset,
wget http://data.allenai.org/rule-reasoning/rule-reasoning-dataset-V2020.2.4.zip
unzip rule-reasoning-dataset-V2020.2.4.zip
To prepare RuleTaker-CWA dataset,
mkdir ruletaker-cwa
for split in train dev test; do
python prepare_RuleTaker_CWA.py \
rule-reasoning-dataset-V2020.2.4/depth-3ext-NatLang/${split}.jsonl \
rule-reasoning-dataset-V2020.2.4/depth-3ext-NatLang/meta-${split}.jsonl \
ruletaker-cwa/${split}.jsonl
done
Note: due to the inherent randomness in the algorithm, you might get slightly different skip-fact variants in each run. To reproduce the numbers in the original paper, please use the released RuleTaker-Skip-fact dataset (link).
To prepare a RuleTaker-Skip-fact dataset,
mkdir ruletaker-skipfact
for split in train dev test; do
python prepare_RuleTaker_Skipfact.py \
rule-reasoning-dataset-V2020.2.4/depth-3ext-NatLang/${split}.jsonl \
rule-reasoning-dataset-V2020.2.4/depth-3ext-NatLang/meta-${split}.jsonl \
ruletaker-skipfact/${split}.jsonl
done
Script for doing so is create_data_anonymized.py. Run the script as follows:
python create_data_anonymized.py ./bert_dev.json ./anon_dev.json
python create_data_anonymized.py ./bert_train.json ./anon_train.json
wherein the first argument is the input file and the second one is the output file. Typically, the input file is the one that is produced after sentence-retrieval step in FEVER task. The ./bert_dev.json
and ./bert_train.json
files can be downloaded from the original KGAT drive folder.