Official repository for ACL 2024 paper "CLOMO: Counterfactual Logical Modification with Large Language Models".
For more details, please refer to the project page: https://clomo-logic.github.io/.
[Webpage] [Paper] [Dataset] [Examples]
In the Counterfactual Logical Modification (CLOMO) task, a model is given a pair of Argument and Premise 1 in the relation R, and then is given an additional Premise 2 that perturbs R. The model is required to modify Argument to Argument' such that R stands in the Argument'-Premise 2 pair.
We thus introduce the CLOMO dataset with 1,000 high-quality and challenging questions in four logical relations. The data is collected by multi-turn human annotation and verification.
Additionally, we introduce a Self-Evaluation Score (SES) for the logically consistent generation in CLOMO. SES decomposes the evaluation into several LLMs basic discrimination tasks, which is demonstrated comparable with human evaluation.
The overall CLOMO dataset can be downloaded in data/. We release CLOMO data with three prompting setups:
Setup | Train | Dev | Test |
---|---|---|---|
Plain CoT | cot_train.json | cot_dev.json | cot_test.json |
Few-shot | few_train.json | few_dev.json | few_test.json |
Zero-shot | zero_train.json | zero_dev.json | zero_test.json |
#Sample | 600 | 200 | 200 |
We also release the exclusive and picked-out subsets we used for the ablation study on unseen logical relation in Table 6 in the Paper.
-
The exclusive subsets w/o R: data/type_excluded/*_xR_train.json. Each subset contains 373 samples.
-
The picked-out subsets of R: data/type_picked/*_oR_test.json. Each subset size is included in the following table.
Setup | R=NA | R=SA | R=S | R=W |
---|---|---|---|---|
Plain CoT | ||||
train w/o R | cot_xna_train.json | cot_xsa_train.json | cot_xs_train.json | cot_xw_train.json |
test on R | cot_ona_test.json | cot_osa_test.json | cot_os_test.json | cot_ow_test.json |
#test on R | 79 | 14 | 35 | 72 |
Few-shot | ||||
train w/o R | few_xna_train.json | few_xsa_train.json | few_xs_train.json | few_xw_train.json |
test on R | few_ona_test.json | few_osa_test.json | few_os_test.json | few_ow_test.json |
#test on R | 79 | 14 | 35 | 72 |
Zero-shot | ||||
train w/o R | zero_xna_train.json | zero_xsa_train.json | zero_xs_train.json | zero_xw_train.json |
test on R | zero_ona_test.json | zero_osa_test.json | zero_os_test.json | zero_ow_test.json |
#test on R | 79 | 14 | 35 | 72 |
** Remark: NA: Necessary Assumption; SA: Sufficient Assumption; S: Strengthen; W: Weaken.
Please refer to the following template to prepare your result json file for SES evaluation: template_pred.json.
First, make sure you have installed all requirements:
pip install -r requirements.txt
For inference using API key, run:
cd inference_only
python LLMer_inference.py --call_type api \
--model_name MODEL_NAME --api_key API_KEY \
--data_path PATH_TO_TEST_JSON_FILE \
--save_path PATH_TO_SAVE_RESULTS
For inference using local LLM, run:
cd inference_only
CUDA_VISIBLE_DEVICES=0 python LLMer_inference.py --call_type llm \
--model_name MODEL_NAME --local_dir LOCAL_DIR \
--data_path PATH_TO_TEST_JSON_FILE \
--save_path PATH_TO_SAVE_RESULTS
Additionally, a sample script of experiments on small LMs in Table 7 in the Paper is provided: LLMer_inference.sh.
Coming soon.
First, prepare your result json file in the format as template_pred.json.
Then, run:
python SES/ses.py \
--model_pred_file PATH_TO_PRED_JSON_FILE \
--api_model LLM_NAME \
--api_key API_KEY \
--api_org API_ORG
If you find CLOMO useful, please kindly cite:
@article{huang2023clomo,
author = {Huang, Yinya and Hong, Ruixin and Zhang, Hongming and Shao, Wei and Yang, Zhicheng and Yu, Dong and Zhang, Changshui and Liang, Xiaodan and Song, Linqi},
journal = {The 62nd Annual Meeting of the Association for Computational Linguistics},
title = {{CLOMO}: Counterfactual Logical Modification with Large Language Models},
booktitle = {The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)},
year = {2024}
}