Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)
The Temporal dataset we use is in the fold data/
.
Question
: The question.Gold answer
: The answer.passages
: The retrieved passages from wikipedia.
- The CoT and retrieval-augmented CoT results are given in the fold
results/
, where thechain_of_thought_gpt3
indicates the responses.
-
For SKR_prompt and SKR_icl, we use the prompts shown in the paper to elicit the self-knowledge of the dev data directly.
-
For SKR_cls, we use the training data and train a BERT classifier to elicit the self-knowledge of the dev data. We use the settings with
lr=2e-5
andepochs=10
. -
For SKR_knn, the steps are as follows:
- cd
source/
, collect the self-knowledge of the training data, runskr.py
and get thetrain_skr.json
file. - run
knn.py
to use the self-knowledge to the dev data and get thedev_skr_knn.json
file. - run
eval_skr.py
to evaluate the results.
- cd
@inproceedings{wang-etal-2023-self-knowledge,
title = "Self-Knowledge Guided Retrieval Augmentation for Large Language Models",
author = "Wang, Yile and Li, Peng and Sun, Maosong and Liu, Yang",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.691",
pages = "10303--10315",
}