You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for sharing your work! I’m having some trouble reproducing the results and would appreciate your help.
For the TruthfulQA Multiple-Choice task, I got these results when evaluating LLMs + TruthX, MC1: 0.5018, MC2: 0.7050, MC3: 0.3905. These are different from the results in the paper (54.22%, 73.90%, 44.37%).
How can I adjust to match the results from the paper? Thanks!
For your reference, I’ve attached the relevant scripts and code snippets:
We used the model file from https://huggingface.co/ICTNLP/Llama-2-7b-chat-TruthX which does not implement two-fold validation. As a result, we made a few minor adjustments to the scripts and llm.py file, as detailed below:
Script content
Based on scripts/truthfulqa.mc.truthx.sh:
export CUDA_VISIBLE_DEVICES=0
ROOT=path_to_truthx_dir
EXP_ROOT=$ROOT/results
model_path=path_to_llm # e.g. Llama-2-7b-chat-hf
Hi, thanks for sharing your work! I’m having some trouble reproducing the results and would appreciate your help.
For the TruthfulQA Multiple-Choice task, I got these results when evaluating LLMs + TruthX, MC1: 0.5018, MC2: 0.7050, MC3: 0.3905. These are different from the results in the paper (54.22%, 73.90%, 44.37%).
How can I adjust to match the results from the paper? Thanks!
For your reference, I’ve attached the relevant scripts and code snippets:
We used the model file from https://huggingface.co/ICTNLP/Llama-2-7b-chat-TruthX which does not implement two-fold validation. As a result, we made a few minor adjustments to the scripts and llm.py file, as detailed below:
export CUDA_VISIBLE_DEVICES=0
ROOT=path_to_truthx_dir
EXP_ROOT=$ROOT/results
model_path=path_to_llm # e.g. Llama-2-7b-chat-hf
truthx_model1=truthx_models/Llama-2-7b-chat-hf/truthx_model.fold1.pt
truthx_model2=truthx_models/Llama-2-7b-chat-hf/truthx_model.fold2.pt
strength=4.5
layers=10
python3 $ROOT/scripts/truthfulqa_mc_truthx.py
--model-path $model_path
--truthx-model $truthx_model1
--truthx-model2 $truthx_model2
--two-fold True
--data-yaml data/truthfulqa_data_fold1.yaml
--edit-strength $strength --top-layers $layers
--fewshot-prompting True
--output-dir $EXP_ROOT/truthfulqa_mc_truthx/llama-2-7b-chat.truthx
export CUDA_VISIBLE_DEVICES=6
ROOT=.
EXP_ROOT=$ROOT/results
model_path="/app/model_download/Llama-2-7b-chat-hf"
truthx_model1=/app/baseline/TruthX/truthx_models/Llama-2-7b-chat-hf/truthx_model.pt
strength=4.5
layers=10
python3 $ROOT/scripts/truthfulqa_mc_truthx.py
--model-path /app/model_download/Llama-2-7b-chat-hf
--truthx-model $truthx_model1
--edit-strength $strength --top-layers $layers
--fewshot-prompting True
--output-dir $EXP_ROOT/truthfulqa_mc_truthx/llama-2-7b-chat.truthx
outputs, past_key_values, hidden_states = self.model(
input_ids,
output_hidden_states=True,
truthx_model=(
self.truthx if idx not in self.fold1_data else self.truthx2
),
).values()
outputs, past_key_values, hidden_states = self.model(
input_ids,
output_hidden_states=True,
truthx_model=self.truthx,
).values()
The text was updated successfully, but these errors were encountered: