Before running, please define EXEHOME
, OUTPUTHOME
, and DATAHOME
accordingly in the script.
- e.g.,
EXEHOME=/home/username/SelfEval-Guided-Decoding/src DATAHOME=/home/username/SelfEval-Guided-Decoding/data OUTPUTHOME=/home/username/SelfEval-Guided-Decoding/outputs/${dtname}/${split}_outputs
We provide three types of example scripts as follows: (1) baseline running; (2) ours running; (3) LLM evaluating.
PS: please adjust the variables dtname
and split
to specify the dataset
(main code: src/generate_code_baseline.py
)
-
arithmetic
reasoning --run_baseline.sh
-
symbolic
reasoning --run_baseline_symbolic.sh
-
commonsense
reasoning --run_baseline_commonsense.sh
(main code: src/generate_code.py
)
-
arithmetic
reasoningGSM8K
: Ours (PAL), Ours (CoT)AQUA
: Ours (PAL)SVAMP
: Ours (PAL)ASDiv
: Ours (PAL)TabMWP
: Ours (PAL)
-
symbolic
reasoningDate Understanding
: Ours (PAL)Object Counting
: Ours (PAL)
-
commonsense
reasoningCSQA
: Ours (CoT)StrategyQA
: Ours (CoT)Sports Understanding
: Ours (CoT)
(main code: src/self_evaluate_code.py
)