I used my generic environment for this project. You can create a conda env from general_environment.yml
(this has extraneous packages since it's my "general" environment). You also need an openai key to run experiments.
conda env create -f general_environment.yml
export PYTHONPATH=<root directory of this repo>
export OPENAI_API_KEY=<your key here>
To run the experiments on openai models, use src/prompt_openai.py
. To run experiments on open-source models, please use src/prompt_open_llms.py
. The arguments to these scripts are the same, so this is abbreviated to src/prompt_XXX.py
. src/prompt_open_llms.py
assumes someone is running the server on tir/babel.
After each experiment runs, the accuracy will be logged and a csv file with the prompts/answers will be saved to ./logs
by default with an autogenerated file name (you can change this through --output
). You can also use ./slurm_scripts/compile_results.py
to compile all the results in a directory to csv. If you hit CTRL + C while running an experiment the intermediate results will also be saved.
There are synthetic experiments on functions and colours domains.
Functions You can reproduce the three general approaches like this:
Base prompt (in-context examples only):
python src/prompt_XXX.py --model <model_name> --dataset functions
With ground-truth instruction:
python src/prompt_XXX.py --model <model_name> --dataset functions --prompt_type full_grammar
With self-induced instruction:
python src/prompt_XXX.py --model <model_name> --dataset functions --prompt_type grammar_induction --hyp_reranking_method <method> --num_hyps 5
Colours
Just replace functions
with colours
in the arguments below.
Base prompt (in-context examples only):
python src/prompt_XXX.py --model <model_name> --dataset colours --use_min_cover
With ground-truth instruction:
python src/prompt_XXX.py --model <model_name> --dataset colours --use_min_cover --prompt_type full_grammar
With self-induced instruction:
python src/prompt_XXX.py --model <model_name> --dataset colours --use_min_cover --prompt_type grammar_induction --hyp_reranking_method <method> --num_hyps 5
Kalamang
The Kalamang experiments can be run from the mtob
directory. Please set up a separate environment following the instructions in mtob
as it is not compatible with the base environment. Hypothesis selection has now been added. The experimental settings are as follows:
cd mtob/baselines
(Note, the TGI-type model is based on an internal framework. Refer to the options in main.py to use huggingface versions of llama models)
Base prompt (in-context examples only):
python main.py --use_reference_sentences --model_name <model_name> --direction <ek|ke> --temperature 0.05 --output_dir <output_dir>
With ground-truth instruction:
python main.py --use_reference_sentences --use_reference_wordlist --use_reference_grammar_sketch --model_name <model_name> --direction <ek|ke> --temperature 0.05 --output_dir <output_dir>
With self-induced instruction:
python main.py --induce_wordlist --use_induced_grammar --grammar_sketch_path ../resources/kalamang_grammar_sketch_<model_name>.txt --num_hyps 5 --model_name <model_name> --direction <ek|ke> --temperature 0.05 --output_dir <output_dir>