Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated handling of instructions #52

Closed
wants to merge 0 commits into from
Closed

Updated handling of instructions #52

wants to merge 0 commits into from

Conversation

wongjingping
Copy link
Collaborator

@wongjingping wongjingping commented Nov 30, 2023

The openai and vllm runner now passes in the instructions into the prompt if provided.
We provide a separate prompt for instructions as a drop in replacement for prompt.md when dealing with instructions based datasets.

openai runner works as before (3.5 without instructions):

$ python main.py \
  -q data/questions_gen.csv \
  -o results/my_query_generator.csv \
  -g oa \
  -f prompts/prompt_openai.md \
  -m gpt-3.5-turbo-0613 \
  -p 5
preparing questions...
Correct so far: 136/200 (68.00%): 100%|█████████████████████████| 200/200 [00:48<00:00,  4.09it/s]
                exact_match   correct
query_category                       
date_functions     0.760000  0.760000
group_by           0.800000  0.800000
order_by           0.657143  0.742857
ratio              0.257143  0.342857
table_join         0.714286  0.742857
where              0.714286  0.714286
Average correct rate: 0.68

3.5 now with instructions:

$ python main.py \
  -q data/questions_instruct.csv \
  -o results/my_query_generator.csv \
  -g oa \
  -f prompts/prompt_openai.md \
  -m gpt-3.5-turbo-0613 \
  -p 5
preparing questions...
Correct so far: 148/240 (61.67%): 100%|█████████████████████████| 240/240 [00:59<00:00,  4.06it/s]
                           exact_match   correct
query_category                                  
abbreviation_instructions     0.400000  0.466667
date_functions                0.760000  0.760000
date_instructions             0.200000  0.200000
group_by                      0.800000  0.800000
order_by                      0.657143  0.742857
ratio                         0.257143  0.342857
table_join                    0.714286  0.742857
where                         0.714286  0.714286
Average correct rate: 0.62

gpt-4 turbo with instructions:

$ python main.py \
  -q data/questions_instruct.csv \
  -o results/my_query_generator.csv \
  -g oa \
  -f prompts/prompt_openai.md \
  -m gpt-4-1106-preview \
  -p 5
preparing questions...
Correct so far: 180/240 (75.00%): 100%|██████████████████████████████████████████████████████████████████████████████████████| 240/240 [09:51<00:00,  2.47s/it]
                           exact_match   correct
query_category                                  
abbreviation_instructions     0.333333  0.400000
date_functions                0.800000  0.800000
date_instructions             0.360000  0.360000
group_by                      0.914286  0.942857
order_by                      0.828571  0.885714
ratio                         0.400000  0.685714
table_join                    0.800000  0.800000
where                         0.828571  0.828571
Average correct rate: 0.75

vllm with instructions:

python3 -W ignore main.py \
  -q data/questions_instruct.csv \
  -o "results/${model_name}_c${checkpoint_num}.csv" \
  -g vllm \
  -f "prompts/prompt_instructions.md" \
  -m "$model_path"
Preparing /models/combined/sqlcoder_7b_bf16_b16_ld005_r128_a128_ts/checkpoint-600
2023-11-30 12:55:04,457 INFO worker.py:1673 -- Started a local Ray instance.
INFO 11-30 12:55:05 llm_engine.py:72] Initializing an LLM engine with config: model='/models/combined/sqlcoder_7b_bf16_b16_ld005_r128_a128_ts/checkpoint-600', tokenizer='/models/combined/sqlcoder_7b_bf16_b16_ld005_r128_a128_ts/checkpoint-600', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=4, quantization=None, seed=0)
INFO 11-30 12:55:15 llm_engine.py:207] # GPU blocks: 8141, # CPU blocks: 2048
Using prompt file prompts/prompt_instructions.md
Prepared 240 questions from data/questions_instruct.csv
Generating completions
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 240/240 [02:39<00:00,  1.51it/s]
Time taken: 159.4s
Correct so far: 174/240 (72.50%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 240/240 [00:04<00:00, 51.86it/s]
                           exact_match   correct
query_category                                  
abbreviation_instructions     0.333333  0.333333
date_functions                0.800000  0.800000
date_instructions             0.160000  0.160000
group_by                      0.857143  0.857143
order_by                      0.800000  0.914286
ratio                         0.771429  0.828571
table_join                    0.828571  0.828571
where                         0.714286  0.714286
Average tokens generated: 55.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant