We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey all, what is the n_samples for instruct human eval about?
The docs say 200 as if its a static setting that should be used, but I can't understand why.
Also, when the turns format is structured, does this look right for chatml?
--instruction_tokens "<|im_start|>user\n","<|im_end|>\n","<|im_start|>assistant\n"
without quoting each string it gave an error so I assume this is how to use this arg?
The text was updated successfully, but these errors were encountered:
I ran the instruct humaneval benchmark and the eval results.json shows this:
"eos": "<|endoftext|>",
whereas the actual EOS should be <|im_end|> - don't really see why this is there
Sorry, something went wrong.
I recommend just running humanevalsynthesize from humanevalpack which offers the same + more - instructions for running are here: https://github.com/bigcode-project/octopack?tab=readme-ov-file#run & here https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md#humanevalpack
you may just need to add your instruction format here:
bigcode-evaluation-harness/bigcode_eval/tasks/humanevalpack.py
Line 235 in 0f3e95f
No branches or pull requests
Hey all, what is the n_samples for instruct human eval about?
The docs say 200 as if its a static setting that should be used, but I can't understand why.
Also, when the turns format is structured, does this look right for chatml?
--instruction_tokens "<|im_start|>user\n","<|im_end|>\n","<|im_start|>assistant\n"
without quoting each string it gave an error so I assume this is how to use this arg?
The text was updated successfully, but these errors were encountered: