-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_length_generation parameter #207
Comments
The default is 512 (works fine with HumanEval) but for some tasks you might need more try setting it to 1024 for mbpp. Regarding the impact on the results, if the benchmark has long prompts you want to have a higher max_length to have room for generation otherwise the solutions won't be complete. |
@loubnabnl I was facing the same issue for multiple-java and mulitple-cpp while trying to reproduce the leaderboard score for codellama-7b using the steps given in leaderboard README here is it supposed to be 1024 for multiple-cpp and multiple-java as well? I was confused beacause in the leaderboad About section it is mentioned that
|
Hi sorry for the confusion, if this happens try 1024, some tokenizers might generate more tokens than others which takes more space. Will update the "About" section of the leaderboard. |
It seems MBPP has a prompt with 1700 tokens with some tokenizers, after this PR #244 you should be able to run the evaluation with a smaller max_length but you might get lower scores as the solutions to some long prompts won't be generated |
You mentioned in the readme that max_length_generation=512 is enough for tasks like HumanEval and MBPP, but when I tested phi-1.5 and deepseek-coder-1.3b-base on the mbpp task, the following problems occurred at max_length_gen = 512.
How should this parameter be set so that the test results can be aligned? Will the setting of this parameter have a significant impact on the results?
The text was updated successfully, but these errors were encountered: