max_length_generation parameter #207

icoderzqliu · 2024-03-21T03:22:09Z

You mentioned in the readme that max_length_generation=512 is enough for tasks like HumanEval and MBPP, but when I tested phi-1.5 and deepseek-coder-1.3b-base on the mbpp task, the following problems occurred at max_length_gen = 512.

ValueError: Input length of input_ids is 512, but `max_length` is set to 512. This can lead to unexpected behavior. You should consider increasing `max_length` or
, better yet, setting `max_new_tokens`.

How should this parameter be set so that the test results can be aligned? Will the setting of this parameter have a significant impact on the results?

The text was updated successfully, but these errors were encountered:

loubnabnl · 2024-03-21T14:23:58Z

The default is 512 (works fine with HumanEval) but for some tasks you might need more try setting it to 1024 for mbpp. Regarding the impact on the results, if the benchmark has long prompts you want to have a higher max_length to have room for generation otherwise the solutions won't be complete.

toptechie156 · 2024-04-14T11:57:04Z

@loubnabnl I was facing the same issue for multiple-java and mulitple-cpp while trying to reproduce the leaderboard score for codellama-7b using the steps given in leaderboard README here
https://github.com/bigcode-project/bigcode-evaluation-harness/tree/main/leaderboard#2--generation

is it supposed to be 1024 for multiple-cpp and multiple-java as well?

I was confused beacause in the leaderboad About section it is mentioned that

All models were evaluated with the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness/tree/main) with top-p=0.95, temperature=0.2, max_length_generation 512, and n_samples=50.

loubnabnl · 2024-04-16T09:02:08Z

Hi sorry for the confusion, if this happens try 1024, some tokenizers might generate more tokens than others which takes more space. Will update the "About" section of the leaderboard.

loubnabnl · 2024-06-24T09:32:47Z

It seems MBPP has a prompt with 1700 tokens with some tokenizers, after this PR #244 you should be able to run the evaluation with a smaller max_length but you might get lower scores as the solutions to some long prompts won't be generated

icoderzqliu changed the title ~~MAX_LENGTH_GEN parameter~~ max_length_generation parameter Mar 21, 2024

nikita1503 pushed a commit to nikita1503/bigcode-evaluation-harness that referenced this issue Apr 17, 2024

Fix for Issue bigcode-project#207

101788d

nikita1503 mentioned this issue Apr 17, 2024

Leaderboard README improvements #217

Open

loubnabnl closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_length_generation parameter #207

max_length_generation parameter #207

icoderzqliu commented Mar 21, 2024

loubnabnl commented Mar 21, 2024

toptechie156 commented Apr 14, 2024 •

edited

Loading

loubnabnl commented Apr 16, 2024

loubnabnl commented Jun 24, 2024

max_length_generation parameter #207

max_length_generation parameter #207

Comments

icoderzqliu commented Mar 21, 2024

loubnabnl commented Mar 21, 2024

toptechie156 commented Apr 14, 2024 • edited Loading

loubnabnl commented Apr 16, 2024

loubnabnl commented Jun 24, 2024

toptechie156 commented Apr 14, 2024 •

edited

Loading