-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpt_bigcode: added internal bucketing fix #1526
Conversation
@mgonchar , thanks very much for fixing this. Please update test to run with True flag: https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py#L30-L36 and update output if necessary |
21130ea
to
d580212
Compare
sure, I've changed the test and launched locally, it passed. Here is the result (I've commented out all other models except starcoder)
|
output is fine, in my understanding output of bucket vs no-bucket case should be same if bucket_size is equal in both cases |
key = torch.cat((past_key, key), dim=-2) | ||
value = torch.cat((past_value, value), dim=-2) | ||
present = torch.cat((key, value), dim=-1) if use_cache else None | ||
key = past_key.index_copy_(1, token_idx - 1, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you verify this works with tgi-gaudi.. out of place op was used to fix a specific issue when tensor cache is disabled otherwise we saw error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sent you ticket link of empty tensor optional error with tgi-gaudi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vidyasiv I tried to rollback changes from your commit #1181 and it works for me on latest 1.18 with command line
PT_HPU_DISABLE_TENSOR_CACHE=1 python run_generation.py --model_name_or_path bigcode/starcoder --batch_size 2 --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --bf16
and output is the same as without PT_HPU_DISABLE_TENSOR_CACHE
variable. It seems that original issues was fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for bucket it also works fine and gives the same output:
PT_HPU_DISABLE_TENSOR_CACHE=1 python run_generation.py --model_name_or_path bigcode/starcoder --batch_size 2 --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --bf16 --bucket_size=128 --bucket_internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vidyasiv What's the TGI config that was leading to an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regisss issue in tgi from original ticket:
# server:
text-generation-launcher --model-id bigcode/starcoderbase-3b --sharded false --hostname 127.0.0.1 --max-input-length 2048 --max-batch-size 8 --dtype bfloat16
# In container:
docker run -it --runtime=habana --name gaudi-tgi-scb-3b-e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=True -e BATCH_BUCKET_SIZE=8 -e PREFILL_BATCH_BUCKET_SIZE=4 -e PAD_SEQUENCE_TO_MULTIPLE_OF=128 --cap-add=sys_nice --net=host --entrypoint bash tgi_gaudi
HF equivalent back then was to set PT_HPU_DISABLE_TENSOR_CACHE=1 and --use_hpu_graphs
The code quality check failed, please run |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
- update kv-cache state inplace at decode phase - slice tensors with cache_idx to reduce excessive compute
d580212
to
f2e494f
Compare
rebased, style fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm based on Miroslav's testing
This PR fixes lost context issue for gpt_bigcode class of models (starcoderbase/starcoder) when bucket_internal feature is used
It allows to unblock generation quality tests in https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py#L36
For example, with command line
without this fix:
with this fix:
Before submitting