gpt_bigcode: added internal bucketing fix #1526

mgonchar · 2024-11-26T15:47:37Z

update kv-cache state inplace at decode phase
slice tensors with cache_idx to reduce excessive compute

This PR fixes lost context issue for gpt_bigcode class of models (starcoderbase/starcoder) when bucket_internal feature is used

It allows to unblock generation quality tests in https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py#L36

For example, with command line

python3 examples/text-generation/run_generation.py --model_name_or_path bigcode/starcoder --batch_size 2 --use_kv_cache --max_new_tokens 100 --bucket_size 128 --bucket_internal --use_hpu_graphs --bf16  --prompt 'def print_hello_world():'

without this fix:

Input/outputs:
input 1: ('def print_hello_world():',)
output 1: ('def print_hello_world():\n    print("Hello World\n\t\n    print("Hello World!\n\n\n\n\n\n\n\n\n\n#\n\n\n\n\n\n\n\n\n#\n        """\n    print_name,\n#0\n\n#\n\n\n\n\n\n\n\n\n#0\n#00\n\n\n#\n\n\n\n\n\n\n\n\n0200000000..\n\n\n0ape_get)\n       0",0.',)

input 2: ('def print_hello_world():',)
output 1: ('def print_hello_world():\n    print("Hello World\n\t\n    print("Hello World!\n\n\n\n\n\n\n\n\n\n#\n\n\n\n\n\n\n\n\n#\n        """\n    print_name,\n#0\n\n#\n\n\n\n\n\n\n\n\n#0\n#00\n\n\n#\n\n\n\n\n\n\n\n\n0200000000..\n\n\n0ape_get)\n       0",0.',)

with this fix:

Input/outputs:
input 1: ('def print_hello_world():',)
output 1: ('def print_hello_world():\n    print("Hello World")\n\ndef print_hello_world_twice():\n    print_hello_world()\n    print_hello_world()\n\ndef print_hello_world_thrice():\n    print_hello_world()\n    print_hello_world()\n    print_hello_world()\n\ndef print_hello_world_four_times():\n    print_hello_world()\n    print_hello_world()\n    print_hello_world()\n   ',)

input 2: ('def print_hello_world():',)
output 1: ('def print_hello_world():\n    print("Hello World")\n\ndef print_hello_world_twice():\n    print_hello_world()\n    print_hello_world()\n\ndef print_hello_world_thrice():\n    print_hello_world()\n    print_hello_world()\n    print_hello_world()\n\ndef print_hello_world_four_times():\n    print_hello_world()\n    print_hello_world()\n    print_hello_world()\n   ',)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

vidyasiv · 2024-11-26T19:29:42Z

@mgonchar , thanks very much for fixing this. Please update test to run with True flag: https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py#L30-L36 and update output if necessary

mgonchar · 2024-11-27T18:21:03Z

@mgonchar , thanks very much for fixing this. Please update test to run with True flag: https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py#L30-L36 and update output if necessary

sure, I've changed the test and launched locally, it passed.

Here is the result (I've commented out all other models except starcoder)

============================================================================================================================================= test session starts =============================================================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0
rootdir: /var/work/optimum-habana
configfile: setup.cfg
collected 9 items                                                                                                                                                                                                                                                                                             

test_text_generation_example.py .sssssss.                                                                                                                                                                                                                                                               [100%]

============================================================================================================================================== warnings summary ===============================================================================================================================================
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-bigcode/starcoder-256-True-6846.575763562658-True]
  /usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
    return isinstance(object, types.FunctionType)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================================= 2 passed, 7 skipped, 1 warning in 77.88s (0:01:17) ==============================================================================================================================

mgonchar · 2024-11-27T18:24:01Z

output is fine, in my understanding output of bucket vs no-bucket case should be same if bucket_size is equal in both cases

vidyasiv · 2024-11-27T18:59:22Z

optimum/habana/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

-                key = torch.cat((past_key, key), dim=-2)
-                value = torch.cat((past_value, value), dim=-2)
-        present = torch.cat((key, value), dim=-1) if use_cache else None
+            key = past_key.index_copy_(1, token_idx - 1, key)


can you verify this works with tgi-gaudi.. out of place op was used to fix a specific issue when tensor cache is disabled otherwise we saw error

sent you ticket link of empty tensor optional error with tgi-gaudi

@vidyasiv I tried to rollback changes from your commit #1181 and it works for me on latest 1.18 with command line

PT_HPU_DISABLE_TENSOR_CACHE=1 python run_generation.py --model_name_or_path bigcode/starcoder --batch_size 2 --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --bf16

and output is the same as without PT_HPU_DISABLE_TENSOR_CACHE variable. It seems that original issues was fixed

for bucket it also works fine and gives the same output:

PT_HPU_DISABLE_TENSOR_CACHE=1 python run_generation.py --model_name_or_path bigcode/starcoder --batch_size 2 --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --bf16 --bucket_size=128 --bucket_internal

@vidyasiv What's the TGI config that was leading to an error?

@regisss issue in tgi from original ticket:

# server: text-generation-launcher --model-id bigcode/starcoderbase-3b --sharded false --hostname 127.0.0.1 --max-input-length 2048 --max-batch-size 8 --dtype bfloat16 # In container: docker run -it --runtime=habana --name gaudi-tgi-scb-3b-e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=True -e BATCH_BUCKET_SIZE=8 -e PREFILL_BATCH_BUCKET_SIZE=4 -e PAD_SEQUENCE_TO_MULTIPLE_OF=128 --cap-add=sys_nice --net=host --entrypoint bash tgi_gaudi

HF equivalent back then was to set PT_HPU_DISABLE_TENSOR_CACHE=1 and --use_hpu_graphs

github-actions · 2024-11-29T21:41:51Z

The code quality check failed, please run make style.

HuggingFaceDocBuilderDev · 2024-11-29T21:45:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- update kv-cache state inplace at decode phase - slice tensors with cache_idx to reduce excessive compute

mgonchar · 2024-12-01T19:52:15Z

rebased, style fixed

vidyasiv

lgtm based on Miroslav's testing

mgonchar requested a review from ZhaiFeiyue as a code owner November 26, 2024 15:47

mgonchar force-pushed the main_fix_bucket_internal branch from 21130ea to d580212 Compare November 27, 2024 18:19

mgonchar requested a review from regisss as a code owner November 27, 2024 18:19

vidyasiv suggested changes Nov 27, 2024

View reviewed changes

gpt_bigcode: added internal bucketing fix

f2e494f

- update kv-cache state inplace at decode phase - slice tensors with cache_idx to reduce excessive compute

mgonchar force-pushed the main_fix_bucket_internal branch from d580212 to f2e494f Compare December 1, 2024 19:51

vidyasiv approved these changes Dec 2, 2024

View reviewed changes

regisss approved these changes Dec 9, 2024

View reviewed changes

regisss merged commit 9a4c6de into huggingface:main Dec 9, 2024
4 checks passed

regisss pushed a commit that referenced this pull request Dec 9, 2024

gpt_bigcode: added internal bucketing fix (#1526)

2373038

zzhang37 pushed a commit to zzhang37/optimum-habana that referenced this pull request Dec 9, 2024

gpt_bigcode: added internal bucketing fix (huggingface#1526)

c3cc9e3

mgonchar deleted the main_fix_bucket_internal branch December 9, 2024 23:39

imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Dec 10, 2024

gpt_bigcode: added internal bucketing fix (huggingface#1526)

ea1682b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt_bigcode: added internal bucketing fix #1526

gpt_bigcode: added internal bucketing fix #1526

mgonchar commented Nov 26, 2024

vidyasiv commented Nov 26, 2024

mgonchar commented Nov 27, 2024 •

edited

Loading

mgonchar commented Nov 27, 2024

vidyasiv Nov 27, 2024

vidyasiv Nov 27, 2024

mgonchar Nov 27, 2024 •

edited

Loading

mgonchar Nov 27, 2024

regisss Nov 29, 2024

vidyasiv Dec 2, 2024

github-actions bot commented Nov 29, 2024

HuggingFaceDocBuilderDev commented Nov 29, 2024

mgonchar commented Dec 1, 2024

vidyasiv left a comment

gpt_bigcode: added internal bucketing fix #1526

gpt_bigcode: added internal bucketing fix #1526

Conversation

mgonchar commented Nov 26, 2024

Before submitting

vidyasiv commented Nov 26, 2024

mgonchar commented Nov 27, 2024 • edited Loading

mgonchar commented Nov 27, 2024

vidyasiv Nov 27, 2024

Choose a reason for hiding this comment

vidyasiv Nov 27, 2024

Choose a reason for hiding this comment

mgonchar Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

mgonchar Nov 27, 2024

Choose a reason for hiding this comment

regisss Nov 29, 2024

Choose a reason for hiding this comment

vidyasiv Dec 2, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 29, 2024

HuggingFaceDocBuilderDev commented Nov 29, 2024

mgonchar commented Dec 1, 2024

vidyasiv left a comment

Choose a reason for hiding this comment

mgonchar commented Nov 27, 2024 •

edited

Loading

mgonchar Nov 27, 2024 •

edited

Loading