Optimizations and WAs to support HPU execution for Detr-Resnet-50 #1334

sandeep-maddipatla · 2024-09-16T21:24:15Z

Modifications to the Detr transformer including WA's and Optimizations to run the Detr-Resnet-50 model in eager and lazy modes on the HPU.

Fixes # (issue)

https://habana.atlassian.net/browse/HS-2704

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

sandeep-maddipatla · 2024-09-16T21:31:28Z

This PR builds on #1155 , which is meant for Eager mode and adds changes necessary for Lazy mode execution. Some of the review feedback for the former PR is addressed here. More details below.

[x] pls. rebase/sync on top of main OH.
[x] run make style
[x] Pls. share the results of this test on g2 machines.
Done. Will share test result in another comment below.
[ ] we need to add README file for this in examples
Skipped. There is a README.md at https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection. We haven't changed that particular inference example. Pls let us know if that still needs modification.
[x] pls. add the appropriate CI tests for this.
Done. Extended existing ci-test to add a detr-resnet-50 test as well.

sandeep-maddipatla · 2024-09-16T21:32:43Z

make style result:

sandeep-maddipatla · 2024-09-16T21:37:05Z

Test Result:

vidyasiv

Please rebase to latest main, there are some changes in modeling_utils.py

vidyasiv · 2024-09-25T17:28:08Z

optimum/habana/transformers/models/detr/modeling_detr.py

+    """
+    Copied from https://github.com/huggingface/transformers/tree/v4.40.2
+    https://github.com/huggingface/transformers/blob/4fdf58afb72b0754da30037fc800b6044e7d9c99/src/transformers/models/detr/modeling_detr.py#L2287
+    The modications are:


Suggested change

The modications are:

The modifications are:

vidyasiv · 2024-09-25T17:28:29Z

optimum/habana/transformers/models/detr/modeling_detr.py

+
+    # Compute the classification cost. Contrary to the loss, we don't use the NLL,
+    # but approximate it in 1 - proba[target class].
+    # The 1 is a constant that doesn't change the matching, it can be ommitted.


Suggested change

# The 1 is a constant that doesn't change the matching, it can be ommitted.

# The 1 is a constant that doesn't change the matching, it can be omitted.

vidyasiv

@splotnikv , please take a look if you're covering for Sandeep

splotnikv · 2024-10-08T16:26:33Z

@splotnikv , please take a look if you're covering for Sandeep

Done. I don't have access right to update this PR, so created a new one. See #1404.

sandeep-maddipatla · 2024-10-24T19:18:44Z

Rebased to latest optimum-habana, addressed feedback from #1404 , and merged in changes from that PR.

Now that I'm back working on this, will use this PR going forward to complete the review process. Sorry for the back-and-forth over the two PR's.

vidyasiv · 2024-10-29T16:31:22Z

@sandeep-maddipatla Sorry was not able to work last few days so unable to review sooner.

Instead of jira link which shouldnt be pasted in public repo, can you add high level summary of the changes?
Are the tests meant to address both lazy and eager modes? or should i be manually setting env to test that?

GAUDI2_CI=1
RUN_SLOW=true
#lazy mode all 8 pass but eager 4 fail
PT_HPU_LAZY_MODE=0 pytest tests/test_object_detection.py 
FAILED tests/test_object_detection.py::GaudiDETRTester::test_inference_hpu_graphs - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'
FAILED tests/test_object_detection.py::GaudiDETRTester::test_no_latency_regression_autocast - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'
FAILED tests/test_object_detection.py::GaudiDetrResnet50_Tester::test_inference_hpu_graphs - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'
FAILED tests/test_object_detection.py::GaudiDetrResnet50_Tester::test_no_latency_regression_autocast - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'

README check (again not sure if eager mode is supported)

export PT_HPU_LAZY_MODE=0
python3 run_example.py \
	--model_name_or_path facebook/detr-resnet-101 \
	--image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
	--use_hpu_graphs \
	--bf16 \
	--print_result

AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'

Lazy mode passes

Detected cat with confidence 0.996 at location [344.0, 25.25, 640.0, 376.0]
Detected remote with confidence 0.996 at location [328.0, 76.0, 372.0, 188.0]
Detected remote with confidence 0.996 at location [39.5, 69.5, 175.0, 119.0]
Detected cat with confidence 1.0 at location [15.62, 52.5, 316.0, 472.0]
Detected couch with confidence 0.996 at location [-1.25, 0.94, 640.0, 472.0]

Stats:
------------------------------------------------------------
Total latency (ms): 59.30161476135254 (for n_iterations=10) 
Average latency (ms): 5.930161476135254 (per iteration)

Pls clarify how the testing is to be done.

vidyasiv · 2024-11-06T18:59:48Z

@sandeep-maddipatla , could you update by EOW?

vidyasiv · 2024-11-12T17:25:59Z

@sandeep-maddipatla please resolve merge conflicts

emascarenhas · 2024-11-21T23:38:44Z

@sandeep-maddipatla , Please make changes and post results of retesting, otherwise this will be pushed to the 1.20 release.

vidyasiv · 2024-11-25T18:07:49Z

@sandeep-maddipatla If we don't see an update by wed, this will need to be part of 1.20 release.

- Add capability to ignore targets that have an out-of-range ID - This helps to pad target objects to avoid graph recompilation and yet not affect the loss computation in training.

sandeep-maddipatla · 2024-12-03T10:17:18Z

Sorry for the delayed update here. It appears that the hpu_graphs feature is no longer supported in eager mode. I adjusted the tests to skip the functions using hpu_graphs for eager mode as a WA. The results of the checks are as follows:

~/optimum-habana $ make style
...
ruff check . setup.py --fix
All checks passed!
ruff format . setup.py
401 files left unchanged

~/optimum-habana $ python setup.py install
~/optimum-habana $ pip install pytest timm sequencepiece
~/optimum-habana $ PT_HPU_LAZY_MODE=1 python -m pytest tests/test_object_detection.py
========================================================================================================================== test session starts ===========================================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0
rootdir: /root/optimum-habana
configfile: setup.cfg
plugins: typeguard-4.3.0
collected 8 items

tests/test_object_detection.py ........                                                                                                                                                                                                                            [100%]

============================================================================================================================ warnings summary ============================================================================================================================
../../usr/lib/python3.10/inspect.py:288
  /usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
    return isinstance(object, types.FunctionType)

../../usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24
  /usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================ 8 passed, 2 warnings in 60.08s (0:01:00) ================================================================================================================


 ~/optimum-habana $ PT_HPU_LAZY_MODE=0 python -m pytest tests/test_object_detection.py
========================================================================================================================== test session starts ===========================================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0
rootdir: /root/optimum-habana
configfile: setup.cfg
plugins: typeguard-4.3.0
collected 8 items

tests/test_object_detection.py ........                                                                                                                                                                                                                            [100%]

============================================================================================================================ warnings summary ============================================================================================================================
../../usr/lib/python3.10/inspect.py:288
  /usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
    return isinstance(object, types.FunctionType)

../../usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpex/kernels/__init__.py:18
  /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpex/kernels/__init__.py:18: UserWarning: CustomNms, RoiAlignFunction, ScaledMaskedSoftmax from habana_frameworks.torch.hpex.kernels are no yet supported in eager mode
    warnings.warn(

../../usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24
  /usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
    warnings.warn(

tests/test_object_detection.py::GaudiDETRTester::test_inference_hpu_graphs
tests/test_object_detection.py::GaudiDetrResnet50_Tester::test_inference_hpu_graphs
  /root/optimum-habana/tests/test_object_detection.py:105: UserWarning: test_inference_hpu_graphs is supported only in lazy mode. Skipped
    warnings.warn("test_inference_hpu_graphs is supported only in lazy mode. Skipped")

tests/test_object_detection.py::GaudiDETRTester::test_no_latency_regression_autocast
tests/test_object_detection.py::GaudiDetrResnet50_Tester::test_no_latency_regression_autocast
  /root/optimum-habana/tests/test_object_detection.py:123: UserWarning: test_no_latency_regression_autocast is supported only in lazy mode. Skipped
    warnings.warn("test_no_latency_regression_autocast is supported only in lazy mode. Skipped")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================================================================================== 8 passed, 7 warnings in 16.00s =====================================================================================================================

tests/test_object_detection.py

sandeep-maddipatla · 2024-12-03T20:09:17Z

Test results with using the skipIf feature

~/optimum-habana $ PT_HPU_LAZY_MODE=1 python -m pytest tests/test_object_detection.py
========================================================================================================================== test session starts ===========================================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0
rootdir: /root/optimum-habana
configfile: setup.cfg
plugins: typeguard-4.3.0
collected 8 items

tests/test_object_detection.py ........                                                                                                                                                                                                                            [100%]

============================================================================================================================ warnings summary ============================================================================================================================
../../usr/lib/python3.10/inspect.py:288
  /usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
    return isinstance(object, types.FunctionType)

../../usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24
  /usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================ 8 passed, 2 warnings in 63.79s (0:01:03) ================================================================================================================



~/optimum-habana $ PT_HPU_LAZY_MODE=0 python -m pytest tests/test_object_detection.py
========================================================================================================================== test session starts ===========================================================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0
rootdir: /root/optimum-habana
configfile: setup.cfg
plugins: typeguard-4.3.0
collected 8 items

tests/test_object_detection.py ..ss..ss                                                                                                                                                                                                                            [100%]

============================================================================================================================ warnings summary ============================================================================================================================
../../usr/lib/python3.10/inspect.py:288
  /usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
    return isinstance(object, types.FunctionType)

../../usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpex/kernels/__init__.py:18
  /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpex/kernels/__init__.py:18: UserWarning: CustomNms, RoiAlignFunction, ScaledMaskedSoftmax from habana_frameworks.torch.hpex.kernels are no yet supported in eager mode
    warnings.warn(

../../usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24
  /usr/local/lib/python3.10/dist-packages/transformers-4.45.2-py3.10.egg/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================================================== 4 passed, 4 skipped, 3 warnings in 18.70s ================================================================================================================

sandeep-maddipatla mentioned this pull request Sep 16, 2024

Enable eager mode training for DETR #1155

Closed

sandeep-maddipatla marked this pull request as ready for review September 17, 2024 07:14

sandeep-maddipatla requested a review from regisss as a code owner September 17, 2024 07:14

sandeep-maddipatla force-pushed the detr-hpu branch from a31ae42 to 0237edd Compare September 17, 2024 08:58

vidyasiv suggested changes Sep 25, 2024

View reviewed changes

vidyasiv suggested changes Sep 26, 2024

View reviewed changes

splotnikv mentioned this pull request Oct 8, 2024

Optimizations and WAs to support HPU execution for Detr-Resnet-50 #1404

Closed

sandeep-maddipatla force-pushed the detr-hpu branch from 0237edd to 4838a93 Compare October 24, 2024 06:55

libinta added the synapse1.20 label Dec 2, 2024

splotnikv and others added 6 commits December 2, 2024 20:00

Enable eager mode training for DETR

6f827f8

Optimizations for Detr Model to run in lazy mode

e558d25

- Add capability to ignore targets that have an out-of-range ID - This helps to pad target objects to avoid graph recompilation and yet not affect the loss computation in training.

Add CI tests for detr-resnet50 inference

fc48f32

Fix typos

e327e71

Incorporate review feedback

ba623e3

Skip hpu-graph tests in eager mode to WA test failures

d4d9fda

sandeep-maddipatla force-pushed the detr-hpu branch from 4838a93 to d4d9fda Compare December 3, 2024 10:04

vidyasiv reviewed Dec 3, 2024

View reviewed changes

tests/test_object_detection.py Outdated Show resolved Hide resolved

Switch to using pytest skipIf feature to skip unsupported tests

dc15228

vidyasiv approved these changes Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations and WAs to support HPU execution for Detr-Resnet-50 #1334

Optimizations and WAs to support HPU execution for Detr-Resnet-50 #1334

sandeep-maddipatla commented Sep 16, 2024

sandeep-maddipatla commented Sep 16, 2024

sandeep-maddipatla commented Sep 16, 2024 •

edited

Loading

sandeep-maddipatla commented Sep 16, 2024

vidyasiv left a comment

vidyasiv Sep 25, 2024

vidyasiv Sep 25, 2024

vidyasiv left a comment

splotnikv commented Oct 8, 2024

sandeep-maddipatla commented Oct 24, 2024

vidyasiv commented Oct 29, 2024 •

edited

Loading

vidyasiv commented Nov 6, 2024

vidyasiv commented Nov 12, 2024

emascarenhas commented Nov 21, 2024

vidyasiv commented Nov 25, 2024

sandeep-maddipatla commented Dec 3, 2024 •

edited

Loading

sandeep-maddipatla commented Dec 3, 2024

	# The 1 is a constant that doesn't change the matching, it can be ommitted.
	# The 1 is a constant that doesn't change the matching, it can be omitted.

Optimizations and WAs to support HPU execution for Detr-Resnet-50 #1334

Are you sure you want to change the base?

Optimizations and WAs to support HPU execution for Detr-Resnet-50 #1334

Conversation

sandeep-maddipatla commented Sep 16, 2024

Before submitting

sandeep-maddipatla commented Sep 16, 2024

sandeep-maddipatla commented Sep 16, 2024 • edited Loading

sandeep-maddipatla commented Sep 16, 2024

vidyasiv left a comment

Choose a reason for hiding this comment

vidyasiv Sep 25, 2024

Choose a reason for hiding this comment

vidyasiv Sep 25, 2024

Choose a reason for hiding this comment

vidyasiv left a comment

Choose a reason for hiding this comment

splotnikv commented Oct 8, 2024

sandeep-maddipatla commented Oct 24, 2024

vidyasiv commented Oct 29, 2024 • edited Loading

vidyasiv commented Nov 6, 2024

vidyasiv commented Nov 12, 2024

emascarenhas commented Nov 21, 2024

vidyasiv commented Nov 25, 2024

sandeep-maddipatla commented Dec 3, 2024 • edited Loading

sandeep-maddipatla commented Dec 3, 2024

sandeep-maddipatla commented Sep 16, 2024 •

edited

Loading

vidyasiv commented Oct 29, 2024 •

edited

Loading

sandeep-maddipatla commented Dec 3, 2024 •

edited

Loading