Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example: int4 weight decompression #2193

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

rupakroyintel
Copy link

@rupakroyintel rupakroyintel commented Oct 29, 2024

Description

oneDNN supports INT4 autoGPTQ and AWQ quantization features. This is an example in oneDNN example to demonstrate Matmul INT4 weights decompression support and how to configure the APIs for autoGPTQ and AWQ quantization features. The request originally came from IPEX team: "AWQ (activation-aware quantization) is very popular in the community and we need to support. We need oneDNN INT4 GEMM API support the below input packing approach.The weights is packed in N direction, [K, N/8]; zeros point is packed in both K and N, [K/G, N/8], scale is in K direction [K/G, N].The input data type of weight and zero point is int32 and scale is fp16."

Checklist

General

  • [✔ ] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • [✔ ] Have you formatted the code using clang-format?

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements? Not yet

New features

  • Have you published an RFC for the new feature? No
  • Was the RFC approved? N/A
  • Have you added relevant tests? N/A

Bug fixes

  • Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
  • Have you added relevant regression tests?

RFC PR

  • Does RFC document follow the template?
  • Have you added a link to the rendered document?

@rupakroyintel rupakroyintel requested a review from a team as a code owner October 29, 2024 18:28
@shu1chen
Copy link
Contributor

The file name of the example int4_weight_decompression_cmnts.cpp doesn't seem good. What is cmnts?

@rupakroyintel
Copy link
Author

The file name of the example int4_weight_decompression_cmnts.cpp doesn't seem good. What is cmnts?

Removed the int4_weight_decompression_cmnts.cpp and added int4_weight_decompression,cpp

rupakroyintel and others added 3 commits October 30, 2024 22:57
@rupakroyintel rupakroyintel changed the title Add int4 decompression example example: int4 weight decompression Oct 31, 2024
@vpirogov
Copy link
Member

@rupakroyintel, please make sure commits in your branch comply with contributing guidelines and do not contain merge commits.

@theComputeKid, @mgouicem, looks like PR Checks / title does not catch issue with commit history...

// - Matrices A and B
// Outputs:
// - Matrix C
void ref_compute_matmul_f32(int64_t M, int64_t N, int64_t K, int64_t G,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest dropping fp32 reference comparison from this example as it does not add value when explaining int4 quantization.


// Compares the results of reference matrix multiplication and oneDNN weights
// decompression.
void compare_ref_and_weights_decompression(engine::kind engine_kind) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to follow structure and flow of int8 decompression example (weights_decompression_matmul) and add additional information about specifics of int4 data storage. If you remember the case that triggered the request for example was related to feeding prepacked weights to oneDNN and dealing with groups and zero-points.

Copy link
Contributor

@shu1chen shu1chen Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously in the Teams Pytorch channel, Dmitry provided the following detailed advice:

...
Secondly, using this group of 8 along N in case of PT is not required. Group size says how many consecutive points of the tensor zero points are applied to should share a single zero point value. It has nothing to do with how PT pack their zero points.

Thirdly, it is the most important HOW these zero points are stored in memory. There was a recent story where IPEX engineer tried to enable oneDNN's int4 and failed to do so because weights were transposed (because of that 8xPack thing), and everything what should have been done was to transpose them again to match oneDNN's API. I would assume this story should follow the same pattern - before calling oneDNN API, it's highly likely those zero points should be transposed and only then passed as an int4 object inside the library to get correct results.

@dzarukin What do you suggest? It seems that it's better to pass an int4 object to oneDNN rather than to prepack 8*int4 and pass an int32 object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneDNN developed API to work with int4 memory objects directly. This hasn't happened to PyTorch yet. Their implementation side has a detail of pre-packing. The example should probably demonstrate how to translate packed 8 int4 values as a single int value language into oneDNN language and what operations should be done in terms of memory (necessary transpositions and/or reorders).

@mgouicem
Copy link
Contributor

mgouicem commented Nov 4, 2024

@theComputeKid, @mgouicem, looks like PR Checks / title does not catch issue with commit history...

Let me see what goes off in the jobs. I checked out the branch and ran locally, and it properly catches the first improper message.

> git remote add rupakroy https://github.com/rupakroyintel/oneDNN.git
> git fetch rupakroy
> git co add_int4_decompression_example
Updating files: 100% (776/776), done.
branch 'add_int4_decompression_example' set up to track 'rupakroy/add_int4_decompression_example'.
Switched to a new branch 'add_int4_decompression_example'
> python3 ./.github/automation/commit-msg-check.py "1abe160095ef52c7ad879b75331dbe4b4e17be6d" "1fe8ee54b18c764d32932d21e776a86f46a6d0cf"
msg: Merge branch 'add_int4_decompression_example' of https://github.com/rupakroyintel/oneDNN into add_int4_decompression_example
Traceback (most recent call last):
  File "./.github/automation/commit-msg-check.py", line 82, in <module>
    main()
  File "./.github/automation/commit-msg-check.py", line 77, in main
    __numCharacterCheck(commit_msg)
  File "./.github/automation/commit-msg-check.py", line 58, in __numCharacterCheck
    raise ValueError(
ValueError: Please see contribution guidelines. Message summary must be less than 72. Got: 124

@rupakroyintel
Copy link
Author

@vpirogov @dzarukin We tried translating packed 8 int4 values into a single int value. However, it looks like the zero-points attribute wei:per_ocic:s4:32x8 is not supported. Here is the output from benchdnn:

./tests/benchdnn/benchdnn --matmul --engine=gpu --dt=f16:s4:f16 --stag=any --wtag=abc --dtag=acb --attr-scales=wei:per_ocic:f16:32x1 --attr-zero-points=wei:per_ocic:s4:32x8 --attr-fpmath=f16:true 7x24x32:7x32x64
Error: Function 'check_dnnl_status' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/dnnl_common.hpp:327) returned 'unimplemented'
Error: Function 'create_primitive' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/dnnl_common.hpp:401) returned '1'
Error: Function 'init_prim' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/dnnl_common.hpp:471) returned '1'
Error: Function 'createit' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/matmul/matmul.cpp:881) returned '1'
Error: Function 'create' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/utils/task.hpp:49) returned '1'
0:UNIMPLEMENTED __REPRO: --matmul --engine=gpu --dt=f16:s4:f16 --wtag=abc --dtag=acb --attr-scales=wei:per_ocic:f16:32x1 --attr-zero-points=wei:per_ocic:s4:32x8 --attr-fpmath=f16:true 7x24x32:7x32x64
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:1 invalid_arguments:0 failed:1 listed:0
total: 0.05s; fill: 0.00s (0%); compute_ref: 0.00s (0%); compare: 0.00s (0%);

@dzarukin
Copy link
Contributor

@vpirogov @dzarukin We tried translating packed 8 int4 values into a single int value. However, it looks like the zero-points attribute wei:per_ocic:s4:32x8 is not supported. Here is the output from benchdnn:

./tests/benchdnn/benchdnn --matmul --engine=gpu --dt=f16:s4:f16 --stag=any --wtag=abc --dtag=acb --attr-scales=wei:per_ocic:f16:32x1 --attr-zero-points=wei:per_ocic:s4:32x8 --attr-fpmath=f16:true 7x24x32:7x32x64
Error: Function 'check_dnnl_status' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/dnnl_common.hpp:327) returned 'unimplemented'
Error: Function 'create_primitive' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/dnnl_common.hpp:401) returned '1'
Error: Function 'init_prim' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/dnnl_common.hpp:471) returned '1'
Error: Function 'createit' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/matmul/matmul.cpp:881) returned '1'
Error: Function 'create' at (/home/intel/rroy/int4_decompression/oneDNN/tests/benchdnn/utils/task.hpp:49) returned '1'
0:UNIMPLEMENTED __REPRO: --matmul --engine=gpu --dt=f16:s4:f16 --wtag=abc --dtag=acb --attr-scales=wei:per_ocic:f16:32x1 --attr-zero-points=wei:per_ocic:s4:32x8 --attr-fpmath=f16:true 7x24x32:7x32x64
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:1 invalid_arguments:0 failed:1 listed:0
total: 0.05s; fill: 0.00s (0%); compute_ref: 0.00s (0%); compare: 0.00s (0%);

@rupakroyintel, oneDNN doesn't have any idea about external to it 8-int4 values packing implementation detail. Zero-point group API is not designed for it. From oneDNN perspective you need to think about each value independently and use a single dimension in groups. The observed benchdnn output is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants