SYCL: SOFTMAX F16 mask support and other fixes #11261

qnixsynapse · 2025-01-16T07:53:40Z

Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

Also, replaced std::max with sycl::max in the softmax kernel. There was not a single test with F16 mask in the test-backend-ops so I manually had to add such a test locally and I can confirm that it passed on my machine. This PR did not add that test. Reviewers are requested to test it thoroughly on their machines.

Not sure why this was necessary. The models which I tested do not use F16 mask.
Also did few cleanups.

Rbiessy

Thanks for the PR! I haven't tried to run it yet, I am worried we are adding support for fp16 without having tests for it. I saw the warnings you mentioned. Is it possible to first merge a PR that adds tests for softmax fp16 and all relevant backends and make sure they are skipped if this is not supported?

ggml/src/ggml-sycl/common.cpp

ggml/src/ggml-sycl/softmax.cpp

qnixsynapse · 2025-01-20T12:48:08Z

Thanks for the PR! I haven't tried to run it yet, I am worried we are adding support for fp16 without having tests for it. I saw the warnings you mentioned. Is it possible to first merge a PR that adds tests for softmax fp16 and all relevant backends and make sure they are skipped if this is not supported?

It is possible but I want this from an actual project collaborator to initiate this because if I go ahead do this, it may be possible that the backend which do not support it will crash with an assertion which doesn't look nice imo.

Rbiessy · 2025-01-20T14:53:34Z

Thanks for the PR! I haven't tried to run it yet, I am worried we are adding support for fp16 without having tests for it. I saw the warnings you mentioned. Is it possible to first merge a PR that adds tests for softmax fp16 and all relevant backends and make sure they are skipped if this is not supported?

It is possible but I want this from an actual project collaborator to initiate this because if I go ahead do this, it may be possible that the backend which do not support it will crash with an assertion which doesn't look nice imo.

Maybe it would be accepted to only the test for one backend for now? If we only enable it for the SYCL backend it could be part of this PR. At least it would make me a bit more confident if I try to run the PR.

qnixsynapse · 2025-01-20T15:21:59Z

Maybe it would be accepted to only the test for one backend for now? If we only enable it for the SYCL backend it could be part of this PR. At least it would make me a bit more confident if I try to run the PR.

CUDA backend also supports this. There may have been a reason why the test was not added in the first place. I will take this issue again up tomorrow. (It's dinner time here)
In the meantime, a comment from @ggerganov regarding this would be helpful.

ggerganov · 2025-01-20T18:00:56Z

We should add F16 mask tests to test-backend-ops. All backends that do not support it yet can indicate so in their respective supports_op function, so this way the test won't run on them.

NeoZhangJianyu · 2025-01-21T01:58:28Z

Thanks for the PR! I haven't tried to run it yet, I am worried we are adding support for fp16 without having tests for it. I saw the warnings you mentioned. Is it possible to first merge a PR that adds tests for softmax fp16 and all relevant backends and make sure they are skipped if this is not supported?

It is possible but I want this from an actual project collaborator to initiate this because if I go ahead do this, it may be possible that the backend which do not support it will crash with an assertion which doesn't look nice imo.

Maybe it would be accepted to only the test for one backend for now? If we only enable it for the SYCL backend it could be part of this PR. At least it would make me a bit more confident if I try to run the PR.

Yes, I agree!
The new code branch should be tested.

Is it possible to modify the UT case locally and test it?

qnixsynapse · 2025-01-21T03:43:53Z

I have added F16 mask test case to test-backend-ops but only for forward pass.
Currently it looks like this:

  SOFT_MAX(type=f32,ne=[32,2,32,1],mask=1,m_prec=f32,scale=0.100000,max_bias=0.000000): OK
  SOFT_MAX(type=f32,ne=[32,2,32,1],mask=1,m_prec=f16,scale=0.100000,max_bias=0.000000): OK
  SOFT_MAX(type=f32,ne=[32,2,32,1],mask=1,m_prec=f32,scale=0.100000,max_bias=8.000000): OK
  SOFT_MAX(type=f32,ne=[32,2,32,1],mask=1,m_prec=f16,scale=0.100000,max_bias=8.000000): OK
  3308/3308 tests passed
  Backend SYCL0: OK

m_prec is mask precision.
I am including it with this PR. I haven't checked other backend's code yet hence I am unaware if they have indicated this in their backend_supports_op function or not.

ggerganov · 2025-01-21T07:21:08Z

I am including it with this PR. I haven't checked other backend's code yet hence I am unaware if they have indicated this in their backend_supports_op function or not.

No worries - I will fix any CI failures that are caused by this after it is merged.

Rbiessy · 2025-01-21T13:47:47Z

Thanks for the changes, it looks good to me overall. I have tested test-backend-ops on PVC and A100 with the SYCL backend.

qnixsynapse · 2025-01-21T14:01:50Z

For the last bit of changes, I will currently remove those GGML_SYCL_DEBUG statements as they don't work because of a variable that gets initialized only on ggml-sycl.cpp. I will try to address that in my next PR.

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jan 16, 2025

qnixsynapse force-pushed the softmax branch 2 times, most recently from 90e7db9 to 1e2fe41 Compare January 19, 2025 14:11

Rbiessy reviewed Jan 20, 2025

View reviewed changes

qnixsynapse force-pushed the softmax branch from 421d574 to 9cd81fc Compare January 20, 2025 12:35

github-actions bot added the testing Everything test related label Jan 21, 2025

Rbiessy approved these changes Jan 21, 2025

View reviewed changes

qnixsynapse added 7 commits January 24, 2025 13:13

SYCL: SOFTMAX F16 mask support and other fixes

495e7ea

Remove changes not related to softmax

82d5c0d

softmax: review update

b913e83

Review update: Use GGML_SYCL_DEBUG

53847e4

test-backend-ops: Add F16 mask test cases

45d6c58

softmax: remove pragma unroll directive

1c5611e

softmax: remove GGML_SYCL_DEBUG as they don't work in softmax.cpp

e9bcdf8

qnixsynapse force-pushed the softmax branch from 1bd2fc7 to e9bcdf8 Compare January 24, 2025 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL: SOFTMAX F16 mask support and other fixes #11261

SYCL: SOFTMAX F16 mask support and other fixes #11261

qnixsynapse commented Jan 16, 2025

Rbiessy left a comment

qnixsynapse commented Jan 20, 2025

Rbiessy commented Jan 20, 2025

qnixsynapse commented Jan 20, 2025

ggerganov commented Jan 20, 2025

NeoZhangJianyu commented Jan 21, 2025

qnixsynapse commented Jan 21, 2025 •

edited

Loading

ggerganov commented Jan 21, 2025

Rbiessy commented Jan 21, 2025

qnixsynapse commented Jan 21, 2025

SYCL: SOFTMAX F16 mask support and other fixes #11261

Are you sure you want to change the base?

SYCL: SOFTMAX F16 mask support and other fixes #11261

Conversation

qnixsynapse commented Jan 16, 2025

Rbiessy left a comment

Choose a reason for hiding this comment

qnixsynapse commented Jan 20, 2025

Rbiessy commented Jan 20, 2025

qnixsynapse commented Jan 20, 2025

ggerganov commented Jan 20, 2025

NeoZhangJianyu commented Jan 21, 2025

qnixsynapse commented Jan 21, 2025 • edited Loading

ggerganov commented Jan 21, 2025

Rbiessy commented Jan 21, 2025

qnixsynapse commented Jan 21, 2025

qnixsynapse commented Jan 21, 2025 •

edited

Loading