[ARM] Support FP16 post-ops fusion into ACL kernels #2067

dmitry-gorokhov · 2024-08-30T11:22:05Z

Summary

Current ACL integration prohibits first post op fusion into ACL kernel in case FP16 dst data_type. The request is to conditionally enable such behavior.

Problem statement

OneDNN post-ops fusion mechanism provides significant performance boost by skipping intermediate memory movements overheads. However in bounds of ACL such behavior is disabled for FP16 execution due to oneDNN requirements on precision of post-ops computations (should be equal to FP16). Fusion of single post op for FP16 primitives leads to multiple FP16<->FP32 datatype conversions and expensive memory access overheads. As a result separate execution of corresponding operations (via separate oneDNN primitives call) provides better performance in comparision with fusion version.

Preferred solution

Inside OpenVINO we just relaxed the condition to allow FP16 post-op fusion (with FP16 insternal compute) inside ACL integration. However that solution might not be sutable for all oneDNN users due to accuracy restrictions.
Based on that the proposal is to adopt dnnl::accumulation_mode atribute as a trigger for different post-ops computational precision. As a results desired behavior in terms of balance between accuracy and performance can be choosen on oneDNN user level.

theComputeKid · 2024-09-03T15:59:05Z

It makes sense to me, do you have any patches demonstrating the scale of changes needed to adopt the attribute?

vpirogov · 2024-09-03T16:42:56Z

Related discussion in #1689.

dmitry-gorokhov added the enhancement A feature or an optimization request label Aug 30, 2024

vpirogov added help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 labels Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM] Support FP16 post-ops fusion into ACL kernels #2067

[ARM] Support FP16 post-ops fusion into ACL kernels #2067

dmitry-gorokhov commented Aug 30, 2024

theComputeKid commented Sep 3, 2024

vpirogov commented Sep 3, 2024

[ARM] Support FP16 post-ops fusion into ACL kernels #2067

[ARM] Support FP16 post-ops fusion into ACL kernels #2067

Comments

dmitry-gorokhov commented Aug 30, 2024

Summary

Problem statement

Preferred solution

theComputeKid commented Sep 3, 2024

vpirogov commented Sep 3, 2024