Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: matmul: updated supported data types and minor edits #2217

Merged
merged 5 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions doc/primitives/matmul.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ argument index as specified by the following table.
user must pass fully specified memory objects so that the primitive is able
to perform the computations. Note that the less information about shapes
or format is available at the creation stage, the less performant execution
will be. In particular, if the shape is not known at creation stage, one
will be. In particular, if the shape is not known at the creation stage, you
cannot use the special format tag #dnnl::memory::format_tag::any to enable an
implementation to choose the most appropriate memory format for the
corresponding input or output shapes. On the other hand, run-time specified
Expand All @@ -80,13 +80,13 @@ argument index as specified by the following table.
invalid.

3. The broadcasting shape consistency check is not done for the dimensions with
#DNNL_RUNTIME_DIM_VAL. It is user responsibility to make sure the dimensions
#DNNL_RUNTIME_DIM_VAL. Make sure the dimensions
for the tensors are valid.

4. Multiple batch dimensions and broadcasting of batch dimensions of `src` and
`weights` are supported for both CPU and GPU engines.

Please check tutorials below to see #DNNL_RUNTIME_DIM_VAL support in use.
Check the tutorials below to see #DNNL_RUNTIME_DIM_VAL support in use.

### Data Types

Expand All @@ -96,12 +96,13 @@ types for source, destination, weights, and bias tensors:

| Source | Weights | Destination | Bias |
|:-----------------|:---------------------|:---------------------------------|:----------------------------|
| f32 | f32 | f32 | f32 |
| f16 | f16, u8, s8, u4, s4 | f16, u8, s8 | f16, f32 |
| bf16 | bf16, u8, s8, u4, s4 | f32, bf16 | bf16, f32 |
| f64 | f64 | f64 | f64, f32, f16, bf16, s8, u8 |
| f32 | f32 | f32 | f32, bf16, f16, u8, s8 |
| f16 | f16, u8, s8, u4, s4 | f16, u8, s8 | f32 |
| f16 | f16, u8, s8 | f32 | f32, f16 |
| bf16 | bf16, u8, s8, u4, s4 | f32, bf16 | f32, bf16 |
| f32, bf16, f16 | u8, s8 | f32, bf16, f16 | f32, bf16, f16 |
| f8_e5m2, f8_e4m3 | f8_e5m2, f8_e4m3 | f32, f16, bf16, f8_e5m2, f8_e4m3 | f32, bf16, f16 |
| f8_e5m2, f8_e4m3 | f8_e5m2, f8_e4m3 | f32, f16, bf16, f8_e5m2, f8_e4m3 | f32, bf16, f16 |
| u8, s8 | s8 | u8, s8, s32, f32, f16, bf16 | u8, s8, s32, f32, f16, bf16 |


Expand Down Expand Up @@ -189,6 +190,7 @@ memory buffer that shares its shape with the destination buffer).
* Three and higher dimensional matrices.
- The layout of dropout mask has to be exactly the same as that of dst.


3. **CPU**
- Configuration with int8 source data type, s8 weight data type and f16
destination data type isn't supported.
Expand Down
4 changes: 2 additions & 2 deletions doc/programming_model/data_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ oneDNN supports training and inference with the following data types:
model implementation.

@note
f64 is only supported for convolution, reorder, layer normalization and
pooling primitives, on the GPU engine.
f64 is supported only for matmul, convolution, reorder, layer normalization, and
pooling primitives on the GPU engine.

@note
Boolean is only supported by the oneDNN graph API when the graph compiler
Expand Down