-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry pick documentation for ROCm 6.3 #2053
Conversation
Co-authored-by: Braden Stefanuk <[email protected]>
* Tensile doc updates * API reference updates * Update api-reference.rst * Update index.rst * Update library-creation.rst * Review comments * added contribution * Update README.md
@@ -1,54 +1,78 @@ | |||
******************************************************************** | |||
Contributing Guide | |||
Contribution guidelines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensile contribution guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing "Tensile" in front of every title is redundant and potentially misleading since the term is overloaded in the project.
If SEO is the goal, there are alternatives I would prefer prioritizing such as placing keywords naturally into the body of the text, getting backlinks from other sites, using strong internal linking, etc. Either way, I don't think the usage of "Tensile" is overlooked for this site, it's already mentioned >200 times in the docs/src directory.
.. _programmers-guide: | ||
|
||
******************************************************************** | ||
Programmer's guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Programmer's guide | |
Tensile programmer's guide |
.. _installation: | ||
|
||
******************************************************************** | ||
Installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installation | |
Tensile installation |
.. _environment-variables: | ||
|
||
******************************************************************** | ||
Environment variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment variables | |
Tensile environment variables |
@bstefanuk Can we add the content from Introduction into index.rst? The introduction file shouldn't be required as a separate entity. |
@SwRaw I like having a document to introduce the concepts in Tensile, so I would prefer to keep it. We can add more content to it to make it a more meaningful standalone document, if it makes sense. |
|
||
.. _index: | ||
|
||
******************************************************************** | ||
Tensile documentation | ||
******************************************************************** | ||
|
||
Tensile is a tool for creating a benchmark-driven backend library for GEMMs, GEMM-like problems (such as batched GEMM), N-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on AMD GPU. | ||
Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU. | |
Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU. | |
Tensile is written in Python for library and kernel generation and in C++ for client headers and library tests. It is a vital | |
project in the ROCm ecosystem, providing optimized kernels for downstream libraries such as :doc:`rocBLAS <rocblas:index>`. | |
The parts of Tensile that are written in Python consist of applications that are collectively responsible | |
for generating optimized kernels and library objects to access these kernels from client code. |
.. _solution-selection-catalogs: | ||
|
||
*************************** | ||
Solution selection catalogs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solution selection catalogs | |
Tensile's solution selection catalogs |
Tensile provides a mechanism by which only a subset of the code object files produced during a build are loaded at runtime. | ||
This is necessary to avoid the overhead associated with loading code object files including initialization time and the | ||
memory footprint of the loaded code object files. However, this introduces the problem of knowing which code object file to load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensile provides a mechanism by which only a subset of the code object files produced during a build are loaded at runtime. | |
This is necessary to avoid the overhead associated with loading code object files including initialization time and the | |
memory footprint of the loaded code object files. However, this introduces the problem of knowing which code object file to load. | |
To avoid the overhead associated with loading code object files including initialization time and the | |
memory footprint of the loaded code object files, Tensile provides a mechanism to load only a subset of the code object files produced during a build, at runtime. | |
To achieve this, it must be determined which code object file to load. |
Solution selection is the process by which the **TensileHost** library determines what kernel is preferred and, in turn, | ||
what code object file contains the selected kernel. This process uses a hierarchical structure | ||
to efficiently search for kernels based on hardware, problem size, and transpose, among others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solution selection is the process by which the **TensileHost** library determines what kernel is preferred and, in turn, | |
what code object file contains the selected kernel. This process uses a hierarchical structure | |
to efficiently search for kernels based on hardware, problem size, and transpose, among others. | |
To determine the preferred kernel and the code object file containing the selected kernel, the ``TensileHost`` library utilizes a process named `Solution selection`. This process uses a hierarchical structure to efficiently search for kernels based on hardware, problem size, and transpose, among others. |
This is the role of the **solution selection catalog** [1]_---a serialized file that uses a hierarchical | ||
schema to organize kernel metadata for efficient lookup at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the role of the **solution selection catalog** [1]_---a serialized file that uses a hierarchical | |
schema to organize kernel metadata for efficient lookup at runtime. | |
For efficient lookup at runtime, the kernel metadata must be organized in a hierarchical schema in a serialized file named `solution selection catalog` [1]_. |
type: Problem # [_C] | ||
property: {type: OperationIdentifier} | ||
type: ProblemMap # [_B] | ||
predicate: {type: TruePred} | ||
type: Hardware # [_A] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstefanuk Why are [_C], [_B], [_A] needed? Remove them if not referenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Visual helper to show opening and closing of sections. There could be a better way to do this. They are referenced below.
rows: # [A_] | ||
- library: | ||
map: | ||
Contraction_l_Alik_Bjlk_Cijk_Dijk: # [B_] | ||
... | ||
rows: # [C_] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstefanuk Why underscore after A, B, C?
Line **[A]** shows the top level of the parent catalog, which contains a single row for each hardware architecture. | ||
Line **[B]** shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*. | ||
Line **[C]** shows the problem type and predicates used to match against exact solutions contained in the child catalogs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line **[A]** shows the top level of the parent catalog, which contains a single row for each hardware architecture. | |
Line **[B]** shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*. | |
Line **[C]** shows the problem type and predicates used to match against exact solutions contained in the child catalogs. | |
Note that the lines in the parent catalog are marked as A,B, and C for reference. | |
- Line [A]: Shows the top level of the parent catalog, which contains a single row for each hardware architecture. | |
- Line [B]: Shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*. | |
- Line [C]: Shows the problem type and predicates used to match against exact solutions present in the child catalogs. |
Mode 2: Merge files | ||
------------------- | ||
|
||
.. warning:: | ||
This feature is not recommended and is in the process of being deprecated. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstefanuk Why are we documenting the mode, which is not recommended and on the verge of deprecation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's still demonstrative of the purpose and history of the catalogs.
|
||
-------------------- | ||
|
||
.. [1] Previously these files were called *master solution libraries* because they contain two top level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. [1] Previously these files were called *master solution libraries* because they contain two top level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API. | |
.. [1] Previously these files were named *master solution libraries* because they consisted of two top-level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API. |
************ | ||
Nomenclature | ||
************ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This topic lists and describes the frequently used terms in the Tensile documentation. |
General Matrix Multiplication | ||
============================= | ||
|
||
General matrix multiplication (GEMM) is a level 3 BLAS operation that computes the product of two matrices, formalized by the equation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General matrix multiplication (GEMM) is a level 3 BLAS operation that computes the product of two matrices, formalized by the equation, | |
General matrix multiplication (GEMM) is a level 3 BLAS operation that computes the product of two matrices, formalized by the following equation: |
.. math:: | ||
C = \alpha A B + \beta C | ||
|
||
where :math:`\alpha` and :math:`\beta` are scalars and :math:`A` and :math:`B` are optionally transposed input matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where :math:`\alpha` and :math:`\beta` are scalars and :math:`A` and :math:`B` are optionally transposed input matrices. | |
In the given formula, :math:`\alpha` and :math:`\beta` are scalars and :math:`A` and :math:`B` are optionally transposed input matrices. |
- :math:`C_{i,j,k} = \sum_l A_{i,l,k} B_{l,j,k}` | ||
* - 2D Summation | ||
- :math:`C_{i,j} = \sum_{k,l} A_{i,k,l} B_{j,l,k}` | ||
* - 3 Batched Indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* - 3 Batched Indices | |
* - 3 Batched indices |
- :math:`C_{i,j} = \sum_{k,l} A_{i,k,l} B_{j,l,k}` | ||
* - 3 Batched Indices | ||
- :math:`C_{i,j,k,l,m} = \sum_n A_{i,k,m,l,n} B_{j,k,l,n,m}` | ||
* - 4 Free Indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* - 4 Free Indices | |
* - 4 Free indices |
When an index shows up in multiple tensors, those tensors must be the same size along with the dimension, however, they can have different strides. | ||
|
||
There are three categories of indices or dimensions used in the problems supported by Tensile: free, batch and bound. | ||
**Tensile only supports problems with at least one pair of free indices.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Tensile only supports problems with at least one pair of free indices.** | |
.. note:: | |
Tensile only supports problems with at least one pair of free indices. |
@bstefanuk I have added the content from introduction into index.rst and prefer to remove introduction file. Once we have enough content , we can plan something for later. Right now, its better to include the introduction on the index page. |
Co-authored-by: Swati Rawat <[email protected]>
Co-authored-by: Swati Rawat <[email protected]>
Comments on this branch are captured in #2056 and include largely changes of taste. Moving forward with merge for 6.3 docs. |
This PR cherry picks the following commits into the docs/6.3 branch:
09ec347
7ac8e75
9b35483