Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick documentation for ROCm 6.3 #2053

Merged
merged 5 commits into from
Nov 26, 2024
Merged

Conversation

bstefanuk
Copy link
Contributor

This PR cherry picks the following commits into the docs/6.3 branch:

SwRaw and others added 3 commits November 18, 2024 13:38
Co-authored-by: Braden Stefanuk <[email protected]>
* Tensile doc updates
* API reference updates
* Update api-reference.rst
* Update index.rst
* Update library-creation.rst
* Review comments
* added contribution
* Update README.md
@@ -1,54 +1,78 @@
********************************************************************
Contributing Guide
Contribution guidelines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensile contribution guidelines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing "Tensile" in front of every title is redundant and potentially misleading since the term is overloaded in the project.

If SEO is the goal, there are alternatives I would prefer prioritizing such as placing keywords naturally into the body of the text, getting backlinks from other sites, using strong internal linking, etc. Either way, I don't think the usage of "Tensile" is overlooked for this site, it's already mentioned >200 times in the docs/src directory.

.. _programmers-guide:

********************************************************************
Programmer's guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Programmer's guide
Tensile programmer's guide

.. _installation:

********************************************************************
Installation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Installation
Tensile installation

.. _environment-variables:

********************************************************************
Environment variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Environment variables
Tensile environment variables

@SwRaw
Copy link
Contributor

SwRaw commented Nov 22, 2024

@bstefanuk Can we add the content from Introduction into index.rst? The introduction file shouldn't be required as a separate entity.

@bstefanuk
Copy link
Contributor Author

@SwRaw I like having a document to introduce the concepts in Tensile, so I would prefer to keep it. We can add more content to it to make it a more meaningful standalone document, if it makes sense.


.. _index:

********************************************************************
Tensile documentation
********************************************************************

Tensile is a tool for creating a benchmark-driven backend library for GEMMs, GEMM-like problems (such as batched GEMM), N-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on AMD GPU.
Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU.
Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU.
Tensile is written in Python for library and kernel generation and in C++ for client headers and library tests. It is a vital
project in the ROCm ecosystem, providing optimized kernels for downstream libraries such as :doc:`rocBLAS <rocblas:index>`.
The parts of Tensile that are written in Python consist of applications that are collectively responsible
for generating optimized kernels and library objects to access these kernels from client code.

.. _solution-selection-catalogs:

***************************
Solution selection catalogs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Solution selection catalogs
Tensile's solution selection catalogs

Comment on lines +11 to +13
Tensile provides a mechanism by which only a subset of the code object files produced during a build are loaded at runtime.
This is necessary to avoid the overhead associated with loading code object files including initialization time and the
memory footprint of the loaded code object files. However, this introduces the problem of knowing which code object file to load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tensile provides a mechanism by which only a subset of the code object files produced during a build are loaded at runtime.
This is necessary to avoid the overhead associated with loading code object files including initialization time and the
memory footprint of the loaded code object files. However, this introduces the problem of knowing which code object file to load.
To avoid the overhead associated with loading code object files including initialization time and the
memory footprint of the loaded code object files, Tensile provides a mechanism to load only a subset of the code object files produced during a build, at runtime.
To achieve this, it must be determined which code object file to load.

Comment on lines +14 to +16
Solution selection is the process by which the **TensileHost** library determines what kernel is preferred and, in turn,
what code object file contains the selected kernel. This process uses a hierarchical structure
to efficiently search for kernels based on hardware, problem size, and transpose, among others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Solution selection is the process by which the **TensileHost** library determines what kernel is preferred and, in turn,
what code object file contains the selected kernel. This process uses a hierarchical structure
to efficiently search for kernels based on hardware, problem size, and transpose, among others.
To determine the preferred kernel and the code object file containing the selected kernel, the ``TensileHost`` library utilizes a process named `Solution selection`. This process uses a hierarchical structure to efficiently search for kernels based on hardware, problem size, and transpose, among others.

Comment on lines +17 to +18
This is the role of the **solution selection catalog** [1]_---a serialized file that uses a hierarchical
schema to organize kernel metadata for efficient lookup at runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is the role of the **solution selection catalog** [1]_---a serialized file that uses a hierarchical
schema to organize kernel metadata for efficient lookup at runtime.
For efficient lookup at runtime, the kernel metadata must be organized in a hierarchical schema in a serialized file named `solution selection catalog` [1]_.

Comment on lines +141 to +145
type: Problem # [_C]
property: {type: OperationIdentifier}
type: ProblemMap # [_B]
predicate: {type: TruePred}
type: Hardware # [_A]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstefanuk Why are [_C], [_B], [_A] needed? Remove them if not referenced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visual helper to show opening and closing of sections. There could be a better way to do this. They are referenced below.

Comment on lines +120 to +125
rows: # [A_]
- library:
map:
Contraction_l_Alik_Bjlk_Cijk_Dijk: # [B_]
...
rows: # [C_]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstefanuk Why underscore after A, B, C?

Comment on lines +148 to +150
Line **[A]** shows the top level of the parent catalog, which contains a single row for each hardware architecture.
Line **[B]** shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*.
Line **[C]** shows the problem type and predicates used to match against exact solutions contained in the child catalogs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Line **[A]** shows the top level of the parent catalog, which contains a single row for each hardware architecture.
Line **[B]** shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*.
Line **[C]** shows the problem type and predicates used to match against exact solutions contained in the child catalogs.
Note that the lines in the parent catalog are marked as A,B, and C for reference.
- Line [A]: Shows the top level of the parent catalog, which contains a single row for each hardware architecture.
- Line [B]: Shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*.
- Line [C]: Shows the problem type and predicates used to match against exact solutions present in the child catalogs.

Comment on lines +152 to +157
Mode 2: Merge files
-------------------

.. warning::
This feature is not recommended and is in the process of being deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstefanuk Why are we documenting the mode, which is not recommended and on the verge of deprecation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's still demonstrative of the purpose and history of the catalogs.


--------------------

.. [1] Previously these files were called *master solution libraries* because they contain two top level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. [1] Previously these files were called *master solution libraries* because they contain two top level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API.
.. [1] Previously these files were named *master solution libraries* because they consisted of two top-level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API.

************
Nomenclature
************

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This topic lists and describes the frequently used terms in the Tensile documentation.

General Matrix Multiplication
=============================

General matrix multiplication (GEMM) is a level 3 BLAS operation that computes the product of two matrices, formalized by the equation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
General matrix multiplication (GEMM) is a level 3 BLAS operation that computes the product of two matrices, formalized by the equation,
General matrix multiplication (GEMM) is a level 3 BLAS operation that computes the product of two matrices, formalized by the following equation:

.. math::
C = \alpha A B + \beta C

where :math:`\alpha` and :math:`\beta` are scalars and :math:`A` and :math:`B` are optionally transposed input matrices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where :math:`\alpha` and :math:`\beta` are scalars and :math:`A` and :math:`B` are optionally transposed input matrices.
In the given formula, :math:`\alpha` and :math:`\beta` are scalars and :math:`A` and :math:`B` are optionally transposed input matrices.

- :math:`C_{i,j,k} = \sum_l A_{i,l,k} B_{l,j,k}`
* - 2D Summation
- :math:`C_{i,j} = \sum_{k,l} A_{i,k,l} B_{j,l,k}`
* - 3 Batched Indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - 3 Batched Indices
* - 3 Batched indices

- :math:`C_{i,j} = \sum_{k,l} A_{i,k,l} B_{j,l,k}`
* - 3 Batched Indices
- :math:`C_{i,j,k,l,m} = \sum_n A_{i,k,m,l,n} B_{j,k,l,n,m}`
* - 4 Free Indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - 4 Free Indices
* - 4 Free indices

docs/src/reference/nomenclature.rst Outdated Show resolved Hide resolved
docs/src/reference/nomenclature.rst Outdated Show resolved Hide resolved
When an index shows up in multiple tensors, those tensors must be the same size along with the dimension, however, they can have different strides.

There are three categories of indices or dimensions used in the problems supported by Tensile: free, batch and bound.
**Tensile only supports problems with at least one pair of free indices.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Tensile only supports problems with at least one pair of free indices.**
.. note::
Tensile only supports problems with at least one pair of free indices.

@SwRaw
Copy link
Contributor

SwRaw commented Nov 25, 2024

@SwRaw I like having a document to introduce the concepts in Tensile, so I would prefer to keep it. We can add more content to it to make it a more meaningful standalone document, if it makes sense.

@bstefanuk I have added the content from introduction into index.rst and prefer to remove introduction file. Once we have enough content , we can plan something for later. Right now, its better to include the introduction on the index page.

@bstefanuk
Copy link
Contributor Author

Comments on this branch are captured in #2056 and include largely changes of taste. Moving forward with merge for 6.3 docs.

@bstefanuk bstefanuk merged commit 7d0d854 into ROCm:docs/6.3.0 Nov 26, 2024
1 check passed
@bstefanuk bstefanuk deleted the docs/6.3.0 branch November 26, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:docs-only Docs only changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants