diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index d49de2f6f..219207fdd 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -1,54 +1,78 @@ ******************************************************************** -Contributing Guide +Contribution guidelines ******************************************************************** +This document provides the guidelines for contributing to the Tensile source code. +.. seealso:: -Welcome to the Tensile project! If you're thinking about contributing, this document is for you. We encourage you to read this guide to understand how to contribute to the project to ensure that your contributions are accepted and merged in a timely manner. - + For information about environment setup and development processes, see :ref:`programmers-guide`. -.. seealso:: +Tensile's development practice is based on the `Gitflow workflow `_. The **develop** branch is the default branch for development, where all new features and bug fixes are merged. After a PR is merged into **develop**, it undergoes extended testing and profiling. If these checks pass, the PR might be merged into **staging** to be included in the next release. A PR is available in the upcoming release only if it is merged before the release branch is cut. - If you haven't already, please review :ref:`getting-started` for an introduction to the project. For details on environment setup and day-to-day development processes, please refer to the :ref:`developer-guide`. +================ +Raising issues +================ -Tensile's development practice is based on the `Gitflow Workflow `_. The **develop** branch is the default branch for development and is where all new features and bug fixes should be merged. After a PR is merged into **develop**, it will undergo extended testing and profiling. Pending all of these checks pass, it may be promoted to **staging** be included in the next release. If you would like to see the changes in the next release, please ensure that the PR is merged before the release branch is cut. +To notify us of any existing issue, use the GitHub *Issues* tab. -============================ -How to submit a Pull Request -============================ +- Use your best judgment for issue creation. If your issue is already listed, upvote the issue and comment or post to provide additional details, such as how you reproduced this issue. +- If you are not sure of the listed issue being the same as yours, err on the side of caution and file your issue. You can link your issue with the existing issue by providing your issue link and details in the comment section. If your issue is evaluated to be a duplicate, it will be closed. +- If your issue doesn't exist, use the issue template to file a new issue. + - When filing an issue, provide as much information as possible including the script output, which is required to collect information about your configuration. This helps to reproduce the issue effectively. + - Check your issue regularly, as we might require additional information to successfully reproduce the issue. +- You can also open an issue to ask the maintainers if a proposed change meets the acceptance criteria, or to discuss an idea pertaining to the library. + +=================== +Acceptance criteria +=================== + +Pull Requests (PR) are reviewed by the members of `CODEOWNERS.md `_. +Depending on the PR, the reviewers might post comments or request changes. This might require several iterations. +The PR is approved only when all the changes requested by the reviewers are marked complete. +When a Pull Request is submitted, it undergoes a standard suite of continuous integration tests. + +Once the pull request is approved and tests pass, it is merged by a member of the codeowner's community. +Attribution for your commit will be preserved when it is merged. -**When making changes:** +========================== +Submitting a Pull Request +========================== -1. Create a fork of Tensile---please do not create feature branches directly in https://github.com/ROCm/Tensile. -2. Clone your fork locally and set up your :ref:`development-environment`. -3. Create a feature branch off of **develop** and make changes to the code. -4. Issue ``tox run -m precommit`` and ensure that all checks pass. -5. Commit you changes using the convention for :ref:`commit-messages`. -6. If you are updating documentation, issue ``tox run -e docs`` and verify the styling and formatting is what you expect. -7. Push the changes to your fork. +By creating a PR, you agree to the statements made in the `Code License`_ section. Your PR must target the default *develop* branch, which also serves as our integration branch. + +a. **Forking the repository and making changes:** + + 1. Create a fork of Tensile. Don't create feature branches directly in https://github.com/ROCm/Tensile. + 2. Clone your fork locally and set up your :ref:`development-environment`. + 3. Create your feature branch from **develop** and make changes to the code. + 4. Issue ``tox run -m precommit`` and ensure that all checks pass. + 5. Commit your changes using the convention for :ref:`commit-messages`. + 6. If you are updating documentation, issue ``tox run -e docs`` and verify the styling and formatting. + 7. Push the changes to your fork. .. tip:: - Keeping the scope of new PRs as narrow as possible improves the chances it will be accepted. If you are making multiple changes, consider breaking them into separate PRs. Keeping PRs small supports timely code reviews, traceability, and straightforward reversions. + Keeping the scope of new PRs as narrow as possible improves the chances of it getting accepted. If you are making multiple changes, consider breaking them into separate PRs. Keeping PRs small supports timely code reviews, traceability, and straightforward reversions. -**When opening a PR:** +b. **Creating the PR:** -1. Ensure that **your develop** branch is up-to-date with the **upstream develop** branch---this may require a rebase or a merge. -2. Verify that your changes pass static analysis checks and all pre-checkin, host library, and unit tests by running ``tox run -m prepr``---then go get a coffee, this could take up to an hour. -3. Create the PRs against the https://github.com/ROCm/Tensile **develop** branch. -4. Fill in as many details as possible. Include a description, outcomes, notable changes, and environment information. This more information, the more likely the PR will be reviewed and merged in a timely manner. -5. Title the PR in present imperative tense, e.g., "*Update* kernel parameters" not "Updates" nor "Updated". + 1. Ensure that **your develop** branch is up-to-date with the **upstream develop** branch. This might require a rebase or a merge. + 2. Verify that your changes pass static analysis checks and all pre-checkin, host library, and unit tests by running ``tox run -m prepr``. + 3. Create the PR against the https://github.com/ROCm/Tensile **develop** branch. + 4. Fill in as many details as possible. Include description, outcomes, notable changes, and environment information. The availability of information makes the PR review process easier, increasing the likelihood of the PR getting merged in a timely manner. + 5. Title the PR in present imperative tense. For example, "*Update* kernel parameters", not "Updates" or "Updated". .. tip:: - If you need to merge **develop** into your feature branch after a PR is opened, use a merge instead of a rebase. + To merge **develop** into your feature branch after a PR is opened, use a merge instead of a rebase. - In general, refrain from force pushing once a feature branch is in PR as it is prone to gotchas in our CI system. Ideally, the git history is linear and clean *before* a PR is created. As such we encourage contributors to conduct any rebases or amends prior to opening a PR. + In general, refrain from force pushing once a feature branch is in PR as it is prone to gotchas in our CI system. Ideally, the git history is linear and clean *before* a PR is created. Hence, we encourage contributors to conduct any rebases or amends prior to opening a PR. -**Once all checks pass and the PR is approved:** +c. **Merging the PR:** -1. Ensure the title of the PR properly describes the changes, update if necessary. -2. Squash and merge the PR---if you are not a maintainer, a maintainer will do this for you. When merging a large change, use bullet points in the commit message to break down the changes. + 1. Ensure the title of the PR properly describes the changes. + 2. Squash and merge the PR. If you are not the maintainer, a maintainer does this for you. When merging multiple changes, use bullet points in the commit message to break down the changes. ------ Labels @@ -59,32 +83,30 @@ Labels ============= ======= Label Effect ============= ======= - ci:profiling Adds the *profiling* job to the CI pipeline. Profiling artifacts will be saved for 10 days. + ci:profiling Adds the *profiling* job to the CI pipeline. Profiling artifacts are saved for 10 days. ci:docs-only Only runs the *docs/readthedocs* job; omits all other pipeline jobs. ============= ======= - -=========================== -Conventions and style guide -=========================== +============================ +Coding style and conventions +============================ ------------------- General conventions ------------------- -1. Always use space indentation (4 spaces)---never commit a tab, e.g., ``\t``. +Always use space indentation (four spaces). Never commit a tab (``\t``). ------------------ Python doc-strings ------------------ -Tensile uses `autodoc `_ to pull in documentation from doc-strings and integrate them into this site. Please use the following guidelines when writing Python functions and modules to maintain quality and consistency. - -1. The all parameters and returned values should be identified with type-hints. -2. All functions should have a doc-string describing the parameters, return value, and any exception; however, if the function is small and the implementation is straightforward, a one-line doc-string is sufficient. -3. Do not include types directly in the doc-string, these should be added as type-hints in the function definition. -4. For doc-string styling, use the `Google Python Style Guide `_. +Tensile uses `autodoc `_ to pull in documentation from doc-strings and integrate them into this site. Use the following guidelines when writing Python functions and modules to maintain quality and consistency. +1. Identify the parameters and returned values with type-hints. +2. For all functions, specify doc-string describing the parameters, return value, and any exception. However, if the function is small and the implementation is straightforward, a one-line doc-string is sufficient. +3. Don't include types directly in the doc-string. Add them as type-hints in the function definition. +4. For doc-string styling, use the `Google Python style guide `_. .. _commit-messages: @@ -93,5 +115,44 @@ Commit messages --------------- 1. Use `conventional commits `_. -2. Use the present imperative tense, e.g., "add" not "adds" nor "added". -3. Don't add a period (``.``) to the end of the message. +2. Use the present imperative tense. For example, "add" not "adds" or "added". +3. Don't end the message with a period (.). + +============ +Code license +============ + +All code contributed to this project will be licensed under the given `LICENSE `_. Your contribution will be accepted under the same license. + +For each new file, include the following licensing header: + +.. code:: cpp + + /******************************************************************************* + * Copyright (c) 20xx Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + *******************************************************************************/ + +=============== +Release cadence +=============== + +Official Tensile releases are subject to the general ROCm release cadence, which typically follows a quarterly cycle. Latest stable versions of Tensile are available in the **staging** branch. diff --git a/README.md b/README.md index 0aa351e9e..fc3532cbb 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,6 @@ Tensile is a tool for creating benchmark-driven backend libraries for GEMMs, GEMM-like problems (such as batched GEMM), and general N-dimensional tensor contractions on a GPU. -The Tensile library is mainly used as backend library to rocBLAS. +The Tensile library is mainly used as a backend library for rocBLAS. Tensile acts as the performance backbone for a wide variety of 'compute' applications running on AMD GPUs. -See [Tensile Wiki](https://github.com/ROCm/Tensile/wiki) for documentation. +> [!NOTE] +> The published documentation is available at [Tensile](https://rocm.docs.amd.com/projects/Tensile/en/latest/index.html) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the `Tensile/docs/src` folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html). diff --git a/docs/assets/msl.svg b/docs/assets/msl.svg new file mode 100644 index 000000000..6bdca33a2 --- /dev/null +++ b/docs/assets/msl.svg @@ -0,0 +1,10 @@ + + + + + + + + HW: gfx900Op: GEMM TTOp: GEMM TNOp: GEMM NTOp: GEMM NNType: [h,h,s,s]Type: [s,s,s,s]Type: [d,d,d,d]...HW: gfx90aOp: GEMM TTOp: GEMM TNOp: GEMM NTOp: GEMM NNType: [h,h,s,s]Type: [s,s,s,s]Type: [d,d,d,d]...[FreeA, FreeB, BoundSize] | Index-----------------------------------------[4, 4, 4] | 14[64, 16, 64] | 219[780, 782, 784] | 92[FreeA, FreeB, BoundSize] | Index-----------------------------------------[4, 4, 4] | 23[64, 16, 64] | 134[780, 782, 784] | 151 \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index e39c4176f..3076e94cb 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -37,6 +37,8 @@ def get_semantic_version_from_file(file_path: str, search_prefix: str): version = f"{semantic_version}" release = f"{semantic_version}" +numfig_reset = True + autodoc_default_options = { "members": True, "member-order": "bysource", @@ -60,4 +62,5 @@ def get_semantic_version_from_file(file_path: str, search_prefix: str): extensions += [ "sphinx.ext.autodoc", # Automatically create API documentation from Python docstrings "sphinx.ext.napoleon", + "sphinx.ext.mathjax", ] diff --git a/docs/license.rst b/docs/license.rst index 8170dc37b..4549165d0 100644 --- a/docs/license.rst +++ b/docs/license.rst @@ -1,6 +1,6 @@ .. meta:: - :description: rocBLAS documentation and API reference library - :keywords: rocBLAS, ROCm, API, Linear Algebra, documentation + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile license .. _license: @@ -9,5 +9,3 @@ License ***************** .. include:: ../LICENSE.md - -You can find more licensing information for on the `ROCm Licensing `_ page. diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index cc95f24e7..6c9fc2184 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -2,27 +2,37 @@ defaults: numbered: False root: src/index subtrees: - - entries: - - file: src/getting-started - - file: src/concepts - entries: - - file: src/concepts/benchmarking - - file: src/concepts/tuning - - file: src/concepts/kernels - - file: src/concepts/library-creation - - file: src/api-reference - entries: - - file: src/api-reference/Common - - file: src/api-reference/EmbeddedData - - file: src/api-reference/TensileCreateLibrary - - file: src/api-reference/Utilities - - file: src/cli-reference - entries: - - file: src/cli-reference/TensileCreateLibrary - - file: src/general-reference - entries: - - file: src/general-reference/environment-variables - - file: src/contributor-guide - - file: src/developer-guide - - file: license +- caption: Install + entries: + - file: src/install/installation + title: Installation +- caption: Conceptual + entries: + - file: src/conceptual/introduction + - file: src/conceptual/solution-selection-catalogs + +- caption: Reference + entries: + - file: src/cli-reference/cli-reference + title: CLI reference + entries: + - file: src/cli-reference/tensile-create-library-cli + - file: src/api-reference/api-reference + title: API reference + entries: + - file: src/api-reference/common + - file: src/api-reference/embedded-data + - file: src/api-reference/tensile-create-library-api + - file: src/api-reference/utilities + - file: src/reference/environment-variables + - file: src/reference/nomenclature + +- caption: Contribution + entries: + - file: src/how-to/programmers-guide + - file: src/how-to/contribution-guidelines + +- caption: About + entries: + - file: license.rst diff --git a/docs/src/api-reference.rst b/docs/src/api-reference.rst deleted file mode 100644 index 48fa23043..000000000 --- a/docs/src/api-reference.rst +++ /dev/null @@ -1,11 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _api-reference: - -******************************************************************** -API Reference -******************************************************************** - - diff --git a/docs/src/api-reference/Common.rst b/docs/src/api-reference/Common.rst deleted file mode 100644 index 34880a13a..000000000 --- a/docs/src/api-reference/Common.rst +++ /dev/null @@ -1,9 +0,0 @@ - -.. _common-api-reference: - -====== -Common -====== - -.. autofunction:: Tensile.Common::getArchitectureName -.. autofunction:: Tensile.Common::tPrint \ No newline at end of file diff --git a/docs/src/api-reference/EmbeddedData.rst b/docs/src/api-reference/EmbeddedData.rst deleted file mode 100644 index dd302fe85..000000000 --- a/docs/src/api-reference/EmbeddedData.rst +++ /dev/null @@ -1,8 +0,0 @@ - -.. embeddeddata-api-reference: - -============ -EmbeddedData -============ - -.. autofunction:: Tensile.EmbeddedData::generateLibrary \ No newline at end of file diff --git a/docs/src/api-reference/Utilities.rst b/docs/src/api-reference/Utilities.rst deleted file mode 100644 index 25309dc1f..000000000 --- a/docs/src/api-reference/Utilities.rst +++ /dev/null @@ -1,10 +0,0 @@ - -.. _utitilities-api-reference: - -========= -Utilities -========= - -.. autofunction:: Tensile.Utilities.Profile::profile -.. autofunction:: Tensile.Utilities.String::splitDelimitedString -.. autofunction:: Tensile.Utilities.toFile::toFile diff --git a/docs/src/api-reference/api-reference.rst b/docs/src/api-reference/api-reference.rst new file mode 100644 index 000000000..80f92c004 --- /dev/null +++ b/docs/src/api-reference/api-reference.rst @@ -0,0 +1,19 @@ +.. meta:: + :description: Tensile documentation and API reference + :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation + +.. _api-reference: + +********************** +Tensile API reference +********************** + +This topic provides a categorywise listing of Tensile APIs. + +:ref:`common-api-reference` + +:ref:`embeddeddata-api-reference` + +:ref:`tensilecreatelibrary-api-reference` + +:ref:`utilities-api-reference` diff --git a/docs/src/api-reference/common.rst b/docs/src/api-reference/common.rst new file mode 100644 index 000000000..e49bd7b95 --- /dev/null +++ b/docs/src/api-reference/common.rst @@ -0,0 +1,12 @@ +.. meta:: + :description: Tensile documentation and API reference + :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation + +.. _common-api-reference: + +====== +Common +====== + +.. autofunction:: Tensile.Common::getArchitectureName +.. autofunction:: Tensile.Common::tPrint \ No newline at end of file diff --git a/docs/src/api-reference/embedded-data.rst b/docs/src/api-reference/embedded-data.rst new file mode 100644 index 000000000..56189f670 --- /dev/null +++ b/docs/src/api-reference/embedded-data.rst @@ -0,0 +1,11 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile, GEMM, Tensor, Tensile API documentation, Tensile embedded data + +.. _embeddeddata-api-reference: + +============== +EmbeddedData +============== + +.. autofunction:: Tensile.EmbeddedData::generateLibrary \ No newline at end of file diff --git a/docs/src/api-reference/TensileCreateLibrary.rst b/docs/src/api-reference/tensile-create-library-api.rst similarity index 74% rename from docs/src/api-reference/TensileCreateLibrary.rst rename to docs/src/api-reference/tensile-create-library-api.rst index f3d43909d..b898233a7 100644 --- a/docs/src/api-reference/TensileCreateLibrary.rst +++ b/docs/src/api-reference/tensile-create-library-api.rst @@ -1,24 +1,27 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile, GEMM, Tensor, Tensile API documentation, Tensile create library .. _tensilecreatelibrary-api-reference: -==================== +====================== TensileCreateLibrary -==================== +====================== .. autofunction:: Tensile.TensileCreateLibrary::addFallback .. autofunction:: Tensile.TensileCreateLibrary::addNewLibrary -.. autofunction:: Tensile.TensileCreateLibrary::applyNaming +.. autofunction:: Tensile.TensileCreateLibrary::applyNaming .. autofunction:: Tensile.TensileCreateLibrary::filterProcessingErrors .. autofunction:: Tensile.TensileCreateLibrary::findLogicFiles .. autofunction:: Tensile.TensileCreateLibrary::generateClientConfig -.. autofunction:: Tensile.TensileCreateLibrary::generateLogicData +.. autofunction:: Tensile.TensileCreateLibrary::generateLogicData .. autofunction:: Tensile.TensileCreateLibrary::generateLazyMasterFileList .. autofunction:: Tensile.TensileCreateLibrary::generateMasterFileList -.. autofunction:: Tensile.TensileCreateLibrary::makeMasterLibraries +.. autofunction:: Tensile.TensileCreateLibrary::makeMasterLibraries .. autofunction:: Tensile.TensileCreateLibrary::makeSolutions .. autofunction:: Tensile.TensileCreateLibrary::markDuplicateKernels .. autofunction:: Tensile.TensileCreateLibrary::parseLibraryLogicFiles -.. autofunction:: Tensile.TensileCreateLibrary::prepAsm +.. autofunction:: Tensile.TensileCreateLibrary::prepAsm .. autofunction:: Tensile.TensileCreateLibrary::sanityCheck .. autofunction:: Tensile.TensileCreateLibrary::verifyManifest .. autofunction:: Tensile.TensileCreateLibrary::writeMasterFile diff --git a/docs/src/api-reference/utilities.rst b/docs/src/api-reference/utilities.rst new file mode 100644 index 000000000..d01be56e6 --- /dev/null +++ b/docs/src/api-reference/utilities.rst @@ -0,0 +1,13 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile, GEMM, Tensor, Tensile API documentation, Tensile utilities + +.. _utilities-api-reference: + +========== +Utilities +========== + +.. autofunction:: Tensile.Utilities.Profile::profile +.. autofunction:: Tensile.Utilities.String::splitDelimitedString +.. autofunction:: Tensile.Utilities.toFile::toFile diff --git a/docs/src/cli-reference/TensileCreateLibrary.rst b/docs/src/cli-reference/TensileCreateLibrary.rst deleted file mode 100644 index 6e22a2c7f..000000000 --- a/docs/src/cli-reference/TensileCreateLibrary.rst +++ /dev/null @@ -1,121 +0,0 @@ - -.. _tensilecreatelibrary-cli-reference: - -==================== -TensileCreateLibrary -==================== - -Syntax ------- - -.. code-block:: - - TensileCreateLibrary [OPTIONS...] - -Required Arguments ------------------- - -When invoking *TensileCreateLibrary*, the following arguments are required. - -\ - Absolute path for logic files. These files are generally found in one of two ways: (i) they are - generated via the `Tensile` program and placed in the build directory under *3_LibraryLogic* (see :ref:`quick-start-example`). - (ii) they are found within a project that hosts pre-generated logic files, e.g., `rocBLAS `_. -\ - Absolute or relative path to the output directory where build artifacts are placed. -\ - One of: {OCL, HIP, HSA} - -Options -------- - -When invoking *TensileCreateLibrary*, one can provide zero or more options. - -\-\-architecture=ARCHITECTURE - Architectures to generate a library for. When specifying multiple options, use quoted, semicolon delimited - architectures, e.g., --architecture='gfx908;gfx1012'. - Supported architectures include: all gfx000 gfx803 gfx900 gfx900:xnack- gfx906 gfx906:xnack+ gfx906:xnack- gfx908 gfx908:xnack+ - gfx908:xnack- gfx90a gfx90a:xnack+ gfx90a:xnack- gfx940 gfx940:xnack+ gfx940:xnack- gfx941 gfx941:xnack+ - gfx941:xnack- gfx942 gfx942:xnack+ gfx942:xnack- gfx1010 gfx1011 gfx1012 gfx1030 gfx1031 gfx1032 gfx1034 gfx1035 - gfx1100 gfx1101 gfx1102. -\-\-build-client - Build Tensile client executable; used for stand alone benchmarking (default). -\-\-client-config - Creates best-solution.ini in the output directory for the library and code object files created (default). -\-\-code-object-version={default,V4,V5} - HSA code-object version. -\-\-cxx-compiler={amdclang++, hipcc} or on Windows {clang++, hipcc} - C++ compiler used when generating binaries. -\-\-embed-library=EMBEDLIBRARY - Embed (new) library files into static variables. Specify the name of the library. -\-\-embed-library-key=EMBEDLIBRARYKEY - Access key for embedding library files. -\-\-generate-manifest-and-exit - Similar to dry-run option for *make*, will compute the outputs - of *TensileCreateLibrary* and write the expected outputs to a - manifest file but does not exectue the commands to generate the - output. -\-\-generate-sources-and-exit - Skip building source code object files and assembly code object files. - Output source files only and exit. -\-\-ignore-asm-cap-cache - Ignore asm capability cache and derive the asm capabilities at runtime. -\-\-jobs=CPUTHREADS, \-j CPUTHREADS - Number of parallel jobs to launch. If this options is set higher than *nproc* the number of parallel - jobs will be equal to the number of cores. If the this option is set below 1 (e.g. 0 or -1), the number - of parallel jobs will be set to the number of cores, up to a maximum of 64. (default = -1). -\-\-lazy-library-loading - Loads Tensile libraries when needed instead of upfront. -\-\-library-format={yaml,msgpack} - Select which library format to use (default = msgpack). -\-\-no-enumerate - Do not run rocm_agent_enumerator. -\-\-no-merge-files - Store every solution and kernel in separate file. -\-\-no-short-file-names - Disables short files names. -\-\-num-merged-files=NUMMERGEDFILES - Number of files the kernels should be written into. -\-\-merge-files - Store all solutions in single file (default). -\-\-short-file-names - On Windows kernel names can get too long. - Converts solution and kernel names to serial ids (default). -\-\-separate-architectures - Separates TensileLibrary file by architecture to reduce the time to load the library file. - This option writes each architecture into a different TensileLibrary_gfxXXX.dat file. -\-\-verbose=PRINTLEVEL, \-v PRINTLEVEL - Set printout verbosity level {0, 1, 2}. -\-\-version=VERSION - Version string to embed into library file. -\-\-write-master-solution-index - Output master solution index in csv format including number - of kernels per architecture post build in csv format. - -Examples --------- - -No options -^^^^^^^^^^ - -The following command will invoke *TensileCreateLibrary* -with no options passing the Logic directory containing -logic files and creates a directory *tensile-output* -in the directory where the *TensileCreateLibrary* -command was invoked. The *tensile-output* directory -will contain the artifacts. - -.. code-block:: - - TensileCreateLibrary /home/myuser/Logic tensile-output HIP - -Adding TensileCreateLibrary options -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The following example illustrates how to add options when -invoking *TensileCreateLibrary*. In some cases, such as ``--separate-architectures`` -no arguments are required; whereas, for ``--jobs`` an argument is required. - -.. code-block:: - - TensileCreateLibrary --separate-architectures --jobs=32 /home/myuser/Logic tensile-output HIP diff --git a/docs/src/cli-reference.rst b/docs/src/cli-reference/cli-reference.rst similarity index 70% rename from docs/src/cli-reference.rst rename to docs/src/cli-reference/cli-reference.rst index 37feb7b08..5149f2621 100644 --- a/docs/src/cli-reference.rst +++ b/docs/src/cli-reference/cli-reference.rst @@ -4,19 +4,14 @@ .. _cli-reference: -************* -CLI Reference -************* - -The **Tensile** project provides several command line tools. Below -we enumerate these tools and provide links to the command line interface -(CLI) reference documentation for each tool. - - * :ref:`TensileCreateLibrary` +********************** +Tensile CLI reference +********************** +The Tensile project provides several command line tools. Here is the standard syntax for the CLI tools: .. table:: Usage syntax for Tensile's command-line documentation - + ================================= ================================== Notation Description ================================= ================================== @@ -26,4 +21,4 @@ we enumerate these tools and provide links to the command line interface ================================= ================================== .. warning:: - Consider undocumented command-line options experimental or deprecated. + Consider undocumented command-line options experimental or deprecated. diff --git a/docs/src/cli-reference/tensile-create-library-cli.rst b/docs/src/cli-reference/tensile-create-library-cli.rst new file mode 100644 index 000000000..7b8f29e0d --- /dev/null +++ b/docs/src/cli-reference/tensile-create-library-cli.rst @@ -0,0 +1,154 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile, GEMM, Tensor, Tensile API documentation, Tensile create library + +.. _tensilecreatelibrary-cli-reference: + +==================== +TensileCreateLibrary +==================== + +``TensileCreateLibrary`` is a command-line tool that generates libraries and code object files for a set of supplied logic files. + +Syntax +------ + +.. code-block:: + + TensileCreateLibrary [OPTIONS...] + +Required arguments +------------------- + +Here is the list of required arguments for invoking the ``TensileCreateLibrary`` command. + +.. list-table:: Required arguments + :header-rows: 1 + + * - Argument + - Description + + * - \ + - Absolute path for logic files. The logic files are generally found in either of the following locations: + |br| \- The build directory under ``3_LibraryLogic``, when generated by the Tensile program. |br| + |br| \- In a project that hosts pregenerated logic files, such as `rocBLAS `_. |br| + + * - \ + - Absolute or relative path to the output directory where build artifacts are placed. + + * - \ + - Runtime language out of OCL, HIP, or HSA + +Options +------- + +Here is the list of optional arguments for invoking the ``TensileCreateLibrary`` command. + +.. list-table:: Options + :header-rows: 1 + + * - Option + - Description + + * - \-\-architecture=ARCHITECTURE + - Architectures to generate a library for. When specifying multiple options, use quoted and semicolon-delimited + architectures such as \-\-architecture='gfx908;gfx1012'. + Supported architectures include: all; gfx000; gfx803; gfx900; gfx900:xnack-; gfx906; gfx906:xnack+; gfx906:xnack-; gfx908; gfx908:xnack+; + gfx908:xnack-; gfx90a; gfx90a:xnack+; gfx90a:xnack-; gfx940; gfx940:xnack+; gfx940:xnack-; gfx941; gfx941:xnack+; + gfx941:xnack-; gfx942; gfx942:xnack+; gfx942:xnack-; gfx1010; gfx1011; gfx1012; gfx1030; gfx1031; gfx1032; gfx1034; gfx1035; + gfx1100; gfx1101; gfx1102. + + * - \-\-build-client + - Builds Tensile client executable that is used for stand alone benchmarking. This option is set by default. + + * - \-\-client-config + - Creates ``best-solution.ini`` in the output directory for the library and generated code object files. This option is set by default. + + * - \-\-code-object-version={default,V4,V5} + - HSA code object version. + + * - \-\-cxx-compiler={amdclang++, hipcc} or {clang++, hipcc} for Windows + - C++ compiler used when generating binaries. + + * - \-\-embed-library=EMBEDLIBRARY + - Specifies the library to embed into static variables. + + * - \-\-embed-library-key=EMBEDLIBRARYKEY + - Access key for embedding library files. + + * - \-\-generate-manifest-and-exit + - Similar to the ``dry-run`` option for ``make``, this option computes the outputs + of ``TensileCreateLibrary`` and writes the expected outputs to a + manifest file but doesn't exectue the commands to generate the output. + + * - \-\-generate-sources-and-exit + - Skips building the source and assembly code object files. Outputs source files only and exits. + + * - \-\-ignore-asm-cap-cache + - Ignores the asm capability cache and derives the asm capabilities at runtime. + + * - \-\-jobs=CPUTHREADS or \-j CPUTHREADS + - Number of parallel jobs to launch. If this option is supplied with a value higher than ``nproc``, the number of parallel + jobs will be the same as the number of cores. If this option is supplied with a value below 1 (0 or -1), the number + of parallel jobs will be the same as the number of cores, up to a maximum of 64. The default value is -1. + * - \-\-lazy-library-loading + - Loads Tensile libraries only when needed. + + * - \-\-library-format={yaml,msgpack} + - Specifies the library format to use. Default value: ``msgpack``. + + * - \-\-no-enumerate + - Prohibits ``rocm_agent_enumerator`` from running. + + * - \-\-no-merge-files + - Stores every solution and kernel in separate files. + + * - \-\-no-short-file-names + - Prohibits short files names. + + * - \-\-num-merged-files=NUMMERGEDFILES + - Number of files the kernels must be written into. + + * - \-\-merge-files + - Stores all solutions in a single file. This is set by default. + + * - \-\-short-file-names + - Converts solution and kernel names to serial Ids if Windows kernel name is too long. The option is set by default. + + * - \-\-separate-architectures + - Separates ``TensileLibrary`` file according to architecture to reduce the library file loading time. + This option writes each architecture into a different ``TensileLibrary_gfxXXX.dat`` file. + + * - \-\-verbose=PRINTLEVEL or \-v PRINTLEVEL + - Sets printout verbosity level out of 0, 1, and 2. + + * - \-\-version=VERSION + - Version string to embed into the library file. + + * - \-\-write-master-solution-index + - Outputs master solution index including number + of kernels per architecture post build in csv format. + +Usage without options +----------------------- + +Here is how to run ``TensileCreateLibrary`` command without options. The specified logic directory contains the +logic files and the generated artifacts are directed to the *tensile-output* directory that is created in the directory where the ``TensileCreateLibrary`` +command is invoked. The runtime language is HIP. + +.. code-block:: + + TensileCreateLibrary /home/myuser/Logic tensile-output HIP + +Usage with options +-------------------- + +Here is how to run ``TensileCreateLibrary`` command with options. The following example shows options with and without arguments. + +.. code-block:: + + TensileCreateLibrary --separate-architectures --jobs=32 /home/myuser/Logic tensile-output HIP + +.. |br| raw:: html + +
diff --git a/docs/src/concepts.rst b/docs/src/concepts.rst deleted file mode 100644 index fbad6c370..000000000 --- a/docs/src/concepts.rst +++ /dev/null @@ -1,13 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _concepts: - -******************************************************************** -Concepts -******************************************************************** - -Tensile is written in both Python (for library/kernel generation) and C++ (for client headers and library tests)---it is a vital project to the ROCm ecosystem, providing optimized kernels for downstream libraries such as https://github.com/rocm/rocBLAS. - -The parts of Tensile that are written in Python consist of applications that, collectively, are responsible for generating optimized assembly kernels and generating library objects to access these kernels from client code. diff --git a/docs/src/concepts/benchmarking.rst b/docs/src/concepts/benchmarking.rst deleted file mode 100644 index d77d193c1..000000000 --- a/docs/src/concepts/benchmarking.rst +++ /dev/null @@ -1,11 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _benchmarking: - -******************************************************************** -Benchmarking -******************************************************************** - - diff --git a/docs/src/concepts/kernels.rst b/docs/src/concepts/kernels.rst deleted file mode 100644 index b265d7b52..000000000 --- a/docs/src/concepts/kernels.rst +++ /dev/null @@ -1,11 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _kernels: - -******************************************************************** -Kernels -******************************************************************** - - diff --git a/docs/src/concepts/library-creation.rst b/docs/src/concepts/library-creation.rst deleted file mode 100644 index dfbc20f11..000000000 --- a/docs/src/concepts/library-creation.rst +++ /dev/null @@ -1,11 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _library-creation: - -******************************************************************** -Library creation -******************************************************************** - - diff --git a/docs/src/concepts/tuning.rst b/docs/src/concepts/tuning.rst deleted file mode 100644 index 780a9c4eb..000000000 --- a/docs/src/concepts/tuning.rst +++ /dev/null @@ -1,11 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _tuning: - -******************************************************************** -Tuning -******************************************************************** - - diff --git a/docs/src/conceptual/introduction.rst b/docs/src/conceptual/introduction.rst new file mode 100644 index 000000000..b292fa7c6 --- /dev/null +++ b/docs/src/conceptual/introduction.rst @@ -0,0 +1,15 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile concepts, GEMM, Tensor + +.. _introduction: + +******************************************************************** +Introduction +******************************************************************** + +Tensile is written in both Python (for library/kernel generation) and C++ (for client headers and library tests)---it is a vital +project to the ROCm ecosystem, providing optimized kernels for downstream libraries such as https://github.com/ROCm/rocBLAS. + +The parts of Tensile that are written in Python consist of applications that, collectively, are responsible +for generating optimized kernels and generating library objects to access these kernels from client code. diff --git a/docs/src/conceptual/solution-selection-catalogs.rst b/docs/src/conceptual/solution-selection-catalogs.rst new file mode 100644 index 000000000..2c60e6c25 --- /dev/null +++ b/docs/src/conceptual/solution-selection-catalogs.rst @@ -0,0 +1,189 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile, GEMM, Tensor, Tensile API documentation, Tensile library creation + +.. _solution-selection-catalogs: + +*************************** +Solution selection catalogs +*************************** + +Tensile provides a mechanism by which only a subset of the code object files produced during a build are loaded at runtime. +This is necessary to avoid the overhead associated with loading code object files including initialization time and the +memory footprint of the loaded code object files. However, this introduces the problem of knowing which code object file to load. +Solution selection is the process by which the **TensileHost** library determines what kernel is preferred and, in turn, +what code object file contains the selected kernel. This process uses a hierarchical structure +to efficiently search for kernels based on hardware, problem size, and transpose, among others. +This is the role of the **solution selection catalog** [1]_---a serialized file that uses a hierarchical +schema to organize kernel metadata for efficient lookup at runtime. + +.. note:: + Throughout this document we will refer to catalog files with the .yaml extension. In practice, + solution selection catalogs are usually serialized with `MessagePack `_, which uses the .dat extension. + +Catalog hierarchy +================= + +.. figure:: ../../assets/msl.svg + :alt: Master Solution Library hierarchy + :align: center + + Solution selection catalog heirarchy for gfx900 and gfx90a + +**Level 1: Hardware** + +At runtime, only kernels compatible with the device can execute. As such, the top level of the hierarchy involves hardware comparisons using GFX architecture. + +**Level 2: Operation** + +This layer is a mapping from a GEMM transpose setting, defined using +Einstein tensor notation (e.g. *Contraction_l_Alik_Bjlk_Cijk_Dijk*) to a list of problem properties. + +**Level 3: Problem** + +This layer matches against specific problem properties such as input and output types, and features like high precision accumulation and stochastic rounding. + +**Level 4: Exact solution** + +Finally, exact solutions contain fine-grained details about each solution that can be used during solution selection to locate the best kernel and to assert +that the requested problem predicates are satisfied. Each kernel will have an index and a performance ranking. During solution selection, the highest ranked +kernel from this pool will be selected. + + +Build modes +=========== + +Tensile comes equipped with multiple build modes, which affect the way solution selection catalogs are generated. + +Mode 1: Lazy library loading +---------------------------- + +If ``--lazy-library-loading`` is enabled, then a "parent" catalog is generated for each architecture, named + +.. centered:: TensileLibrary_lazy_.yaml + +This file contains a +reference to each of it's "child" catalogs, but doesn't have details about the exact solutions. These settings are instead +held in the "child" catalogs, which use the naming convention + +.. centered:: TensileLibrary_Type___.yaml + +Here, *precision* is the data type, *problem type* is the GEMM type, including transpose and accumulate settings, and *gfx* is the hardware GFX archiecture. + +For example, *TensileLibrary_Type_HH_Contraction_l_Alik_Bjlk_Cijk_Dijk_.yaml* identifies a code object library for half precision +contractions on two transpose matrices, otherwise known as HGEMM TT. +In this way, the child catalogs contain the solution metadata, while the parent catalog is responsible for organizing the child catalogs +by hardware, problem type, transpose, precision, and other predicates. +This has the benefit of reducing the memory footprint of the calling application, as code object libraries are compiled separately and loaded only when required. + +**Example: Build outputs** + +.. code-block:: bash + :caption: Lazy library loading build outputs for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk* + + build/ + └── library/ + ├── Kernels.so-000-gfx1030.hsaco + ├── Kernels.so-000-gfx900.hsaco + ├── Kernels.so-000-gfx906.hsaco + ├── TensileLibrary_lazy_gfx1030.yaml # [A] + ├── TensileLibrary_lazy_gfx900.yaml + ├── TensileLibrary_lazy_gfx906.yaml + ├... + ├── TensileLibrary_Type_..._fallback_gfx1030.hsaco + ├── TensileLibrary_Type_..._fallback_gfx900.hsaco + ├── TensileLibrary_Type_..._fallback_gfx906.hsaco + ├── TensileLibrary_Type_..._fallback.yaml # [B] + ├── TensileLibrary_Type_..._gfx900.co + ├── TensileLibrary_Type_..._gfx900.hsaco + ├── TensileLibrary_Type_..._gfx900.yaml # [C] + ├── TensileLibrary_Type_..._gfx906.co + ├── TensileLibrary_Type_..._gfx906.yaml # [D] + +Line **[A]** shows the parent catalog for gfx1030, the first of the three parent catalogs generated. +Line **[B]** shows a fallback child catalog, which reference each of the archiecture specific fallback kernels +in the associated .hsaco files. +This means that at least some of the parameter/problem type combinations for *DD_Contraction_l_Alik_Bjlk_Cijk_Dijk* +haven't been explicitly tuned for these architectures. +Note that the matching .hsaco files (above **[B]**) are code object libraries for HIP source kernels. +These files are referenced by the fallback catalog. +Line **[C]** shows a child catalog for gfx900 that references both HIP source and assembly source kernels, found in the associated .hsaco and .co files, respectively. +Line **[D]** shows a child catalog for gfx906, similar to the gfx900 catalog. However, notice that there is only one associated +.co file. This means that there are only assembly source kernels in this catalog. + +**Example: Parent solution selection catalog** + +.. code-block:: yaml + :caption: build/library/TensileLibrary_lazy_gfx900.yaml + + library: + rows: # [A_] + - library: + map: + Contraction_l_Alik_Bjlk_Cijk_Dijk: # [B_] + ... + rows: # [C_] + - library: {type: Placeholder, value: TensileLibrary_Type_SS_..._fallback} + predicate: + type: And + value: + - type: TypesEqual + value: [Float, Float, Float, Float] + - {type: HighPrecisionAccumulate, value: false} + - {type: F32XdlMathOp, value: Float} + - {type: StochasticRounding, value: false} + - ... + type: Problem + ... + Contraction_l_Alik_Bljk_Cijk_Dijk: + rows: + - ... + type: Problem # [_C] + property: {type: OperationIdentifier} + type: ProblemMap # [_B] + predicate: {type: TruePred} + type: Hardware # [_A] + solutions: [] + +Line **[A]** shows the top level of the parent catalog, which contains a single row for each hardware architecture. +Line **[B]** shows the problem map for the operation *Contraction_l_Alik_Bjlk_Cijk_Dijk*. +Line **[C]** shows the problem type and predicates used to match against exact solutions contained in the child catalogs. + +Mode 2: Merge files +------------------- + +.. warning:: + This feature is not recommended and is in the process of being deprecated. + +When ``--merge-files`` is enabled, one solution catalog is generated for each architecture, named + +.. centered:: TensileLibrary_.yaml + +The catalog contains information about supported GEMM types and +solution metadata that is used to locate the optimal kernel for a requested GEMM. This pattern +has the drawback that all code object libraries are loaded eagerly, +thereby increasing both the initialization time and memory footprint of the calling application. + +**Example** + +Say you're building libraries for gfx908 and gfx90a with ``--merge-files``. The build output directory would look like this + +.. code-block:: bash + + build/ + └── library/ + ├── Kernels.so-000-gfx1030.hsaco + ├── Kernels.so-000-gfx1030.hsaco + ├── Kernels.so-000-gfx1030.hsaco + ├── Kernels.so-000-gfx900.hsaco + ├── Kernels.so-000-gfx906.hsaco + ├── TensileLibrary_gfx1030.co + ├── TensileLibrary_gfx1030.yaml + ├── TensileLibrary_gfx900.co + ├── TensileLibrary_gfx900.yaml + ├── TensileLibrary_gfx906.co + └── TensileLibrary_gfx906.yaml + +-------------------- + +.. [1] Previously these files were called *master solution libraries* because they contain two top level keys, "solutions" and "library". The term *solution selection catalog* was later adopted to clarify the purpose of this file within the larger context of the Tensile C++ API. \ No newline at end of file diff --git a/docs/src/contributor-guide.rst b/docs/src/contributor-guide.rst deleted file mode 100644 index d701f5aa2..000000000 --- a/docs/src/contributor-guide.rst +++ /dev/null @@ -1,8 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation -.. highlight:: none - -.. _contributor-guide: - -.. include:: ../../CONTRIBUTING.rst diff --git a/docs/src/developer-guide.rst b/docs/src/developer-guide.rst deleted file mode 100644 index 3bb4b99d4..000000000 --- a/docs/src/developer-guide.rst +++ /dev/null @@ -1,255 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation -.. highlight:: none - -.. _developer-guide: - -******************************************************************** -Developer Guide -******************************************************************** - -.. _development-environment: - -======================= -Development environment -======================= - -ROCm is a base requirement for contributing to Tensile. To begin, ensure that ROCm is supported on your platform by reviewing the installation details on the `ROCm documentation `_ site. - -.. note:: - Environment setup steps are provided for Ubuntu/Debian platforms. For other operating systems, use the appropriate package manager, or your preferred installation method. - - --------------------- -Developing in Docker --------------------- - -ROCm development images are available on `Docker Hub `_ for a variety of OS/ROCm versions. See `Docker images in the ROCm ecosystem `_ for more details. - - - ------------------------------- -Setting up Python dependencies ------------------------------- - -1. Install OS dependencies (requires elevated privileges), - - .. code-block:: - - apt-get install libyaml python3-yaml \ - libomp-dev libboost-program-options-dev libboost-filesystem-dev - -2. Install *one* of the following, depending on your preferred Tensile data format. If both are installed, *msgpack* is preferred, - - .. code-block:: - - apt-get install libmsgpack-dev # If using the msgpack backend - # OR - apt-get install libtinfo-dev # If using the YAML backend - -3. Setup a virtual environment - - .. code-block:: - - python3 -m venv .venv - source .venv/bin/activate - -4. Install Python dependencies - - .. code-block:: - - pip3 install -r requirements.txt - -5. Confirm your dependencies match the following listing with ``pip3 freeze``, - - .. literalinclude:: ../../requirements.txt - :caption: **requirements.txt**—Direct and transitive Python dependencies - :lines: 7- - -You can now run Tensile's Python applications—see `Tensile/bin `_. - - - ---------------------------- -Setting up C++ dependencies ---------------------------- - -1. Install ROCm for your platform (`Linux `_ or `Windows `_). - - After the installation is complete, binaries and libraries can be found at */opt/rocm*. ROCm comes packaged with compilers such as **amdclang++**, and other useful tools including **rocminfo** and **rocprofv2**. - - .. tip:: - - If using Bash, we recommend setting ``PATH=/opt/rocm/bin/:$PATH`` in your *~/.bashrc* and refreshing your shell, e.g., ``source ~/.bashrc``. Alternatively, export the path only for your current shell session with ``export PATH=/opt/rocm/bin/:$PATH``. - -2. Install build tools. Additional installation methods for the latest versions for CMake can be found `here `_. - - .. code-block:: - - apt-get install build-essential cmake - -3. Verify the versions of installed tools against the following table, - - .. table:: C++ build dependencies - :widths: grid - - ========== ======= - Dependency Version - ========== ======= - amdclang++ 17.0+ - Make 4.2+ - CMake 3.16+ - ========== ======= - -You can now run Tensile's `Host library tests`_. - -======= -Testing -======= - -Tensile uses `pytest `_ to manage library/kernel tests. In particular, the project makes use of `pytest markers `_ to filter which tests are run. Important markers include *pre_checkin*, *extended*, *integration*, and *unit*---refer to `pytest.ini `_ for all supported markers. - -In general, a test can be run via the tox **ci** environment by passing the desired test marker with ``-m ``, - -.. code-block:: - - tox run -e ci -- -m {pre_checkin|extended|integration|unit} - -Note that ``--`` is used to pass options to the underlying pytest command. - -.. note:: - - By default ``tox run -e ci`` will run pre-checkin tests. - -------------------------------- -Unit tests and coverage reports -------------------------------- - -Unit tests include all tests located under *Tensile/Tests/unit/*. A convenience command is included that adds coverage reporting, - -.. code-block:: - - tox run -e unittest - # OR for 32 processes - tox run -e unittest -- -n 32 - -By default, coverage results will be dumped to the terminal. To generate reports in other formats (e.g. HTML) use, - -.. code-block:: - - tox run -e unittest -- --cov-report=html - -Files and directories excluded from coverage reporting are itemized in `.coveragerc `_. - -Although it is encouraged to run unit tests through tox to support consistency, they may also be run directly with pytest for quicker feedback, for example, to debug a run a single test named *test_foo*, the following command may be useful - -.. code-block:: - :caption: From *Tensile/Tests/* - - pytest unit/test_TensileCreateLibrary.py -k "test_foo" --capture=no -v - - ------------------- -Host library tests ------------------- - -Host library tests ensure that generated libraries remain operational when being called from client code, e.g., other libraries or applications. These tests are built on `gtest `_; to run them you must first download the submodule. From Tensile's project root run, - -.. code-block:: - - git submodule update --init - -Next, you can configure and build the host library tests through tox, - -.. code-block:: - - tox run -e hostlibtest - -.. note:: - Note that this tox command wraps `invoke `_, a tool to manage CLI-invokable tasks. Since tox is, fundamentally, a Python environment manager and test runner, any reusable shell commands that fall outside its purview are managed by invoke (which are then sometimes encapsulated by tox). See `tasks.py `_ for more details. - -You also can configure, build, and run host library tests directly with `invoke `_, - -.. code-block:: - - invoke hostlibtest --configure --build --run - -An executable *TensileTests* will be generate upon build, which can be used to run the tests. - -If you wish to build and run the tests manually, checkout the commands in `tasks.py `_. For advanced usage, like filtering or repeating test cases, see the `gtest documentation `_. - - -=============== -Static analysis -=============== - ------- -Python ------- - -Use the top-level tox label **static** to run all static analysis, **this may reformat your code**, so be sure to commit your changes after running the command, - -.. code-block:: - - tox run -m static - - -**Linting** is evaluated with `flake8 `_, and **formatting** is conducted with `black `_ and `isort `_. To run a check in isolation refer to `tox.ini `_, or use one the following commands, - -.. code-block:: - - tox run -e lint - tox run -e format # add `-- --check` to check formatting without applying changes - tox run -e isort # add `-- --check` to check imports without applying changes - - -.. tip:: - - To ensure consistent formatting, we recommend setting up your editor to **format on save** using the same formatter settings as in `tox.ini `_. Either way, ensuring you commit changes after running static analysis will reduce wait-times caused by simple CI failures. - ---- -C++ ---- - -**Formatting** is conducted with `clang-format `_. For example, the following command will format all provided files, however, we recommend that you setup your editor to format on save. - -.. code-block:: - - clang-format -i style=file - -Styling rules are configured in `.clang-format `_. - - -========= -Profiling -========= - ------- -Python ------- - -Profiling is enabled through the ``@profile`` decorator, and can be imported from the **Tensile.Utilities.Profile** module. Under the hood, the decorator wraps the function in a `cProfile `_ context, and generates a .prof file inside the *profiling-results-* directory. - -.. note:: - Due to a current limitation with the profiling decorator, nested profiling is not supported, that is, if `func1` calls `func2` in a loop, and both are marked for profiling, the resulting .prof file for `func1` will display incorrect results. - -============= -Documentation -============= - -Tensile uses https://github.com/ROCm/rocm-docs-core as the documentation engine, which itself wraps Read the Docs and Sphinx. - -You can build the documentation locally with, - -.. code-block:: - - tox run -e docs - -After the documentation is built, the generated HTML files can be found at *docs/_build/html*. - -========== -Versioning -========== - -Tensile follows semantic versioning practices, e.g., **major.minor.patch**. See `server.org `_ for more details. diff --git a/docs/src/general-reference.rst b/docs/src/general-reference.rst deleted file mode 100644 index 64b340f43..000000000 --- a/docs/src/general-reference.rst +++ /dev/null @@ -1,11 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _general-reference: - -******************************************************************** -General Reference -******************************************************************** - - * :ref:`environment-variables` diff --git a/docs/src/general-reference/environment-variables.rst b/docs/src/general-reference/environment-variables.rst deleted file mode 100644 index b292551bc..000000000 --- a/docs/src/general-reference/environment-variables.rst +++ /dev/null @@ -1,23 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _environment-variables: - -******************************************************************** -Environment Variables -******************************************************************** - - -^^^^^^^^^^^^^^^ -TENSILE_PROFILE -^^^^^^^^^^^^^^^ - -Enables profiling when set to "ON", "TRUE", or "1". When enabled, all functions decorated with ``@profile`` will be profiled and results will be generated as .prof files. - -*Example* - -.. code-block:: - :name: TENSILE_PROFILE Example - - TENSILE_PROFILE=ON Tensile/bin/Tensile ... diff --git a/docs/src/getting-started.rst b/docs/src/getting-started.rst deleted file mode 100644 index 1bd2dcfb9..000000000 --- a/docs/src/getting-started.rst +++ /dev/null @@ -1,108 +0,0 @@ -.. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation - -.. _getting-started: - -******************************************************************** -Getting Started -******************************************************************** - -Tensile is a tool for creating a benchmark-driven backend library for GEMMs [#gemm]_, GEMM-like problems, *N*-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on AMD GPUs. - -Project Overview -================ - -.. code-block:: - - Tensile/ - ├── Tensile/ Source code, tests, and utilities for the Tensile project - │ └── Tests/ Kernels and application tests - ├── HostLibraryTests/ Tests for host-side code running the Tensile library - ├── docker/ A collection of useful Dockerfiles - ├── docs/ Documentation source files - ├── requirements.txt Python dependencies for running Tensile applications - ├── pytest.ini Configuration settings for pytest - ├── tox.ini Configuration settings for the Tox environment management tool - └── setup.py Package build and installation script - - -Environment requirements -======================== - -Before working with Tensile, ensure the following dependencies are set up in your enviornment. - -Basic dependencies -"""""""""""""""""" - -1. **Python 3.8+**: Verify your Python version with ``python --version`` -2. **CMake 3.13+**: Verify your CMake version with ``cmake --version`` - -Library dependencies -"""""""""""""""""""" - -We provide instructions for Ubuntu, for other operating systems use the appropriate package manager. - -.. code-block:: bash - - sudo apt install -y \ - python3-yaml \ - libomp-dev \ - libboost-program-options-dev \ - libboost-filesystem-dev - -Additionally, install *one* of the following, depending on your preferred Tensile backend. - -.. code-block:: bash - - sudo apt install libmsgpack-dev # If using the msgpack backend - # OR - sudo apt install libtinfo-dev # If using the YAML backend - -.. note: Tensile uses the LLVM ObjectYAML library for YAML parsing. The LLVM library is bundled with your ROCm installation, but it requires libtinfo to be installed. Alternatively, if you already have LLVM version 6.0 or newer installed, Tensile's build process will find it and libtinfo is not needed. - -Python dependencies -""""""""""""""""""" - -.. code-block:: bash - - pip3 install joblib # Tools for pipelining and concurrency - -Installation -============ - -.. code-block:: bash - - git clone -b master https://github.com/ROCm/Tensile.git - cd Tensile - -.. _quick-start-example: - -Quick start example -=================== - -.. important: Ensure you have followed the steps in the **Environment Requirements** and **Installation** sections. - -To run a benchmark, you need to pass a tuning config to the ``Tensile`` program located in *Tensile/bin*. - -A sample tuning file has been prepared for this quick start example, it can be found in *Tensile/Configs/rocblas_sgemm_example.yaml*. Note the line at the bottom of this file ``ArchitectureName: "gfx1030"``, this line identifies the target architecture for which the benchmark will generate a library. Verify the architecture of your device by running ``rocminfo | grep gfx``. If you are running on a different architecture, for example, gfx90a, update the line to ``ArchitectureName: "gfx90a"``. - -You are now ready to run benchmarks using Tensile! From the top-level directory, - -.. code-block:: bash - - mkdir build && cd build - ../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./ - -After the benchmark completes, Tensile will create the following directories: - -- *0_Build* contains a client executable; use this to launch Tensile from a library viewpoint. -- *1_BenchmarkProblems* contains all the problem descriptions and executables generated during benchmarking; use the ``run.sh`` script to reproduce results. -- *2_BenchmarkData* contains the raw performance results for all kernels in CSV and YAML formats. -- *3_LibraryLogic* contains the winning (optimal) kernel configurations in YAML format. Typically, rocBLAS takes the YAML files from this folder. -- *4_LibraryClient* contains the code objects, kernels, and library code. This is the output of running ``TensileCreateLibrary`` using the *3_LibraryLogic* directory as an input - - -.. rubric:: Footnotes - -.. [#gemm] GEMM: General Matrix-Matrix Multiplication diff --git a/docs/src/how-to/contribution-guidelines.rst b/docs/src/how-to/contribution-guidelines.rst new file mode 100644 index 000000000..fe5fd07a0 --- /dev/null +++ b/docs/src/how-to/contribution-guidelines.rst @@ -0,0 +1,8 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: contributing to Tensile, GEMM, Tensor, Tensile contribution guidelines +.. highlight:: none + +.. _contribution-guidelines: + +.. include:: ../../../CONTRIBUTING.rst diff --git a/docs/src/how-to/programmers-guide.rst b/docs/src/how-to/programmers-guide.rst new file mode 100644 index 000000000..eda34355e --- /dev/null +++ b/docs/src/how-to/programmers-guide.rst @@ -0,0 +1,200 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile developers guide, Tensile contributors guide, Tensile programmers guide, GEMM, Tensor +.. highlight:: none + +.. _programmers-guide: + +******************************************************************** +Programmer's guide +******************************************************************** + +This topic provides necessary information for programmers interested in contributing to the Tensile source code. + +.. _development-environment: + +======================= +Development environment +======================= + +ROCm is the base requirement for contributing to Tensile. See if ROCm is supported on your platform by verifying the `supported operating systems `_ list. +Then, follow the steps given in the :ref:`installation` guide. + +------------------------- +Developing in Docker +------------------------- + +ROCm development images are available on `Docker Hub `_ for a variety of OS/ROCm versions. See `Docker images in the ROCm ecosystem `_ for more details. + +================== +Project structure +================== + +Here is the project directory structure to help you find the project files available for contribution. + +.. code-block:: + + Tensile/ + ├── Tensile/ Source code, tests, and utilities for the Tensile project + │ └── Tests/ Kernels and application tests + ├── HostLibraryTests/ Tests for host-side code running the Tensile library + ├── docker/ A collection of useful Dockerfiles + ├── docs/ Documentation source files + ├── requirements.txt Python dependencies for running Tensile applications + ├── pytest.ini Configuration settings for pytest + ├── tox.ini Configuration settings for the Tox environment management tool + └── setup.py Package build and installation script + +======= +Testing +======= + +Tensile uses `pytest `_ to manage library or kernel tests. The Tensile project utilizes `pytest markers `_ to filter the tests to be run. Important markers include ``pre_checkin``, ``extended``, ``integration``, and ``unit``. Refer to `pytest.ini `_ for all supported markers. + +You can run a test via the ``tox ci`` environment by passing the desired test marker using ``-m ``: + +.. code-block:: + + tox run -e ci -- -m {pre_checkin|extended|integration|unit} + +Note that ``--`` is used to pass options to the underlying pytest command. + +.. note:: + + By default, the ``tox run`` command runs pre-checkin tests, when no markers are specified via ``-m``. + +------------------------------- +Unit tests and coverage reports +------------------------------- + +All unit tests are available in ``Tensile/Tests/unit/``. A convenience command is included to add coverage reporting: + +.. code-block:: + + tox run -e unittest + + # OR for 32 processes + + tox run -e unittest -- -n 32 + +By default, coverage results are dumped to the terminal. To generate reports in other formats such as HTML, use: + +.. code-block:: + + tox run -e unittest -- --cov-report=html + +Files and directories excluded from coverage reporting are itemized in `.coveragerc `_. + +Although, we encourage to run unit tests using ``tox`` for consistency, you can also run the tests directly using ``pytest`` for quicker feedback. For example, To run a single test named ``test_foo``, use: + +.. code-block:: + + pytest unit/test_TensileCreateLibrary.py -k "test_foo" --capture=no -v + +------------------ +Host library tests +------------------ + +Host library tests ensure that the generated libraries remain operational when called from the client code such as other libraries or applications. +These tests are built on `gtest `_. To run them, download the submodule first. Then, from Tensile project's root, run: + +.. code-block:: + + git submodule update --init + +Next, you can configure, build, and run the host library tests using any of the following: + +- ``tox``: + + .. code-block:: + + tox run -e hostlibtest + + .. note:: + + Note that the ``tox`` command wraps `invoke `_, a tool to manage CLI-invokable tasks. Since tox is fundamentally a Python environment manager and test runner, any reusable shell commands that fall outside its purview are managed by invoke (which are again encapsulated by tox sometimes). See `tasks.py `_ for details. + +- ``invoke``: + + .. code-block:: + + invoke hostlibtest --configure --build --run + + Running the preceding command generates an executable ``TensileTests``, which can be further used to run the tests. + +- Manually: To build and run the tests manually, see the commands in `tasks.py `_. + For advanced usage like filtering or repeating test cases, see the `gtest documentation `_. + +=============== +Static analysis +=============== + +------ +Python +------ + +To run all static analysis, use the top-level ``tox`` label ``static``: + +.. code-block:: + + tox run -m static + +.. note:: + The preceding command might reformat your code, so make sure to commit your changes after running the command. + +**Linting** is evaluated using `flake8 `_ and **formatting** is conducted using `black `_ and `isort `_. To run a check in isolation, either refer to `tox.ini `_ or use one the following commands: + +.. code-block:: + + tox run -e lint + tox run -e format # add `-- --check` to check formatting without applying changes + tox run -e isort # add `-- --check` to check imports without applying changes + + +.. tip:: + + To ensure consistent formatting, we recommend you to set up the editor to **format on save** using the same formatter settings as in `tox.ini `_. Either way, ensuring to commit changes after running static analysis reduces wait times caused by simple CI failures. + +--- +C++ +--- + +**Formatting** is conducted using `clang-format `_. +The following command formats all given files, however, we recommend you to setup the editor to *format on save*. + +.. code-block:: + + clang-format -i style=file + +Styling rules are configured in `.clang-format `_. + +========= +Profiling +========= + +------ +Python +------ + +To enable profiling, use the ``@profile`` decorator, which must be imported from the ``Tensile.Utilities.Profile`` module. Under the hood, the decorator wraps the function in a `cProfile `_ context and generates a ``.prof`` file inside the ``profiling-results-`` directory. + +.. note:: + Nested profiling is NOT supported due to the existing limitation with the profiling decorator. This implies that if `func1` calls `func2` in a loop, and both are marked for profiling, the resulting ``.prof`` file for `func1` will display incorrect results. + +======================== +Building documentation +======================== + +To build the documentation locally, use: + +.. code-block:: + + tox run -e docs + +After the documentation is built, the HTML files are generated in ``docs/_build/html``. + +===================== +Versioning practices +===================== + +Tensile follows semantic versioning practices such as **major.minor.patch**. See `server.org `_ for details. diff --git a/docs/src/index.rst b/docs/src/index.rst index 396912775..b78cde0d8 100644 --- a/docs/src/index.rst +++ b/docs/src/index.rst @@ -1,6 +1,6 @@ .. meta:: - :description: Tensile documentation and API reference - :keywords: Tensile, GEMM, Tensor, ROCm, API, Documentation + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile documentation, GEMM, Tensor, Tensile API .. _index: @@ -8,33 +8,35 @@ Tensile documentation ******************************************************************** -Tensile is a tool for creating a benchmark-driven backend library for GEMMs, GEMM-like problems (such as batched GEMM), N-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on AMD GPU. +Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU. -The code is open source and hosted at: https://github.com/ROCm/Tensile +The code is open source and hosted at https://github.com/ROCm/Tensile .. grid:: 2 :gutter: 2 - .. grid-item-card:: Getting Started + .. grid-item-card:: Install - * :ref:`getting-started` + * :ref:`Installation ` - .. grid-item-card:: Concepts + .. grid-item-card:: Conceptual - * :ref:`benchmarking` - * :ref:`tuning` - * :ref:`kernels` - * :ref:`library-creation` + * :ref:`introduction` + * :ref:`solution-selection-catalogs` .. grid-item-card:: Reference - * :ref:`cli-reference` - * :ref:`api-reference` + * :ref:`Environment variables ` + * :ref:`API reference ` + * :ref:`CLI reference ` - .. grid-item-card:: Guides - - * :ref:`developer-guide` - * :ref:`contributor-guide` + .. grid-item-card:: Contribution + * :ref:`Programmer's guide ` + * :ref:`Contribution guidelines ` +To contribute to the documentation, refer to +`Contributing to ROCm `_. +You can find licensing information on the +`Licensing `_ page. diff --git a/docs/src/install/installation.rst b/docs/src/install/installation.rst new file mode 100644 index 000000000..e7e7562c8 --- /dev/null +++ b/docs/src/install/installation.rst @@ -0,0 +1,109 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile installation, GEMM, Tensor, Build Tensile, Run benchmarks + +.. _installation: + +******************************************************************** +Installation +******************************************************************** + +Install ROCm +============ + +To begin, install ROCm for your platform. For installation instructions, refer to the `Linux `_ or `Windows `_ installation guide. + +.. tip:: + + If using Bash, we recommend you to set ``PATH=/opt/rocm/bin/:$PATH`` in your ``~/.bashrc`` and refresh your shell using ``source ~/.bashrc``. + Alternatively, export the path for your current shell session only, using ``export PATH=/opt/rocm/bin/:$PATH``. + +Install OS dependencies +========================= + + +.. note:: + The steps below are for Ubuntu. For other distributions, use the appropriate package manager. + +1. Install dependencies: + + .. code-block:: + + apt-get install libyaml python3-yaml \ + libomp-dev libboost-program-options-dev libboost-filesystem-dev + +2. Install one of the following, depending on your preferred Tensile data format. If both are installed, ``msgpack`` is preferred: + + .. code-block:: + + apt-get install libmsgpack-dev # If using the msgpack backend + + # OR + + apt-get install libtinfo-dev # If using the YAML backend + +3. Install build tools. For additional installation methods for the latest versions of CMake, see the `CMake installation `_ page. + + .. code-block:: + + apt-get install build-essential cmake + +Install Tensile from source +============================ + +To install Tensile from source, it is recommended to create a virtual environment first: + +.. code-block:: bash + + python3 -m venv .venv + source .venv/bin/activate + +Then, you can install Tensile using pip or git. + +Option 1: Install with pip +--------------------------- + +.. code-block:: bash + + pip3 install git+https://github.com/ROCmSoftwarePlatform/Tensile.git@develop + + +Option 2: Install with git +---------------------------- + +.. code-block:: bash + + git clone git@github.com:ROCm/Tensile.git && cd Tensile + pip3 install . + +You can now run Tensile's Python applications. + +Running benchmark +=================== + +To run a benchmark, pass a tuning config to the ``Tensile`` program located in ``Tensile/bin``. + +For demonstration purposes, we use the sample tuning file available in ``Tensile/Configs/rocblas_sgemm_example.yaml``. +The sample tuning file allows you to specify the target architecture for which the benchmark will generate a library. +To find your device architecture, run: + +.. code-block:: bash + + rocminfo | grep gfx + +Specify the device architecture in the sample tuning file using ``ArchitectureName:``. Based on the device architecture, use ``ArchitectureName: "gfx90a"`` or ``ArchitectureName: "gfx1030"``. + +You can now run benchmarks using Tensile. From the top-level directory, run: + +.. code-block:: bash + + mkdir build && cd build + ../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./ + +After the benchmark completes, Tensile creates the following directories: + +- **0_Build**: Contains a client executable. Use this to launch Tensile from a library viewpoint. +- **1_BenchmarkProblems**: Contains all the problem descriptions and executables generated during benchmarking. Use the ``run.sh`` script to reproduce results. +- **2_BenchmarkData**: Contains the raw performance results of all kernels in CSV and YAML formats. +- **3_LibraryLogic**: Contains the winning (optimal) kernel configurations in YAML format. Typically, rocBLAS takes the YAML files from this folder. +- **4_LibraryClient**: Contains the code objects, kernels, and library code. This is the output of running ``TensileCreateLibrary`` using the ``3_LibraryLogic`` directory as an input. diff --git a/docs/src/reference/environment-variables.rst b/docs/src/reference/environment-variables.rst new file mode 100644 index 000000000..67e4dc50e --- /dev/null +++ b/docs/src/reference/environment-variables.rst @@ -0,0 +1,69 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile environment variables, GEMM, Tensor + +.. _environment-variables: + +******************************************************************** +Environment variables +******************************************************************** + +This topic lists the environment variables that enable testing, debugging, and experimental features for Tensile clients and applications. + +.. list-table:: Environment variables + :header-rows: 1 + :widths: 30 30 30 + + * - Environment variable + - Description + - Values + + * - TENSILE_DB + - Enables debugging features based on the supplied value. + TENSILE_DB is a bit field, so options can be set individually or combined. To enable all debug output, set TENSILE_DB=0xFFFF. + - | 0x2 or 0x4 \- Prints extra information about the solution selection process. Indicates if a kernel was an exact match, or if a sequence of kernels is considered for a closest match. + | 0x8 \- Prints extra information about the hardware selection process. + | 0x10 \- Prints debug-level information about predicate evaluations. + | 0x20 \- Prints a list of loaded or missing code object libraries. + | 0x40 \- Prints kernel launch arguments, including the kernel name, work group size and count, and all arguments passed. + | 0x80 \- Prints size of allocated tensors. + | 0x100 \- Prints debug information about convolution reference calculations. + | 0x200 \- Prints more detailed information about convolution reference calculations. + | 0x1000 \- Prints information about the loading of embedded, YAML, or MessagePack libraries. + | 0x4000 \- Prints solution lookup efficiency. + | 0x8000 \- Prints the name of selected kernels. + | 0x80000 \- Prints the name of selected kernels and number of common kernel parameters such as Matrix Instruction, MacroTile, ThreadTile, DepthU, and so on. + + * - TENSILE_DB2 + - Enables extended debugging features based on the supplied value. When enabled, Tensile skips launching kernels for debug purposes, but continues to perform other steps such as kernel selection, + data allocation, and initialization. + - | 1 \- Enable + | 2 \- Disable + + * - TENSILE_NAIVE_SEARCH + - Performs a naive search for matching kernels instead of the standard optimized search. + - | 1 \- Enable + | 2 \- Disable + + * - TENSILE_TAM_SELECTION_ENABLE + - Enables tile aware solution selection. + - | 1 \- Enable + | 2 \- Disable + + * - TENSILE_SOLUTION_INDEX + - Prints the index of the selected solution. + - | 1 \- Enable + | 2 \- Disable + + * - TENSILE_METRIC + - Overrides the default distance matrix for solution selection with the supplied value. + - | "Euclidean" + | "JSD" + | "Manhattan" + | "Ratio" + | "Random" + + * - TENSILE_PROFILE + - When enabled, all functions decorated with ``@profile`` are profiled and results are generated as ``.prof`` files. + - | 1, "ON", "TRUE" \- Enable + | Any other value \- Disable diff --git a/docs/src/reference/nomenclature.rst b/docs/src/reference/nomenclature.rst new file mode 100644 index 000000000..4f34fd0f3 --- /dev/null +++ b/docs/src/reference/nomenclature.rst @@ -0,0 +1,93 @@ +.. meta:: + :description: Tensile is a tool for creating a benchmark-driven backend library for GEMM + :keywords: Tensile kernel selection, Tensile solution selection, GEMM, Tensor, ROCm + +.. _nomenclature: + +************ +Nomenclature +************ + +General Matrix Multiplication +============================= + +General matrix multiplication (GEMM) is a level 3 BLAS operation that computes the product of two matrices, formalized by the equation, + +.. math:: + C = \alpha A B + \beta C + +where :math:`\alpha` and :math:`\beta` are scalars and :math:`A` and :math:`B` are optionally transposed input matrices. + +.. list-table:: GEMM data types. + :header-rows: 1 + :widths: 30, 50, 20 + + * - Abbreviation + - Description + - Precision + * - HGEMM + - Half precision general matrix multiplication + - 16-bit + * - SGEMM + - Single precision general matrix multiplication + - 32-bit + * - DGEMM + - Double precision general matrix multiplication + - 64-bit + * - CGEMM + - Single precision complex general matrix multiplication + - 32-bit + * - ZGEMM + - Double precision complex general matrix multiplication + - 64-bit + +.. list-table:: GEMM operations; N (non-transpose) and T (transpose) represent the transpose state of the input matrices. + :header-rows: 1 + :widths: 30, 70 + + * - Operation + - Equation + * - NN + - :math:`C_{i,j} = \sum_l A_{i,l} B_{l,j}` + * - NT + - :math:`C_{i,j} = \sum_l A_{i,l} B_{j,l}` + * - TN + - :math:`C_{i,j} = \sum_l A_{l,i} B_{l,j}` + * - TT + - :math:`C_{i,j} = \sum_l A_{l,i} B_{j,l}` + * - Batched-GEMM + - :math:`C_{i,j,k} = \sum_l A_{i,l,k} B_{l,j,k}` + * - 2D Summation + - :math:`C_{i,j} = \sum_{k,l} A_{i,k,l} B_{j,l,k}` + * - 3 Batched Indices + - :math:`C_{i,j,k,l,m} = \sum_n A_{i,k,m,l,n} B_{j,k,l,n,m}` + * - 4 Free Indices + - :math:`C_{i,j,k,l,m} = \sum_{n,o} A_{i,k,m,o,n} B_{j,m,l,n,o}` + + +Indices +======= + +The indices describe the dimensionality of the problem to be solved. A GEMM operation takes two 2-dimensional matrices as input, +adds up to four input dimensions and contracts them along one dimension. This cancels out two dimensions, leading to a 2-dimensional result. +When an index shows up in multiple tensors, those tensors must be the same size along with the dimension, however, they can have different strides. + +There are three categories of indices or dimensions used in the problems supported by Tensile: free, batch, and bound. +**Tensile only supports problems with at least one pair of free indices.** + +Free indices +------------ + +Free indices are the paired indices of tensor C with one pair in tensor A and another pair in tensor B. i,j,k, and l are the four free indices of tensor C where indices i and k are present in tensor A while indices j and l are present in tensor B. + +Batch indices +------------- + +Batch indices are the indices of tensor C that are present in both tensor A and tensor B. +The difference between the GEMM example and the batched-GEMM example is the additional index. +In the batched-GEMM example, the index k is the batch index, which batches together multiple independent GEMMs. + +Bound indices +------------- + +The bound indices are also known as summation indices. These indices are not present in tensor C but in the summation symbol (Sum[k]) and in tensors A and B. The inner products (pairwise multiply then sum) are performed along these indices. diff --git a/tox.ini b/tox.ini index c689118dc..73f5e8fbc 100644 --- a/tox.ini +++ b/tox.ini @@ -71,7 +71,7 @@ commands = skip_install = true change_dir = {toxinidir}/docs deps = -r docs/sphinx/requirements.txt -commands = python3 -m sphinx -T -b html -d _build/doctrees -D language=en . _build/html +commands = python3 -m sphinx -T -b html -d _build/doctrees -D language=en -v . _build/html [testenv:isort] description = "Sorts import statements for less merge conflicts" diff --git a/tuning/README.md b/tuning/README.md index 9cc084a9b..e42a24eec 100644 --- a/tuning/README.md +++ b/tuning/README.md @@ -97,7 +97,7 @@ args: Example use case ```bash -$ ./tuning/scripts/provision_tuning.sh -w tensile_tuning -z logs/inception_rocblas-configs_unique.log -r tensile_tuning/tensile/Tensilen-rocblas-configs_unique.log -o tf_inception.yaml -y sgemm -l vega20 +$ ./tuning/scripts/provision_tuning.sh WORKING_PATH LOG_PATH OUTPUT_SUFFIX.yaml LIBRARY [options] ``` When this is run the tuning will be provisioned in the directory ./tensile_tuing. The following directories will be generated.