Rocm jaxlib v0.4.30 qa nccl maxnchannels #75

hsharsha · 2024-11-28T17:47:14Z

No description provided.

@xla-rotation

Imported from GitHub PR openxla#15311 @xla-rotation Copybara import of the project: -- 2c4cee2 by Chao Chen <[email protected]>: unified memory for rocm Merging this change closes openxla#15311 COPYBARA_INTEGRATE_REVIEW=openxla#15311 from ROCm:ci_rocm_unify_mem 2c4cee2 PiperOrigin-RevId: 657168704

PR openxla#15311: [ROCm] GPU/CPU unified memory for rocm

…-copy Let the other stream wait for the main stream before issuing memcpy d2h

Main changes include: * Added support for fp8 matmul with output data type to be fp8 and bf16. * Added buffer comparators for fp8e4m3fnuz and fp8e5m2fnuz

…factoring, added verbose flag

Rocm jaxlib v0.4.30 qa autotuning

…anup

Rocm jaxlib v0.4.30 qa cleanup

Replace "Navi" with corresponding public product names

…on unit tests Imported from GitHub PR openxla#16938 This PR adds support for NANOO FP8 data format in the collaborative communication unit tests. - For the context on OCP FP8 and NANOO FP8, please refer to this comment: google/flax#3993 (comment) - The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats: openxla#10488 Copybara import of the project: -- 0fc74cc by Wen Chen <[email protected]>: [AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz. -- d247af5 by scxfjiang <[email protected]>: refactor tests for collective comm ops -- 6f8c418 by scxfjiang <[email protected]>: rafactor collective comm e2e tests -- 8ecb6ec by scxfjiang <[email protected]>: update: replace str -- 338d3af by scxfjiang <[email protected]>: get rid of macros Merging this change closes openxla#16938 COPYBARA_INTEGRATE_REVIEW=openxla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af PiperOrigin-RevId: 676615012

Add NANOO FP8 support for collaborative communication unit tests

[ROCm] Include clang-19 and clang-20 headers

* reset blas stream used by gemm_algorithm_picker * small refactoring * fixing clang format * fixing clang format * fixing clang format --------- Co-authored-by: Pavel Emeliyanenko <[email protected]>

…ocblas_get_version_string (#52)

…ble-triton Add multigpu script and disable triton tests

[ROCm] Added include of hipblas.h in hipblaslt_wrapper.h

buffer init fix and gpu_hlo_runner test

* PR openxla#14605: [ROCm] Switch on Triton feature for ROCm. Imported from GitHub PR openxla#14605 Last in series of commits to switch on Triton in XLA for ROCm. This is new version of: openxla#13003 Changes in third_party/triton/temporary/amd_pr7.patch are already merged on: triton-lang/triton#4238 Copybara import of the project: -- c2ce7e0 by Zoran Jovanovic <[email protected]>: [ROCm] Switch on Triton feature for ROCm. -- 563b303 by Zoran Jovanovic <[email protected]>: [ROCm] Fixed an issue with test cases from ir_emitter_triton_test.cc -- a4d2ad8 by Zoran Jovanovic <[email protected]>: [ROCm] Fixed an issue with gpu_compiler_test.cc -- a1b9260 by Zoran Jovanovic <[email protected]>: [ROCm] Applied comments from code review. -- c694a95 by Zoran Jovanovic <[email protected]>: [ROCm] Fixed failed tests because of openxla@19c11ba -- 7359619 by Zoran Jovanovic <[email protected]>: [ROCm] Fixed compilation issue with latest rebase. -- 82f58ce by Zoran Jovanovic <[email protected]>: [ROCm] Skip SplitLHSInputOutputIsFused test in ir_emitter_triton_test.cc untill issue is fixed. -- 57e776b by Zoran Jovanovic <[email protected]>: [ROCm] Triton related changes merged thus removed amd_pr7.patch -- 0d09d0e by Zoran Jovanovic <[email protected]>: [ROCm] Applied comments from code review. -- 7b11147 by Zoran Jovanovic <[email protected]>: [ROCm] Applied comments from code review. -- 9e7e0c7 by Zoran Jovanovic <[email protected]>: [ROCm] Modified TestNoAutotuner test case. Merging this change closes openxla#14605 COPYBARA_INTEGRATE_REVIEW=openxla#14605 from ROCm:rocm_triton_backend_8 9e7e0c7 PiperOrigin-RevId: 652449567 * Fixed test issues.

[ROCm] Fixed linker issues related to fp8 buffer_comparator functions

Passing amdgpu targets to crosstool wrapper which calls hipcc can restrict the kernels generated to specific set of supported amdgpu architectures.

Merge fixes to 31 QA

Rahul Batra and others added 30 commits July 11, 2024 20:06

[ROCm]: Fix LLVM path issue for ROCm 6.3

56e22fa

[ROCm]: Fix LLVM path for ROCm 6.2

dc89176

Merge pull request #33 from ROCm/rocm-jaxlib-v0.4.30-uni_mem

973f86b

PR openxla#15311: [ROCm] GPU/CPU unified memory for rocm

Let the other stream wait for the main stream before issuing memcpy d2h

b1ac447

Merge pull request #34 from ROCm/rocm-jaxlib-v0.4.30-qa-d2hmem-stream…

62b0e7b

…-copy Let the other stream wait for the main stream before issuing memcpy d2h

workspace fixing

49f81a7

[ROCM] Updated fp8 matmul with adjustments for updated hipBlasLt support

b61059e

Main changes include: * Added support for fp8 matmul with output data type to be fp8 and bf16. * Added buffer comparators for fp8e4m3fnuz and fp8e5m2fnuz

[ROCM] Addressed reviewer comment.

2e76267

[ROCM] Fix build after 21311f2

652fd38

[ROCM] Add basic scaffolding and enable MLIR fusion

e085f5d

Enable dot algorithms for AMD GPUs

b9622e2

added bias pointer workaround

d8e87a6

added precision settings for autotuner and buffer_comparator small re…

83f1366

…factoring, added verbose flag

added precision settings for autotuner and buffer_comparator small re…

a6b98de

…factoring, added verbose flag

small changes

63aeab4

adopted changes

3cb9699

Use deterministic ops flag in determinism test

aad6467

Fix MLIR tests specifically w.r.t number of threads and number of blocks

9560ccf

Disable FP8 rewrite pattern test on ROCm

f57f22a

Disable workspace setting

339dde0

Merge pull request #36 from ROCm/rocm-jaxlib-v0.4.30-qa-autotuning

8c73dfe

Rocm jaxlib v0.4.30 qa autotuning

Merge branch 'rocm-jaxlib-v0.4.30-qa' into rocm-jaxlib-v0.4.30-qa-cle…

8ae1de7

…anup

Merge pull request #35 from ROCm/rocm-jaxlib-v0.4.30-qa-cleanup

5945307

Rocm jaxlib v0.4.30 qa cleanup

scrub navi

7dc2933

Merge pull request #42 from ROCm/rocm-jaxlib-v0.4.30-qa_navi_scrub

ed82401

Replace "Navi" with corresponding public product names

Merge pull request #45 from ROCm/rocm-jaxlib-v0.4.30-qa_collective_fp8

a42d9cd

Add NANOO FP8 support for collaborative communication unit tests

[ROCm] Include clang-19 and clang-20 headers

48a9d97

Merge pull request #48 from ROCm/rocm-jaxlib-v0.4.30-qa-clang20

ed5b782

[ROCm] Include clang-19 and clang-20 headers

hsharsha and others added 16 commits October 2, 2024 17:45

Reset stream function in Gemm algorithm picker (#39)

7fd3ae6

* reset blas stream used by gemm_algorithm_picker * small refactoring * fixing clang format * fixing clang format * fixing clang format --------- Co-authored-by: Pavel Emeliyanenko <[email protected]>

[ROCm] Fixed linker issues with rocblas_get_version_string_size and r…

f3e91a6

…ocblas_get_version_string (#52)

Add multigpu script and disable triton tests

4ea5b6f

Merge pull request #53 from ROCm/rocm-jaxlib-v0.4.30-qa-multigpu-disa…

c718ef3

…ble-triton Add multigpu script and disable triton tests

[ROCm] Added include of hipblas.h in hipblaslt_wrapper.h

e8b1ff4

Merge pull request #55 from ROCm/rocm-jaxlib-v0.4.30-qa-hipblasfix

e2dde69

[ROCm] Added include of hipblas.h in hipblaslt_wrapper.h

buffer init fix and gpu_hlo_runner test

e2b918d

Merge pull request #59 from ROCm/r0.4.30_buffer_init_and_hlo_runner

bf81e49

buffer init fix and gpu_hlo_runner test

[ROCm] Fixed linker issues related to fp8 buffer_comparator functions

4951842

Merge pull request #66 from ROCm/rocm-jaxlib-v0.4.30-qa-SWDEV-476829-2

49a7651

[ROCm] Fixed linker issues related to fp8 buffer_comparator functions

[ROCm] Pass AMDGPU_TARGETS to crosstool wrapper

d9660ac

Passing amdgpu targets to crosstool wrapper which calls hipcc can restrict the kernels generated to specific set of supported amdgpu architectures.

[Rocm] fix arch

08d8691

Merge pull request #63 from ROCm/ci_clang_31_2

430d8c3

Merge fixes to 31 QA

Skip gpu_hlo_runner_test if input is not provided (#71)

1d70f15

Add NCCL_MAX_NCHANNELS env variable to multi gpu tests

ee307e3

github-actions bot added the kokoro:force-run label Nov 28, 2024

hsharsha closed this Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rocm jaxlib v0.4.30 qa nccl maxnchannels #75

Rocm jaxlib v0.4.30 qa nccl maxnchannels #75

hsharsha commented Nov 28, 2024

Rocm jaxlib v0.4.30 qa nccl maxnchannels #75

Rocm jaxlib v0.4.30 qa nccl maxnchannels #75

Conversation

hsharsha commented Nov 28, 2024