Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] ROCm packages #197885

Open
25 of 34 tasks
Madouura opened this issue Oct 26, 2022 · 90 comments
Open
25 of 34 tasks

[Tracking] ROCm packages #197885

Madouura opened this issue Oct 26, 2022 · 90 comments
Assignees
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: hardware 6.topic: rocm

Comments

@Madouura
Copy link
Contributor

Madouura commented Oct 26, 2022

Tracking issue for ROCm derivations.

moar packages

Key

  • Package
    • Dependencies

WIP

Ready

TODO

Merged

ROCm-related

Notes

  • Update command: nix-shell maintainers/scripts/update.nix --argstr commit true --argstr keep-going true --arg predicate '(path: pkg: builtins.elem (pkg.pname or null) [ "rocm-llvm-llvm" "rocm-core" "rocm-cmake" "rocm-thunk" "rocm-smi" "rocm-device-libs" "rocm-runtime" "rocm-comgr" "rocminfo" "clang-ocl" "rdc" "rocm-docs-core" "hip-common" "hipcc" "clr" "hipify" "rocprofiler" "roctracer" "rocgdb" "rocdbgapi" "rocr-debug-agent" "rocprim" "rocsparse" "rocthrust" "rocrand" "rocfft" "rccl" "hipcub" "hipsparse" "hipfort" "hipfft" "tensile" "rocblas" "rocsolver" "rocwmma" "rocalution" "rocmlir" "hipsolver" "hipblas" "miopengemm" "composable_kernel" "half" "miopen" "migraphx" "rpp-hip" "mivisionx-hip" "hsa-amd-aqlprofile-bin" ])'

Won't implement

  • ROCmValidationSuite
    • Too many assumptions, not going to rewrite half the cmake files
  • rocm_bandwidth_test
    • Not really needed, will implement on request
  • atmi
    • Out-of-date
  • aomp
    • We basically already do this
  • Implement strictDeps for all derivations
    • Seems pointless now and I don't see many other derivations doing this
@Madouura
Copy link
Contributor Author

Madouura commented Oct 30, 2022

Updating to 5.3.1, marking all WIP until pushed to their respective PRs and verified.

@Madouura
Copy link
Contributor Author

Madouura commented Oct 30, 2022

If anyone is interested in helping me debug rocBLAS, here's the current derivation
Already fixed.

@Flakebi
Copy link
Member

Flakebi commented Oct 31, 2022

Hi, thanks a lot for your work on ROCm packages!

So far, the updates where all aggregated in a single rocm: 5.a.b -> 5.x.y pr. I think that makes more sense than splitting the package updates into single prs for a couple of reasons:

  • Often, packages have backward- (and forward-) incompatible changes, i.e. a the 5.3.0 version of rocm-runtime only works with 5.3.0 of rocm-comgr, but not with 5.2.0 or 5.4.0 (made up example).
  • Nobody tests a mixture of versions, i.e. only all packages at the same version are known to work.
  • If I want to test hip, OpenCL and other things for an update, it’s easier to do it one time (and compile everything a single time), rather than 10 times.

tl;dr, do you mind merging all your 5.3.1 updates into a single PR?

PS: Not sure how you did the update, I usually do it with for f in rocm-smi rocm-cmake rocm-thunk rocm-runtime rocm-opencl-runtime rocm-device-libs rocm-comgr rocclr rocminfo llvmPackages_rocm.llvm hip; nix-shell maintainers/scripts/update.nix --argstr commit true --argstr package $f; end.

@Madouura
Copy link
Contributor Author

I was actually afraid of the opposite being true so I split them up.
Got it, I'll aggregate them.
Thanks for the tip on the update script, that would have saved me a lot of time.

@Madouura
Copy link
Contributor Author

Madouura commented Oct 31, 2022

Hip I think should stay separate though, since there are other changes.
Actually never mind it's just an extra dependency so should be fine to split it.

@ony
Copy link
Contributor

ony commented Jun 16, 2024

As per pytorch/pytorch#119081 (comment) in 2.4.0+ (future release) it should be possible to use something like:

  pythonPackagesExtensions = prev.pythonPackagesExtensions ++ [
    (python-final: python-prev: {
      torch = python-prev.torch.overrideDerivation (oldAttrs: {
        TORCH_BLAS_PREFER_HIPBLASLT = 0;  # not yet in nixpkgs
      });
    })
  ];

@AngryLoki
Copy link

@ony , TORCH_BLAS_PREFER_HIPBLASLT is environment variable for runtime; pytorch still links and requires hipblaslt, even when unused. pytorch/pytorch#120551 should help, but I have no idea whether and when it could be accepted.

By the way, hipblaslt is not difficult to build. Just don't build 6.0 release, skip directly to 6.1. When I tried, bundled TensileLine in 6.0 generated wall of unreadable errors, while 6.1 worked from first attempt.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/4

@SomeoneSerge SomeoneSerge mentioned this issue Oct 16, 2024
13 tasks
@DerDennisOP
Copy link
Contributor

I'm not able to build rocmlir-rock-6.0.2, when trying to install zluda.

FAILED: mlir/lib/Dialect/Rock/Transforms/CMakeFiles/obj.MLIRRockTransforms.dir/ViewToTransform.cpp.o
/nix/store/16pvlpl13g06f1rqxp7z0il9i4l9mlww-rocm-llvm-clang-wrapper-6.0.2/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/
build/source/build/mlir/lib/Dialect/Rock/Transforms -I/build/source/mlir/lib/Dialect/Rock/Transforms -I/build/source/external/llvm-project/llvm/include -I/build/source/build/external/llvm-project/llvm/include -I/build/source/external/llv
m-project/mlir/include -I/build/source/build/external/llvm-project/llvm/tools/mlir/include -I/build/source/external/mlir-hal/mlir/include -I/build/source/build/external/mlir-hal/include -I/build/source/external/mlir-hal/include -I/build/
source/mlir/include -I/build/source/build/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wm
issing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmislead
ing-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused
-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsugg
est-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=gnu++17 -fPIC   -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LI
BCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS  -fno-exception
s -funwind-tables -fno-rtti -UNDEBUG -MD -MT mlir/lib/Dialect/Rock/Transforms/CMakeFiles/obj.MLIRRockTransforms.dir/ViewToTransform.cpp.o -MF mlir/lib/Dialect/Rock/Transforms/CMakeFiles/obj.MLIRRockTransforms.dir/ViewToTransform.cpp.o.d
-o mlir/lib/Dialect/Rock/Transforms/CMakeFiles/obj.MLIRRockTransforms.dir/ViewToTransform.cpp.o -c /build/source/mlir/lib/Dialect/Rock/Transforms/ViewToTransform.cpp
In file included from /build/source/mlir/lib/Dialect/Rock/Transforms/ViewToTransform.cpp:14:
/build/source/mlir/include/mlir/Conversion/TosaToRock/TosaToRock.h:21:10: fatal error: 'mlir/Conversion/RocMLIRPasses.h.inc' file not found
#include "mlir/Conversion/RocMLIRPasses.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Is there an easy fix for it?

@AngryLoki
Copy link

@DerDennisOP , it was addressed in pull-request ROCm/rocMLIR#1640 (issue ROCm/rocMLIR#1620), you may want use it.

@ilylily
Copy link

ilylily commented Oct 26, 2024

@DerDennisOP @AngryLoki i think you'll actually also need ROCm/rocMLIR#1542 (closes ROCm/rocMLIR#1500). similar patch in a nearby file

@arunoruto
Copy link
Contributor

@DerDennisOP , it was addressed in pull-request ROCm/rocMLIR#1640 (issue ROCm/rocMLIR#1620), you may want use it.

@DerDennisOP @AngryLoki i think you'll actually also need ROCm/rocMLIR#1542 (closes ROCm/rocMLIR#1500). similar patch in a nearby file

Is there a plan to patch these things in upstream too? As far as I can see, the hydro logs show the same error as @DerDennisOP.
Would a general update to a newer version resolve this problem? Maybe updating everything to 6.2.4 or even 6.3.0 would be feasible.

@mschwaig
Copy link
Member

Right now I do not have the time to update ROCm, but I could help out as a reviewer.
I think updating to a newer version would be a great idea.

@arunoruto
Copy link
Contributor

arunoruto commented Dec 11, 2024

Right now I do not have the time to update ROCm, but I could help out as a reviewer. I think updating to a newer version would be a great idea.

While I do not have that much experience with ROCm, I could try it.
I was also wondering, there are a lot of PRs from the autoupdater @r-ryantm, for example #358391. If the nixpkgs-review succeeds on these PRs, would they be directly viable for a marge? Do we need to bring all the packages up to date at the same time? Are there more things to consider?

@bgamari
Copy link
Contributor

bgamari commented Dec 12, 2024

Would a general update to a newer version resolve this problem? Maybe updating everything to 6.2.4 or even 6.3.0 would be feasible.

FWIW, I have a branch where I have tried updating things to 6.2.4. Unfortunately, I am seeing linking failures in the clang (specifically libcxx) bootstrap that I doubt I will have time to fix in the near-term.

I was also wondering, there are a lot of PRs from the autoupdater @r-ryantm,

I don't think any of those PRs are viable. Apart from the MRs themselves not building, the auto-updater generally doesn't seem to respect the fact that ROCM components expect to be upgraded in lock-step.

@bgamari
Copy link
Contributor

bgamari commented Dec 12, 2024

FWIW, I have opened a draft MR to record the state of my attempt: #364423

@LunNova
Copy link
Member

LunNova commented Dec 12, 2024

I have a mix of 6.3 and 6.2 working here with pytorch nightly but in no state to upstream. Might be helpful for someone with more time trying to fix it in nixpkgs.

https://github.com/LunNova/ml.nix

@LunNova
Copy link
Member

LunNova commented Dec 20, 2024

It may now be infeasible for hydra to build some critical rocm packages on its current runners in a 10 hour time limit.
Some packages take hundreds of GB of RAM or hundreds of GB of /build even with a restricted target set.
composable_kernel for 6.3.0 being built for just two gfx targets with a current gen 128c build server takes ~3 hours and peaks over 200GB of RAM. Increasing this to the full target list + running on a less powerful builder will not complete within the limit.

@LunNova
Copy link
Member

LunNova commented Dec 20, 2024

I think it's necessary to design a scheme that allows -minimal versions of the largest rocm packages to be built with support for no targets built in and then use env vars which direct the libraries to load code objects for the current hardware from target specific packages, instead of requiring mega packages built for all targets or rebuilding an entire package set for your specific targets.

@AngryLoki
Copy link

@LunNova , regarding memory/time consumption you may want to disable -amdgpu-early-inline-all=true in composable-kernel and hipcc, the compilation time is unbearable for mortals. See llvm/llvm-project#86332 for the rationale.
Overall I'm searching for solution for this question too. LLVM 19 added new targets like gfx9-generic so there are 6 targets at most, but even then it is still too much.

@LunNova
Copy link
Member

LunNova commented Dec 20, 2024

In the future amdgcnspirv might be a solution: ROCm/ROCm#3985 (comment)

@LunNova
Copy link
Member

LunNova commented Dec 20, 2024

I'm not seeing a significant improvement in composable_kernel compilation time from applying https://github.com/gentoo/gentoo/blob/582df03a0d0cb5e61661f64ee629e8d2d0e9ba6b/sci-libs/composable-kernel/files/composable-kernel-6.3.0-no-inline-all.patch

This is on rocm-6.3.0 + a few patches, some related to the inline pass performance issues. https://github.com/LunNova/llvm-project-rocm/commits/lunnova/rocm-6.3.x-optimized/

@Snektron
Copy link
Contributor

Snektron commented Dec 20, 2024

Overall I'm searching for solution for this question too. LLVM 19 added new targets like gfx9-generic so there are 6 targets at most, but even then it is still too much.

Be advised that these targets are often not tested or optimized for. For example, rocPRIM does not have tunings for these architectures.

@Shawn8901
Copy link
Contributor

Shawn8901 commented Dec 23, 2024

fyi: rocm toolchain seems to be mostly broken on staging-next ( https://hydra.nixos.org/eval/1810617?filter=rocm&compare=1810613&full=#tabs-still-fail ) due to compiler-rt failing on the compilation of the testing tools ( https://hydra.nixos.org/build/281942876/nixlog/3 ). Not sure if someone here might want to take a look / has an idea.
might be worth to take a look before staging-next get merged. Test builds should be quite easy as most of the packages are already on the cache.

@AngryLoki
Copy link

AngryLoki commented Dec 23, 2024

I'm not seeing a significant improvement in composable_kernel compilation

You are using -DCMAKE_CXX_COMPILER=hipcc when building composable_kernel, and hipcc adds an extra hidden -mllvm -amdgpu-early-inline-all=true, that's why https://github.com/gentoo/gentoo/blob/582df03a0d0cb5e61661f64ee629e8d2d0e9ba6b/dev-util/hipcc/hipcc-6.3.0.ebuild#L73-L75 removes it too. The effect of this flag is very noticeable.

As alternative solution, CMAKE_CXX_COMPILER=clang++ can compile hip code now pretty well without hidden flags (but this is novel).

@LunNova
Copy link
Member

LunNova commented Dec 23, 2024

You are using -DCMAKE_CXX_COMPILER=hipcc when building composable_kernel

$ nix log .#rocmPackages.composable_kernel_build | grep CMAKE_CXX_COMPILER                                            
CMAKE_CXX_COMPILER: /nix/store/45aqs6snwhnnhsvpbjrbrf4hfsk2pdix-clr-6.3.1/bin/clang++

@AngryLoki
Copy link

You are using -DCMAKE_CXX_COMPILER=hipcc when building composable_kernel

$ nix log .#rocmPackages.composable_kernel_build | grep CMAKE_CXX_COMPILER                                            
CMAKE_CXX_COMPILER: /nix/store/45aqs6snwhnnhsvpbjrbrf4hfsk2pdix-clr-6.3.1/bin/clang++

Maybe it is overriden, I checked in https://hydra.nixos.org/build/253114834/log - -DCMAKE_CXX_COMPILER=clang++ ... -DCMAKE_CXX_COMPILER=hipcc set in

@LunNova
Copy link
Member

LunNova commented Dec 23, 2024

Have been testing in this repo: https://github.com/LunNova/ml.nix/blob/main/rocm-6/composable_kernel/default.nix

@GZGavinZhao
Copy link
Contributor

Hi, maintainer of the ROCm stack for Solus here and author of #298388 and #305920. I just finished updating the entire Solus ROCm stack to v6.2.4 (we're generally one minor version behind to wait for PyTorch to catch up) and I'm also in the process of trying to update ROCm in nixpkgs to v6.3. Feel free to reference our package recipes for the patches, compiler flags, and environment variables we used. Just search for the package name in the search bar and you should find the recipe in a package.yml file. I'm also happy to help with building composable_kernel because I had a ton of (admittedly unpleasant) experience trying to compile it 😅

@LunNova
Copy link
Member

LunNova commented Dec 23, 2024

Turned the out of tree 6.3 into a draft PR: #367695

@LunNova
Copy link
Member

LunNova commented Dec 23, 2024

Related issue for packaging the ROCm out-of-tree amdgpu driver: #366242

Not needed for single GPU setups, required for infiniband/ROCe clusters, probably required for multi-GPU single node PCIE P2P to work (DMABUF IPC support for mainline appears to be broken).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: hardware 6.topic: rocm
Projects
None yet
Development

No branches or pull requests