-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking] ROCm packages #197885
Comments
Updating to 5.3.1, marking all WIP until pushed to their respective PRs and verified. |
|
Hi, thanks a lot for your work on ROCm packages! So far, the updates where all aggregated in a single
tl;dr, do you mind merging all your 5.3.1 updates into a single PR? PS: Not sure how you did the update, I usually do it with |
I was actually afraid of the opposite being true so I split them up. |
Hip I think should stay separate though, since there are other changes. |
As per pytorch/pytorch#119081 (comment) in 2.4.0+ (future release) it should be possible to use something like: pythonPackagesExtensions = prev.pythonPackagesExtensions ++ [
(python-final: python-prev: {
torch = python-prev.torch.overrideDerivation (oldAttrs: {
TORCH_BLAS_PREFER_HIPBLASLT = 0; # not yet in nixpkgs
});
})
]; |
@ony , TORCH_BLAS_PREFER_HIPBLASLT is environment variable for runtime; pytorch still links and requires hipblaslt, even when unused. pytorch/pytorch#120551 should help, but I have no idea whether and when it could be accepted. By the way, hipblaslt is not difficult to build. Just don't build 6.0 release, skip directly to 6.1. When I tried, bundled TensileLine in 6.0 generated wall of unreadable errors, while 6.1 worked from first attempt. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/4 |
I'm not able to build rocmlir-rock-6.0.2, when trying to install zluda.
Is there an easy fix for it? |
@DerDennisOP , it was addressed in pull-request ROCm/rocMLIR#1640 (issue ROCm/rocMLIR#1620), you may want use it. |
@DerDennisOP @AngryLoki i think you'll actually also need ROCm/rocMLIR#1542 (closes ROCm/rocMLIR#1500). similar patch in a nearby file |
Is there a plan to patch these things in upstream too? As far as I can see, the hydro logs show the same error as @DerDennisOP. |
Right now I do not have the time to update ROCm, but I could help out as a reviewer. |
While I do not have that much experience with ROCm, I could try it. |
FWIW, I have a branch where I have tried updating things to 6.2.4. Unfortunately, I am seeing linking failures in the
I don't think any of those PRs are viable. Apart from the MRs themselves not building, the auto-updater generally doesn't seem to respect the fact that ROCM components expect to be upgraded in lock-step. |
FWIW, I have opened a draft MR to record the state of my attempt: #364423 |
I have a mix of 6.3 and 6.2 working here with pytorch nightly but in no state to upstream. Might be helpful for someone with more time trying to fix it in nixpkgs. |
It may now be infeasible for hydra to build some critical rocm packages on its current runners in a 10 hour time limit. |
I think it's necessary to design a scheme that allows |
@LunNova , regarding memory/time consumption you may want to disable |
In the future amdgcnspirv might be a solution: ROCm/ROCm#3985 (comment) |
I'm not seeing a significant improvement in composable_kernel compilation time from applying https://github.com/gentoo/gentoo/blob/582df03a0d0cb5e61661f64ee629e8d2d0e9ba6b/sci-libs/composable-kernel/files/composable-kernel-6.3.0-no-inline-all.patch This is on rocm-6.3.0 + a few patches, some related to the inline pass performance issues. https://github.com/LunNova/llvm-project-rocm/commits/lunnova/rocm-6.3.x-optimized/ |
Be advised that these targets are often not tested or optimized for. For example, rocPRIM does not have tunings for these architectures. |
fyi: rocm toolchain seems to be mostly broken on staging-next ( https://hydra.nixos.org/eval/1810617?filter=rocm&compare=1810613&full=#tabs-still-fail ) due to compiler-rt failing on the compilation of the testing tools ( https://hydra.nixos.org/build/281942876/nixlog/3 ). Not sure if someone here might want to take a look / has an idea. |
You are using As alternative solution, |
|
Maybe it is overriden, I checked in https://hydra.nixos.org/build/253114834/log -
|
Have been testing in this repo: https://github.com/LunNova/ml.nix/blob/main/rocm-6/composable_kernel/default.nix |
Hi, maintainer of the ROCm stack for Solus here and author of #298388 and #305920. I just finished updating the entire Solus ROCm stack to v6.2.4 (we're generally one minor version behind to wait for PyTorch to catch up) and I'm also in the process of trying to update ROCm in nixpkgs to v6.3. Feel free to reference our package recipes for the patches, compiler flags, and environment variables we used. Just search for the package name in the search bar and you should find the recipe in a |
Turned the out of tree 6.3 into a draft PR: #367695 |
Related issue for packaging the ROCm out-of-tree amdgpu driver: #366242 Not needed for single GPU setups, required for infiniband/ROCe clusters, probably required for multi-GPU single node PCIE P2P to work (DMABUF IPC support for mainline appears to be broken). |
Tracking issue for ROCm derivations.
Key
WIP
Ready
TODO
Merged
ROCm-related
Notes
nix-shell maintainers/scripts/update.nix --argstr commit true --argstr keep-going true --arg predicate '(path: pkg: builtins.elem (pkg.pname or null) [ "rocm-llvm-llvm" "rocm-core" "rocm-cmake" "rocm-thunk" "rocm-smi" "rocm-device-libs" "rocm-runtime" "rocm-comgr" "rocminfo" "clang-ocl" "rdc" "rocm-docs-core" "hip-common" "hipcc" "clr" "hipify" "rocprofiler" "roctracer" "rocgdb" "rocdbgapi" "rocr-debug-agent" "rocprim" "rocsparse" "rocthrust" "rocrand" "rocfft" "rccl" "hipcub" "hipsparse" "hipfort" "hipfft" "tensile" "rocblas" "rocsolver" "rocwmma" "rocalution" "rocmlir" "hipsolver" "hipblas" "miopengemm" "composable_kernel" "half" "miopen" "migraphx" "rpp-hip" "mivisionx-hip" "hsa-amd-aqlprofile-bin" ])'
Won't implement
strictDeps
for all derivationsThe text was updated successfully, but these errors were encountered: