-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudaPackages.autoAddOpenGLRunpathHook: fix to skip unsupported #245789
Conversation
5ca97f2
to
7fac6b1
Compare
local outputPaths=($(for o in $(getAllOutputNames); do echo "${!o}"; done)) | ||
find "${outputPaths[@]}" -type f -executable -print0 | while IFS= read -rd "" f; do | ||
if isELF "$f"; then | ||
if containsAny "$(patchelf --print-needed "$f" 2>/dev/null || :)" "${openGLRunpathLibs[@]}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern here is that -- IIRC -- we have previously encountered situations in which a library attempts to load libcuda.so
(or others) but does not include it in DT_NEEDED. Not sure if there's a convenient resolution for that.
I suppose it could be argued that manual intervention should be required in those cases. But for whatever reason I don't think that's the approach that we've taken so far.
Perhaps @SomeoneSerge @ConnorBaker might have other thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I suspected that would be the case. autoPatchelf
is much stricter about this and requires that NEEDEDs be configured correctly for the auto-patching behavior. I think that's certainly the ideal place we should be, but it might be too painful to go there right now. I figured I'd put the ideal solution up first and we could walk our way back from there as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one "saving grace" we may be able to leverage is the fact that autoAddOpenGLRunpathHook
is actually only used in about a dozen places, and most of them are in cudaPackages itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one "saving grace" we may be able to leverage is the fact that autoAddOpenGLRunpathHook is actually only used in about a dozen places, and most of them are in cudaPackages itself.
Indeed this is convenient. However the main challenge is that we don't (yet) have a great way to test for failures... the sneaky ones all occur at runtime and require a GPU, something that nixpkgs checkPhase isn't currently set up for. Solutions to this are being discussed in #225912.
For the scope of this PR, I propose we as @NixOS/cuda-maintainers establish a policy as to whether autoAddOpenGLRunpathHook should respect DT_NEEDED or not, and then we can proceed from there. Waiting on #225912 could take a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! Thank you. I'm happy to adjust this PR for expediency (likely by only filtering on a successful call to patchelf --print-needed
so as to filter out statically linked exes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dt_needed has an extra effect that a needed soname is loaded at the process startup (or at dlopen of the depending library). E.g. if nvidia-smi had libcuda.so as needed you couldn't run nvidia-smi -h on a system without the driver.
Ooh that's a good point I had not previously considered. I imagine that folks might eg maintain a shared cache with cudaSupport = true
but then run the same software on non-GPU machines.
It'd be nice though to know in advance what an executable might try to load. E.g. if this info were included in the nvidia.s manifests...
I suppose we could modify the hook to patch a more specific set of libraries, something like openGLHookPatchThese = [ "foo.so", "bar.so.*" ];
. That would be akin to the manual addOpenGLRunpath
function, but could be made more ergonomic with globbing, DT_NEEDED detection, and so forth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just about to propose the exact same idea. It would be nice to have some control over what this hook patches. A a regex/glob that defaults to *
would at least let me limit it without having to write my own loops with addOpenGLRunpath
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also start by testing binaries whether they're dynamically linked instead. From what I gather that should address the failures already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll change it to only do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made that change.
pkgs/development/compilers/cudatoolkit/auto-add-opengl-runpath-hook.sh
Outdated
Show resolved
Hide resolved
5828d07
to
029bf6d
Compare
# autoAddOpenGLRunpathHook does not actually depend on or incur any dependency | ||
# of cudaPackages. It merely adds an impure, non-Nix PATH to the RPATHs of | ||
# executables that need to use cuda at runtime. | ||
cudaPackages_12.autoAddOpenGLRunpathHook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hardcode the vetsion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This package depends on 3 different versions of cudaPackages
, none of which are the default cudaPackages
. Instead of pulling in a 4th variant of cudaPackages
, just for this, I randomly chose one of the existing ones. It's a shame this hook is attached to versions like this.
|
||
if [ -z "${dontUseAutoAddOpenGLRunpath-}" ]; then | ||
echo "Using autoAddOpenGLRunpathPhase" | ||
postFixupHooks+=(autoAddOpenGLRunpathPhase) | ||
echo "Using autoAddOpenGLRunpathPhase" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a formatting change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rewrite uses 2-space tabs so this is to make it consistent.
local outputPaths=($(for o in $(getAllOutputNames); do echo "${!o}"; done)) | ||
find "${outputPaths[@]}" -type f -executable -print0 | while IFS= read -rd "" f; do | ||
if isELF "$f"; then | ||
if patchelf --print-needed "$f" >/dev/null 2>&1; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the exit code that's significant here? Can we add a comment about how this works/what is the purpose of the test/what does the exit code mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to --print-interpreter
and added a comment.
❯ cat main.go
package main
func main() {}
❯ go build main.go
❯ patchelf --print-interpreter main; echo $?
patchelf: cannot find section '.interp'. The input file is most likely statically linked
1
❯ patchelf --print-interpreter $(realpath $(which patchelf)); echo $?
/nix/store/vfmkdh2s2cnnk3igcn0bjpx38kl3zw8a-glibc-2.37-8/lib/ld-linux-x86-64.so.2
0
Thank you @de11n ! Could you also run the script through shellcheck? |
a79d9b6
to
8b4275f
Compare
Result of 185 packages marked as broken and skipped:
257 packages failed to build:
1136 packages built:
|
Failed derivations
|
haven't inspected every failure log, but nothing jumps out at me as problematic here |
any blockers to merging this? |
I also haven't looked through all of them, but a number of them are test suites failing which don't look related. I think it's good to merge! |
ok, just merged! thanks everyone! |
03e72e4 seems to have introduced a regression, affecting pytorchWithCuda&c:
...I wonder if we could somehow unit-test these helper functions ( |
I was wrong about On the other hand, testing for
|
this is also a good usecase for us having a GPU-enabled test suite 🤔 |
Description of changes
cudaPackages.autoAddOpenGLRunpathHook
is far too liberal about what it attempts to patch. For this reason, it currently fails when run on a package that has statically linked executables.This rewrite does 3 things:
Limits which executables get patched based on theirDT_NEEDED
sPoints 1 and 2 are non-controversial. However, point 3 has potential fallout or unintended consequences that must be reviewed/considered. For example, while the currentautoAddOpenGLRunpathHook
patches 690 files in thecudaPackages
set, this version patches only 9 files. (This was determined by instrumenting the hook, manually building before/after versions, and inspecting the logs.1)If the idea of limiting the patching behavior is acceptable, we may need to tune it to patch the right things.EDIT: Feature 3 was removed.
cc @NixOS/cuda-maintainers @samuela
This is follow-up from #235024 (comment).
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Footnotes
I used
nix-build -E 'with import ./. {}; lib.mapAttrs (n: v: if builtins.any (x: x == "x86_64-linux") v.meta.platforms or [] then v else null) cudaPackages' --keep-going |& tee after.txt
and thengrep 'autoAddOpenGLRunpathHook: Patching' after.txt | wc -l
. ↩