Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch] Updated recipe #9785

Merged
merged 4 commits into from
Dec 8, 2024

Conversation

stemann
Copy link
Contributor

@stemann stemann commented Nov 14, 2024

Updated Torch recipe

  • Changed to provide lazy artifacts.
  • Replaced CUDA_full sources with CUDA module dependencies (and similar for CUDA 11.3).
  • Updated cuda-part of script accordingly.
  • Added xcrun executable for macOS which can handle --show-sdk-path.
  • Using LLVM v13 (on macOS, v17 has issues).
  • Adding cuda="none" tag to non-CUDA platforms with matching CUDA platforms.

Added comment to fancy_toys.jl should_build_platform

should_build_platform returns true, e.g. for "x86_64-linux-gnu-cxx11" when ARGS = ["x86_64-linux-gnu-cxx11-cuda+10.2"].

Added NVTX RuntimeDependency

@stemann stemann force-pushed the feature/torch_1.10.2+1 branch 7 times, most recently from 257a506 to fbaf1dd Compare November 14, 2024 23:38
@stemann stemann force-pushed the feature/torch_1.10.2+1 branch 3 times, most recently from 7962da4 to 172225c Compare November 17, 2024 07:49
@stemann stemann force-pushed the feature/torch_1.10.2+1 branch from ffc57b4 to 59fd149 Compare December 1, 2024 18:23
* Changed to provide lazy artifacts.
* Replaced CUDA_full sources with CUDA module dependencies (and similar for CUDA 11.3).
* Updated cuda-part of script accordingly.
* Added xcrun executable for macOS which can handle `--show-sdk-path`.
* Using LLVM v13 (on macOS, v17 has issues).
* Adding cuda="none" tag to non-CUDA platforms with matching CUDA platforms.
`should_build_platform` returns true, e.g. for "x86_64-linux-gnu-cxx11" when ARGS = ["x86_64-linux-gnu-cxx11-cuda+10.2"].
@stemann stemann force-pushed the feature/torch_1.10.2+1 branch from 7bdd1cf to a2c72e4 Compare December 1, 2024 23:45
@stemann stemann changed the title [Torch] Changed recipe to use CUDA module [Torch] Updated recipe Dec 1, 2024
@stemann stemann marked this pull request as ready for review December 1, 2024 23:46
T/Torch/build_tarballs.jl Outdated Show resolved Hide resolved
@stemann stemann force-pushed the feature/torch_1.10.2+1 branch from be2a7d7 to dadefa6 Compare December 2, 2024 11:02
@stemann
Copy link
Contributor Author

stemann commented Dec 3, 2024

@maleadt Does this look alright to you, wrt. CUDA aspects?

I.e., using lazy artifacts, and building for cuda="none", and cuda="10.2" and cuda="11.3", x86_64-linux-gnu platforms.

With the goal being to have Torch.jl be automatically loadable (also) on x86_64-linux-gnu (for General registration etc.) - while also selecting CUDA artifacts if appropriate (in accordance with the CUDA platform augmentation implementation).

@maleadt
Copy link
Contributor

maleadt commented Dec 3, 2024

I think that looks right. Normally you'd use CUDA.is_supported to determine whether to use a cuda+none build, though here that happens not to matter because of not having a Windows build. But it might still be better to do it the "proper" way.

That said, I'm not entirely convinced anymore that unified CPU+GPU artifacts are the best way forward (as opposed to split packages). There's some fragility e.g. with the none tag, but also it removes the ability to use a non-GPU build as soon as you have the NVIDIA driver installed. I'll leave that up to you to decide, though.

@stemann
Copy link
Contributor Author

stemann commented Dec 3, 2024

Great, thanks!

I was a little unhappy with the handling around those lines anyway (the double platforms, cuda_platforms, loop), so I've adopted CUDA.is_supported (though I also had to add special handling for aarch64).

I agree that it might be more right to have separate packages... but keeping the current approach for now (to move things forward).

@stemann
Copy link
Contributor Author

stemann commented Dec 3, 2024

Hoping for the materialisation of BinaryBuilder2 to bring pkg extensions to Yggdrasil to handle CUDA-variants of JLLs...

@stemann stemann requested a review from giordano December 3, 2024 20:25
@giordano giordano merged commit 7deefac into JuliaPackaging:master Dec 8, 2024
15 checks passed
@stemann stemann deleted the feature/torch_1.10.2+1 branch December 9, 2024 08:19
@stemann
Copy link
Contributor Author

stemann commented Dec 9, 2024

@maleadt So, since Torch_jll depends on CUDNN_jll, which depends on CUDA_Runtime_jll, which in turn depends on CUDA_Driver_jll, is it impossible to get the cuda="none" artifact (CUDA_Driver_jll finds a "Forward-compatible driver") or is there a trick I've missed?

Tried setting version="none" for CUDA_Runtime_jll , but that does not seem to be anything other than "imagined API" :-)

Debugging in https://github.com/stemann/Torch.jl/actions/runs/12233357984/job/34120277156

@maleadt
Copy link
Contributor

maleadt commented Dec 9, 2024

is it impossible to get the cuda="none" artifact (CUDA_Driver_jll finds a "Forward-compatible driver")

Correct; that's what I meant with #9785 (comment)

it removes the ability to use a non-GPU build as soon as you have the NVIDIA driver installed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants