Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Torch from source #4554

Merged
merged 3 commits into from
Sep 13, 2022

Conversation

stemann
Copy link
Contributor

@stemann stemann commented Mar 6, 2022

Alternative to #4477, as that bumped into FluxML/Torch.jl#17

Building Torch from source (for Linux and Linux with CUDA) would likely remedy (at least):

Related aims (but not the aim of this PR):

  • Updating Torch version.
  • Windows support.
  • Windows + CUDA support.
  • macOS support.

@stemann stemann force-pushed the stemann/torch_from_source branch 3 times, most recently from 31ce087 to ae939c8 Compare March 9, 2022 12:32
@stemann stemann force-pushed the stemann/torch_from_source branch 4 times, most recently from 4b4943e to 35e09b5 Compare March 18, 2022 00:51
cuda_version_minor=`echo $cuda_version | cut -d . -f 2`
cuda_full_path="$WORKSPACE/srcdir/CUDA_full.v$cuda_version/cuda"
apk del cmake
apk add 'cmake<3.17' --repository=http://dl-cdn.alpinelinux.org/alpine/v3.11/main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got fed up trying to hack the "find cuda" part of Torch - downgrading cmake seems to work.

Torch seems to be using the "old" https://cmake.org/cmake/help/latest/module/FindCUDA.html approach which was replaced with https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html in CMake v3.17 as I understand it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least it seems more "clean" to me than patching some set of cmake files :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and then it works in the BB shell (e.g. with --debug=begin), but not in auto mode...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and then it works in the BB shell (e.g. with --debug=begin), but not in auto mode...

I'm not 100% sure what you mean here, but note that package installation and setting environment variables aren't persistent at the moment, when dropping into the debug shell (package are installed in a tmpfs which is lost when recreating the debug environment, and we don't have a way to remember all environment settings). It may be possible to address some of these issues in the future, but at the moment you have to repeat those operations manually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that was a bit too terse. I meant that I could get cmake to find CUDA when running interactively, but not when just doing a build. It turned out to be something weird about the (copied/forked) FindCUDA/FindCUDAToolkit cmake stuff in pytorch/cmake - at least I currently ended up with the hack about running "configure" twice/thrice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might actually work with the bundled cmake v3.21 now - have to check...

@stemann stemann force-pushed the stemann/torch_from_source branch from 29cfcc0 to 3c03c11 Compare March 18, 2022 01:43
@stemann stemann force-pushed the stemann/torch_from_source branch 19 times, most recently from 6cdc1c9 to 88f338a Compare March 19, 2022 23:42
Comment on lines 245 to 262
Dependency(PackageSpec(name="CompilerSupportLibraries_jll", uuid="e66e0078-7015-5450-92f7-15fbd957f2ae")),
Dependency("blis_jll"; platforms = blis_platforms),
Dependency("CPUInfo_jll", v"0.0.20201217"),
Dependency("CUDNN_jll", v"8.2.4"; compat = "8", platforms = cuda_platforms),
Dependency("Gloo_jll", v"0.0.20210521"; platforms = filter(p -> nbits(p) == 64, platforms)),
Dependency("LAPACK_jll"; platforms = openblas_platforms),
Dependency("MKL_jll"; platforms = mkl_platforms),
BuildDependency("MKL_Headers_jll"; platforms = mkl_platforms),
Dependency("OpenBLAS_jll"; platforms = openblas_platforms),
Dependency("PThreadPool_jll", v"0.0.20210414"),
Dependency("SLEEF_jll", v"3.5.2"),
# Dependency("TensorRT_jll"; platforms = cuda_platforms), # Building with TensorRT is not supported: https://github.com/pytorch/pytorch/issues/60228
Dependency("XNNPACK_jll", v"0.0.20210622"),
BuildDependency(PackageSpec("protoc_jll", Base.UUID("c7845625-083e-5bbe-8504-b32d602b7110"), v"3.13.0")),
HostBuildDependency(PackageSpec("protoc_jll", Base.UUID("c7845625-083e-5bbe-8504-b32d602b7110"), v"3.13.0")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you specify the version of a runtime dependency you typically also want to specify the compat (or actually only that one since the build version is automatically inferred from the lowest compatible version)

@stemann stemann force-pushed the stemann/torch_from_source branch 6 times, most recently from 8f0437d to 931edf6 Compare September 12, 2022 08:09
* Removed Torch v1.4.0 which included Torch.jl wrapper
* Skipped Torch.jl wrapper
* With MKL dependency on MKL-platforms
* Using protoc v3.13.0 JLL.
* Added protoc as a build dependency to get correct version
* Not using ONNX dependency to get past protoc issue
* Added micromamba install of pyyaml and typing_extensions - needed for build.
* Using XNNPACK JLL dependency
* Added CPUInfo and PThreadPool dependencies
* Added SLEEF dependency
* Turned off some features explicitly to silence some configure warnings
* Not using NNPACK, and QNNPACK, and limited PYTORCH_QNNPACK to x86_64.
* Disabled use of breakpad on aarch64-linux-gnu
* Enabled configure on Windows via patch and disabling breakpad
* Disabled use of TensorPipe on linux-musl
* Excluded unsupported powerpc64le and i686-windows platforms
* Disabled kineto for w64 and freebsd
* Disabled breakpad for FreeBSD
* Disabled use of MKLDNN on macOS
* Added Gloo dependency - to aid linux-musl
* Disabled MKLDNN for linux-musl
* Disabled FreeBSD as Clang v12 crashes
* Disabled MKLDNN for w64-mingw32
* Using MKL, BLIS, or OpenBLAS + LAPACK - preferring MKL or BLIS
  * Restricted use of LAPACK to OpenBLAS platforms
  * Set preferred BLAS for armv6l-linux-gnu
* Disabled FBGEMM for x86_64-w64-mingw32
* Added MKL_Headers as dependency
  * Disabled MKL for Windows as CMake cannot find MKL
* Optimized git submodule update
* Added note about disabling MKLDNN for x86_64-apple-darwin
* Fixed a few warnings related to FBGEMM
* Fixed windows warning related to TensorPipe
* Disabled Metal to silence warning that it is only used on iOS
* Silence cmake developer warnings
* Disabled linux-musl and Windows
* Added additional library product libtorch_cpu
* Added SO version to libraries and disabled numpy
* Set GLIBCXX_USE_CXX11_ABI - like official libtorch builds.
* Added platform expansion for C++ string ABIs
* Added dep build versions and/or compat
* Disabled ARM 32-bit platforms
* Fixup for FBGEMM warning on aarch64-apple-darwin
@stemann stemann force-pushed the stemann/torch_from_source branch from e7dc80e to afc8ffa Compare September 12, 2022 11:15
* Using CUDA_full v11.3 to use v11.3.1+1 which includes Thrust library.
* Using CUDNN v8.2.4 for build version (similar to ONNXRuntime)
* Added patch for cmake to find CUDA
* Set CUDACXX to make cmake find CUDA
* Added CUDA libraries manually - and enabled CUDNN
* Added double-triple configure hack to make CUDA configure - To get past TRY_RUN for CUDA
* Added CUDA headers to CMAKE_INCLUDE_PATH
* Additional fixes for CUDA - and CUB
* Set TMPDIR for nvcc
* Added additional CUDA libraries
@stemann stemann force-pushed the stemann/torch_from_source branch from f466550 to 2dc37fa Compare September 12, 2022 13:42
@stemann stemann marked this pull request as ready for review September 12, 2022 13:42
@giordano
Copy link
Member

Is this good to go now?

@stemann
Copy link
Contributor Author

stemann commented Sep 13, 2022

Is this good to go now?

Yes, LGTM :-)

I plan to follow-up with a PR with a recipe for building an updated version of the C wrapper in https://github.com/FluxML/Torch.jl/tree/master/build

@rayegun
Copy link
Contributor

rayegun commented Sep 13, 2022

I plan to follow-up with a PR with a recipe for building an updated version of the C wrapper in

Awesome I was just looking at whether to do the ocaml or Rust one today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants