Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA build #120

Closed
wants to merge 33 commits into from
Closed

CUDA build #120

wants to merge 33 commits into from

Conversation

izahn
Copy link

@izahn izahn commented May 29, 2021

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

This is a cleaned up PR that adds a CUDA build. It is based on the work and feedback in #118

The main issue is that I couldn't get a CUDA version to build with system libraries. I finally gave up and unset TF_SYSTEM_LIBS for the CUDA builds only (other builds should still use system libraries as before). It would be even more awesome if we could get CUDA builds and use conda system libraries. I tried hard but couldn't figure out how to do it, and I think a CUDA build will be valuable, even if we can't get system libraries working.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at the bazel toolchain and the scripts that generate it. This was the groundbreaking part for me to get the CPU builds working.

recipe/build.sh Outdated Show resolved Hide resolved
recipe/build.sh Outdated Show resolved Hide resolved
@izahn
Copy link
Author

izahn commented May 31, 2021

Have a look at the bazel toolchain and the scripts that generate it. This was the groundbreaking part for me to get the CPU builds working.

It turns out that the CFLAGS set in https://github.com/conda-forge/nvcc-feedstock/blob/master/recipe/install_nvcc.sh were the cause of the The include path '/usr/local/cuda/include' references a path outside of the execution root. errors I was getting, and that was the thing that led me to avoid the custom_toolchain. I think I have a work-around for that, and am building locally with custom_toolchain now. If it works locally I'll push these changes.

@izahn
Copy link
Author

izahn commented May 31, 2021

I think I'm making progress, but hit another snag I don't understand:

ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/work/tensorflow/compiler/mlir/lite/quantization/BUILD:138:20: Linking of rule '//tensorflow/compiler/mlir/lite/quantization:op_quant_spec_getters_gen' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/conda/.cache/bazel/_bazel_conda/f7f47ae89e86280f733605a1378b0008/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/work:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/_build_env:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622495488587/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/host/bin/tensorflow/compiler/mlir/lite/quantization/op_quant_spec_getters_gen-2.params)
Execution platform: @local_execution_config_platform//:platform
/usr/bin/ld: cannot find -lz
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 5284.940s, Critical Path: 197.14s
INFO: 11007 processes: 8270 internal, 2737 local.
FAILED: Build did NOT complete successfully

Any ideas on this one? Full log at https://gist.github.com/izahn/47c950b53ffca4e8f68818b67538d495 in case that helps.

@izahn
Copy link
Author

izahn commented Jun 1, 2021

Still having problems with this:

  [11,311 / 15,254] Compiling tensorflow/core/kernels/relu_op_gpu.cu.cc; 111s local ... (4 actions running)
ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/work/tensorflow/tools/proto_text/BUILD:31:10: Linking of rule '//tensorflow/tools/proto_text:gen_proto_text_functions' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/conda/.cache/bazel/_bazel_conda/8cea2436bc4316f162f1b5b3058eb302/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/work:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_build_env:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/host/bin/tensorflow/tools/proto_text/gen_proto_text_functions-2.params)
Execution platform: @local_execution_config_platform//:platform
/usr/bin/ld: cannot find -lprotobuf
/usr/bin/ld: cannot find -lsnappy
/usr/bin/ld: cannot find -lprotobuf
collect2: error: ld returned 1 exit status
ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/work/tensorflow/core/framework/BUILD:1261:31 Linking of rule '//tensorflow/tools/proto_text:gen_proto_text_functions' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/conda/.cache/bazel/_bazel_conda/8cea2436bc4316f162f1b5b3058eb302/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/work:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_build_env:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622561486085/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/host/bin/tensorflow/tools/proto_text/gen_proto_text_functions-2.params)
Execution platform: @local_execution_config_platform//:platform
INFO: Elapsed time: 17383.472s, Critical Path: 1300.39s
INFO: 11640 processes: 6284 internal, 5356 local.
FAILED: Build did NOT complete successfully

Full log at https://gist.github.com/izahn/47c950b53ffca4e8f68818b67538d495

bazel is really killing me here, I still don't understand it.

recipe/build.sh Outdated
export TF_DOWNLOAD_CLANG=0
export TF_NEED_TENSORRT=0
export TF_NCCL_VERSION=""
BUILD_OPTS="${BUILD_OPTS} --config=cuda --linkopt=-L${PREFIX}/lib --define=LIBDIR=${PREFIX}/lib"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BUILD_OPTS="${BUILD_OPTS} --config=cuda --linkopt=-L${PREFIX}/lib --define=LIBDIR=${PREFIX}/lib"
BUILD_OPTS="${BUILD_OPTS} -s --config=cuda --linkopt=-L${PREFIX}/lib --define=LIBDIR=${PREFIX}/lib"

Apparently -s makes it print out the compilation commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might even want to try changing --linkopt to --copt or even --host_linkopt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah that is very verbose, but didn't really give any more information about what does wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does it take to get to failure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to failure varies, because the order isn't always the same. Can be anywhere from 30 minutes to several hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh, this makes it harder to debug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might even want to try changing --linkopt to --copt or even --host_linkopt

I think --copt=-L${PREFIX}/lib might have done the trick! I'm almost afraid to get my hopes up at this point, but we are safely past the place where the build failed last time. Fingers crossed!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it fails later with

[12,897 / 29,765] Compiling tensorflow/lite/toco/graph_transformations/resolve_multiply_by_zero.cc [for host]; 3s local ... (4 actions running)
ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/work/tensorflow/tools/proto_text/BUILD:31:10: Linking of rule '//tensorflow/tools/proto_text:gen_proto_text_functions' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/conda/.cache/bazel/_bazel_conda/fc5b6f8a245ea6c7bfa068d002f44f78/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/work:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_build_env:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/host/bin/tensorflow/tools/proto_text/gen_proto_text_functions-2.params)
Execution platform: @local_execution_config_platform//:platform
/usr/bin/ld: cannot find -lprotobuf
/usr/bin/ld: cannot find -lsnappy
/usr/bin/ld: cannot find -lprotobuf
collect2: error: ld returned 1 exit status
ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/work/tensorflow/core/framework/BUILD:1261:31 Linking of rule '//tensorflow/tools/proto_text:gen_proto_text_functions' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/conda/.cache/bazel/_bazel_conda/fc5b6f8a245ea6c7bfa068d002f44f78/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/work:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_build_env:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1622642338317/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/host/bin/tensorflow/tools/proto_text/gen_proto_text_functions-2.params)
Execution platform: @local_execution_config_platform//:platform
INFO: Elapsed time: 13126.052s, Critical Path: 444.51s
INFO: 13232 processes: 8181 internal, 5051 local.
FAILED: Build did NOT complete successfully

full log at https://gist.github.com/izahn/42712a8437361cb40aa5aed129f05155

Ista Zahn added 2 commits June 2, 2021 09:13
@izahn
Copy link
Author

izahn commented Jun 2, 2021

Have a look at the bazel toolchain and the scripts that generate it. This was the groundbreaking part for me to get the CPU builds working.

I notice

WARNING: option '--config=cuda' (source command line options) was expanded and now overrides the explicit option --crosstool_top=//custom_toolchain:toolchain with --crosstool_top=@local_config_cuda//crosstool:toolchain

in the logs, and suspect that might be a problem, if not the problem, causing my builds to fail.

@izahn
Copy link
Author

izahn commented Jun 2, 2021

I've taken this as far as I can for the time being. I've pushed my latest changes, and I think its in pretty good shape except for the linker issue. If someone has time and interest in picking it up that would be great, otherwise I'll try to come back to this at some later date. Thanks to everyone who offered their time and advice, I appreciate your help!

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Jun 3, 2021

I'm finding it strange that there are references to the use of compilers in /usr/bin.

For one, the /usr/bin/ld: cannot find -lz.

It should be using the conda-forge ld compiler.

I'm expecting it to be something like

${HOST_PREFIX}/bin/x86_64-conda-linux-gnu-ld

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Jun 7, 2021

whoa! did you find it?

@izahn
Copy link
Author

izahn commented Jun 7, 2021

whoa! did you find it?

Kind of! I gave up on getting TF_SYSTEM_LIBS working with the CUDA build, and just let bazel build everything, like they do in https://github.com/AnacondaRecipes/tensorflow_recipes/blob/master/tensorflow-base-gpu. I also used the patches from there. This results in extremely long build times (all I have available for building is a 4 year old thinkpad, takes about two days...), and a huge package at the end of it all. But... it works!

The log from my most recent completed build is at https://gist.github.com/izahn/b194405de3595d460199cbc3bbce0c74, and the packages are at https://anaconda.org/izahn/repo. One of the libtensorflow tests failed, but the main tensorflow-base package works, and I "tested" it by running https://pypi.org/project/ai-benchmark/.

All in all I think I'm getting very close to something that I would be comfortable merging here. It needs investigation and fix for the failing libtensorflow test, and maybe a little more code cleanup, but I think we're close!

@izahn
Copy link
Author

izahn commented Jun 23, 2021

The linux builds are working, both with and without cuda. Unfortunately it looks like I broke the osx builds, despite my efforts not to change anything with them.

Several build logs are available for review, and the corresponding artifacts are available at https://anaconda.org/izahn/repo

Given the complexity of this build, and the fact that I had to resort to using a totally different build process for cuda packages, I'm not sure it makes sense to keep trying to do all this in one recipe. It may be simpler to split the cuda builds out into a separate tensorflow-gpu recipe, as Anaconda does with their tensorflow packages. Please let me know if you have thoughts or suggestions about that.

@183amir
Copy link

183amir commented Jun 23, 2021

Given the complexity of this build, and the fact that I had to resort to using a totally different build process for cuda packages, I'm not sure it makes sense to keep trying to do all this in one recipe. It may be simpler to split the cuda builds out into a separate tensorflow-gpu recipe, as Anaconda does with their tensorflow packages. Please let me know if you have thoughts or suggestions about that.

We can have a cuda or gpu branch here to track GPU recipes.

@hmaarrfk
Copy link
Contributor

Sorry for chiming in so late. This just came back on my radar.

I think we should avoid having two branches.

I think that we can likely try to see if the OSX builds are just broken as is, or if we did something special in this recipe to break them.

I think I can try to build on the CIs the OSX part to see what broke.

@izahn
Copy link
Author

izahn commented Aug 15, 2021

If we go the direction of this PR we effectively have two completely different builds. The cuda enabled recipe in this PR is basically just the Anaconda one, while the non-cuda build has diverged considerably from the anaconda recipe. Because of this I think a separate branch makes sense. Alternatively we can keep trying to get a cuda enabled build based on the conda-forge recipe here rather than based on the Anaconda recipe. I tried pretty hard to do that but never could get it working. The only thing that came from that effort was that I learned to hate bazel.

@h-vetinari
Copy link
Member

Thanks for pushing this forward!

I don't like the divergence in recipes TBH, but I understand the realities of the situation (and your hate for bazel). I haven't looked in detail, but is there an argument against using the CUDA-enabled recipe to also build the non-CUDA version?

Also, in the hope of enriching our understanding of the different approaches - have you seen the work from open-ce on packaging tensorflow for conda?

At the time I commented:

Just discovered the following work by open-ce: https://github.com/open-ce/tensorflow-feedstock/blob/master/recipe
[...]

Since both the license of the feedstock and the open-ce project are Apache-2, there shouldn't be an issue with including some of this work here?

and @jayfurmanek responded:

Absolutely no problem. TensorFlow is a beast, we'd love to collaborate. Feel free to pull in any of the work we did (perhaps some credit or a link in a readme or comment would be nice). A couple of things to note:

  • The recipes in open-ce back to defaults, not to forge, so there could be some unforeseen issues related to that, especially since forge uses a newer toolchain.

  • We enable TensorRT, cudnn for the cuda variant. I know forge has cuda packages now, at least so thats's good.

  • Our recipe creates a separate libtensorflow package as well that includes the extra C and C++ API libs and headers. There are likely other ways of providing those, but this is the best way we've found to include all of the needed headers and to avoid rebuilding everything, just for the C and C++ libs.

@xhochy
Copy link
Member

xhochy commented Aug 15, 2021

Thanks for pushing this forward!

I don't like the divergence in recipes TBH, but I understand the realities of the situation (and your hate for bazel). I haven't looked in detail, but is there an argument against using the CUDA-enabled recipe to also build the non-CUDA version?

Not really as the CUDA-enabled one uses the system compilers and not the conda-forge ones. We need the c-f ones on OSX independent of the target and for cross-compiling on Linux. The latter is not used yet for Tensorflow but has been working really well with other bazel-based projects using the bazel-toolchain package. The conda-forge Tensorflow build is actually just taking the Anaconda one for osx-64 to all platforms, there is divergence but coming from the same origin.

I haven't looked into any issues here as I actually don't need CUDA for my current use cases but I would expect that we would get it working with a limited amount of patching. The main issue here is not Bazel but just the extreme way of Tensorflow of doing all things on their own weird way. Bazel isn't a straight forward to use tool, it requires a lot of upfront knowledge but the issues here cannot be really blamed on it 😉

Also, in the hope of enriching our understanding of the different approaches - have you seen the work from open-ce on packaging tensorflow for conda?

I had a brief look at that when doing the last iteration on building Tensorflow and it also helped and gave me some helpful pointers but as I didn't enable CUDA at that point, I didn't use anything from there.

@izahn izahn mentioned this pull request Sep 24, 2021
5 tasks
@wolfv
Copy link
Member

wolfv commented Sep 27, 2021

@xhochy are you planning on working on this? I'd love to help out as we need a CUDA-enabled build.

@xhochy
Copy link
Member

xhochy commented Sep 27, 2021

@xhochy are you planning on working on this? I'd love to help out as we need a CUDA-enabled build.

Not really, I would have a look from time to time into the errors that come up but I don't need CUDA currently.

@izahn
Copy link
Author

izahn commented Sep 27, 2021

The basic problem I've had when trying to make a cuda-enabled build based on the conda-forge recipe is that bazel stubbornly refuses to use the host system cuda headers. The error usually looks like

The include path '/usr/local/cuda/include' references a path outside of the execution root

@xhochy if you have any ideas about that it would be appreciated!

@izahn izahn mentioned this pull request Oct 4, 2021
@isuruf
Copy link
Member

isuruf commented Oct 4, 2021

bazel stubbornly refuses to use the host system cuda headers.

One option is to just copy the headers to ${PREFIX}/include and delete them at the end. (And also remove -I/usr/local/cuda/include` from CFLAGS, CPPFLAGS, CXXFLAGS)

@izahn
Copy link
Author

izahn commented Oct 5, 2021

bazel stubbornly refuses to use the host system cuda headers.

One option is to just copy the headers to ${PREFIX}/include and delete them at the end. (And also remove -I/usr/local/cuda/include` from CFLAGS, CPPFLAGS, CXXFLAGS)

I'll be more than happy if someone wants to try this, but I've already spent more time than I care to admit working on this. I have an alternative that works (see #134 ) and I don't plan to spend more time on this approach. I'm closing this PR, but I'll leave my branch up in case someone else wants to give it a shot.

@izahn izahn closed this Oct 5, 2021
@hmaarrfk hmaarrfk mentioned this pull request Oct 10, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants