WIP: Cuda again #137

hmaarrfk · 2021-10-10T16:31:25Z

Just testing cuda recipes on CIs

My TODO list:

Add cuda dependencies to recipe (follow CUDA build #120)
Add cuda migrations
Double check python dependencies
Double check that cuda dependencies have all been followed through in the many sub recipes
Update build files to add cuda flags.
Many other steps I am forgetting
Build cuda packages that timeout locally.

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

…a-forge-pinning 2021.10.08.16.45.02

conda-forge-linter · 2021-10-10T16:31:29Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

izahn · 2021-10-10T17:57:17Z

recipe/build.sh

+    fi
+
+    # cuda builds don't work with custom_toolchain, instead we hard-code arguments, mostly copied
+    # from https://github.com/AnacondaRecipes/tensorflow_recipes/tree/master/tensorflow-base-gpu


This is part of the approach that is likely to be rejected based on the discussion in #134

Note also that if you just want a working cuda version for Linux I have that already in https://github.com/izahn/tensorflow-feedstock/tree/cuda_2.6.0 , https://github.com/izahn/tensorflow-feedstock/tree/cuda_2.5.1, and https://github.com/izahn/tensorflow-feedstock/tree/cuda_2.4.3 . Python 39 packages are available at https://anaconda.org/izahn/repo. It looks like these won't make it into conda-forge for the reasons discussed in #134 but they work fine for me.

I consider using different BUILD_OPTS (up to & including the custom toolchain) to be fair game. Perhaps I misread #134 at the time (if so, I apologise), but this doesn't looks too invasive.

But if you are using the custom toolchain most if this is redundant, or overrides the overrides specified in the custom toolchain.

isuruf · 2021-10-10T19:24:07Z

recipe/meta.yaml

@@ -106,7 +115,11 @@ outputs:
      build:
        - {{ compiler('c') }}
        - {{ compiler('cxx') }}
+        - {{ compiler('cuda') }}  # [cuda_compiler_version != "None"]


This needs to be added to the top level build requirements in addition to all the outputs.

There is a way to avoid duplicating dependencies. I think you have to name the top level package the same as the "first package".

I'm not sure if this is a feature or bug, but I use it in:
https://github.com/conda-forge/opencv-feedstock/blob/master/recipe/meta.yaml#L56

Would it be acceptable to use it here too?

…a-forge-pinning 2021.10.08.16.45.02

recipe/build.sh

izahn · 2021-10-10T21:38:54Z

Add cuda dependencies to recipe (follow CUDA build #120)

An updated version for 2.6.0 is available at https://github.com/izahn/tensorflow-feedstock/tree/cuda_2.6.0

Co-authored-by: Isuru Fernando <[email protected]>

…a-forge-pinning 2021.10.10.22.03.30

hmaarrfk · 2021-10-11T17:03:44Z

Everytime I feel like I understand programming a little better, new frameworks leave me thinking that it is all jargon again.

While I feel like I should be able to understand what this error is about, I just can't make sense of it
https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=390026&view=logs&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=841356e0-85bb-57d8-dbbc-852e683d1642&l=796

wolfv · 2021-10-27T15:32:40Z

hm, we (me and @DerThorsten) were looking at this again, and we might have gotten a tiny bit closer.

So basically we got it to compile to a similar level as what you have where it fails at a GPU target.
Basically it seems to try to invoke the GCC compiler where it should use nvcc.

In Tensorflow (the generated bazelrc) they seem to reference the following toolchain for linux-nvcc compilation: https://github.com/tensorflow/addons/blob/master/build_deps/toolchains/gcc7_manylinux2010-nvcc-cuda11/cc_toolchain_config.bzl

This toolchain also comes with a magic compiler wrapper in Python: https://github.com/tensorflow/addons/blob/334cd7ca8fb944aab38164a13d7d2203d7c39605/build_deps/toolchains/gcc7_manylinux2010-nvcc-cuda11/clang/bin/crosstool_wrapper_driver_is_not_gcc#L258-L278

That wrapper seems to either invoke nvcc or the host compiler.

I am not sure (and not a bazel expert at all!) but maybe we're overruling this toolchain with our custom toolchain (https://github.com/conda-forge/tensorflow-feedstock/tree/master/recipe/custom_toolchain) ??

Maybe @xhochy knows a bit about this?

One idea would be to try the compiler wrapper thing that is in the other toolchain and see if we can log something interesting.

wolfv · 2021-10-27T15:36:17Z

We also found it's quite nice to reproduce the error when just compiling the single target and removing the purge:

# bazel clean --expunge
# bazel shutdown
./configure

bazel ${BAZEL_OPTS} build ${BUILD_OPTS} //tensorflow/core/kernels:quantize_and_dequantize_op_gpu

exit 0

And lastly I've written a very simple script to enter into the docker container from a build_locally.py build:

first commit the container using docker commit <hash> tf_snapshot and then run

FEEDSTOCK_ROOT="$(pwd)"
RECIPE_ROOT="${FEEDSTOCK_ROOT}/recipe"
DOCKER_IMAGE=tf_snapshot

echo "PWD: ${FEEDSTOCK_ROOT} :: ${RECIPE_ROOT}"

docker run -it \
           -v "${RECIPE_ROOT}":/home/conda/recipe_root:rw,z,delegated \
           -v "${FEEDSTOCK_ROOT}":/home/conda/feedstock_root:rw,z,delegated \
           tf_snapshot \
           /bin/bash

And that get's a proper interactive session going. I think automating this might be interesting to add into conda-smithy :)

wolfv · 2021-10-27T20:18:42Z

Sooo ... i think I got ... something to work!

I copied the crosstool_wrapper_driver_is_not_gcc into our custom toolchain, and edited the toolchain to call it instead of gcc directly here:

tensorflow-feedstock/recipe/custom_toolchain/cc_toolchain_config.bzl

Line 21 in a904b47

path = "${GCC}",

I also hard coded the path to GCC in the crosstool_wrapper_driver_is_not_gcc tool (https://github.com/tensorflow/addons/blob/334cd7ca8fb944aab38164a13d7d2203d7c39605/build_deps/toolchains/gcc7_manylinux2010-nvcc-cuda11/clang/bin/crosstool_wrapper_driver_is_not_gcc#L41-L42)

I will try a full build now, let's see what happens :)

@xhochy @hmaarrfk just trying to get a sanity check if you think this makes sense at all? I am really no expert in bazel...

hmaarrfk · 2021-10-27T21:26:38Z

I think you are on the right track.

I figured it was going to be something related to diving into their hard coded paths.

A solution might involve symlinking, or patching.

I'm very excited to get this working!

wolfv · 2021-10-27T21:40:39Z

I can find this file also in the tensorflow sources: https://github.com/tensorflow/tensorflow/blob/master/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl

We might want to somehow wire it up from there ...

wolfv · 2021-10-27T21:40:53Z

PS. Building the package now ... let's see how far it goes :)

xhochy · 2021-10-28T05:13:34Z

Sooo ... i think I got ... something to work!

I copied the crosstool_wrapper_driver_is_not_gcc into our custom toolchain, and edited the toolchain to call it instead of gcc directly here:

tensorflow-feedstock/recipe/custom_toolchain/cc_toolchain_config.bzl

Line 21 in a904b47

path = "${GCC}",

I also hard coded the path to GCC in the crosstool_wrapper_driver_is_not_gcc tool (https://github.com/tensorflow/addons/blob/334cd7ca8fb944aab38164a13d7d2203d7c39605/build_deps/toolchains/gcc7_manylinux2010-nvcc-cuda11/clang/bin/crosstool_wrapper_driver_is_not_gcc#L41-L42)

I will try a full build now, let's see what happens :)

@xhochy @hmaarrfk just trying to get a sanity check if you think this makes sense at all? I am really no expert in bazel...

This makes sense. Bazel only supports a single compiler for C/C++/CUDA/... Thus if you want to use different compilers for different languages, you need a wrapper script that does the decision.

wolfv · 2021-10-28T05:33:43Z

@xhochy I think #157 works :)

hmaarrfk added 3 commits October 10, 2021 12:23

Update meta file for CUDA

0794840

Add cuda migrations

930524a

MNT: Re-rendered with conda-build 3.21.4, conda-smithy 3.12, and cond…

d2139fc

…a-forge-pinning 2021.10.08.16.45.02

hmaarrfk requested review from FarhanTejani, ghego, gilbertfrancois, h-vetinari, hajapy, jschueller, njzjz, waitingkuo and xhochy as code owners October 10, 2021 16:31

hmaarrfk added 2 commits October 10, 2021 13:46

Try to build

86e6a14

Add bazel clean

e7f4e3b

izahn reviewed Oct 10, 2021

View reviewed changes

hmaarrfk added 2 commits October 10, 2021 14:55

echo cuda home

de01110

Set default value for cuda_home

d9980f5

isuruf reviewed Oct 10, 2021

View reviewed changes

hmaarrfk added 3 commits October 10, 2021 15:28

Export CUDA_HOME correctly

fc31e77

Only build for cuda=11.2 and py=3.9

4334d7a

MNT: Re-rendered with conda-build 3.21.4, conda-smithy 3.12, and cond…

0a2d620

…a-forge-pinning 2021.10.08.16.45.02

isuruf reviewed Oct 10, 2021

View reviewed changes

recipe/build.sh Outdated Show resolved Hide resolved

hmaarrfk and others added 4 commits October 10, 2021 19:46

Update recipe/build.sh

4dab5e8

Co-authored-by: Isuru Fernando <[email protected]>

Remove many options

2c2e44f

Build normal build too as a control

e9a739b

MNT: Re-rendered with conda-build 3.21.4, conda-smithy 3.12, and cond…

1cc092f

…a-forge-pinning 2021.10.10.22.03.30

hmaarrfk force-pushed the cuda branch from 278de58 to 1cc092f Compare October 11, 2021 13:47

Remove clang

6f9474d

hmaarrfk added 3 commits October 11, 2021 09:59

Remove cuda110 migration

fea943d

MNT: Re-rendered with conda-build 3.21.4, conda-smithy 3.12, and cond…

0ff32f2

…a-forge-pinning 2021.10.10.22.03.30

Try to build

7bda18c

hmaarrfk closed this Oct 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Cuda again #137

WIP: Cuda again #137

hmaarrfk commented Oct 10, 2021 •

edited

Loading

conda-forge-linter commented Oct 10, 2021

izahn Oct 10, 2021

h-vetinari Oct 10, 2021

izahn Oct 11, 2021

isuruf Oct 10, 2021

hmaarrfk Oct 10, 2021

hmaarrfk Oct 10, 2021

izahn commented Oct 10, 2021

hmaarrfk commented Oct 11, 2021

wolfv commented Oct 27, 2021

wolfv commented Oct 27, 2021

wolfv commented Oct 27, 2021

hmaarrfk commented Oct 27, 2021

wolfv commented Oct 27, 2021

wolfv commented Oct 27, 2021

xhochy commented Oct 28, 2021

wolfv commented Oct 28, 2021

WIP: Cuda again #137

WIP: Cuda again #137

Conversation

hmaarrfk commented Oct 10, 2021 • edited Loading

conda-forge-linter commented Oct 10, 2021

izahn Oct 10, 2021

Choose a reason for hiding this comment

h-vetinari Oct 10, 2021

Choose a reason for hiding this comment

izahn Oct 11, 2021

Choose a reason for hiding this comment

isuruf Oct 10, 2021

Choose a reason for hiding this comment

hmaarrfk Oct 10, 2021

Choose a reason for hiding this comment

hmaarrfk Oct 10, 2021

Choose a reason for hiding this comment

izahn commented Oct 10, 2021

hmaarrfk commented Oct 11, 2021

wolfv commented Oct 27, 2021

wolfv commented Oct 27, 2021

wolfv commented Oct 27, 2021

hmaarrfk commented Oct 27, 2021

wolfv commented Oct 27, 2021

wolfv commented Oct 27, 2021

xhochy commented Oct 28, 2021

wolfv commented Oct 28, 2021

hmaarrfk commented Oct 10, 2021 •

edited

Loading