Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC: Add CUDA support for aarch64/ppc64le #2001

Closed

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Oct 11, 2021

This includes aarch64 & ppc64le in the CUDA build matrices, which should allow packages that build on these other architectures to also support CUDA as well. Leverages existing CUDA migrations to add this content in.

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Fixes #725

xref: conda-forge/cudatoolkit-feedstock#64

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham
Copy link
Member Author

Would be great to get some feedback on this approach. Happy to adjust if there's a better way to do this 🙂

cc @isuruf @jaimergp @kkraus14 @leofang @Ethyling @raydouglass

recipe/migrations/cuda110.yaml Outdated Show resolved Hide resolved
Comment on lines 39 to +44
- quay.io/condaforge/linux-anvil-cuda:11.1 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64"]
- quay.io/condaforge/linux-anvil-ppc64le-cuda:11.1 # [linux and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-aarch64-cuda:11.1 # [linux and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-cuda:11.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64"]
- quay.io/condaforge/linux-anvil-ppc64le-cuda:11.2 # [linux and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-aarch64-cuda:11.2 # [linux and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John, could you remind me why we needed CUDA 11.1+ in this file? Don't we also have another migrator cuda111_112.yaml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this was needed so we could add CentOS 7 support ( #1232 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's unrelated. Wouldn't worry about it 🙂

Comment on lines -52 to +61
- 7 # [linux64]
- 8 # [linux64]
- 7 # [linux]
- 8 # [linux]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we need to build cuDNN for aarch64/ppc64le first before kicking off the migrator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. How have we handled cudnn before on new platforms? Would think that cudnn would be migrated before any package that depends on it, but it is always possible things get jammed up before it gets there

@isuruf
Copy link
Member

isuruf commented Oct 12, 2021

Changing the migrators will not make the bot issue PRs.

@jakirkham
Copy link
Member Author

Right and that's a question that I've been thinking about as well. Do we want to start a migration or do we want to just migrate a few packages based on need. There's probably a short list of things we would really want to see built out. After that it seems reasonable to leave it up to the feedstocks to make a decision. This might keep the CI load lower if that's a concern.

Though don't have strong feelings here. Am interested to hear other thoughts 🙂

Comment on lines +119 to +120
- cos7 # [linux64 and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- cos7 # [linux64 and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be linux and ... instead of linux64 and ...

@@ -126,6 +128,8 @@ docker_image: # [os.environ.get("BUILD_PLATFOR

- quay.io/condaforge/linux-anvil-cuda:10.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64" and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos6"]
- quay.io/condaforge/linux-anvil-cos7-cuda:10.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64" and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-ppc64le-cuda:10.2 # [linux and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-aarch64-cuda:10.2 # [linux and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not support CUDA 10.2 for aarch64. Should we use quay.io/condaforge/linux-anvil-aarch64-cuda:11.0 instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is trickier than it seems as all the other architectures are using CUDA 10.2. So we would need to drop CUDA 10.2 in order to change to CUDA 11.0. Should add I'm ok with doing that and have checked with others to make sure it won't be too disruptive, but there are some other challenges. Basically we would want to get PR ( #1708 ) in, which has some blockers atm that have been difficult to address around aligning compiler versions

@@ -31,14 +31,22 @@ __migrator:
- quay.io/condaforge/linux-anvil-cos7-cuda:10.1 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64"]
- quay.io/condaforge/linux-anvil-cuda:10.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64"]
- quay.io/condaforge/linux-anvil-cos7-cuda:10.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64"]
- quay.io/condaforge/linux-anvil-ppc64le-cuda:10.2 # [linux and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-aarch64-cuda:10.2 # [linux and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not support CUDA 10.2 for aarch64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Was look at ppc64le

Yeah starting with 11.0 makes sense. Maybe we can just do that with both. Thoughts @jaimergp?

We've been thinking about dropping the CUDA versions pre-11.0 anyway ( #1708 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, totally fine.

Comment on lines 39 to +44
- quay.io/condaforge/linux-anvil-cuda:11.1 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64"]
- quay.io/condaforge/linux-anvil-ppc64le-cuda:11.1 # [linux and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-aarch64-cuda:11.1 # [linux and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-cuda:11.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64"]
- quay.io/condaforge/linux-anvil-ppc64le-cuda:11.2 # [linux and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-aarch64-cuda:11.2 # [linux and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- 10.2 # [linux64]
- 11.0 # [linux64]
- 10.2 # [linux]
- 11.0 # [linux]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not support CUDA 10.2 for aarch64

@@ -72,10 +81,14 @@ docker_image: # [os.environ.get("BUILD_PLATFOR

- quay.io/condaforge/linux-anvil-cuda:10.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64" and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos6"]
- quay.io/condaforge/linux-anvil-cos7-cuda:10.2 # [linux64 and os.environ.get("BUILD_PLATFORM") == "linux-64" and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-ppc64le-cuda:10.2 # [linux and ppc64le and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
- quay.io/condaforge/linux-anvil-aarch64-cuda:10.2 # [linux and aarch64 and os.environ.get("DEFAULT_LINUX_VERSION", "cos6") == "cos7"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@h-vetinari
Copy link
Member

Just FYI, it seems that a separate PR for PPC is progressing independently from this PR #2527.

@leofang
Copy link
Member

leofang commented Feb 28, 2022

Just FYI, it seems that a separate PR for PPC is progressing independently from this PR #2527.

I am so confused... It seems things are treated in different ways between this and that PR...

@isuruf
Copy link
Member

isuruf commented Mar 7, 2022

As I said in #2001 (comment), this PR is wrong

@jakirkham
Copy link
Member Author

Superseded by PR ( #2527 )

@jakirkham jakirkham closed this Mar 22, 2022
@jakirkham jakirkham deleted the add_cuda_arch branch March 22, 2022 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA on ppc64le / aarch64
7 participants