-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cutlass v3.1.0 #9
cutlass v3.1.0 #9
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
@conda-forge-admin, please rerender |
Ah:
Not sure this is going to work with CUDA 11. |
c6ecb04
to
c10c495
Compare
…nda-forge-pinning 2023.07.30.16.32.56
df14740
to
62477f1
Compare
@jakirkham @leofang @ngam @hmaarrfk
though it could of course also be related to the version bump. Would it make sense to trim this list a bit? (how?) If that's not possible - would someone have time / resources to build this locally? |
I’d say trimming the arches is okay, but I am not sure if people/developers use this package in ways more specialized. Having said that, this is already a pretty restrictive arch list…last time I thought about this, I settled on 60,70,75,80,86, 89,90,90a for cuda 12 (at least that’s what I will propose for conda-forge to use with jaxlib, tensorflow, and pytorch) By the way, the build progress with these cuda-enabled packages can be misleading (usually not in our favor). I will look into this soon if no response from more involved developers/maintainers 😺 |
I was personally thinking of reviving a 1060 GPU of mine as a benchmarking compute.r I guess that is a bad idea. We kinda went through the architecture trimming before, and found that we would have to trim way too much for it to be acceptable. Maybe we should start building packages on a per GPU generation level? |
Yeah but AFAIK we have no metadata to select the right generation through virtual packages or similar. We only have the driver version, but that doesn't map 1:1 to architectures. |
I mean, i feel like as more packages start to take longer and longer on CIs, we have to find an other solution. I'm also not a fan of the fat binaries, they are just slow to download.... |
I agree with you, but I'm asking how you imagine that would work? Perhaps conda needs a virtual package for the cuda arch of the machine? |
yeah. but you would have to make the assumption that there is only one kind of GPU. I feel like this is a "fair assumption". Mixing GPUs on one machine seems like a bad idea.... |
in that same vain, i'm not sure how we are in targetting newer CPU x86-64 architectures. I feel like we target quite old instruction sets and might benefit from a bump there too. Again, this may blow up the build matrix. |
Re trimming… Yep, it was me instigating a fight on that front a while back and we essentially discovered we likely needed to go down the single arch route. I can see us potentially supporting ensembles of these arches. The problem is, we likely need more insight from practitioners for this to practical and usedul. When I was heavily involved in this a year ago, I privately started building for a single CPU (say epyc xyz with all its recommended flags) and target only A100 GPUs (with all its recommended flags). I soon discovered that the HPCs I have access to had a stupid setup… compute nodes had A100s, so-called viz nodes had V100s, and yet still some miscellaneous nodes had K80s. You get the idea of how this can be problematic at least on one (small-ish) side of use cases. The other thing is, when compiling and training models, we have to be careful about interchangeability. Maybe the new keras paradigm can help with that, but likely not… |
There might be a way to optimize the compilation across arches (caching and reusing stuff, etc.) but I don’t know enough |
Anyway for this particular one, let’s figure our appropriately setting the arches and have someone build it. It’s only one binary, so not the worst… I guess I really should get my stupid singularity PR submitted again to get this going as we get more packages done for 12 |
Yeah, this has been long overdue, but has stalled for a long time. Though there has been movement recently: conda/ceps#59 |
Yeah, with this feedstock it's really just the wait for the compilation (🤞) |
Thanks all! 🙏
Would it make sense to convert this into a Conda issue for further discussion? |
I feel like I don't have a strong ask yet to keep the discussion focused |
Think that is ok. There's value in tracking the general need. Plus we can refine the ask into actionable steps through discussion |
Thanks a lot @hmaarrfk! |
So a threadripper 2950 isn't the best processor, but it still shows about 14 hours of CPU time.... |
Yeah, it seems that cutlass is a big baby... |
It is very likely that the current package version for this feedstock is out of date.
Checklist before merging this PR:
license_file
is packagedInformation about this PR:
@conda-forge-admin,
please add bot automerge
in the title and merge the resulting PR. This command will add our bot automerge feature to your feedstock.bot-rerun
label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase@conda-forge-admin, please rerun bot
in a PR comment to have theconda-forge-admin
add it for you.Closes: #6
Closes: #7
Closes: #10
Closes: #11
Dependency Analysis
We couldn't run dependency analysis due to an internal error in the bot. :/ Help is very welcome!
This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/5073530612, please use this URL for debugging.