-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage cuda-python for GPU detection #52
Comments
I think it's easier to do this in the C code rather than python. I have created this PR which should make SCS fail cleanly if there is no gpu availble: https://github.com/cvxgrp/scs/pull/181/files Are you able to patch this in and test? |
Sorry for the delayed response. I applied the patch in conda-forge/scs-feedstock#21, but the test suite still segfaults (both on linux & on windows)... |
Are we sure it's to do with the gpu? Just looking at this:
I don't see |
That's because it's not part of the test recipe
However, I'm sure it has to do with the GPU build/code paths somehow, because the test suite for the CPU version passes. |
This is very strange, I don't understand how just building the gpu version could break like this. Is it all platforms (linux, mac, windows)? |
For context, each of the |
There are not GPU builds for mac in conda-forge, but for linux & windows, the GPU builds are broken when trying to run the test suite (the imports work fine), while everything runs through for the CPU builds. To a degree (which this issue is about), this is to be expected, because the Azure CI that conda-forge uses does not have actual GPUs. So at runtime, if a GPU-enabled package tries to access a GPU that is not there, things fail. Hence the desire to add device detection so that GPU builds don't crash if there's no physical hardware. |
That's what I'm confused by, the Previous behavior with no GPU:
New behavior with no gpu:
|
Do you mean the package imports here? As I said above, the imports work (for the GPU builds, even on an agent without a GPU), but the test suite fails. In any case, great to hear that the failure should now be more gracious! I'm guessing these changes haven't made it to the repo(s) yet? |
By test suite do you mean running It should never seg fault with or without a gpu (even before the latest change to make the failing more graceful), it's very strange and it makes me thing something else weird is going on. |
With testsuite I mean running the equivalent** of ** slight adaptation because the test-folder is not packaged in the same way as the package itself, but for basically all intents and purposes it should be the same as running the tests in the source tree. |
Ok I understand now, that does `import _scs_gpu'. Still, there shouldn't be a seg fault even without a gpu so I'm not sure what's going on here. |
OK cool, glad we're on the same page now
I still have artefact persistence switched on in conda-forge/scs-feedstock#21. You could try again to download an appropriate artefact, unpack it, and then install it into an environment. If we can get past the resolver errors this time, then you could have a closer look at what's happening... 🙃 |
to recall;
|
@bodono, I've tried again for 3.1.0, and Could we give it another shot with you installing one of the artefacts? I think the setup has hopefully improved enough now that you should be able to install it (the last CI run on that PR has green CI because I switched off the failing test suite so that the artefacts are more easily installable) - the instructions in the previous comment remain correct. |
I'm looking at this now. Two issues:
|
Can you try On the artefact side, not all cuda versions support the relevant gcc versions, which are therefore mixed in and "pollute" the build string. Can you tell me which cuda/python version you need - it's possible to look up from the logs, but a bit tedious. For Windows, it should already be visible |
Using the absolute path worked. After installing and activating the environment I navigated to the scs-python directory and ran
|
Hmm, on my linux machine when I try to import _scs_gpu I get
|
Can you verify using |
Got it, looks like it's using scs from conda-forge:
Presumably it's because I'm using the wrong artifact. I'm in a linux machine with |
You need one of the builds that says cuda 11.2, for example this one (this is for python 3.9). The build variant is not fully visible in the artefact name, but it is visible in the job overview, which should also lead to the right artefact. Barring that, try the 19th one down on the artefact overview page. |
For context, 11.2 is actually 11.2+ (i.e. compatible with all later minor versions of cuda 11) |
You can probably also "fail faster" with the wrong artefacts by using strict channel priority:
Which is the recommended default anyway... |
Yes I found that page, but when I click on the '1 artifact produced' link it just brings me to the page of all the artifacts and I couldn't figure out which one corresponded to the link I had clicked. Anyway, with the 19th artifact down and using strict channel priority (both specifying and not specifying
I think the main way to use SCS is the direct CPU solver, the GPU solver is a bit niche and in many (most?) cases is actually slower than the direct solver for the time being. With that in mind maybe we should pause on this for now? |
It's possible that either you or I miscounted, or that the order on the artefact page is not the same as for the jobs. In any case it seems that you got the pypy build rather than the one for cpython 3.9. Could you maybe have a look at the windows side of things for the time being - there the artefacts should be named unambiguously. Once I have access to a computer again, I'll update the PR so that also the Linux builds have artefact names that are decipherable. I'm not in an urgent hurry to get this done, but it's still something that I think plays to conda-forge's strength, and it would be good to have sorted out. Presumably over time, the GPU variant will have some aspects where it outperforms the CPU version |
Can you post a link to the exact artifact I should use? You can get it on the right-hand side menu |
Sorry for the delay, didn't have a laptop available for a while. This download is for linux/x86 + python=3.9 + cuda>=11.2 - could you give it a try? 🙃 |
Sorry for the delay. I'm still getting UnsatisfiableError with that exact artifact:
|
On which system are you running, and what's your current glibc version? |
It's OK, we're not in a hurry 🙃 |
I'm running a fork of Debian, and glibc 2.33: └──[ins] => ldd --version |
After the 3.0.0, I tried to redo conda-forge/scs-feedstock#21, but the problems with running the test suite remain. In particular, the GPU builds segfault when there's no GPU hardware (as happens in the conda-forge CI).
Very recently, the new python-wrappers for cuda from NVIDIA reached general availability, and this would presumably be an excellent tool to use to determine dynamically whether the GPU can actually be used.
@bodono, what do you think about adding a check (possibly conditional on its availability) that the GPU tests are only run in the drivers & GPU can be found?
The text was updated successfully, but these errors were encountered: