-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New test failures with 2.1.4 #43
Comments
So the first set of errors is expected if there is no GPU. It should return 'nan' in the solution arrays in this case. The second set of errors are because SCS has been installed without blas / lapack support, which is needed to solve SDPs. You should be able to copy what I did because I finally managed to get windows to install blas / lapack correctly in github actions here. |
I understand that this is the current status, though I still think it would be very good to check the runtime availability of GPU drivers. I googled this briefly, and it seems like the forthcoming @kkraus14 (since you co-wrote that blogpost), any hint or even timeline when that might be released (or any other tips)?
I'm aware of this of course, but some change in the infrastructure between 2.1.3 & 2.1.4 invalidated the existing conda-recipe (which is not a big deal, I'll fix it). Will update as soon as I have more. |
OK, I get the same failures as in CI if there's physical GPU hardware... This is from downloading an artefact from here, installing it with
|
When I try to install the linux artifact I get an error:
However I can install the GPU code from source and it passes the tests on linux. Just eyeballing your error message I am guessing that it's something to do with different widths for the integer or floating point types that the CUDA library is using that are conflicting with the SCS widths. I see in your trace that you have |
OK sorry, I forgot that you might not have conda-forge configured as a channel yet. In that case, please try:
This should fix the resolver errors. |
Ok that worked, but I can't The other tests pass though. |
Can you check (e.g. using Sidenote: it's possible that the compiled scs shouldn't depend on the CUDA version. I'm not good enough at these things yet to know for sure, so I tend to compile per cuda version. |
I matched up the CUDA versions (11.1) but I still can't import _scs_gpu after creating the environment. I see these files in the unzipped artifact:
so it looks like it should work, but for some reason _scs_gpu does not get installed correctly. |
Thanks for checking. Is there any error message for importing Aside from setting up the required things, the installation is as trivial as I managed to keep it - basically comes down to: |
Ah, sorry, just saw this part only now. Is there a reason to not use 64bits on GPU? I assume that the width is 64bits there as well... |
I see the issue now, for some reason conda is using conda-forge to install scs. It sees the new 'channel' but isn't using it. GPUs are much better at 32 bit operations and I think by default usually CUDA is 32 bit ints, not sure about floats. |
Can you explain what you mean by that? Conda-forge does its own builds (using its own infra & dependencies & envs) by design. All necessary steps need to be made explicit, in order to achieve reproducibility & portability...
I also deal with packaging other GPU-packages (e.g. faiss), and I'm not aware of this (but then, they maybe just build all variants by default). |
I don't know very much about conda, when I enter this: I see scs appear as:
which I assumed meant it was coming from conda-forge, rather than the channel I'm specifying. About the 32 bit ints see top comment here. That might not be the issue though, just a possibility. |
Ah, now I understand you. It's possible that the path is not correct (e.g. you could be missing a |
OK I've tried every which way now and I cannot get it to install from the channel, I'm presumably doing something wrong so I don't want to waste any more of your time with it! |
One more thought (sorry for all the hassle, and thanks so much for your patience) - could you try adding |
I tried
My guess is somehow the packages are inconsistent somewhere, and when the strict channel priority is off it falls back to using conda-forge which for some reason works (though it has the same version by the look of it). |
ok, it's weird (classic "works on my machine"), but I accept that we're stuck 😅 |
Haha, thanks for your help anyway! It's probably something wrong with my setup somewhere. |
One more thought about this @bodono - which distribution are you on, resp. what version of glibc is running there? If it's below 2.17, this might be a reason for the conflicts. Probably the artefacts are gone from azure by now (and you'll have probably deleted yours), but I had forgotten to ask about the details from the |
It potentially might have changed since I tried to run these, but I'm currently running:
|
Hey @bodono, thanks so much for the updates in 618962c and the release of 2.1.4!
Unfortunately, there seem to be a couple of new errors (also in the CI here, which really should upgrade its python version though, see #42) - see this CI run of conda-forge/scs-feedstock#21.
One aspect is that the determining factor in https://github.com/bodono/scs-python/blob/master/test/test_solve_random_cone_prob.py is whether
import _scs_gpu
succeeds - as it turns out, this does succeed even without a GPU, so the test suite for the GPU-enabled library ends up failing in conda-forge CI. Note this is not so terrible, I could skip those tests for GPU builds, but perhaps it would be still nice to have more robust runtime-detection of GPU-hardware.Still even though the following tests are run with
gpu=True
on a machine without actual GPU hardware, the errors actually don't look that they're related to that (AFAICS, basically hidingERROR_CUDA: no CUDA-capable device is detected
but still producing results - likely garbage?):Details of test-suite failures on linux (GPU enabled lib running on machine with no GPU)
For windows, the failures are more extensive for the GPU-less builds (interestingly, the test failures of windows + GPU are the same as under linux + GPU).
Details of test suite failures on windows
The text was updated successfully, but these errors were encountered: