Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Download IDAKLU from pybammsolvers #4487

Draft
wants to merge 27 commits into
base: develop
Choose a base branch
from

Conversation

kratman
Copy link
Contributor

@kratman kratman commented Oct 3, 2024

Description

This will separate the IDAKLU C++ code from pybamm.

Fixes #3564

Type of change

This should speed up CI by skipping the build of the C++ code.

  • Optimization (back-end change that speeds up the code)

Key checklist:

  • No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python run-tests.py --all (or $ nox -s tests)
  • The documentation builds: $ python run-tests.py --doctest (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ python run-tests.py --quick (or $ nox -s quick).

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@kratman kratman self-assigned this Oct 3, 2024
@kratman
Copy link
Contributor Author

kratman commented Oct 3, 2024

A new link error cropped up, but it looks like we could get a lot of savings on time with this update.

Edit: Most of the run time appears to be in the integration tests, so unfortunately the time savings are not as good as I would have hoped.

@agriyakhetarpal
Copy link
Member

The linkage error is the same one as #3783, coming from CasADi's plugin system. I am not sure if it's worth fixing it, since it was fixed by @martinjrobins for the linear interpolant case by dropping down to Python but IIRC there wasn't a way in CasADi for doing it for the cubic

@kratman
Copy link
Contributor Author

kratman commented Oct 3, 2024

@agriyakhetarpal Yeah I was looking at that issue as well. As far as I can tell CasADI sets a path for plugins. I am trying to see if there is a decent workaround since this was part of #4464

My guess is that the wheels for the next release will be broken as well, but I have not confirmed it yet

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Oct 3, 2024

There is a workaround for Linux and macOS, but not for Windows (different toolchain); sadly, it's not decent enough to include. I think I'll raise a PR upstream in CasADi to get one part of the linkage going and see if we can migrate to a non-MSVC toolchain (which can potentially help provide that workaround for this on Windows later on). It's been on my list of things to do for a while, but I've yet to do it.

@kratman
Copy link
Contributor Author

kratman commented Oct 3, 2024

This is fixed locally with this: export CASADIPATH=.venv/lib/python3.12/site-packages/casadi

Copy link

codecov bot commented Oct 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.66%. Comparing base (e17b549) to head (baf25df).
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4487      +/-   ##
===========================================
- Coverage    99.25%   98.66%   -0.59%     
===========================================
  Files          302      302              
  Lines        22897    22896       -1     
===========================================
- Hits         22726    22591     -135     
- Misses         171      305     +134     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@agriyakhetarpal
Copy link
Member

This is fixed locally with this: export CASADIPATH=.venv/lib/python3.12/site-packages/casadi

Yes, won't work with Windows

@martinjrobins
Copy link
Contributor

martinjrobins commented Nov 6, 2024

I had a look at this. The linker error is the same as I came across for the case of linear interpolation. The solution there was to swap to using the direct casadi function rather than their plugin system, which won't work if the casadi function is evaluated in C++ for windows as we compile everything statically.

I think I might be able to access the direct bspline interface by calculating the spline coefficients in scipy and then use the casadi Function.bspline function to construct a bspline. Cross fingers this doesn't use the plugin system anywhere! Going to try this out in #4570

@kratman
Copy link
Contributor Author

kratman commented Nov 6, 2024

I had a look at this. The linker error is the same as I came across for the case of linear interpolation. The solution there was to swap to using the direct casadi function rather than their plugin system, which won't work if the casadi function is evaluated in C++ for windows as we compile everything statically.

I think I might be able to access the direct bspline interface by calculating the spline coefficients in scipy and then use the casadi Function.bspline function to construct a bspline. Cross fingers this doesn't use the plugin system anywhere!

Yeah I was going to approach this by seeing if I could just change the build itself. It is something that should work if we are compiling and delivering everything correctly. If that does not work, then I will look at workarounds for interpolation

@kratman
Copy link
Contributor Author

kratman commented Nov 6, 2024

I expect to work on this again next week, I have been caught up with other stuff

@martinjrobins
Copy link
Contributor

looks like there is still issues with the idaklu jax solver on windows, I can look into these?

@kratman
Copy link
Contributor Author

kratman commented Nov 21, 2024

@martinjrobins Sure if you want to look at it you are more than welcome. I am hopefully going to be able to take another look this evening

I recently got a Windows laptop so I could start looking into this stuff locally. Most of my commits to this branch recently have been me testing things for the release as I have been focused on getting that out the door

@martinjrobins
Copy link
Contributor

martinjrobins commented Nov 21, 2024

I tried to figure this one out today but no luck :( It's crashing with a fatal exception when jax tries to jit compile, I'm still in the dark as to why. It might be a threading issue as the problem is intermittant (occurs in about 95% of test runs). It might be triggered by some interaction with pytest because when I copy the test into a stand-alone script it works fine

@agriyakhetarpal
Copy link
Member

For a stopgap solution, we can isolate these tests into their own xdist_group and allow only one worker to touch them at a time.

@kratman
Copy link
Contributor Author

kratman commented Nov 21, 2024

For a stopgap solution, we can isolate these tests into their own xdist_group and allow only one worker to touch them at a time.

Yeah that is my fallback option.

I want to take a closer look at the linking/delivery as well. We have failures on windows when you download the wheels:

  • pybammsolvers has some crashed workers
  • my i5 (without AVX-512 instructions) has ~45 test failures on both 24.9.0 and 24.11.0 when running tests with the wheels
  • A colleague's i7 (with AVX-512 instructions) has ~25 test failures on both 24.9.0 and 24.11.0 when running tests with the wheels

So it appears that the tests are working when you test in the build environment, but not in a different environment. I will be digging into this more and see what I come up with

@jsbrittain
Copy link
Contributor

Hi - I took a very quick look at this yesterday and agree that it seems to be a threading issue. More specifically, jaxify() can only be called once per solver instance (this is the first test), which then caches the full solve result so the jax-wrapper can query samples without repeatedly re-running the solver. My suspicion is that running these tests in parallel is causing test pollution, probably because the test script currently instantiates the solver and jax wrapper objects at the start of the test script, not as a fixture for each test (although the ubuntu tests should also be failing?). Refactoring the tests with fixtures would be good to see if that resolves things - I can take a look at that if you like but if I'm right then the xdist_group solution should also work if you need a quick fix. I can't remember the precise details as to why we can't jaxify more than once per object, but I do remember that it was more complex than just the cache issue (something to do with the jax primitives...).

@martinjrobins
Copy link
Contributor

martinjrobins commented Nov 22, 2024

Just a small note to say that it is not just a matter of running the tests in serial to make them pass. I had to turn off both the pytest workers, and the faulthandler, pytest -n 0 -p no:faulthandler, before the tests would pass. With these options all the tests in test_idaklu_jax.py pass reliably

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking issue: migrate to scikit-build-core
4 participants