Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scipy an optional dependency #2062

Open
tylerflex opened this issue Nov 7, 2024 · 24 comments · May be fixed by #2073
Open

Make scipy an optional dependency #2062

tylerflex opened this issue Nov 7, 2024 · 24 comments · May be fixed by #2073

Comments

@tylerflex
Copy link
Collaborator

tylerflex commented Nov 7, 2024

Motivation: allow a more lightweight base package for validation

Not sure if it's possible:

  1. There are some methods that use scipy (like SimulationData.to_mat) so lazy import should be fine there.
  2. xarray interpolation uses scipy, so we'd need to install different xarray versions depending on the installation?
  3. not sure if scipy comes with autograd already

Either way, we probably want regular pip install tidy3d users to have scipy and interp() support. So not sure if to accomplish this goal of having a "lightweight" tidy3d package if we just need to create a more custom installation script to MC rather than using pyproject.toml?

Thoughts @daquintero @yaugenst @momchil-flex ?

@momchil-flex
Copy link
Collaborator

Motivation: allow a more lightweight base package for validation

Not sure if it's possible:

  1. There are some methods that use scipy (like SimulationData.to_mat) so lazy import should be fine there.
  2. xarray interpolation uses scipy, so we'd need to install different xarray versions depending on the installation?
  3. not sure if scipy comes with autograd already

One other thing that comes to mind is we may want to add it to the trimesh option then. MC basically installs our core packages + trimesh for the validation API to be able to validate all possible simulations, and I remember that some STL validator requires some scipy functionality and it fails if not installed.

Either way, we probably want regular pip install tidy3d users to have scipy and interp() support. So not sure if to accomplish this goal of having a "lightweight" tidy3d package if we just need to create a more custom installation script to MC rather than using pyproject.toml?

That is effectively what this block achieves, albeit in a bit of a dirty way.

tidy3d/pyproject.toml

Lines 41 to 48 in 3f087b5

### NOT CORE
boto3 = "^1.28.0"
requests = "^2.31.0"
pyjwt = "*"
click = "^8.1.0"
responses = "*"
joblib = "*"
### END NOT CORE

But yeah MC basically comments that out for the validation API. So the question to me is if scipy can be moved inside this block. It was moved up I think because of that trimesh issue actually, because the other 3 things that you point out should not be needed for validation. Well actually I think we have a validator for custom data that checks that the data the user has provided can be interpolated (e.g. has no repeated coords), so it may be needed for that too.

@yaugenst-flex
Copy link
Contributor

Not really sure what the minimum requirements for MC are but autograd depends on scipy (as does xarray as @tylerflex pointed out). That seems pretty tricky to get around?
I think autograd only imports scipy when you import autograd.scipy though, so in principle you could pip install autograd --no-deps.

@tylerflex
Copy link
Collaborator Author

Not really sure what the minimum requirements for MC are but autograd depends on scipy (as does xarray as @tylerflex pointed out). That seems pretty tricky to get around?

yea it does, i wonder if we just need to say that we only validate the .to_static() version of the component? and just not install autograd (or use --no-deps)?

@Frank-DingYC
Copy link

I would like to share some background about the problem. We use Web Assembly (More specifically, Pyodide) to import Tidy3D so that we can implement some features, like plot source_time, script objects, show grid, plot permittivity and so on. If we can't limit the core dependencies of Tidy3D, the front-end webpage will suffer from crashing because of out-of-memory issues. For now, @majinge7846 accomplished lots of work to remove Scipy for front-end Tidy3D utilization. However, the problem becomes critical when upgrading Tidy3d to v2.7.6.

Recently, we want to improve the user experience for frequently used functions (Validate/Estimate). We plan to use the same technical solution to implement it. However, we are facing the same problem. Refer to this. Currently, we use lambda to do validation/estimation and pipeline as backup, which will spend lots of time wasted during the simulation.json file transmission. Instead, if we use web assembly to implement validation/estimation, users can get the results immediately.

@majinge7846
Copy link
Contributor

https://github.com/pydata/xarray/blob/9b632fd36edba7c6fcd1eea778ba3d8710c68349/pyproject.toml#L36
image

scipy is an optional deependency of xarray. And I found that it just used for interpolate(scipy.interpolate.interp1d).
If the tidy3d wasm without scipy does't call related functions while do validating. the validation of tidy3d wasm is OK.

webgui can even do a check, if tidy3d wasm error/log validation error, "lack scipy please install first" , this is acceptable, we can use lambda/pipeline as fallback. (Of course, it would be better if numpy or other methods can be used to avoid calling Scipy)

I think webgui validation not need designed to handle all simulation cases, just most simulation cases are OK too. lightweight base package may not be needed.

@yaugenst-flex
Copy link
Contributor

yea it does, i wonder if we just need to say that we only validate the .to_static() version of the component? and just not install autograd (or use --no-deps)?

oh nevermind, scipy is optional in autograd too. i guess that's good news then.

what would be the best way to go here? sounds to me like it would make sense to add something like a minimal install option under [tool.poetry.extras]?

@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Nov 8, 2024

Just catching up on this. Ultimately, this is one of the package refactor aspects we want to implement in 3.0. In the sense that we want to have an essential core tidy3d API and extend the package functionality from there. So this is helpful to appreciate the core functionality and extensible functionality. Like the core functionality probably should just be an static API data-structure validation and then have a clear boundary for further functionality. In the future, we do want to have minimal tidy3d lightweight pacakge that is extended through tidy3d[option] or simply other independent packages that depend on the minimal tidy3d. This should help for separating lightweight memory operations in the web built upon a minimal installation.

This kind of leads to the issue of how to approach this right now (edit to be clear 3.0 is a bit far away still). Unfortunately poetry does not enable us to "minimize" the base package [tool.poetry.dependencies] : it will always install those when pip install tidy3d. --no-deps is also an option since autograd or xarray don't directly depend on it. Based on previously looking into this, tool.poetry.extras will install extra packages on top of the base installation. So we shouldn't really make scipy an optional dependency and install it in an extra because then users will not have it in their base tidy3d installation by default. Unfortunately, I think the kind of hacky solution we've been using of the CORE NOTCORE commented section in the pyproject.toml would enable it not to be directly installed by MC in their current workflow possibly. Possibly we can add trimesh as Momchil suggests, also since this hack is shared in the interim before 3.0.

If the pipeline can be a good fallback for all the other non-web-assembly local validation cases then I think this could be a good way to get it to work before 3.0! We'd just have to check the scipy to be sure they are on-the-fly per-function like Tyler mentioned.

Happy to do a PR for this if this is the approach we want to go for.

@momchil-flex
Copy link
Collaborator

momchil-flex commented Nov 8, 2024

So my thoughts:

  • We should definitely get something working before 3.0 as the latter may take quite a while
  • Yeah I remember why we went with CORE NOTCORE and I think we may just have to continue with that. Basically I don't think we want to require our typical users to do anything else but pip install tidy3d to be able to do most of their work.
  • On the other hand, we do separate some packages out as extras that users may still need, when we've encountered installation issues with such packages in the past. This is why trimesh and gdstk and vtk are separate extras. So I don't think we should put trimesh into CORE if it's then going to make it always installed for the usual installation too, but we could consider somehow defining a CORE (or some other name) set of dependencies which are needed for all validations but nothing more.
  • Finally, whatever we do about this, we need to make sure that we are testing that everything can be validated using only these base dependencies. That's probably the biggest chunk of effort. We also discussed on slack at some point that we may need to refactor our SIM_FULL into a more formal set of tests for all possible components, also probably splitting it into multiple simulations as some components are becoming incompatible to be in the same simulation. I mean, without too much refactoring of the tests, testing the core/validate_only install could potentially also be done by defining a subset of existing tests to run with the limited installation, which should cover quite a lot already.

@daquinteroflex
Copy link
Collaborator

Nice, yeah I agree!

How urgent is this out of curiosity? If it is very urgent for the web UI (sounds a bit?), probably someone else should do this given my current limited timing constraints. Otherwise, think I've got a few ideas how to implement and properly test this with the right environments as you suggest and happy to focus on cracking on this. Or happy to propose a few implementation approaches too as need be.

@majinge7846
Copy link
Contributor

majinge7846 commented Nov 11, 2024

Nice, yeah I agree!

How urgent is this out of curiosity? If it is very urgent for the web UI (sounds a bit?), probably someone else should do this given my current limited timing constraints. Otherwise, think I've got a few ideas how to implement and properly test this with the right environments as you suggest and happy to focus on cracking on this. Or happy to propose a few implementation approaches too as need be.

test for web to usage:
do not install scipy, load/create most simulation (validation) successfully.

@tylerflex
Copy link
Collaborator Author

I spoke with MC and they are saying this is becoming a very urgent issue. Can we come up with a plan for it and assign someone?

@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Nov 21, 2024

So I began a PR that is linked, unfortunately I'm in crunch time on the first thesis submission by the end of the month so I can't really devote any more extra time on top of the backend PRs I was helping with recently. I become more free early December to devote extra time but maybe someone else can finish implementing this? I was also a bit concerned on the timelines that weren't super clear to me.

In the PR I've been focused on the testing implementation using nox for multiple environments including one without scipy. We could do as agreed on moving scipy on the pyproject.toml, but the problem is making sure the testing suite still works accordingly and hence why I was focused on that within the PR. Sorry it might be very tricky for me to implement this in the timeframe needed until next month.

Sorry to nominate @yaugenst-flex if you'd be ok with that?

@tylerflex
Copy link
Collaborator Author

Thanks @daquinteroflex , let's also consider whether there's a really simple approach that removes scipy but still allows us to create and validate components (unless your branch already plans to implement that?)

@momchil-flex
Copy link
Collaborator

So if I understand things correctly,

  • tidy3d can still be used for a lot of things without scipy. Some specific validators may fail but many simulations can be loaded still
  • Whatever MC is doing they need to have a plan about what to do with simulations that do need scipy and so their websm approach fails. Even if we want to try to completely eliminate scipy by rewriting validators that need it (which I don't think was included in the original scope?), they need to have a backup plan for e.g. simulations that are too big

So is our refactor really blocking for them?

@tylerflex
Copy link
Collaborator Author

I think there's at least one validator that explicitly uses interp (scipy).

#1684

but yea other than that maybe we can just make it work without scipy? we could make that validator skip on an import error for example

@momchil-flex
Copy link
Collaborator

Well it depends on what they are trying to do, if they are trying to validate it should not skip but fall back on something that can validate properly.

I think there's a trimesh validator that also requires scipy internally.

@majinge7846
Copy link
Contributor

Well it depends on what they are trying to do, if they are trying to validate it should not skip but fall back on something that can validate properly.

I think there's a trimesh validator that also requires scipy internally.

yes, and webGUI do not try to support trimesh validation, because although trimesh itself is a pure python package, but it depeends on networkx / rtree (hard to get wasm version) and scipy (big), It is acceptable to fallback to pipeline when meet these case.

@majinge7846
Copy link
Contributor

majinge7846 commented Nov 22, 2024

I think there's at least one validator that explicitly uses interp (scipy).

#1684

but yea other than that maybe we can just make it work without scipy? we could make that validator skip on an import error for example

yes, In the short term remove scipy here is a good and big start for wasm validation.
My suggestions are (just for reference)

  1. replace scipy by numpy/xarray(functions do not depeends on scipy)/pandas completely. (maybe need more code work)
  2. do a import check in interp, if env do not has scipy, enter an extra logic, and do a simple can_interp check

And gui do not want to lose the ability to check simulation includes stl and custom source/medium with custom dataset . So could we do not skip.

@yaugenst-flex
Copy link
Contributor

I tinkered around a bit with this, see #2087. This right now just fails if there is a medium on which to interpolate. However, if we need to validate interp, there is not really a way around not using scipy other than reimplementing scipy's RegularGridInterpolator ourselves..

@tylerflex
Copy link
Collaborator Author

can we just change the validator in question to not interp directly but instead analyze the coords to determine (to good approximation) whether this DataArray could be interpolated?

@tylerflex
Copy link
Collaborator Author

I think it's intended to just catch an edge cases that pops up from time to time on the backend, but maybe we can target that edge case without doing the interp directly.

@tylerflex
Copy link
Collaborator Author

@momchil-flex @marc-flex

@momchil-flex
Copy link
Collaborator

Yeah we could loosen the validator. The original issue was that there were arrays provided which had repeated coords along some dimensions. We could revert to testing that only (we made it more general to potentially capture other errors).

However like I mentioned I'm pretty sure there was at least one other validator that required scipy internally. @yaugenst-flex did you try loading e.g. the tests SIM_FULL in your trimmed test? Well even if that works it doesn't necessarily mean all is good since e.g. some validators only apply to structures that cross a given plane.

@yaugenst-flex
Copy link
Contributor

yaugenst-flex commented Nov 22, 2024

@momchil-flex ok yeah i tested with SIM_FULL, it's failing because of trimesh and the interp validator. Interp validator is easy to skip, trimesh is failing a bit more badly...

I'm also not super sure what verify_packages_import does. Basically it fails if the module is not found, which will happen anyways if the module import fails?

pydantic.v1.error_wrappers.ValidationError: 4 validation errors for Simulation
structures -> 8 -> geometry -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)
structures -> 31 -> geometry -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)
structures -> 32 -> geometry -> GeometryGroup -> geometries -> 0 -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)
structures -> 32 -> geometry -> GeometryGroup -> geometries -> 1 -> TriangleMesh -> __root__
  The package 'trimesh' is required for this operation, but it was not found. Please install the 'trimesh' dependencies using, for example, 'pip install tidy3d[<see_options_in_pyproject.toml>] (type=value_error.tidy3dimport)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants