Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sympl interface spec for arrays/units #46

Open
JoyMonteiro opened this issue Aug 13, 2019 · 0 comments
Open

sympl interface spec for arrays/units #46

JoyMonteiro opened this issue Aug 13, 2019 · 0 comments

Comments

@JoyMonteiro
Copy link
Collaborator

I would like to try these ideas out on a fork if that makes more sense, and
merge it later.

Currently, sympl assumes that the arrays inside the state dictionary are
instances of DataArray. While this made sense initially, I'm continually
coming up against performance issues (like #43).

For instance,

  • get_numpy_array uses the .transpose() function of DataArray
    which is very slow. I wrote an equivalent version of this function
    which instead used the numpy version which is ~20-30% faster (and passes all tests).

  • accessing an attribute like .values or .dims involves multiple function calls since
    they are properties which reference other properties internally.

  • creating a new DataArray has a huge __init__ overhead with all kinds of checks
    which are really not necessary in our use case.

These issues really come to the front when writing models which work with a single
column of data, which currently is the major use-case for climt at least.

While it is desirable to keep the DataArray interface, it would be really helpful
downstream if sympl described an API which any array object must implement.
This will require some re-writing of internal code which assumes that the
arrays are DataArrays, but in the end will allow more performant array representations
like unyt to be used seamlessly in sympl components.

This might also require sympl to allow an implementing library to replace functions
like get_numpy_array with custom versions.

In general, it might be good to specify a number of functions that an implementing library
must provide which can replace the logic that currently resides within __call__ of any
sympl component. This will make it easy to add functionality without having to
build custom subclasses of the base sympl components, which is undesirable.

IMO this also makes sense since sympl is a framework, and it need
not be opinionated about what kind of arrays are used, or how the validation
of these arrays and their dimensions is done. sympl could register
callbacks based on the type of the input array formats and use them
for validation and reshaping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant