You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to try these ideas out on a fork if that makes more sense, and
merge it later.
Currently, sympl assumes that the arrays inside the state dictionary are
instances of DataArray. While this made sense initially, I'm continually
coming up against performance issues (like #43).
For instance,
get_numpy_array uses the .transpose() function of DataArray
which is very slow. I wrote an equivalent version of this function
which instead used the numpy version which is ~20-30% faster (and passes all tests).
accessing an attribute like .values or .dims involves multiple function calls since
they are properties which reference other properties internally.
creating a new DataArray has a huge __init__ overhead with all kinds of checks
which are really not necessary in our use case.
These issues really come to the front when writing models which work with a single
column of data, which currently is the major use-case for climt at least.
While it is desirable to keep the DataArray interface, it would be really helpful
downstream if sympl described an API which any array object must implement.
This will require some re-writing of internal code which assumes that the
arrays are DataArrays, but in the end will allow more performant array representations
like unyt to be used seamlessly in sympl components.
This might also require sympl to allow an implementing library to replace functions
like get_numpy_array with custom versions.
In general, it might be good to specify a number of functions that an implementing library
must provide which can replace the logic that currently resides within __call__ of any sympl component. This will make it easy to add functionality without having to
build custom subclasses of the base sympl components, which is undesirable.
IMO this also makes sense since sympl is a framework, and it need
not be opinionated about what kind of arrays are used, or how the validation
of these arrays and their dimensions is done. sympl could register
callbacks based on the type of the input array formats and use them
for validation and reshaping.
The text was updated successfully, but these errors were encountered:
I would like to try these ideas out on a fork if that makes more sense, and
merge it later.
Currently, sympl assumes that the arrays inside the state dictionary are
instances of
DataArray
. While this made sense initially, I'm continuallycoming up against performance issues (like #43).
For instance,
get_numpy_array
uses the.transpose()
function ofDataArray
which is very slow. I wrote an equivalent version of this function
which instead used the
numpy
version which is ~20-30% faster (and passes all tests).accessing an attribute like
.values
or.dims
involves multiple function calls sincethey are properties which reference other properties internally.
creating a new
DataArray
has a huge__init__
overhead with all kinds of checkswhich are really not necessary in our use case.
These issues really come to the front when writing models which work with a single
column of data, which currently is the major use-case for
climt
at least.While it is desirable to keep the
DataArray
interface, it would be really helpfuldownstream if
sympl
described an API which any array object must implement.This will require some re-writing of internal code which assumes that the
arrays are
DataArray
s, but in the end will allow more performant array representationslike
unyt
to be used seamlessly insympl
components.This might also require
sympl
to allow an implementing library to replace functionslike
get_numpy_array
with custom versions.In general, it might be good to specify a number of functions that an implementing library
must provide which can replace the logic that currently resides within
__call__
of anysympl
component. This will make it easy to add functionality without having tobuild custom subclasses of the base
sympl
components, which is undesirable.IMO this also makes sense since
sympl
is a framework, and it neednot be opinionated about what kind of arrays are used, or how the validation
of these arrays and their dimensions is done.
sympl
could registercallbacks based on the type of the input array formats and use them
for validation and reshaping.
The text was updated successfully, but these errors were encountered: