Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved duck array wrapping #9798

Merged
merged 30 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
fd6b339
lots more duck array compat, plus tests
slevang Nov 18, 2024
893408c
Merge branch 'main' into more-array-api-compat
slevang Nov 18, 2024
f7866ce
merge sliding_window_view
slevang Nov 18, 2024
90037fe
namespaces constant
slevang Nov 18, 2024
5ba1a2f
revert dask allowed
slevang Nov 18, 2024
6225ae3
fix up some tests
slevang Nov 19, 2024
e2911c2
backwards compat sparse mask
slevang Nov 19, 2024
2ac37f9
add as_array methods
slevang Nov 21, 2024
1cc344b
to_like_array helper
slevang Nov 21, 2024
69080a5
Merge branch 'main' into more-array-api-compat
slevang Nov 21, 2024
372439c
only cast non-numpy
slevang Nov 21, 2024
0eef2cb
better idxminmax approach
slevang Nov 21, 2024
6739504
fix mypy
slevang Nov 21, 2024
9e6d6f8
naming, add is_array_type
slevang Nov 21, 2024
e721011
add public doc and whats new
slevang Nov 21, 2024
1fe4131
update comments
slevang Nov 21, 2024
205c199
add support for chunked arrays in as_array_type
slevang Nov 21, 2024
7752088
Merge branch 'main' into more-array-api-compat
slevang Nov 21, 2024
c8d4e5e
revert array_type methods
slevang Nov 22, 2024
e67a819
Merge branch 'main' into more-array-api-compat
slevang Nov 22, 2024
f306768
fix up whats new
slevang Nov 22, 2024
18ebdcd
comment about bool_
slevang Nov 22, 2024
f51e3fb
Merge branch 'main' into more-array-api-compat
slevang Nov 22, 2024
121af9e
add jax to complete ci envs
slevang Nov 23, 2024
472ae7e
add pint and sparse to tests
slevang Nov 23, 2024
5aa4a39
remove from windows
slevang Nov 23, 2024
390df6f
mypy, xfail one more sparse
slevang Nov 23, 2024
f6074d2
add dask and a few other methods
slevang Nov 25, 2024
561f21b
Merge branch 'main' into more-array-api-compat
slevang Nov 25, 2024
bfd6aeb
move whats new
slevang Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions xarray/core/array_api_compat.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import numpy as np

from xarray.namedarray.pycompat import array_type


def is_weak_scalar_type(t):
return isinstance(t, bool | int | float | complex | str | bytes)
Expand Down Expand Up @@ -42,3 +44,29 @@ def result_type(*arrays_and_dtypes, xp) -> np.dtype:
return xp.result_type(*arrays_and_dtypes)
else:
return _future_array_api_result_type(*arrays_and_dtypes, xp=xp)


def get_array_namespace(*values):
def _get_single_namespace(x):
if hasattr(x, "__array_namespace__"):
return x.__array_namespace__()
elif isinstance(x, array_type("cupy")):
# special case cupy for now
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy seems to have full compliance with the standard, but doesn't yet actually have __array_namespace__ on the core API. Others may be the same?

import cupy as cp

return cp
else:
return np

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to wrap array-api-compat's array_namespace here, see https://github.com/scipy/scipy/blob/ec30b43e143ac0cb0e30129d4da0dbaa29e74c34/scipy/_lib/_array_api.py#L118-L152 for what we do in SciPy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to go that route, I did actually try that but array-api-compat doesn't handle a bunch of things we end up passing through this (scalars, index wrappers, etc) so it would require some careful prefiltering.

The only things this package effectively wraps that don't have __array_namespace__ are cupy, dask, and torch. I tried the torch wrapper but it doesn't pass our as_compatible_data check because the wrapper object itself doesn't have __array_namespace__ or __array_function__ 😕

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for scalars was just merged half an hour ago! data-apis/array-api-compat#147

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this would handle the array-like check well, but would require adding this as a core xarray dependency to use it in as_compatible_data. Not sure if there is any appetite for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SciPy we vendor array-api-compat via a Git submodule. There's a little bit of build system bookkeeping needed, but otherwise it works well without introducing a dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this some more.

Getting an object that is compliant from the perspective of array_api_compat.is_array_api_obj to work through the hierarchy of xarray objects requires basically swapping in a bunch of more restrictive hasattr(__array_namespace__) checks for this function. Not too bad.

I was able to get things to the point that xarray can wrap torch.Tensor, which is pretty cool. But the reality is that xarray relies on a lot of functionality beyond the Array API, so this just doesn't work in any practical sense. Similarly, swapping in the more restrictive cupy.array_api for the main cupy namespace causes all kinds of things to break.

It seems torch has had little movement on supporting the standard since 2021. From xarray's perspective, cupy implements everything we need, except for the actual __array_namespace__ attribute declaring compatibility, which seems to be planned for v14.

So while this compat module is nice in theory, I don't think it's very useful for xarray. cupy, jax, and sparse are good to go without it, and we only need a single special case for cupy to fetch its __array_namespace__.


namespaces = {_get_single_namespace(t) for t in values}
non_numpy = namespaces - {np}

if len(non_numpy) > 1:
names = [module.__name__ for module in non_numpy]
raise TypeError(f"Mixed array types {names} are not supported.")
elif non_numpy:
[xp] = non_numpy
else:
xp = np

return xp
5 changes: 3 additions & 2 deletions xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,7 @@ def clip(
keep_attrs = _get_keep_attrs(default=True)

return apply_ufunc(
np.clip, self, min, max, keep_attrs=keep_attrs, dask="allowed"
duck_array_ops.clip, self, min, max, keep_attrs=keep_attrs, dask="allowed"
)

def get_index(self, key: Hashable) -> pd.Index:
Expand Down Expand Up @@ -1760,7 +1760,8 @@ def _full_like_variable(
**from_array_kwargs,
)
else:
data = np.full_like(other.data, fill_value, dtype=dtype)
xp = duck_array_ops.get_array_namespace(other.data)
data = xp.full_like(other.data, fill_value, dtype=dtype)

return Variable(dims=other.dims, data=data, attrs=other.attrs)

Expand Down
8 changes: 6 additions & 2 deletions xarray/core/computation.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
from xarray.core.utils import is_dict_like, is_scalar, parse_dims_as_set, result_name
from xarray.core.variable import Variable
from xarray.namedarray.parallelcompat import get_chunked_array_type
from xarray.namedarray.pycompat import is_chunked_array
from xarray.namedarray.pycompat import is_chunked_array, to_numpy
from xarray.util.deprecation_helpers import deprecate_dims

if TYPE_CHECKING:
Expand Down Expand Up @@ -1702,7 +1702,7 @@ def cross(
)

c = apply_ufunc(
np.cross,
duck_array_ops.cross,
a,
b,
input_core_dims=[[dim], [dim]],
Expand Down Expand Up @@ -2174,9 +2174,13 @@ def _calc_idxminmax(
# we need to attach back the dim name
res.name = dim
else:
indx.data = to_numpy(indx.data)
dcherian marked this conversation as resolved.
Show resolved Hide resolved
res = array[dim][(indx,)]
# The dim is gone but we need to remove the corresponding coordinate.
del res.coords[dim]
# Cast to array namespace
xp = duck_array_ops.get_array_namespace(array.data)
res.data = xp.asarray(res.data)

if skipna or (skipna is None and array.dtype.kind in na_dtypes):
# Put the NaN values back in after removing them
Expand Down
12 changes: 8 additions & 4 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@
calculate_dimensions,
)
from xarray.namedarray.parallelcompat import get_chunked_array_type, guess_chunkmanager
from xarray.namedarray.pycompat import array_type, is_chunked_array
from xarray.namedarray.pycompat import array_type, is_chunked_array, to_numpy
from xarray.plot.accessor import DatasetPlotAccessor
from xarray.util.deprecation_helpers import _deprecate_positional_args, deprecate_dims

Expand Down Expand Up @@ -6564,7 +6564,7 @@ def dropna(
array = self._variables[k]
if dim in array.dims:
dims = [d for d in array.dims if d != dim]
count += np.asarray(array.count(dims))
count += to_numpy(array.count(dims).data)
size += math.prod([self.sizes[d] for d in dims])

if thresh is not None:
Expand Down Expand Up @@ -8678,16 +8678,20 @@ def _integrate_one(self, coord, datetime_unit=None, cumulative=False):
coord_names.add(k)
else:
if k in self.data_vars and dim in v.dims:
# cast coord data to duck array if needed
coord_data = duck_array_ops.get_array_namespace(v.data).asarray(
coord_var.data
)
if _contains_datetime_like_objects(v):
v = datetime_to_numeric(v, datetime_unit=datetime_unit)
if cumulative:
integ = duck_array_ops.cumulative_trapezoid(
v.data, coord_var.data, axis=v.get_axis_num(dim)
v.data, coord_data, axis=v.get_axis_num(dim)
)
v_dims = v.dims
else:
integ = duck_array_ops.trapz(
v.data, coord_var.data, axis=v.get_axis_num(dim)
v.data, coord_data, axis=v.get_axis_num(dim)
)
v_dims = list(v.dims)
v_dims.remove(dim)
Expand Down
138 changes: 90 additions & 48 deletions xarray/core/duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,17 @@
import pandas as pd
from numpy import all as array_all # noqa: F401
from numpy import any as array_any # noqa: F401
from numpy import concatenate as _concatenate
from numpy import ( # noqa: F401
full_like,
gradient,
isclose,
isin,
isnat,
take,
tensordot,
transpose,
unravel_index,
)
from packaging.version import Version
from pandas.api.types import is_extension_array_dtype

from xarray.core import dask_array_compat, dask_array_ops, dtypes, nputils
from xarray.core.array_api_compat import get_array_namespace
from xarray.core.options import OPTIONS
from xarray.core.utils import is_duck_array, is_duck_dask_array, module_available
from xarray.namedarray import pycompat
Expand All @@ -54,28 +49,6 @@
dask_available = module_available("dask")


def get_array_namespace(*values):
def _get_array_namespace(x):
if hasattr(x, "__array_namespace__"):
return x.__array_namespace__()
else:
return np

namespaces = {_get_array_namespace(t) for t in values}
non_numpy = namespaces - {np}

if len(non_numpy) > 1:
raise TypeError(
"cannot deal with more than one type supporting the array API at the same time"
)
elif non_numpy:
[xp] = non_numpy
else:
xp = np

return xp


def einsum(*args, **kwargs):
from xarray.core.options import OPTIONS

Expand All @@ -84,7 +57,23 @@ def einsum(*args, **kwargs):

return opt_einsum.contract(*args, **kwargs)
else:
return np.einsum(*args, **kwargs)
xp = get_array_namespace(*args)
return xp.einsum(*args, **kwargs)


def tensordot(*args, **kwargs):
xp = get_array_namespace(*args)
return xp.tensordot(*args, **kwargs)


def cross(*args, **kwargs):
xp = get_array_namespace(*args)
return xp.cross(*args, **kwargs)


def gradient(f, *varargs, axis=None, edge_order=1):
xp = get_array_namespace(f)
return xp.gradient(f, *varargs, axis=axis, edge_order=edge_order)


def _dask_or_eager_func(
Expand Down Expand Up @@ -133,15 +122,20 @@ def fail_on_dask_array_input(values, msg=None, func_name=None):
"masked_invalid", eager_module=np.ma, dask_module="dask.array.ma"
)

# sliding_window_view will not dispatch arbitrary kwargs (automatic_rechunk),
# so we need to hand-code this.
sliding_window_view = _dask_or_eager_func(
"sliding_window_view",
eager_module=np.lib.stride_tricks,
dask_module=dask_array_compat,
dask_only_kwargs=("automatic_rechunk",),
numpy_only_kwargs=("subok", "writeable"),
)

def sliding_window_view(array, window_shape, axis=None, **kwargs):
# TODO: some libraries (e.g. jax) don't have this, implement an alternative?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the biggest outstanding bummers of wrapping jax arrays. There is apparently openness to adding this as an API (even though it would not offer any performance benefit in XLA). But given this is way outside the API standard, whether it makes sense to implement a general version within xarray that doesn't rely on stride tricks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could implement a version using "summed area tables" (basically run a single accumulator and then compute differences between the window edges); or convolutions I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have something that works pretty well with this style of gather operation. But only in a jit context where XLA can work its magic. So I guess this is better left to the specific library to implement, or the user.

xp = get_array_namespace(array)
# sliding_window_view will not dispatch arbitrary kwargs (automatic_rechunk),
# so we need to hand-code this.
func = _dask_or_eager_func(
"sliding_window_view",
eager_module=xp.lib.stride_tricks,
dask_module=dask_array_compat,
dask_only_kwargs=("automatic_rechunk",),
numpy_only_kwargs=("subok", "writeable"),
)
return func(array, window_shape, axis=axis, **kwargs)


def round(array):
Expand Down Expand Up @@ -174,7 +168,7 @@ def isnull(data):
)
):
# these types cannot represent missing values
return full_like(data, dtype=bool, fill_value=False)
return full_like(data, dtype=xp.bool, fill_value=False)
else:
# at this point, array should have dtype=object
if isinstance(data, np.ndarray) or is_extension_array_dtype(data):
Expand Down Expand Up @@ -215,11 +209,23 @@ def cumulative_trapezoid(y, x, axis):

# Pad so that 'axis' has same length in result as it did in y
pads = [(1, 0) if i == axis else (0, 0) for i in range(y.ndim)]
integrand = np.pad(integrand, pads, mode="constant", constant_values=0.0)

xp = get_array_namespace(y, x)
integrand = xp.pad(integrand, pads, mode="constant", constant_values=0.0)

return cumsum(integrand, axis=axis, skipna=False)


def full_like(a, fill_value, **kwargs):
xp = get_array_namespace(a)
return xp.full_like(a, fill_value, **kwargs)


def empty_like(a, **kwargs):
xp = get_array_namespace(a)
return xp.empty_like(a, **kwargs)


def astype(data, dtype, **kwargs):
if hasattr(data, "__array_namespace__"):
xp = get_array_namespace(data)
Expand Down Expand Up @@ -350,7 +356,8 @@ def array_notnull_equiv(arr1, arr2):

def count(data, axis=None):
"""Count the number of non-NA in this array along the given axis or axes"""
return np.sum(np.logical_not(isnull(data)), axis=axis)
xp = get_array_namespace(data)
return xp.sum(xp.logical_not(isnull(data)), axis=axis)


def sum_where(data, axis=None, dtype=None, where=None):
Expand All @@ -365,7 +372,7 @@ def sum_where(data, axis=None, dtype=None, where=None):

def where(condition, x, y):
"""Three argument where() with better dtype promotion rules."""
xp = get_array_namespace(condition)
xp = get_array_namespace(condition, x, y)
return xp.where(condition, *as_shared_dtype([x, y], xp=xp))


Expand All @@ -382,15 +389,25 @@ def fillna(data, other):
return where(notnull(data), data, other)


def logical_not(data):
xp = get_array_namespace(data)
return xp.logical_not(data)


def clip(data, min=None, max=None):
xp = get_array_namespace(data)
return xp.clip(data, min, max)


def concatenate(arrays, axis=0):
"""concatenate() with better dtype promotion rules."""
# TODO: remove the additional check once `numpy` adds `concat` to its array namespace
if hasattr(arrays[0], "__array_namespace__") and not isinstance(
arrays[0], np.ndarray
):
xp = get_array_namespace(arrays[0])
# TODO: `concat` is the xp compliant name, but fallback to concatenate for
# older numpy and for cupy
xp = get_array_namespace(*arrays)
if hasattr(xp, "concat"):
return xp.concat(as_shared_dtype(arrays, xp=xp), axis=axis)
return _concatenate(as_shared_dtype(arrays), axis=axis)
else:
return xp.concatenate(as_shared_dtype(arrays, xp=xp), axis=axis)


def stack(arrays, axis=0):
Expand All @@ -408,6 +425,26 @@ def ravel(array):
return reshape(array, (-1,))


def transpose(array, axes=None):
xp = get_array_namespace(array)
return xp.transpose(array, axes)


def moveaxis(array, source, destination):
xp = get_array_namespace(array)
return xp.moveaxis(array, source, destination)


def pad(array, pad_width, **kwargs):
xp = get_array_namespace(array)
return xp.pad(array, pad_width, **kwargs)


def quantile(array, q, axis=None, **kwargs):
xp = get_array_namespace(array)
return xp.quantile(array, q, axis=axis, **kwargs)


@contextlib.contextmanager
def _ignore_warnings_if(condition):
if condition:
Expand Down Expand Up @@ -749,6 +786,11 @@ def last(values, axis, skipna=None):
return take(values, -1, axis=axis)


def isin(element, test_elements, **kwargs):
xp = get_array_namespace(element, test_elements)
return xp.isin(element, test_elements, **kwargs)


def least_squares(lhs, rhs, rcond=None, skipna=False):
"""Return the coefficients and residuals of a least-squares fit."""
if is_duck_dask_array(rhs):
Expand Down
2 changes: 1 addition & 1 deletion xarray/core/nanops.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ def nanmean(a, axis=None, dtype=None, out=None):
"ignore", r"Mean of empty slice", category=RuntimeWarning
)

return np.nanmean(a, axis=axis, dtype=dtype)
return nputils.nanmean(a, axis=axis, dtype=dtype)


def nanmedian(a, axis=None, out=None):
Expand Down
6 changes: 6 additions & 0 deletions xarray/core/nputils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import pandas as pd
from packaging.version import Version

from xarray.core.array_api_compat import get_array_namespace
from xarray.core.utils import is_duck_array, module_available
from xarray.namedarray import pycompat

Expand Down Expand Up @@ -179,6 +180,11 @@ def f(values, axis=None, **kwargs):
dtype = kwargs.get("dtype")
bn_func = getattr(bn, name, None)

xp = get_array_namespace(values)
if xp is not np:
func = getattr(xp, name, None)
if func is not None:
return func(values, axis=axis, **kwargs)
if (
module_available("numbagg")
and pycompat.mod_version("numbagg") >= Version("0.5.0")
Expand Down
Loading
Loading