-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: SciPy array types & libraries support #18286
Comments
I read it over, sounds exciting. I have to admit that evaluating API designs on this scale is not one of my strengths. I'll briefly mention that the US National Labs will probably quietly test out their own array API compatible technologies behind the scenes as well, if this moves forward, to see if performance gains are observed on bleeding-edge supercomputer hardware. One question I have related to this--for nodes with multiples devices (like 4 x A100 is pretty common these days), do we leave the use of multiple devices up to the GPU library under the hood? Do we explicitly not encourage using kwargs for controlling concurrency that is specific to say GPU libs (i.e., too much risk of optional kwarg API bloat)? You did briefly mention testing a few different CPU-based scenarios by default. I'm curious if we'd use a One thing that may be more general re: testing is how we might handle bug reports that are effectively isolated to a subset of pass-throughs--only library X with hardware Y. I suppose the array API is supposed to prevent that from happening, but there is also reality... I guess maybe occasional shims and regression tests that only run with the conditions are right, assuming that we can't just consider these upstream issues most of the time. I will say that I don't see any way for this overall improvement to make our testing situation any simpler, unfortunately/obviously. Maybe after the first few PRs go in, it will be easier for those of us who aren't familiar with every corner of those NEPs to start contributing since we might be able to understand the changes needed as basic engineering tasks. I wonder if |
What is the policy for handling mixed inputs when a function accepts more than one argument? This is probably covered in one of the array API documents or discussions, but I think it we need it stated explicitly here, too. I expect the most common use-case is mixing $SOME_OTHER_ARRAY_TYPE and standard numpy arrays (e.g. multiplying a sparse matrix or sparse array by a numpy array, convolving some other array type with a kernel that is a numpy array, etc.). |
The array API spec says:
https://data-apis.org/array-api/latest/purpose_and_scope.html#out-of-scope See also the interchange section about DLPack https://data-apis.org/array-api/latest/design_topics/data_interchange.html |
Thanks Ralf, for the detailed RFC; after reading it a couple of times, I have a few implementation-specific comments/questions/clarifications (some arising from the past work on the Gravitational waves demo). It seems
With my understanding, I'd try to expand the above for everyone with a real-world example taken from scipy/scipy/signal/_spectral_py.py Lines 1748 to 1752 in c8206b8
Here Now I'd expect at some point, PyTorch will update their implementation to support array API for these functions, which are still left, and there is probably effort around it already (see this But for now, Questions:
Apologies if my questions are a bit too specific with implementation details and are special examples, and if this RFC is about discussing the high-level design, feel free to ignore it. Overall, excited that all of this work is coming together, and the current state looks pretty solid to me to start baking in this functionality in the "array consumer libraries" ( |
Thanks for the feedback so far! Let me try to answer the high-level questions first.
Yes, we should not carry anything specific to the execution model within SciPy code. That's a key design principle of the array API standard. You could use a multithreaded CPU library, a GPU library, or something even fancier like what you have here. The code should be unchanged. There's quite a bit of variation between different distributed libraries in particular, and that's out of scope to deal with I'd say.
Yes, that's what I'd like to aim for. This has worked pretty well for the array API test suite; a single switch allows you to select any compatible libraries. I imagine we'd have that as a decorator for relevant tests, or some such thing.
Probably. I imagine modules like
Good point, I'll add it to the issue description. @asmeurer gave a good answer already. I'd add that numpy arrays + sequences will work unchanged; non-numpy arrays + sequences should error; and in general it's a bad idea to mix array objects coming from different libraries. Rather get a clear exception there and have the user convert one kind of array to the other, to get the behavior they intended. That's how all libraries except NumPy work, and what NumPy does is an endless source of bugs and user confusion.
The sparse-dense mix is the only one that in principle makes sense. I think that could be allowed in the future, but I'm not exactly sure under what circumstances. Right now all sparse array types forbid auto-densification, so where you can mix sparse-dense is quite restricted. PyTorch has the best story here, because it natively provides both dense and sparse tensors - but even there it's still a work in progress. |
Yes indeed, that is expected in many cases I'd say.
I'd say "it depends". If it's simple to reimplement
I'd prefer for coherent sets of functionality to be upgraded at once. To stay with Tyler's example of |
Thanks for the proposal! I doubt its urgent to figure out (see end), but I need to push back a little on promotion/mixing story here. The big issues are really not about mixing/promotion. They are about NumPy thinking that:
Unlike NumPy most array implementers do not pretend that its OK to promote almost everything to their array type. But promotion/mixing is a different thing. Many do promote: sparse/denseI admit sparse/dense may be trickier. But I can see that for many things it either makes perfect sense or can be trusted to raise a good error. Some ops densify, some ops don't, and those that where it's not clear raise. Is supporting limited promotion important enough to be needed? I don't know. I tend to think the answer is yes and I somewhat suspect e.g. Dask users may well be annoyed if its missing (I am assuming Dask has a good reason to support it very well, but trying to gauge it better. Would post the result on the array-api issue if there is a clearer one). Overall my angle/answer would be: Fortunately, it should not be a big mess to add simple promotion/mixing support here. (I am confident about that, but I think its important to clarify since it is not obvious.) |
Thanks for your thoughts Sebastian!
yes please:) I think our NumPy 2.0 plate is pretty full already, so I'm hesitant to invest a lot of time now - but if there's easy wins there to avoid promote-to-object-array issues, let's try and take them.
This is not really "promotion" in any proper sense. JAX and NumPy arrays are two different types of arrays that are unrelated by class hierarchy. There happens to be an ordering though, through I've worked on quite a few PyTorch issues with the same problem. PyTorch tensors and NumPy arrays are unrelated, so things like
Again, only when there is a clear relation.
If this is well-defined, I agree. We just need to be very explicit what the mechanisms are, and they should be introspectable. I want to avoid "just try it and see what rolls out, YMMV". |
Collapsing everying into details, because... It doesn't matter much for the RFC, but I can't leave some of these things unanswered.(We could get hung up and say I should use "implicit conversion" like the C-standard making the argument about implicit/explicit conversion – but Julia uses "promotion" the way I do as far as I understand, so I will change my habit when someone asks me to or points out why its very wrong.)
This is not about promote-to-object-array issues. Its about JAX sayings its higher priority then NumPy, but NumPy ignoring that when you write:
Aha? JAX and NumPy arrays are two different types of arrays that are unrelated by class hierarchy. Just as: Unless you also say So yes, there is an (abstract) base class ( Sure "array" is a bit of a more complicated then "number".
Fine its confusing and probably has things that need improving. But also But I don't even think that matters. If a type is comfortable with implementing And there is no problem with allowing such a type to signal that it wants to do just that. That still allows Torch to signal that it doesn't want to mix! So is it annoyingly hard to tell NumPy "no"? Yes, and maybe that needs to be improved. But frankly, I think it is misleading to use those difficulties as an argument that mixing is generally pretty much doomed to fail. We have many examples of successfull implementations that mix fine. Like dask, masked arrays, or quantities...
But because you stop stuffing things blindly into Take
No, I would be very surprised if Allan's masked array is a subclass and I am sure it has better behavior than The important point is that it wants to work. That can just as well be by implementing
This is a reason, unlike the "it's a mess" argument, IMO. We do have quite a bit of prior art, since every other protocol defines a mechanism. It still needs care of course (there are small variations after all). But at that point it's: We still need to figure that out because we don't want to get it wrong. And not "its a bad idea". |
Maybe we're not disagreeing all that much. I said briefly "it's a bad idea" because (a) it's apparently so complex that after all these years we still haven't figured it out indeed, and (b) the gains are in many cases a minor amount of characters saved to deal with explicit conversions in the user's code instead. So my personal assessment is that this isn't worth the effort. Others may disagree of course, and spend that effort. I think the appropriate place for discussing this topic further is either on the numpy repo or on pydata/duck-array-discussion, which was/is an effort specifically to sort out this hierarchy.
I agree. We're not closing any doors in this RFC, it can in principle be enabled in a future extension. |
A few assorted questions/comments:
|
Thanks @peterbell10, all good questions.
|
Let me get a bit more specific now. I started studying the demo work here: https://quansight-labs.github.io/array-api-demo/GW_Demo_Array_API.html
Even for this "well-behaved"/reasonable target, I ran into problems pretty early on. For example, both in the original feature branch and in my branch, there doesn't seem to be an elegant solution made for handling So, I guess my first real-world experience makes me wonder what our policy on special casing in these scenarios will be--ideally, I'd like to just remove the usage of Anyway, maybe I'll get an idea of what we want to do in general based on reactions to this specific case. I assume we could move much faster if we just accept out-of-array-API common shims by checking the array types like Anirudh and Ralf did here: https://github.com/AnirudhDagar/scipy/blob/array-api-demo/scipy/signal/spectral.py#L2007 in a sort of "perfection is the enemy of done" approach, though this is obviously not quite perfect. Something like this is clearly out of scope for |
Thanks @tylerjereddy, that's a very good question. I think that indeed this is something that will come up regularly, and the specific cases at hand may determine how fast things can be implemented and whether the end result is more/less maintainable or more/less performant. Every case will be a bit different, and in some a rewrite will help, in some it may lead to an extra code path or some such thing. Let's have a look at this case - this is the code in question: # Created strided array of data segments
if nperseg == 1 and noverlap == 0:
result = x[..., np.newaxis]
else:
# https://stackoverflow.com/a/5568169
step = nperseg - noverlap
shape = x.shape[:-1]+((x.shape[-1]-noverlap)//step, nperseg)
strides = x.strides[:-1]+(step*x.strides[-1], x.strides[-1])
result = np.lib.stride_tricks.as_strided(x, shape=shape,
strides=strides) The result = detrend_func(result) So now result2 = np.empty(shape, dtype=x.dtype)
for ii in range(shape[0]):
result2[ii, :] = x[ii*step:(ii*step + nperseg)] Now, to complicate matters a little, when we use # This is temporary, once all libraries have device support, this check is no longer needed
if hasattr(x, 'device'):
result = xp.empty(shape, dtype=x.dtype, device=x.device)
else:
result = xp.empty(shape, dtype=x.dtype)
for ii in range(shape[0]):
result2[ii, :] = x[ii*step:(ii*step + nperseg)] At that point, we can say we're happy with more readability at the cost of some performance (TBD if the for-loop matters, my guess is it will). Or, we just keep the special-casing for numpy using
Agreed, that will naturally emerge I think. |
FWIW, array-api-compat has a |
Thanks @tylerjereddy, as Ralf said, I'm pretty sure these will have to be dealt case by case, and hence in the previous comment above (Question 1), I wanted to know exactly what will be our policy on such cases. And I guess the answer is again dealing case by case. It would be very hard to know all such cases before starting to work on the transition. Regarding the demo, at the time of its development, our goal was to get a prototype version using Array API out (which is not perfect). The way it's done in my feature branch, special casing, is not ideal, and in such cases, it would come down to the possibility of refactoring the code to achieve the same using methods compliant with Array API as shown above by Ralf. @tylerjereddy there is this other demo that was developed by @aktech, and might be of interest too: https://labs.quansight.org/blog/making-gpus-accessible-to-pydata-ecosystem-via-array-api
This is a good idea, not only for SciPy, but I expect similar issues/problems will arrive with other array-consuming libraries (eg. |
It sounds like we propose to require libraries to be FFT API implementation complete--curiously, avoiding the usage of diff --git a/scipy/signal/_spectral_py.py b/scipy/signal/_spectral_py.py
index befe9d3bf..ea1f6eb27 100644
--- a/scipy/signal/_spectral_py.py
+++ b/scipy/signal/_spectral_py.py
@@ -1986,10 +1986,10 @@ def _fft_helper(x, win, detrend_func, nperseg, noverlap, nfft, sides):
# Perform the fft. Acts on last axis by default. Zero-pads automatically
if sides == 'twosided':
- func = xp.fft.fft
+ func = sp_fft.fft
else:
result = result.real
- func = xp.fft.rfft
+ func = sp_fft.rfft
result = func(result, n=nfft)
Also, for stuff like |
Brief follow-up, that's reproducible on |
There will be small differences in precision, because the |
You should be able to use |
I think this will clutter the SciPy code with endless container type conversions in any exported function or method. It will contribute to code rote and also make things slower. Basically I think users should be responsible for their own container types. |
The diff for my draft branch for enabling CuPy/NumPy swapping for One thing I found myself wanting was a universal/portable |
I don't think that will be the case. It's possible there will be a few more type checks, but no more (or even fewer) type conversions. A few examples:
Thanks for sharing your progress on trying this out Tyler! Regarding this performance expectation: typically CuPy will be slower than NumPy for small arrays (due to overhead of data transfer and starting a CUDA kernel) and there will be a cross-over point beyond which CuPy gets faster. Typically that's in the 1e4 to 1e5 elements range. And beyond 1e6 or 1e7 you'll see a roughly constant speedup, the magnitude of which will depend on the details but typically in the 5x-50x range. |
For this small script on that branch: import sys
import time
import numpy as np
import cupy as cp
from scipy.signal import welch
def main(namespace):
size = 70_000_000
if namespace == "numpy":
x = np.zeros(size)
elif namespace == "cupy":
x = cp.zeros(size)
x[0] = 1
x[8] = 1
f, p = welch(x, nperseg=8)
print("f:", f)
print("p:", p)
if __name__ == "__main__":
start = time.perf_counter()
main(namespace=sys.argv[1])
end = time.perf_counter()
print("Elapsed Time (s):", end - start)
I don't know that we expect that particular incantation to get faster though. Could do some profiling or check the device copy activity with
|
Some questions have come up recently w.r.t. dtype conversions:
|
|
I've seen speed and memory arguments made for using lower-precision types internally, so I'm not sure we can make a prescription for everyone. We can add notes to the documentation of functions that may suffer unexpectedly high precision loss with lower precision types, though. And users are of course always welcome to use |
Some hardware (e.g. some types of GPU) is very inefficient when it comes to float64 operations, and some hardware (e.g. some types of TPU) does not support float64 at all. So if you're talking about array API implmentations where users might be using accelerator-backed arrays, I think users will very much care what precision is used internally |
In a lot of cases, such numerical stability issues can be handled by writing dtype specific implementations of functions. This is the route I'm planning to take for |
I'd say that type promotion is not strictly related to the introduction of support for other array types - having SciPy functions behave uniformly and preserving lower-precision input dtypes makes perfect sense in a numpy-only world as well. We just happen to have a lot of historical inconsistencies here. There are two related questions here:
Putting it behind the There are other considerations as well, like we don't want to inflate binary size too much by templating over many dtypes. There are few functions which involve compiled code that can afford doing that.
The last bit here is very much a valid point. The array API design also doesn't care about what happens internally, the type promotion rules are for the expected output dtype. Even JAX and PyTorch will do upcasts internally (e.g., accumulators for float16 reductions are often float32 or float64) when precision requirements can't otherwise be met. My thoughts on @mdhaber's questions 1-4:
Not necessarily. I think we should only do this if we have
That seems like churn for no real reason. The end result is the same for the end user, with slightly more verbose code for them and a change they have to make.
Only if it makes the
That's the most tricky question perhaps, and also the only one that's directly array API standard related. I'd lean toward leaving the behavior for NumPy arrays unchanged for backwards compat reasons, but not accepting integer dtypes for other array types. |
It's not, but I think it's going to come up a lot with array API conversions. I've planned for a while to go through and look at
I'm proposing that the desired design is for import numpy as np
x = np.asarray([30000., 30010., 30020.], dtype=np.float16)
np.var(x) # inf - expected behavior, unless we change dtypes or work in log-space
# similar for other array backends Besides following precedent, this gives users control over the dtype used rather than imposing one on them. We could add a Is that the appropriate behavior for As a more detailed example of issue 1, let's look at a scipy/scipy/stats/_stats_py.py Line 7858 in 224dea5
Alternatively, here is a bare-bones implementation of a one-sided import numpy as np
from scipy import stats, special
def ttest_rel_respect_dtype(a, b):
dtype = np.result_type(a, b)
n = np.asarray(a.shape[0], dtype=dtype)
df = np.asarray(n - 1, dtype=dtype)
d = a - b
v = np.var(d, ddof=1)
dm = np.mean(d)
t = dm/np.sqrt(v/n)
p = special.stdtr(df, t)
return t, p At least in some cases, if we pass rng = np.random.default_rng()
x = rng.standard_normal(size=100)
y = rng.standard_normal(size=100)
dtype = np.float32
x, y = np.asarray(x, dtype=dtype), np.asarray(y, dtype=dtype)
ref = stats.ttest_rel(x, y, alternative='less')
t, p = ttest_rel_respect_dtype(x, y)
eps = np.finfo(t.dtype).eps
np.testing.assert_allclose(t, ref.statistic, rtol=eps)
np.testing.assert_allclose(p, ref.pvalue, rtol=eps) (If we use I think this is a case in which we could have a solid Do we do that, something else (e.g. perform variance calculation in log-space, add a
SciPy 1.13 produces |
Integers should not be promoted to float32 in stats because a float32 does not have the numerical precision to represent all int32 values precisely. The mantissa is too small. That is why float64 is needed. We still have the problem that not all int64 can be represented by a float64, but the problem only exists for very large ones. It is common in statistics that datasets consist of integers but computations have to be done in floating point. Then we want a floating point mantissa that allow us to represent all values in the dataset precisely. A float32 will often be sufficient, but sometimes it will not. A float64 will nearly always suffice, except for astronomically large integers. |
There is no such thing as a solid float32 implementation when a covariance matrix is close to singularity (which is not uncommon in statistics). I feel there is a general lack of appreciation for one of the most fundamental aspects of scientific computing, i.e. that correctness always beats performance. If you want statistics to run very fast, you can e.g. use Cholesky or LU instead of QR or SVD for inverting covariance matrices and fitting least-squares models. But there is a reason we don’t do that. And if you are sacrificing that kind of speed for numerical stability, there is no good reason to throw that stability away by introducing float32 and obtaining large rounding and truncation errors, which is often catastrophic in the case of an ill-conditioned covariance. |
(2) isn't that relevant I think, since pre-NEP 50 behavior is going away. We should use the array API and NumPy 2.0 rules. (1) is useful indeed, and should be a consistency improvement - but not necessarily an excuse for large-scale breaking changes.
Not necessarily, that's implementation-defined. Your example is a little misleading to the point you're trying to make, because the end result also doesn't fit in >>> f16_max = np.finfo(np.float16).max
>>> f16_max
65500.0
>>> f16_max + np.array(25.0, dtype=np.float16)
inf
>>> x = np.array([25., f16_max, -25.], dtype=np.float16)
>>> np.sum(x) # final result fits in float16, but a naive implementation would overflow
65500.0
>>> np.std(x) # this does overflow
inf
>>> np.std(x.astype(np.float32)).astype(np.float16) # but the end result fits
30880.0 The returned end result should have dtype
I think this is a potentially huge amount of work, with more breakage than adding array API support. It's also going to be very hard (impossible?) to achieve consistently, since
I agree - but don't think anyone said this? NumPy will promote to >>> (np.array([3, 4], dtype=np.int32) + 1.5).dtype
dtype('float64')
>>> (np.array([3, 4], dtype=np.int32) + np.float16(1.5)).dtype
dtype('float64')
I somewhat agree with the sentiment. There are types of algorithms that make sense to provide in lower precision variants (float32 in particular), because accuracy is still acceptable and performance really matters. FFT routines are a good example. For many element-wise functions (e.g., in |
I agree this is fundamental in some areas of scientific computing, but there are areas of computational science where it is not. As an example, look at the adoption of bfloat16 for the case of deep neural nets: we literally take a float32 and chop off the last 16 bits of mantissa, resulting in much faster computation—and surprisingly, in many cases this truncation actually improves model performance, as it acts as a sort of implicit regularization. I wouldn't claim the same exercise would work in every context, but it's an example where what's fundamental in one domain is not fundamental in another. Does SciPy intend to support these kinds of use cases? The answer may be no – but if the answer is no, let's say that explicitly, rather than pretending such domains don't exist! |
I've changed the I'm fine leaving compiled code working with I proposed what I did as the typical behavior in part because it's easier than tip-toeing around existing promotions to I won't post anymore about this here since it may not be strictly related to array API conversion and this seems to be a sticking point. I'll open a separate issue if needed to get to the desired behavior for these stats conversions. |
Just for the special suite; the dtype downcasting is not very helpful in a sizable chunk of special because the very reason they exist is due to the fact that a precise calculation is needed; say, It is on paper great to have them bitsize agnostic, however, not all, but many will have diminishing returns if not confusing the user by the underflows. A zebra-patterned API, some with |
Agreed - and I'd say yes, we do intend to support such use cases assuming they are judged to fit in some submodule well enough. It's all context/domain-specific.
Not quite true I think - there's a lot of compiled code we have that will dispatch back to CuPy/JAX/PyTorch when they have matching functions. And that set may change/grow over time.
The obvious alternative is "status quo" with careful changes where we don't like the current behavior. It will really depend per module or per set of functionality. There is significant demand for It's hard to say more though. I think "always return
|
I would also be opposed to a zebra like pattern, but just want to point out that catastrophic cancellation becomes an issue for the naive implementations of I also want to mention that it should be relatively straightforward to create lower precision versions of special functions. I think the main step would be refitting minimax polynomial approximations to the desired precision; and it's actually easier to find good minimax approximatoins at lower precision because one can evaluate more exhaustively over floating point values. I'm not saying we would go to such lengths, but it's actually feasible to create correctly rounded implementations of mathematical functions in 32 bit, such as here, something which cannot be done for 64 bit math. Beyond refitting minimax polynomials, I think only small adjustments would be needed. |
This is a long read and a proposal with quite a large scope and a decent amount of backwards-compatibility impact. I hope it'll make SciPy's behavior much more consistent and predictable, as well as yield significant performance gains. I'll post this on the mailing list too. I'd suggest to discuss the big picture first; will ask for process suggestions on the mailing list too in case we'd like to split up the discussion, open a doc/proposal PR for detailed review, or anything like that.
The basic design principle I propose aiming for in all of SciPy for array libraries is: container type in == container type out.
Python sequences (lists, tuples, etc.) will continue to be converted to NumPy arrays, as will other unknown types that are coercible with
np.asarray
.The scope of this proposal is: how to treat array inputs. This includes different kinds of NumPy arrays and ndarrays with particular dtypes, as well as array/tensor objects coming from other libraries.
Out of scope for this proposal are (a) dataframe libraries, and (b) implementation of a dispatch mechanism for when non-numpy array types hit C/C++/Cython/Fortran code inside SciPy. Both of these topics are touched upon in the Appendix section.
I'll dive straight into the design here; for context on why we'd want/need this design or what has been done and discussed before, see the Context and Problems & opportunities sections further down.
array/tensor types support
Array types and how to treat them:
__array_namespace__
object
dtypestats.mstats
nan_policy
/mstats
plans are unaffected by thisnp.asanyarray
checkstorch.Tensor
on CPU/GPUarray-api-compat
When a non-NumPy array type sees compiled code in SciPy (which tends to use the NumPy C API), we have a couple of options:
__array__
), use the compiled code in question, then convert back.I'll note that (1) is the long-term goal; how to implement this is out of scope for this proposal - for more on that, see the Appendix section. For now we choose to do (2) when possible, and (3) otherwise. Switching from that approach to (1) in the future will be backwards-compatible.
A note on
numpy.matrix
: the only place this is needed isscipy.sparse
, it can be vendored there and for NumPy >= 2.0 instances of the vendored matrix code can be returned. That allows deprecating it in NumPy. We need to supportscipy.sparse.*_matrix
long-term for backwards compatibility (they're too widely used to deprecate), however for new code we havesparse.*_array
instances and PyData/Sparse.Regarding array API support: when it's present in an array library, SciPy will require it to be complete (v2022.12), including complex number support and the
linalg
andfft
submodules - supporting partial implementations withoutlinalg
support in particular seems unnecessary.For as-yet-unsupported GPU execution when hitting compiled code, we will raise exceptions. The alternative considered was to transfer to CPU, execute, and transfer back (e.g., for PyTorch). A pro of doing that would be that everything works, and there may still be performance gains. A con is that it silently does device transfers, usually not a good idea. On balance, something like this is only a good idea if there's a well-defined plan to make GPU execution work for most functionality on a reasonable time scale (~12-18 months max). Which means both addressing the dispatch (uarray & co) problem that I am trying hard to avoid diving into here, and having time commitments for doing the work. Since we don't have that, raising exceptions is the way to go.
Development and introduction strategy
The following strategy is meant to allow implementing support in a gradual way, and have control over when to default to a change in behavior - which necessarily is going to have some backwards compatibility impact.
array-api-compat
as an optional dependency for nowarray-api-compat
numpy.array_api
for compliance testing in APIs that start supporting array API compliant input, and perhaps pandas Series/DataFrame. GPU CI may materialize at some point, but let's not count on or plan for that.scipy._lib
and depend on that from other submodules. That means we'll have a single replacement fornp.asarray
/np.asanyarray
in one central location.Context
numpy
namespacenumpy.array_api
module is useful for testing that code is actually using only standard APIs, because it's a minimal implementation. It's not going to be used as a layer for production usage though, it's for portability testing only.array-api-compat
as the way to go. This looks quite a bit nicer than the previous approach withnumpy.array_api
, and the work to support PyTorch that way has been quite well-received.__duckarray__
), NEP 31 (uarray
) and NEP 37 (__array_module__
) are all effectively dead - I'll propose rejecting these.numpy.array_api
) has been implemented, however I'm going to propose marking it superceded via a new NEP for the mainnumpy
namespace__array_function__
and__array_ufunc__
are fully implemented and will continue to exist and be supported in NumPy. We won't support those mechanisms in SciPy though, since we are coercing unknown input types tondarray
and error out if that fails. The exception here is ufuncs inscipy.special
, which happen to work already because we're reusing the numpy ufunc machinery there. We can probably leave that as is, since it's not problematic.np.matrix
instances not being Liskov-substitutable (i.e. it's a drop-in replacement, no changes in behavior but only extensions), making it difficult to accept them. We can explicitly start rejecting those with clear exceptions, that will make regular subclasses a lot more useful.numpy.ma
has tons of issues and isn't well-maintained. There's a full rewrite floating around that is ~90% complete with some luck will make it into NumPy 2.0. However, it hasn't seen movement for several years, and the work on that is not planned.numpy.ma
in its current form should be considered legacy.scipy.stats.mstats
is the only module that specifically supports masked arrays.scipy.stats
has anan_policy
keyword in many functions that is well-maintained at this point, and has a documented specification. That is probably not applicable to other submodules though.interpolate
andspatial.KDTree
(see gh-18230) may make sense, but ideally that'd use solid numpy support (e.g., via a null-aware dtype) and that does not exist.numpy.ma.MaskedArray
instances".Problems & opportunities
The motivation for all the effort on interoperability is because the current state of SciPy's behavior causes issues and because there's an opportunity/desire to gain a lot of performance by using other libraries (e.g., PyTorch, CuPy, JAX).
Problems include:
Decimal
and other random things that people stuff into object arrays on purpose or (more likely) by accident, being handled very inconsistently.Opportunities include:
References
numpy.array_api
is a bit cumbersome): gh-15395Appendix
Note: these topics are included for reference/context because they are related, but they are out of scope for this proposal. Please avoid diving into these (I suggest to ping me directly first in case you do see a reason to discuss these topics).
dataframe support
For dataframe library support, the situation is a little trickier. We have to think about pandas
Series
andDataFrame
instances with numpy and non-numpy dtypes, the presence of nullable integers, and other dataframe types which may be completely backed by Apache Arrow, another non-numpy library (e.g., cuDF & CuPy), or have implemented things completely within their own library and may or may not have any NumPy compatibility.This would be one option, which is somewhat similar to what scikit-learn does (except, it converts nullable integers to float64 with nans):
__dataframe__
There's a lot to learn from the effort scikit-learn went through to support pandas dataframes better. See, e.g., the scikit-learn 1.2 release highlights showing the
set_output
feature to request pandas dataframe as the return type.Note: I'd like to not work this out in lots of detail here, because it will require time and that should not block progress on array library support. I just want to put it on the radar, because we do need to deal with it at some point; current treatment of dataframes is quite patchy.
Dispatching mechanism
For compiled code, other array types (whether CPU, GPU or distributed) are likely not going to work at all; the SciPy code is written for the NumPy C API. It's not impossible that some Cython code will work with other array types if those support the buffer protocol and the Cython code uses memoryviews - but that's uncommon (won't work at all on GPU, and PyTorch doesn't support the buffer protocol on CPU either).
There has been a lot of discussion on how this should work. The leading candidate is Uarray, which we already use in
scipy.fft
(as do matchingfft
APIs in CuPy and PyFFTW) and has other PRs pending in both SciPy and CuPy. However, there is also resistance to that because it involves a lot of complexity - perhaps too much. So significant work is needed to simplify that. Or switch to another mechanism. This is important work that has to be done, but I'd prefer not to mix that with this proposal.Whatever the mechanism, it should work transparently such that
scipy.xxx.yyy(x)
wherex
is a non-numpy array should dispatch to the library which implements the array type ofx
.We have a uarray label in the issue tracker. See gh-14353 for the tracker with completed and pending implementations. For more context and real-world examples, see:
The text was updated successfully, but these errors were encountered: