Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add atleast_nd #3

Merged
merged 11 commits into from
Sep 24, 2024
Merged

ENH: add atleast_nd #3

merged 11 commits into from
Sep 24, 2024

Conversation

lucascolley
Copy link
Collaborator

No description provided.

@lucascolley lucascolley force-pushed the atleast_nd branch 3 times, most recently from 3953b1d to 60efb68 Compare September 20, 2024 11:17
@lucascolley
Copy link
Collaborator Author

@asmeurer are you sure we want to allow not passing xp and calling array_namespace ourselves?

If we didn't allow that, I think we could make this package have no dependency on array-api-compat, which would make it a lot easier to vendor. From SciPy's perspective, we will always have access to xp when using functions from this package, so it isn't necessary.

@lucascolley lucascolley force-pushed the atleast_nd branch 2 times, most recently from 2dae393 to c417ab4 Compare September 20, 2024 17:51
@lucascolley
Copy link
Collaborator Author

This is ready to merge from my side.

@asmeurer
Copy link
Member

We should probably try to get feedback from other people who would be using this. My feeling is that people might prefer to call atleast_nd(x) rather than atleast_nd(x, xp).

I suppose one option would be to make xp optional, like

def atleast_nd(x, xp=None):
    if xp is None:
        from array_api_compat import array_namespace
        xp = array_namespace(x)

That way you can get the best of both worlds.

We would probably want to move that logic into a common decorator. And also do some indirection for array_api_compat so that people can easily swap it out for a vendored version. (although OTOH the more indirection and decorators we use, the harder it is for people to just copy-paste a single function).

@lucascolley
Copy link
Collaborator Author

lucascolley commented Sep 20, 2024

@betatim @ogrisel @thomasjpfan hello 👋 this is the start of an attempt to share array-agnostic implementations of functions that build on top of the standard across consumer libraries. So instead of storing private implementations in each consumer library, they'll be stored here and one can do:

from ... import array_api_extra as xpx
...
xpx.atleast_nd(x, ndim=2, xp=xp)

I'd appreciate if any of you could weigh in from a scikit-learn perspective on the question of whether these functions should require xp as an argument? (EDIT: where xp is a standard-compatible namespace for x)

We could make xp optional and call array_namespace on the input arrays when it isn't passed, which would make using these functions ever so slightly more flexible and keep the APIs "clean" in a sense. But from my perspective in SciPy, we will always be using these functions after xp = array_namespace(x), so xp will always be available to us to pass. Furthermore, passing xp around will avoid redundant array_namespace calls.

If we force developers to pass xp into these functions, we can remove any dependency on array-api-compat here which will make it easier to enable simultaneous vendoring of the two libraries. I think that sounds like a win with no serious downside - what do you think?

@asmeurer
Copy link
Member

I should point out that not using array-api-compat will make writing this library much harder as you would have to avoid using APIs that differ between the array libraries (or else effectively re-implement array-api-compat in helpers). I personally don't think it's a worthwhile goal.

@lucascolley
Copy link
Collaborator Author

you would have to avoid using APIs that differ between the array libraries

Which APIs are you thinking of? I forgot that we're probably at least going to need array-api-compat's size helper. Maybe we can get away with just copying some functions from array-api-compat?

To be clear, xp should be a namespace that comes from array-api-compat.

@asmeurer
Copy link
Member

Literally any API you might use. For example, in this function you're using expand_dims, but expand_dims does not exist in torch, only in array_api_compat.torch.

To be clear, xp should be a namespace that comes from array-api-compat.

I suppose that does sidestep the issue. I still think it would be annoying as a user to have to pass in xp to every function. I'm a little surprised you're doing that in scipy, but I can't imagine every library wants to do that.

@lucascolley
Copy link
Collaborator Author

lucascolley commented Sep 20, 2024

I suppose that does sidestep the issue. I still think it would be annoying as a user to have to pass in xp to every function. I'm a little surprised you're doing that in scipy, but I can't imagine every library wants to do that.

To clarify, the user-base for this library is consumer-library authors who are already using array-api-compat namespaces everywhere. In other words, we are just targeting the spec, not existing libraries.

@thomasjpfan
Copy link

I'd appreciate if any of you could weigh in from a scikit-learn perspective on the question of whether these functions should require xp as an argument?

I prefer always passing it in as an argument.

@lucascolley
Copy link
Collaborator Author

hello @KelSolaar too! You had previously expressed interest in functions like atleast_nd from scipy._lib._array_api being available in a public namespace. Do you have an opinion on the question in #3 (comment) above?

@lucascolley
Copy link
Collaborator Author

Here's a PR showing that vendoring this package and using xpx.atleast_nd works in SciPy: scipy/scipy#21600

@lucascolley
Copy link
Collaborator Author

pixi run docs:

image

@lucascolley
Copy link
Collaborator Author

@vnmabus perhaps you have an opinion on the question in #3 (comment) too?

@vnmabus
Copy link

vnmabus commented Sep 23, 2024

@vnmabus perhaps you have an opinion on the question in #3 (comment) too?

I am not sure of why you request my particular feedback. Personally, I think that it is probably better to prevent writing code that do unnecessary calls to array_namespace, but having to pass xp to each function seems also very redundant, and it is not the same style used by the array API functions.

I wonder if it wouldn't be cleaner to have a function that receives the xp namespace and then returns an "extended" namespace with the extra functions, and with xp already "bound". That way the usage would be more similar to the array API pattern (i.e. obtain a namespace for particular arrays and then just use it).

@lucascolley
Copy link
Collaborator Author

lucascolley commented Sep 23, 2024

I am not sure of why you request my particular feedback.

Mainly because I don't know many other array-api-compat users!

I wonder if it wouldn't be cleaner to have a function that receives the xp namespace and then returns an "extended" namespace with the extra functions, and with xp already "bound". That way the usage would be more similar to the array API pattern (i.e. obtain a namespace for particular arrays and then just use it).

Cool idea! I guess something like the following would do it?

extra_funcs = {'atleast_nd': atleast_nd}

def get_extra_namespace(xp: ModuleType) -> ModuleType:
    class ExtraNamespace:
        def __getattr__(self, name: str):
            if name in extra_funcs:
                func = functools.partial(extra_funcs[name], xp=xp)
                return func
            else:
                return getattr(xp, name)
				# or `return AttributeError` ?

    return ExtraNamespace(xp)

I'm not sure whether we'd want to use that yet in SciPy, since in most functions we would only be calling an xpx function once, so the extra line to acquire the extra namespace isn't really worth it. But I do think the following pattern looks quite clean:

xp = array_namespace(x)
xpx = get_extra_namespace(xp)
...
x = xpx.atleast_nd(x, ndim=5)
y = xp.sum(x)

It would probably be better to keep uses of the two patterns separate, to avoid xpx sometimes needing an xp kwarg and sometimes not.

Maybe it would be better for get_extra_namespace to return a namespace with just the extra functions, rather than an extended namespace, though? I think it would be nice to keep calls to standard functions as calls to xp rather than xpx.

Thanks for the thoughts!

@betatim
Copy link
Member

betatim commented Sep 23, 2024

In scikit-learn we use the foo(..., xp=None) pattern. That way it is optional, but you can pass it in.

I don't have a strong opinion either way. It feels weird to pass it in, but it saves some repetitive code, but then making it optional you need repetitive code again (or decorator). I am not massively convinced regarding the performance benefit of not repeatedly calling array_namespace. So overall, it is a definitive maybe from me :D

@lucascolley
Copy link
Collaborator Author

I opened gh-6 to discuss potential alternatives to the xp=xp interface to explore in the future. Right now, I'm happy to keep it as the pattern is (optionally) used across SciPy and scikit-learn already.

I'm going to merge this as the function is passing NumPy-quality tests, works in SciPy, and is well documented. Thanks all for the thoughts!

@lucascolley lucascolley merged commit 241f566 into data-apis:main Sep 24, 2024
3 checks passed
@lucascolley lucascolley deleted the atleast_nd branch September 24, 2024 11:42
@asmeurer
Copy link
Member

OK, that's fine. I think we can easily change things around in the future without too many backwards compatibility concerns.

@asmeurer
Copy link
Member

Did you want to make a release on PyPI? If we set up something like https://github.com/data-apis/array-api-compat/blob/main/.github/workflows/publish-package.yml and set up things on PyPI.

@lucascolley
Copy link
Collaborator Author

Sure, if you could do whatever needs Owner permissions I can do the rest - I would like to learn the process.

@asmeurer
Copy link
Member

If you want you can try to set it up yourself, and add me as an owner on PyPI. I'll give you maintain role on this repo, which should hopefully be enough to add the right info to the repo settings.

I would start with the publish file from array-api-compat, as well as the instructions at data-apis/array-api-compat#51 (I don't know if there are better docs for this somewhere). The best thing to do is to get everything working on TestPyPI first, and then once you have that working, enable publishing to real PyPI. I use a workflow where I push a tag up to the repo, but if you'd prefer to do a release via the web interface I think you might need to change things a little bit. Just remember that once you push a release to PyPI, you cannot undo it. If there's a problem, the only way to fix it is to bump the version number.

I would also recommend adding https://github.com/data-apis/array-api-compat/blob/main/.github/dependabot.yml and https://github.com/data-apis/array-api-compat/blob/main/.github/workflows/dependabot-auto-merge.yml to this repo. That will keep the GitHub Actions versions up-to-date automatically. Actually the newest version of publish-package is supposed to have something that makes setting up PyPI easier https://github.com/pypa/gh-action-pypi-publish/releases

@lucascolley
Copy link
Collaborator Author

Thanks Aaron, sounds good to me.

I don't know if there are better docs for this somewhere

I suppose there is Henry’s https://learn.scientific-python.org/development/guides/gha-pure/#job-setup, but who knows which is “better” :)

The repo template (scientific-python/cookie) already included dependabot. I’ll add the auto-merge workflow.

@lucascolley
Copy link
Collaborator Author

lucascolley commented Sep 24, 2024

@asmeurer I've requested access for the all contributors bot, would you be able to grant that? And could you also import the project on readthedocs?

@asmeurer
Copy link
Member

@asmeurer I've requested access for the all contributors bot, would you be able to grant that?

I think this might be something we used to have but removed, so I'll need to check if that's the case and why it was removed if so.

And could you also import the project on readthedocs?

Let's use GitHub pages (or netlify if you prefer) so the docs appear under data-apis.org like the rest of the projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new function
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants