Skip to content
This repository has been archived by the owner on Oct 25, 2022. It is now read-only.

Stability of histogram types #42

Closed
jonas-eschle opened this issue Sep 28, 2020 · 3 comments
Closed

Stability of histogram types #42

jonas-eschle opened this issue Sep 28, 2020 · 3 comments

Comments

@jonas-eschle
Copy link

Hi all, thanks a lot for this repo!

What is the status of the stability for the binning types? We are interested to use them as the binning definitions in zfit. Or is there a subset that you would consider "stable", especially the simpler ones?

@jpivarski
Copy link
Member

Aghast is not in active development, so in that sense, it's absolutely stable. If it were in active development, I'd want to reshape the interface, so in that sense, it is not stable.

The key thing came in March 2019 (!), @nsmith-'s observation that all binning being rectilinear is a problem for ever supporting sparse data, which are essentially jagged arrays (#10). It got me thinking that the bins shouldn't be a NumPy array, but an Awkward Array. At the time, that was a non-starter because Awkward Array had to be rebuilt as Awkward 1 (August 2019 through April 2020).

This year, @henryiii and @LovelyBuggies developed quite a lot of hist; if the official 2.0 release isn't out yet, it will be very soon. They did some Aghast integration (e.g. #39) and we had many conversations about it, in which I made it clear that I'm unable to maintain Aghast, let alone give it the essential upgrade it needs to future-proof it for sparse histograms.

A great idea that came out of those conversations was to backpedal this somewhat and introduce a histogram protocol, rather than a universal format, using Python typing. A protocol is an API that histogram libraries can adhere to (not necessarily exclusively) and histogram-using libraries can expect (as a minimum). If that protocol is expressed in Python types, then it is an interface that only histogram libraries in Python can share. Aghast was more ambitious; it was intended to be an ABI, a block of bytes that can be interpreted as a histogram between processes and across languages, but that may be more than we need. If at least one histogram library that shares the API has a good serialization, they all effectively inherit it.

I haven't been able to find a link to those conversations; I've been searching GitHub commits and issues, but I don't know where we talked about it.

@henryiii
Copy link
Member

Discussions for the API scikit-hep/boost-histogram#423, scikit-hep/uproot3#511 are not progressing very well, see scikit-hep/boost-histogram#459.

@jonas-eschle
Copy link
Author

Many thanks for the answer, it's quite insightful!

So in total, we will stick with the development and axes types closer to hist (knowing that it is under development, but we don't need a lot of it either).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants