-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Non-uniform rebinning #345
Comments
Below is my own implementation for the histogram manipulation that we want: right now it only handles NamedHist and is probably super slow for large histograms. But is a good staring point for describing what we want to do: We would run it something with like: https://gist.github.com/yimuchen/a5e200c001ef4ea01681a7dd8fe89162 |
A nice interface for this would be using array indexing, e.g. doing:
Would return a new histogram with variable binning, and the central three bins merged into one. |
+1 on this |
If I may add, it would be also nice to be able to rebin an histogram based on a second one: >>> h1.rebin(h2.axes.egdes) |
I just ran into a setup where I was looking for such a feature as well. Both the ability to specify new bin edges explicitly, and the possibility to pick specific bins to be merged (like @swertz's example) would be very useful! |
+1 on this. Having this functionality would make it a lot easier to produce quality plots for coffea based analyses. |
Probably others have too, but I've written a function for this for my own studies using hist: https://gist.github.com/kdlong/d697ee691c696724fc656186c25f8814 I think it is unique from the previous implementation in that it uses np.add.reduceat, so it shouldn't be so slow. I have fought with some details of it (like treating the overflow and underflow when rebinning to subset), and I think I've validated it, but I wouldn't swear in blood that there aren't mistakes. Others can try it out if it's useful, and I could convert it to a PR if it goes in the direction the developers would want. |
https://github.com/fabriceMUKARAGE/rebinning_histogram Based on the feedback above, here is the Non-uniform rebinning I was working on. Maybe you can check it out if it makes sense. REAMe.md and the comments in the code explain it better I guess |
@kdlong Thanks! Not a developer but I think it would be a useful PR 🙂 |
From what I can tell looking at related posts/issues/features the real developers are close to having something centrally supported. |
Added to boost-histogram in scikit-hep/boost-histogram#913 |
Yes, it's available now; copying from @Saransh-cpp: In [1]: import hist
In [2]: hist.__version__
Out[2]: '2.8.0'
In [3]: import numpy as np
In [4]: h = hist.Hist(hist.axis.Regular(10, 0, 1))
In [5]: h
Out[5]: Hist(Regular(10, 0, 1, label='Axis 0'), storage=Double())
In [6]: h.fill(np.random.normal(size=1_000_000))
Out[6]: Hist(Regular(10, 0, 1, label='Axis 0'), storage=Double()) # Sum: 341415.0 (1000000.0 with flow)
In [7]: rebin = hist.rebin(factor=2)
In [8]: h[::rebin]
Out[8]: Hist(Regular(5, 0, 1, label='Axis 0'), storage=Double()) # Sum: 341415.0 (1000000.0 with flow)
In [9]: rebin = hist.rebin(groups=[1, 2, 3, 4])
In [10]: h[::rebin]
Out[10]: Hist(Variable([0, 0.1, 0.3, 0.6, 1], metadata=...), storage=Double()) # Sum: 341415.0 |
Thank you for introducing this new feature.
I have tested the same snippet of code with hist version 2.8.0 and it works for me. Then, I have tested to rebin a 2D histogram with In [1]: from coffea.util import load
In [2]: import hist
In [3]: filename = "/work/mmarcheg/ttHbb/analysis/ttlf_background_calibration/ttlf_background_calibration_betterbins_2018/output_all.coffea"
...:
In [4]: o = load(filename)
In [5]: years = list(o["datasets_metadata"]['by_datataking_period'])
...: h = o["variables"]["Njet_Ht"]
In [6]: samples_data = [s for s in h.keys() if s.startswith("DATA")]
In [7]: h_data = sum({k :val for s in samples_data for k, val in h[s].items() }.values())
In [8]: h_data = h_data[{'cat':'semilep'}]
In [9]: h_data
Out[9]:
Hist(
Variable([4, 5, 6, 7, 8, 9, 11, 20], name='events.nJetGood', label='$N_{JetGood}$'),
Variable([0, 100, 200, 300, 400, 500, 750, 1000, 1250, 1500, 2000, 2500, 5000], name='events.JetGood_Ht', label='Jets $H_T$ [GeV]'),
storage=Weight()) # Sum: WeightedSum(value=1.64868e+06, variance=1.64868e+06) (WeightedSum(value=1.64868e+06, variance=1.64868e+06) with flow)
In [10]: rebin = hist.rebin(groups=[1,1,2,3])
In [11]: h_data[::rebin,sum]
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
Cell In[11], line 1
----> 1 h_data[::rebin,sum]
File /work/mmarcheg/micromamba/envs/pocket-coffea/lib/python3.9/site-packages/hist/basehist.py:417, in BaseHist.__getitem__(self, index)
410 def __getitem__( # type: ignore[override]
411 self, index: IndexingExpr
412 ) -> Self | float | bh.accumulators.Accumulator:
413 """
414 Get histogram item.
415 """
--> 417 return super().__getitem__(self._index_transform(index))
File /work/mmarcheg/micromamba/envs/pocket-coffea/lib/python3.9/site-packages/boost_histogram/_internal/hist.py:924, in Histogram.__getitem__(self, index)
922 for _ in range(group):
923 pos = [slice(None)] * (i)
--> 924 new_view[(*pos, new_j + 1, ...)] += reduced_view[ # type: ignore[arg-type]
925 (*pos, j, ...) # type: ignore[arg-type]
926 ]
927 j += 1
929 reduced = new_reduced
UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype([('value', '<f8'), ('variance', '<f8')]), dtype([('value', '<f8'), ('variance', '<f8')])) -> None
In [12]: h_data.storage_type
Out[12]: boost_histogram.storage.Weight Is the rebinning of histograms with |
Right now, histograms can only be rebinned by some integer amount via the
hist.rebin
indicator. I would be nice if there was some way to rebin a certain axis arbitrary bin edges, as we might want to rebin a regular axis to be irregular just for low statistic region, without requiring the exact binning scheme to be known during histogram construction.I'm not sure what the most optimal method should be, maybe something like extending the existing
hist.rebin
class to include something likehist.rebin( <new_axis>/<new bin edges> )
to specify the new binning scheme of interest?The text was updated successfully, but these errors were encountered: