-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Report expected p-values based on their quantiles under background hypotheses #1162
Conversation
Tests need to get fixed. Probably also have a good test to assert correct behavior. |
I added a validation script StandardHypoTestDemo.py which roughly follows StandardHypoTestDemo.C except that it configures it for exclusion instead of discovery. With this we can xcheck toys against ROOT. for the workspace 'xmlimport_input_bkg.json' (which si xmlimport_input_bkg excecpt with data put on bkg expectation) it looks like this https://root.cern.ch/doc/v614/StandardHypoTestDemo_8C.html The pyhf code is |
ok some more important points
once this is done, the delta peak gets broadened (left ROOT right toys) the brazil band doesn't look too great still, but it's looking better - trying w/ more toys This is the workspace
|
This pull request introduces 2 alerts when merging a8c2f29 into 81c9adb - view on LGTM.com new alerts:
|
a8c2f29
to
08247c0
Compare
This pull request introduces 2 alerts when merging d4d1898 into a3b34a5 - view on LGTM.com new alerts:
|
d4d1898
to
4f79f05
Compare
This pull request introduces 4 alerts when merging 63eaa6d into e4011ff - view on LGTM.com new alerts:
|
Codecov Report
@@ Coverage Diff @@
## master #1162 +/- ##
=======================================
Coverage 97.48% 97.48%
=======================================
Files 63 63
Lines 3733 3740 +7
Branches 530 531 +1
=======================================
+ Hits 3639 3646 +7
Misses 55 55
Partials 39 39
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
b77d7de
to
ab1b78e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice rebasing work @kratsg.
So a concern that I have here, that I myself contributed to when working on this weeks ago, is that we're introducing inconsistent return types for CLs-like values now. Previously, all CLs-like / p-value-like values were 0
-d tensors
pyhf/src/pyhf/infer/__init__.py
Line 149 in 0e71f2f
# Ensure that all CL values are 0-d tensors |
However, now we have things like hypotest
returning 0
-d tensors and things like pvalues
and expected_pvalues
returning floats only for NumPy and for all other backends returning 0
-d tensors, yet in all the examples we're showing CLs-like / p-value-like values being returned.
Example: Run the docstring example from this PR for expected_pvalues
for various backends and note the return type of CLs_exp_band[0]
.
import pyhf
for backend in ["numpy", "pytorch", "jax"]:
print(f"\nbackend: {backend}")
pyhf.set_backend(backend)
model = pyhf.simplemodels.hepdata_like(
signal_data=[12.0, 11.0], bkg_data=[50.0, 52.0], bkg_uncerts=[3.0, 7.0]
)
observations = [51, 48]
data = observations + model.config.auxdata
mu_test = 1.0
asymptotic_calculator = pyhf.infer.calculators.AsymptoticCalculator(
data, model, test_stat="qtilde"
)
_ = asymptotic_calculator.teststatistic(mu_test)
sig_plus_bkg_dist, bkg_dist = asymptotic_calculator.distributions(mu_test)
CLsb_exp_band, CLb_exp_band, CLs_exp_band = asymptotic_calculator.expected_pvalues(
sig_plus_bkg_dist, bkg_dist
)
print(f"CLs expected band: {CLs_exp_band}")
print(
f"of type: {type(CLs_exp_band[0])} and shape {pyhf.tensorlib.shape(CLs_exp_band[0])}"
)
gives
backend: numpy
CLs expected band: [0.0026062609501074576, 0.01382005356161206, 0.06445320535890459, 0.23525643861460702, 0.573036205919389]
of type: <class 'numpy.float64'> and shape ()
backend: pytorch
CLs expected band: [tensor(0.0026), tensor(0.0138), tensor(0.0645), tensor(0.2353), tensor(0.5730)]
of type: <class 'torch.Tensor'> and shape ()
backend: jax
CLs expected band: [DeviceArray(0.00260626, dtype=float64), DeviceArray(0.01382005, dtype=float64), DeviceArray(0.0644532, dtype=float64), DeviceArray(0.23525643, dtype=float64), DeviceArray(0.57303619, dtype=float64)]
of type: <class 'jax.interpreters.xla._DeviceArray'> and shape ()
We should try to come to a consensus on what the return type for a CLs-like / p-value-like value should be. From looking back at PR #944 and Issue #714, I think that the CLs-like values should be 0
-d tensor as it allows for us to sidestep this difference in backend behavior (I believe this is the motivation for the current behavior). While conceptually it makes sense to have a p-value-like just be a float to emphasize the scalar nature, having it be a 0
-d tensor make it appear to a user as still having scalar like behavior.
The validation notebooks here are fine for now and we can revise them when we address Issue #1241.
@kratsg all the notebook stuff is great though. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLs-like / p-value-like return type problem is now moved to Issue #1268 and will be resolved in a follow up PR.
Description
Fixes issue reported by @kpachal
This changes the way we report expected limits
we used to
however this can lead to orderings in which the p-values are not monotonic as the relationship between CLs and test statistic value is not necessarily monotonic..
we now
This is also what ROOT does. It might present some future questions regarding #966 but is ok for now.
produces this plot
ReadTheDocs build: https://pyhf.readthedocs.io/en/fix_for_kate/api.html#inference
For the PR Assignees: