diffNuisances.py: adding per-nuisance delta NLL #827
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes a few changes to
diffNuisances.py
, the largest of them being the ability to print and plot explicitly the change in log-likelihood between the background-only and signal+background fits for every given parameter.Because the likelihood factorizes into a Poisson over each bin and a constraint term over each nuisance, the contribution of each bin and/or nuisance can be directly determined. These differences simply sum to give the total delta NLL.
Here, by passing a workspace we evaluate the pdf constraint term for each nuisance at its background-only best fit point and S+B best fit point to get the delta NLL. This is optional, if no workspace is passed, then
diffNuisances.py
simply runs as it used to.A plot of the deltaNLL is also made ordered from largest to smallest DeltaNLL and showing a cumulative line. This can help to quickly identify if the change in postfit nuisances is contributing to a significance, and which nuisances in particular are contributing.
Other smaller changes:
--max-nuis
which limits the number of nuisances per plot. If more nuisances exist multiple plots are created of each type, each containing only up to the maximum number of nuisances per plot. I've also increased the bottom margin of the plots to help make the nuisance names visible.diffNuisances.py
script fromtest/
toscripts/
and made it executable, so that it can be run as a command-line tool. I've also removed the version under thedata/tutorial/longexercise/
, which had become slightly out-of-date, and updated the documentation in the exercise to call the script without needing the explicitpython
invocation.A few thoughts for future PRs:
I'd like to add a similar plot of the dNLL contribution per bin including a cumulative (per region) line. But this is probably better suited to be added somewhere else, perhaps
FitDiagnostics
directly, but I think better is probably inPostFitShapesFromWorkspace
to avoid jamming everything intoFitDiagnostics
. I'm open to suggestions.For future developments, it might be useful to separate the table formatting into some functions which will be more generalizable and reusable. OTOH, if we incorporate more modern tools like pandas, then it is already set up to do things like this.