-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Documentation Section focused on underlying stats without code #839
Conversation
7629e3b
to
8b2bc41
Compare
8b2bc41
to
c4575db
Compare
I've left this open for an unreasonably long time for no good reason. I just gave it another check, and despite what I'm sure are many flaws, I am happy enough with it to merge it and make it public. Unless there are any loud complaints soon, I will go ahead with the merge. Closer to the time of releasing the paper, I will go through and try to harmonize some notation etc. |
65ff9f8
to
ddebb2b
Compare
The observation model, $\mathcal{M}_0( \vec{\Phi}_{0})$ defines the probability for any set of observations given specific values of the input parameters of the model $\vec{\Phi}_0$. | ||
The probability for any observed data is denoted: | ||
|
||
$$ p_{\mathcal{M}_{0}}(\mathrm{data}; \vec{\Phi}_0 ) $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the _{0}? I think it looks better without this subscript
The event-count portion of the model consists of a sum over different processes. | ||
The expected observations, $\vec{\lambda}$, are then the sum of the expected observations for each of the processes, $\vec{\lambda} =\sum_{p} \vec{\lambda}_{p}$. | ||
|
||
The model can also be composed of multiple channels, in which case the expected observation is the set of all expected observations from the various channels $\vec{\lambda}_{0} = \{ \vec{\lambda}_{c1}, \vec{\lambda}_{c2}, .... \vec{\lambda}_{cN}\}$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, we have _{0} here but in the previous paragraph, there's no subscript (prefer without)
For any given model, $\mathcal{M}(\vec{\Phi})$, [the likelihood](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#section.40.1) defines the probability of observing a given dataset. | ||
It is numerically equal to the probability of observing the data, given the model. | ||
|
||
$$ \mathcal{L}_\mathcal{M}(\vec{\Phi};\mathrm{data}) = p_{\mathcal{M}}(\mathrm{data};\vec{\Phi}) $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the amount of time we took over the review of the paper for this, I would really try to stick to the paper (specifically, we never write a likelihood with "; data" , and later in the figure and elsewhere we don't have it so I would drop that here, just keep the parameters.
|
||
The likelihood in combine takes the general form: | ||
|
||
$$ \mathcal{L} = \mathcal{L}_{\textrm{data}} \cdot \mathcal{L}_{\textrm{constraint}} $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use "primary" and "auxiliary" as in the paper? instead of data and constraint ?
Where $\mathcal{L}_{\mathrm{data}}$ is equal to the probability of observing the event count data for a given set of model parameters, and $\mathcal{L}_{\mathrm{constraint}}$ represent some external constraints on the parameters. | ||
The constraint term may be constraints from previous measurements (such as Jet Energy Scales) or prior beliefs about the value some parameter in the model should have. | ||
|
||
Both $\mathcal{L}_{\mathrm{data}}$ and $\mathcal{L}_{\mathrm{constraint}}$ can be composed of many sublikelihoods, for example for observations of different bins and constraints on different nuisance parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As before (data->primary, constraint->auxiliary)
While we presented the likelihoods for the template and parameteric models separately, they can also be combined into a single likelihood, by treating them each as separate channels. | ||
When combining the models, the data likelihoods of the binned and unbinned channels are multiplied. | ||
|
||
$$ \mathcal{L}_{\mathrm{combined}} = \mathcal{L}_{\mathrm{data}} \cdot \mathcal{L}_\mathrm{constraint} = (\prod_{c_\mathrm{template}} \mathcal{L}_{\mathrm{data}}^{c_\mathrm{template}}) (\prod_{c_\mathrm{parametric}} \mathcal{L}_{\mathrm{data}}^{c_\mathrm{parametric}}) \mathcal{L}_{\mathrm{constraint}} $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one more repeat (data->primary, constraint -> auxiliary)
Thanks, good points Nick. I changed those cases, I also tried to update the text to match this primary/auxiliary wording better and found some other instances throughout where I made the notation and wording more consistent with what's in the paper. |
This is (the start of) an attempt to help make more clear to users the underlying model, statistical tests etc... being used by combine.
These pages pages are designed to give users concise but thorough and precise references on the details of what is being done.
The existing documentation includes much of this material spread throughout. But it might be helpful to users to have more complete explanations in one easy to find spot with reminders and references back to that material in other parts of the documentation which are focused around how to run procedures and commands.
Open to suggestions/comments at all levels (overall structure, content, flow, choice of notation etc. ).
For those not familiar with setting up the documentation locally to have a look, please see the instructions in the
contributing.md
document from #838 (you can see it here: https://github.com/kcormi/HiggsAnalysis-CombinedLimit/blob/contributing/contributing.md). A page which should be identical to the one here has also been put up at: https://kcormi.github.io/HiggsAnalysis-CombinedLimit/ -- the new pages are the ones under the 'what combine does' tab.