From c4575db72fe3d724fa149bf2fa82f907e50d130f Mon Sep 17 00:00:00 2001 From: Kyle Cormier Date: Wed, 27 Sep 2023 06:16:04 +0200 Subject: [PATCH] minor notation changes --- docs/what_combine_does/fitting_concepts.md | 10 +++++----- docs/what_combine_does/model_and_likelihood.md | 8 ++++---- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/what_combine_does/fitting_concepts.md b/docs/what_combine_does/fitting_concepts.md index f69a8d95d83..950288a7234 100644 --- a/docs/what_combine_does/fitting_concepts.md +++ b/docs/what_combine_does/fitting_concepts.md @@ -12,10 +12,10 @@ Likelihood fits typically either follow a frequentist framework of maximum likel ### Maximum Likelihood fits -A [maximum likelihood fit](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#subsection.40.2.2) means finding the values of the model parameters $(\vec{\mu}, \vec{\theta})$ which maximize the likelihood, $\mathcal{L}(\vec{\mu},\vec{\theta}|\mathrm{data})$ +A [maximum likelihood fit](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#subsection.40.2.2) means finding the values of the model parameters $(\vec{\mu}, \vec{\theta})$ which maximize the likelihood, $\mathcal{L}(\vec{\mu},\vec{\theta};\mathrm{data})$ The values which maximize the likelihood, are the parameter estimates, denoted with a "hat" ($\hat{}$): -$$(\vec{\hat{\mu}}, \vec{\hat{\theta}}) \equiv \underset{\vec{\mu},\vec{\theta}}{\operatorname{argmax}} \mathcal{L}(\vec{\mu}, \vec{\theta}|\mathrm{data})$$ +$$(\vec{\hat{\mu}}, \vec{\hat{\theta}}) \equiv \underset{\vec{\mu},\vec{\theta}}{\operatorname{argmax}} \mathcal{L}(\vec{\mu}, \vec{\theta};\mathrm{data})$$ These values provide **point estimates** for the parameter values. @@ -27,7 +27,7 @@ In a bayesian framework, the likelihood represents the probability of observing Beliefs about the values of the parameters are updated based on the data to provide a [posterior distributions](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#subsection.40.2.6) -$$ p(\vec{\theta}|\mathrm{data}) = \frac{ p(\mathrm{data}|\vec{\theta}) p(\vec{\theta}) }{ p(\mathrm{data}) } = \frac{ \mathcal{L}_{\mathrm{data}}(\vec{\theta}|\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta}) }{ \int_{\vec{\theta'}} \mathcal{L}_{\mathrm{data}}(\vec{\theta'}|\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta'}) }$$ +$$ p(\vec{\theta}|\mathrm{data}) = \frac{ p(\mathrm{data}|\vec{\theta}) p(\vec{\theta}) }{ p(\mathrm{data}) } = \frac{ \mathcal{L}_{\mathrm{data}}(\vec{\theta};\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta}) }{ \int_{\vec{\theta'}} \mathcal{L}_{\mathrm{data}}(\vec{\theta'};\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta'}) }$$ The posterior distribution p$(\vec{\theta}|\mathrm{data})$ defines the updated belief about the parameters $\vec{\theta}$. @@ -74,8 +74,8 @@ Parameter uncertainties describe regions of parameter values which are considere These can be defined either in terms of frequentist **confidence regions** or bayesian **credibility regions**. In both cases the region is defined by a confidence or credibility level $CL$, which quantifies the meaning of the region. -For frequentist confidence regions, the confidence level $CL$ describes how often the confidence region will contain the true parameter values. -For bayesian credibility regions, the credibility level $CL$ describes the bayesian probability that the true parameter value is in that region. +For frequentist confidence regions, the confidence level $CL$ describes how often the confidence region will contain the true parameter values if the model is a sufficiently accurate approximation of the truth. +For bayesian credibility regions, the credibility level $CL$ describes the bayesian probability that the true parameter value is in that region for under the given model. The confidence or credibility regions are described by a set of points $\{ \vec{\theta} \}_{\mathrm{CL}}$ which meet some criteria. diff --git a/docs/what_combine_does/model_and_likelihood.md b/docs/what_combine_does/model_and_likelihood.md index 01b04a3446e..87070249c8b 100644 --- a/docs/what_combine_does/model_and_likelihood.md +++ b/docs/what_combine_does/model_and_likelihood.md @@ -34,7 +34,7 @@ These are discussed in detail in the context of the full likelihood below. For any given model, $\mathcal{M}(\vec{\mu},\vec{\theta})$, [the likelihood](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#section.40.1) defines the probability of observing a given dataset. It is numerically equal to the probability of observing the data, given the model. -$$ \mathcal{L}(\vec{\mu},\vec{\theta}|\mathrm{data}) = p(\mathrm{data}|\vec{\mu},\vec{\theta}) $$ +$$ \mathcal{L}(\vec{\mu},\vec{\theta};\mathrm{data}) = p(\mathrm{data}|\vec{\mu},\vec{\theta}) $$ It should be understood through, that the likelihood depends on the parameters through the observation model, $\mathcal{L}(\vec{\theta},\vec{\mu}) = \mathcal{L}_{\mathcal{M}}(\vec{\mu},\vec{\theta})$. Changing the observation model, though it may depend on the same parameters, will also change the likelihood function. @@ -55,7 +55,7 @@ This form is entirely general. However, as with the model itself, there are typi For a binned likelihood, the probability of observing a certain number of counts, given a model takes on a simple form. For each bin: $$ -\mathcal{L}_{\mathrm{bin}}(\vec{\mu},\vec{\theta}|\mathrm{data}) = \mathrm{Poiss}(n_{\mathrm{obs}}| n_{\mathrm{exp}}) +\mathcal{L}_{\mathrm{bin}}(\vec{\mu},\vec{\theta};\mathrm{data}) = \mathrm{Poiss}(n_{\mathrm{obs}}| n_{\mathrm{exp}}) $$ i.e. it is a poisson distribution with the mean given by the expected number of events in that bin. @@ -85,7 +85,7 @@ In bayesian frameworks, these terms represent the prior[^1]. We will write in a mostly frequentist framework, though combine can be used for either frequentist or bayesian analyses. In this framework, each constraint term represents the likelihood of some parameter, $\theta$, given some previous observation $\tilde{\theta}$, often called a "global observable". -$$ \mathcal{L}_{\mathrm{constraint}}( \theta | \tilde{\theta} ) = p(\tilde{\theta} | \theta ) $$ +$$ \mathcal{L}_{\mathrm{constraint}}( \theta ; \tilde{\theta} ) = p(\tilde{\theta} | \theta ) $$ In principle the form of the likelihood can be any function where the corresponding $p$ is a valid probability distribution. In practice, most constraint terms are gaussian, and the definition of $\theta$ is chosen such that the central observation $\tilde{\theta} = 0$ , and the width of the gaussian is one. @@ -297,7 +297,7 @@ Note that $M_{cp}$ can be chosen by the user from a set of predefined models, or ### Parametric Likelihoods in Combine -As with the template likelihood, the parameteric likelihood implemented in combine implements likelihoods which for multiple process and multiple channels. +As with the template likelihood, the parameteric likelihood implemented in combine implements likelihoods for multiple process and multiple channels. Unlike the template likelihoods, the [parametric likelihoods are defined using custom probability density functions](../../part2/settinguptheanalysis/#unbinned-or-parametric-shape-analysis), which are functions of continuous observables, rather than discrete, binned counts. Because the pdfs are functions of a continuous variable, the likelihood can be evaluated over unbinned data. They can still, also, be used for analysis on [binned data](../../part2/settinguptheanalysis/#caveat-on-using-parametric-pdfs-with-binned-datasets).