From c4575db72fe3d724fa149bf2fa82f907e50d130f Mon Sep 17 00:00:00 2001
From: Kyle Cormier <kcormier@physik.uzh.ch>
Date: Wed, 27 Sep 2023 06:16:04 +0200
Subject: [PATCH] minor notation changes

---
 docs/what_combine_does/fitting_concepts.md     | 10 +++++-----
 docs/what_combine_does/model_and_likelihood.md |  8 ++++----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/what_combine_does/fitting_concepts.md b/docs/what_combine_does/fitting_concepts.md
index f69a8d95d83..950288a7234 100644
--- a/docs/what_combine_does/fitting_concepts.md
+++ b/docs/what_combine_does/fitting_concepts.md
@@ -12,10 +12,10 @@ Likelihood fits typically either follow a frequentist framework of maximum likel
 
 ### Maximum Likelihood fits
 
-A [maximum likelihood fit](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#subsection.40.2.2) means finding the values of the model parameters $(\vec{\mu}, \vec{\theta})$ which maximize the likelihood, $\mathcal{L}(\vec{\mu},\vec{\theta}|\mathrm{data})$
+A [maximum likelihood fit](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#subsection.40.2.2) means finding the values of the model parameters $(\vec{\mu}, \vec{\theta})$ which maximize the likelihood, $\mathcal{L}(\vec{\mu},\vec{\theta};\mathrm{data})$
 The values which maximize the likelihood, are the parameter estimates, denoted with a "hat" ($\hat{}$):
 
-$$(\vec{\hat{\mu}}, \vec{\hat{\theta}}) \equiv \underset{\vec{\mu},\vec{\theta}}{\operatorname{argmax}} \mathcal{L}(\vec{\mu}, \vec{\theta}|\mathrm{data})$$ 
+$$(\vec{\hat{\mu}}, \vec{\hat{\theta}}) \equiv \underset{\vec{\mu},\vec{\theta}}{\operatorname{argmax}} \mathcal{L}(\vec{\mu}, \vec{\theta};\mathrm{data})$$ 
 
 These values provide **point estimates** for the parameter values.
 
@@ -27,7 +27,7 @@ In a bayesian framework, the likelihood represents the probability of observing
 
 Beliefs about the values of the parameters are updated based on the data to provide a [posterior distributions](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#subsection.40.2.6)
 
-$$ p(\vec{\theta}|\mathrm{data}) = \frac{ p(\mathrm{data}|\vec{\theta}) p(\vec{\theta}) }{ p(\mathrm{data}) } = \frac{ \mathcal{L}_{\mathrm{data}}(\vec{\theta}|\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta}) }{ \int_{\vec{\theta'}} \mathcal{L}_{\mathrm{data}}(\vec{\theta'}|\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta'}) }$$ 
+$$ p(\vec{\theta}|\mathrm{data}) = \frac{ p(\mathrm{data}|\vec{\theta}) p(\vec{\theta}) }{ p(\mathrm{data}) } = \frac{ \mathcal{L}_{\mathrm{data}}(\vec{\theta};\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta}) }{ \int_{\vec{\theta'}} \mathcal{L}_{\mathrm{data}}(\vec{\theta'};\mathrm{data}) \mathcal{L}_{\mathrm{constraint}}(\vec{\theta'}) }$$ 
 
 The posterior distribution p$(\vec{\theta}|\mathrm{data})$ defines the updated belief about the parameters $\vec{\theta}$.
 
@@ -74,8 +74,8 @@ Parameter uncertainties describe regions of parameter values which are considere
 These can be defined either in terms of frequentist **confidence regions** or bayesian **credibility regions**.
 
 In both cases the region is defined by a confidence or credibility level $CL$, which quantifies the meaning of the region.
-For frequentist confidence regions, the confidence level $CL$ describes how often the confidence region will contain the true parameter values.
-For bayesian credibility regions, the credibility level $CL$ describes the bayesian probability that the true parameter value is in that region.
+For frequentist confidence regions, the confidence level $CL$ describes how often the confidence region will contain the true parameter values if the model is a sufficiently accurate approximation of the truth.
+For bayesian credibility regions, the credibility level $CL$ describes the bayesian probability that the true parameter value is in that region for under the given model.
 
 
 The confidence or credibility regions are described by a set of points $\{ \vec{\theta} \}_{\mathrm{CL}}$ which meet some criteria.
diff --git a/docs/what_combine_does/model_and_likelihood.md b/docs/what_combine_does/model_and_likelihood.md
index 01b04a3446e..87070249c8b 100644
--- a/docs/what_combine_does/model_and_likelihood.md
+++ b/docs/what_combine_does/model_and_likelihood.md
@@ -34,7 +34,7 @@ These are discussed in detail in the context of the full likelihood below.
 For any given model, $\mathcal{M}(\vec{\mu},\vec{\theta})$, [the likelihood](https://pdg.lbl.gov/2022/web/viewer.html?file=../reviews/rpp2022-rev-statistics.pdf#section.40.1) defines the probability of observing a given dataset. 
 It is numerically equal to the probability of observing the data, given the model. 
 
-$$ \mathcal{L}(\vec{\mu},\vec{\theta}|\mathrm{data}) = p(\mathrm{data}|\vec{\mu},\vec{\theta}) $$
+$$ \mathcal{L}(\vec{\mu},\vec{\theta};\mathrm{data}) = p(\mathrm{data}|\vec{\mu},\vec{\theta}) $$
 
 It should be understood through, that the likelihood depends on the parameters through the observation model, $\mathcal{L}(\vec{\theta},\vec{\mu}) = \mathcal{L}_{\mathcal{M}}(\vec{\mu},\vec{\theta})$.
 Changing the observation model, though it may depend on the same parameters, will also change the likelihood function.
@@ -55,7 +55,7 @@ This form is entirely general. However, as with the model itself, there are typi
 For a binned likelihood, the probability of observing a certain number of counts, given a model takes on a simple form. For each bin:
 
 $$
-\mathcal{L}_{\mathrm{bin}}(\vec{\mu},\vec{\theta}|\mathrm{data}) = \mathrm{Poiss}(n_{\mathrm{obs}}| n_{\mathrm{exp}}) 
+\mathcal{L}_{\mathrm{bin}}(\vec{\mu},\vec{\theta};\mathrm{data}) = \mathrm{Poiss}(n_{\mathrm{obs}}| n_{\mathrm{exp}}) 
 $$
 
 i.e. it is a poisson distribution with the mean given by the expected number of events in that bin. 
@@ -85,7 +85,7 @@ In bayesian frameworks, these terms represent the prior[^1].
 We will write in a mostly frequentist framework, though combine can be used for either frequentist or bayesian analyses.
 In this framework, each constraint term represents the likelihood of some parameter, $\theta$, given some previous observation $\tilde{\theta}$, often called a "global observable".
 
-$$ \mathcal{L}_{\mathrm{constraint}}( \theta | \tilde{\theta} ) = p(\tilde{\theta} | \theta ) $$
+$$ \mathcal{L}_{\mathrm{constraint}}( \theta ; \tilde{\theta} ) = p(\tilde{\theta} | \theta ) $$
 
 In principle the form of the likelihood can be any function where the corresponding $p$ is a valid probability distribution.
 In practice, most constraint terms are gaussian, and the definition of $\theta$ is chosen such that the central observation $\tilde{\theta} = 0$ , and the width of the gaussian is one.
@@ -297,7 +297,7 @@ Note that $M_{cp}$ can be chosen by the user from a set of predefined models, or
 
 ### Parametric Likelihoods in Combine
 
-As with the template likelihood, the parameteric likelihood implemented in combine implements likelihoods which for multiple process and multiple channels.
+As with the template likelihood, the parameteric likelihood implemented in combine implements likelihoods for multiple process and multiple channels.
 Unlike the template likelihoods, the [parametric likelihoods are defined using custom probability density functions](../../part2/settinguptheanalysis/#unbinned-or-parametric-shape-analysis), which are functions of continuous observables, rather than discrete, binned counts.
 Because the pdfs are functions of a continuous variable, the likelihood can be evaluated over unbinned data.
 They can still, also, be used for analysis on [binned data](../../part2/settinguptheanalysis/#caveat-on-using-parametric-pdfs-with-binned-datasets).