Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
ekourlit committed Dec 3, 2023
1 parent 7ac8dbf commit 09c9eb1
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions SUSY_pyfits.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
"source": [
"# Introduction\n",
"\n",
"A common characteristic of the ATLAS Supersymmetry (SUSY) analyses is the Standard Model (SM) background estimation strategy along with the statistical approaches for testing first the compatibility of the observed data with the predicted SM background and second, in absence of a data excess, setting exclusion limits on SUSY models. Additionally, limits are usually set in a model-independent manner too, in terms of generic Beyond SM (BSM) events. Alternatively, in the case of an actual data excess, measurement of the signal strength of a signal model can be performed.\n",
"A quite common characteristic of the ATLAS Supersymmetry (SUSY) analyses is the Standard Model (SM) background estimation strategy along with the statistical approaches for testing, first, the compatibility of the observed data with the predicted SM background and, second, in absence of a data excess, setting exclusion limits on SUSY models. Additionally, limits are usually set in a model-independent manner too, in terms of a generic Beyond SM (BSM) model. Alternatively, in the case of an actual data excess, measurement of the signal strength of a model can be performed.\n",
"\n",
"This notebook implements the three types of statistical fits regularly used by ATLAS SUSY analyses to achieve the above: \n",
"1. **Background-only fit** to assess the compatibility of the observed data with the background prediction.\n",
"2. **Exclusion fit** to place model-dependent exclusion limits or measure the signal strength of a model.\n",
"3. **Discovery fit** to place model-independent exclusion limits.\n",
"\n",
"Those types of fits are originally introduced in the [HistFitter](https://doi.org/10.1140/epjc/s10052-015-3327-7) framework, which is using the HistFactory and RooStats packages that are implemented in ROOT. This notebook provides an implementation of the above fit strategies using solely python packages: pyhf and cabinetry."
"Those types of fits are originally introduced in the HistFitter framework, which is using the HistFactory and RooStats packages that are implemented in ROOT [[1](https://doi.org/10.1140/epjc/s10052-015-3327-7)]. This notebook provides an implementation of the above fit strategies solely in python. The two main libraries used are [`pyhf`](https://github.com/scikit-hep/pyhf) and [`cabinetry`](https://github.com/scikit-hep/cabinetry)."
]
},
{
Expand All @@ -26,7 +26,7 @@
"1. *Reducible* background: typically a high cross section SM processes that *fakes* the signal characteristics and estimated by independent data-driven methods.\n",
"2. *Irreducible* background: all those SM processes that actually have the same physical characteristics with the signal. Their estimate relies in semi-data-driven methods and is one of the main topics of this presentation.\n",
"\n",
"To estimate the contribution of the irreducible backgrounds, the shape of the distributions is taken from Monte Carlo (MC) simulation while the normalization is extracted from data, thus the semi-data-driven term. To extract the aforementioned normalization factors ($\\mu$-factors), background enriched regions, called *Control Regions* (CRs), are constructed and utilized to scale the initial MC predictions to the observed event levels. Additionally, *Validation Regions* (VRs) are also defined, with a topology as similar as possible to SRs but with low signal contamination, in order to verify the background agreement with the data before looking the yields in the SRs.\n",
"To estimate the contribution of the irreducible backgrounds, the shape of the kinematic distributions is taken from Monte Carlo (MC) simulation while the normalization is extracted from data, thus the semi-data-driven term. To extract the normalization factors ($\\mu$-factors), background enriched regions, called *Control Regions* (CRs), are constructed and utilized to scale the initial MC predictions to the observed data. Additionally, *Validation Regions* (VRs) are also defined, with a topology as similar as possible to SRs, but with low signal contamination, in order to verify the background agreement with the data before looking the yields in the SRs.\n",
"\n",
"The strategy described can be schematically seen below: \n",
"\n",
Expand All @@ -36,7 +36,7 @@
"\n",
"## Likelihood Function\n",
"\n",
"A likelihood function is build and fit to estimate both the background and the signal normalization factors, along with nuisance parameters of the fit describing the measurement systematic uncertainties. Such likelihood function is build as a product of Poisson probability functions $P(n_i, \\lambda_i)$ describing the probability to observe $n$ events when $\\lambda$ are expected in region $i$ (CR, VR, SR). $\\lambda_i$ depends on the normalization factors, generally from both the (irreducible) background ($\\boldsymbol{\\mu}$) and the signal processes ($\\mu_{\\mathrm{sig}}$). Systematic uncertainties of the measurement are treated as nuisance parameters and constrained by normal distributions $C(\\boldsymbol{\\theta}^\\mathrm{0},\\boldsymbol{\\theta})$, where $\\boldsymbol{\\theta}^\\mathrm{0}$ is the expected values of the nuisance parameters. The resulting likelihood is:\n",
"A binned likelihood function is build and fit to estimate both the background and the signal normalization factors, along with nuisance parameters of the fit describing the measurement systematic uncertainties. Such likelihood function is build as a product of Poisson probability functions $P(n_i, \\lambda_i)$ describing the probability to observe $n$ events when $\\lambda$ are expected in region $i$ (SR, CR, VR). The $\\lambda_i$ depends on the normalization factors, generally from both the (irreducible) background ($\\boldsymbol{\\mu}$) and the signal processes ($\\mu_{\\mathrm{sig}}$). Systematic uncertainties of the measurement (nuisance parameters) are constrained by normal distributions $C(\\boldsymbol{\\theta}^\\mathrm{0},\\boldsymbol{\\theta})$, where $\\boldsymbol{\\theta}^\\mathrm{0}$ is the expected values of the nuisance parameters. The resulting likelihood is:\n",
"\n",
"$$\n",
"L = \\prod_{i\\in\\text{SR}} \\mathcal{P}(n_i|\\lambda_i(\\mu_\\text{sig},\\boldsymbol{\\mu},\\boldsymbol{\\theta})) \\times \\prod_{i\\in\\text{CR}} \\mathcal{P}(n_i|\\lambda_i(\\mu_\\text{sig},\\boldsymbol{\\mu},\\boldsymbol{\\theta})) \\times C(\\boldsymbol{\\theta}^\\mathrm{0},\\boldsymbol{\\theta})\n",
Expand All @@ -56,7 +56,7 @@
"\n",
"![](figures/sqsq-C1C1-Hplus.png)\n",
"\n",
"It should be noted that this is a toy model and the couplings used violate conserved quantities.\n",
"It should be stressed that this is a toy model and the couplings used violate conserved quantities.\n",
"\n",
"The *irreducible* SM backgrounds considered are:\n",
"1. top pair production in association with a photon (*tty*)\n",
Expand Down Expand Up @@ -1139,7 +1139,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## From a single model to the whole signal grid"
"## Bonus: from a single model to the whole signal grid"
]
},
{
Expand Down

0 comments on commit 09c9eb1

Please sign in to comment.