Skip to content

Commit

Permalink
Finish spellcheck part 3-4-5
Browse files Browse the repository at this point in the history
  • Loading branch information
adewit committed Dec 12, 2023
1 parent 03e8249 commit eb551d7
Show file tree
Hide file tree
Showing 13 changed files with 349 additions and 349 deletions.
6 changes: 3 additions & 3 deletions docs/part2/bin-wise-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The following line should be added at the bottom of the datacard, underneath the

The first string `channel` should give the name of the channels (bins) in the datacard for which the new histogram classes should be used. The wildcard `*` is supported for selecting multiple channels in one go. The value of `threshold` should be set to a value greater than or equal to zero to enable the creation of automatic bin-wise uncertainties, or `-1` to use the new histogram classes without these uncertainties. A positive value sets the threshold on the effective number of unweighted events above which the uncertainty will be modeled with the Barlow-Beeston-lite approach described above. Below the threshold an individual uncertainty per-process will be created. The algorithm is described in more detail below.

The last two settings are optional. The first of these, `include-signal` has a default value of `0` but can be set to `1` as an alternative. By default, the total nominal yield and uncertainty used to test the threshold excludes signal processes. The reason for this is that typically the initial signal normalization is arbitrary, and could unduly lead to a bin being considered well-populated despite poorly populated background templates. Setting this flag will include the signal processes in the uncertainty analysis. Note that this option only affects the logic for creating a single Barlow-Beeston-lite parameter vs. separate per-process parameters - the uncertainties on all signal processes are always included in the actual model! The second flag changes the way the normalization effect of shape-altering uncertainties is handled. In the default mode (`1`) the normalization is handled separately from the shape morphing via a an asymmetric log-normal term. This is identical to how combine has always handled shape morphing. When set to `2`, the normalization will be adjusted in the shape morphing directly. Unless there is a strong motivation we encourage users to leave this on the default setting.
The last two settings are optional. The first of these, `include-signal` has a default value of `0` but can be set to `1` as an alternative. By default, the total nominal yield and uncertainty used to test the threshold excludes signal processes. The reason for this is that typically the initial signal normalization is arbitrary, and could unduly lead to a bin being considered well-populated despite poorly populated background templates. Setting this flag will include the signal processes in the uncertainty analysis. Note that this option only affects the logic for creating a single Barlow-Beeston-lite parameter vs. separate per-process parameters - the uncertainties on all signal processes are always included in the actual model! The second flag changes the way the normalization effect of shape-altering uncertainties is handled. In the default mode (`1`) the normalization is handled separately from the shape morphing via a an asymmetric log-normal term. This is identical to how <sub><sup>COMBINE</sup></sub> has always handled shape morphing. When set to `2`, the normalization will be adjusted in the shape morphing directly. Unless there is a strong motivation we encourage users to leave this on the default setting.

## Description of the algorithm

Expand Down Expand Up @@ -68,7 +68,7 @@ Bin Contents Error Notes
</details>

## Analytic minimisation
One significant advantage of the Barlow-Beeston-lite approach is that the maximum likelihood estimate of each nuisance parameter has a simple analytic form that depends only on $n_{\text{tot}}$, $e_{\text{tot}}$ and the observed number of data events in the relevant bin. Therefore when minimising the negative log-likelihood of the whole model it is possible to remove these parameters from the fit and set them to their best-fit values automatically. For models with large numbers of bins this can reduce the fit time and increase the fit stability. The analytic minimisation is enabled by default starting in combine v8.2.0, you can disable it by adding the option `--X-rtd MINIMIZER_no_analytic` when running combine.
One significant advantage of the Barlow-Beeston-lite approach is that the maximum likelihood estimate of each nuisance parameter has a simple analytic form that depends only on $n_{\text{tot}}$, $e_{\text{tot}}$ and the observed number of data events in the relevant bin. Therefore when minimising the negative log-likelihood of the whole model it is possible to remove these parameters from the fit and set them to their best-fit values automatically. For models with large numbers of bins this can reduce the fit time and increase the fit stability. The analytic minimisation is enabled by default starting in combine v8.2.0, you can disable it by adding the option `--X-rtd MINIMIZER_no_analytic` when running <sub><sup>COMBINE</sup></sub>.

The figure below shows a performance comparison of the analytical minimisation versus the number of bins in the likelihood function. The real time (in sections) for a typical minimisation of a binned likelihood is shown as a function of the number of bins when invoking the analytic minimisation of the nuisance parameters versus the default numerical approach.

Expand All @@ -84,7 +84,7 @@ The figure below shows a performance comparison of the analytical minimisation v
Up until recently `text2workspace.py` would only construct the PDF for each channel using a `RooAddPdf`, i.e. each component process is represented by a separate PDF and normalization coefficient. However, in order to model bin-wise statistical uncertainties, the alternative `RooRealSumPdf` can be more useful, as each process is represented by a RooFit function object instead of a PDF, and we can vary the bin yields directly. As such, a new RooFit histogram class `CMSHistFunc` is introduced, which offers the same vertical template morphing algorithms offered by the current default histogram PDF, `FastVerticalInterpHistPdf2`. Accompanying this is the `CMSHistErrorPropagator` class. This evaluates a sum of `CMSHistFunc` objects, each multiplied by a coefficient. It is also able to scale the summed yield of each bin to account for bin-wise statistical uncertainty nuisance parameters.

!!! warning
One disadvantage of this new approach comes when evaluating the expectation for individual processes, for example when using the `--saveShapes` option in the `FitDiagnostics` mode of combine. The Barlow-Beeston-lite parameters scale the sum of the process yields directly, so extra work is needed to distribute this total scaling back to each individual process. To achieve this, an additional class `CMSHistFuncWrapper` has been created that, given a particular `CMSHistFunc`, the `CMSHistErrorPropagator` will distribute an appropriate fraction of the total yield shift to each bin. As a consequence of the extra computation needed to distribute the yield shifts in this way, the evaluation of individual process shapes in `--saveShapes` can take longer then previously.
One disadvantage of this new approach comes when evaluating the expectation for individual processes, for example when using the `--saveShapes` option in the `FitDiagnostics` mode of <sub><sup>COMBINE</sup></sub>. The Barlow-Beeston-lite parameters scale the sum of the process yields directly, so extra work is needed to distribute this total scaling back to each individual process. To achieve this, an additional class `CMSHistFuncWrapper` has been created that, given a particular `CMSHistFunc`, the `CMSHistErrorPropagator` will distribute an appropriate fraction of the total yield shift to each bin. As a consequence of the extra computation needed to distribute the yield shifts in this way, the evaluation of individual process shapes in `--saveShapes` can take longer then previously.



6 changes: 3 additions & 3 deletions docs/part2/physicsmodels.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Physics Models

Combine can be run directly on the text-based datacard. However, for more advanced physics models, the internal step to convert the datacard to a binary workspace should be performed by the user. To create a binary workspace starting from a `datacard.txt`, you can run
<sub><sup>COMBINE</sup></sub> can be run directly on the text-based datacard. However, for more advanced physics models, the internal step to convert the datacard to a binary workspace should be performed by the user. To create a binary workspace starting from a `datacard.txt`, you can run

```sh
text2workspace.py datacard.txt -o workspace.root
Expand Down Expand Up @@ -110,7 +110,7 @@ Below are some (more generic) example models that also exist in GitHub.

### MultiSignalModel ready made model for multiple signal processes

Combine already contains a model **`HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel`** that can be used to assign different signal strengths to multiple processes in a datacard, configurable from the command line.
<sub><sup>COMBINE</sup></sub> already contains a model **`HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel`** that can be used to assign different signal strengths to multiple processes in a datacard, configurable from the command line.

The model is configured by passing one or more mappings in the form **`--PO 'map=bin/process:parameter'`** to text2workspace:

Expand All @@ -126,7 +126,7 @@ The MultiSignalModel will define all parameters as parameters of interest, but t

Some examples, taking as reference the toy datacard [test/multiDim/toy-hgg-125.txt](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/test/multiDim/toy-hgg-125.txt):

- Scale both `ggH` and `qqH` with the same signal strength `r` (that is what the default physics model of combine does for all signals; if they all have the same systematic uncertainties, it is also equivalent to adding up their yields and writing them as a single column in the card)
- Scale both `ggH` and `qqH` with the same signal strength `r` (that is what the default physics model of <sub><sup>COMBINE</sup></sub> does for all signals; if they all have the same systematic uncertainties, it is also equivalent to adding up their yields and writing them as a single column in the card)

```nohighlight
$ text2workspace.py -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO verbose --PO 'map=.*/ggH:r[1,0,10]' --PO 'map=.*/qqH:r' toy-hgg-125.txt -o toy-1d.root
Expand Down
Loading

0 comments on commit eb551d7

Please sign in to comment.