From 6c5af5c2a1f938a4318f6d1d193673f69be9ea61 Mon Sep 17 00:00:00 2001 From: Philip Daniel Keicher Date: Thu, 22 Feb 2024 11:52:10 +0100 Subject: [PATCH 1/7] adjusted introduction --- docs/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 3737f44abbe..5853533fac4 100644 --- a/docs/index.md +++ b/docs/index.md @@ -217,7 +217,8 @@ See [contributing.md](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimi # CombineHarvester/CombineTools -CombineTools is an additional tool for submitting Combine jobs to batch systems or crab, which was originally developed in the context of Higgs to tau tau analyses. Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/) subpackage, speeding up the checkout and compile times: +CombineTools is an additional tool with useful features for Combine, such as the automated production of datacards among others. +Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/) subpackage, speeding up the checkout and compile times: git clone via ssh: From 3bc5920e7b540dcc8ecf61ab72200b44cc48ff42 Mon Sep 17 00:00:00 2001 From: Philip Daniel Keicher Date: Thu, 22 Feb 2024 12:00:11 +0100 Subject: [PATCH 2/7] updated description in part5 of long exercise --- docs/part5/longexercise.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/part5/longexercise.md b/docs/part5/longexercise.md index c291669e8d3..82bb098663a 100644 --- a/docs/part5/longexercise.md +++ b/docs/part5/longexercise.md @@ -452,7 +452,7 @@ is perfectly valid and only one `rateParam` will be created. These parameters wi ### B: Nuisance parameter impacts -It is often useful to examine in detail the effects the systematic uncertainties have on the signal strength measurement. This is often referred to as calculating the "impact" of each uncertainty. What this means is to determine the shift in the signal strength, with respect to the best-fit, that is induced if a given nuisance parameter is shifted by its $\pm1\sigma$ post-fit uncertainty values. If the signal strength shifts a lot, it tells us that it has a strong dependency on this systematic uncertainty. In fact, what we are measuring here is strongly related to the correlation coefficient between the signal strength and the nuisance parameter. The `MultiDimFit` method has an algorithm for calculating the impact for a given systematic: `--algo impact -P [parameter name]`, but it is typical to use a higher-level script, `combineTool.py` (part of the CombineHarvester package you checked out at the beginning) to automatically run the impacts for all parameters. Full documentation on this is given [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts). There is a three step process for running this. First we perform an initial fit for the signal strength and its uncertainty: +It is often useful to examine in detail the effects the systematic uncertainties have on the signal strength measurement. This is often referred to as calculating the "impact" of each uncertainty. What this means is to determine the shift in the signal strength, with respect to the best-fit, that is induced if a given nuisance parameter is shifted by its $\pm1\sigma$ post-fit uncertainty values. If the signal strength shifts a lot, it tells us that it has a strong dependency on this systematic uncertainty. In fact, what we are measuring here is strongly related to the correlation coefficient between the signal strength and the nuisance parameter. The `MultiDimFit` method has an algorithm for calculating the impact for a given systematic: `--algo impact -P [parameter name]`, but it is typical to use a higher-level script, `combineTool.py`, to automatically run the impacts for all parameters. Full documentation on this is given [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts). There is a three step process for running this. First we perform an initial fit for the signal strength and its uncertainty: ```shell combineTool.py -M Impacts -d workspace_part3.root -m 200 --rMin -1 --rMax 2 --robustFit 1 --doInitialFit From 4d5f631fad8c90a8655d000aa42b0bf8d106ed39 Mon Sep 17 00:00:00 2001 From: Philip Daniel Keicher Date: Thu, 22 Feb 2024 12:01:10 +0100 Subject: [PATCH 3/7] updated tutorial2023 --- docs/tutorial2023/parametric_exercise.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tutorial2023/parametric_exercise.md b/docs/tutorial2023/parametric_exercise.md index c8131249fb3..6e21e64369e 100644 --- a/docs/tutorial2023/parametric_exercise.md +++ b/docs/tutorial2023/parametric_exercise.md @@ -461,7 +461,7 @@ To perform a likelihood scan (i.e. calculate 2NLL at fixed values of the signal ```shell combine -M MultiDimFit datacard_part1_with_norm.root -m 125 --freezeParameters MH -n .scan --algo grid --points 20 --setParameterRanges r=lo,hi ``` -We can use the `plot1DScan.py` function from combineTools to plot the likelihood scan: +We can use the `plot1DScan.py` function from CombineTools to plot the likelihood scan: ```shell plot1DScan.py higgsCombine.scan.MultiDimFit.mH125.root -o part2_scan ``` @@ -781,7 +781,7 @@ These methods are not limited to this particular grouping of systematics. We can ### Impacts It is often useful/required to check the impacts of the nuisance parameters (NP) on the parameter of interest, r. The impact of a NP is defined as the shift $\Delta r$ induced as the NP, $\theta$, is fixed to its $\pm1\sigma$ values, with all other parameters profiled as normal. More information can be found in the combine documentation via this [link](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts). -Let's calculate the impacts for our analysis. We can use the `combineTool.py` from the `CombineHarvester` package to automate the scripts. The impacts are calculated in a few stages: +Let's calculate the impacts for our analysis. We can use the `combineTool.py` to automate the scripts. The impacts are calculated in a few stages: 1) Do an initial fit for the parameter of interest, adding the `--robustFit 1` option: ```shell From acb65d32f6690f32ea542e4d90a2f90362f33123 Mon Sep 17 00:00:00 2001 From: Philip Daniel Keicher Date: Thu, 22 Feb 2024 12:03:17 +0100 Subject: [PATCH 4/7] updated tutorial2020 --- docs/tutorial2020/exercise.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorial2020/exercise.md b/docs/tutorial2020/exercise.md index 3f7bde1fb44..298e6ec7a15 100644 --- a/docs/tutorial2020/exercise.md +++ b/docs/tutorial2020/exercise.md @@ -606,7 +606,7 @@ Start by setting a threshold of 0, i.e. `[channel] autoMCStats 0`, to force the # Part 3: Further investigation of shape-based models ## A: Nuisance parameter impacts -It is often useful to examine in detail the effects the systematic uncertainties have on the signal strength measurement. This is often referred to as calculating the "impact" of each uncertainty. What this means is to determine the shift in the signal strength, with respect to the best-fit, that is induced if a given nuisance parameter is shifted by its $\pm1\sigma$ post-fit uncertainty values. If the signal strength shifts a lot, it tells us that it has a strong dependency on this systematic uncertainty. In fact, what we are measuring here is strongly related to the correlation coefficient between the signal strength and the nuisance parameter. The `MultiDimFit` method has an algorithm for calculating the impact for a given systematic: `--algo impact -P [parameter name]`, but it is typical to use a higher-level script, `combineTool.py` (part of the CombineHarvester package you checked out at the beginning) to automatically run the impacts for all parameters. Full documentation on this is given [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts). There is a three step process for running this. We will demonstrate this with a similar analysis to what we were using before, but at a lower mass point (200 GeV) as this region is more sensitive to the background uncertainties. The datacard is provided for you (datacard_part3.txt). Make the workspace first to be able to perform the steps for the impacts. First we perform an initial fit for the signal strength and its uncertainty: +It is often useful to examine in detail the effects the systematic uncertainties have on the signal strength measurement. This is often referred to as calculating the "impact" of each uncertainty. What this means is to determine the shift in the signal strength, with respect to the best-fit, that is induced if a given nuisance parameter is shifted by its $\pm1\sigma$ post-fit uncertainty values. If the signal strength shifts a lot, it tells us that it has a strong dependency on this systematic uncertainty. In fact, what we are measuring here is strongly related to the correlation coefficient between the signal strength and the nuisance parameter. The `MultiDimFit` method has an algorithm for calculating the impact for a given systematic: `--algo impact -P [parameter name]`, but it is typical to use a higher-level script, `combineTool.py`, to automatically run the impacts for all parameters. Full documentation on this is given [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts). There is a three step process for running this. We will demonstrate this with a similar analysis to what we were using before, but at a lower mass point (200 GeV) as this region is more sensitive to the background uncertainties. The datacard is provided for you (datacard_part3.txt). Make the workspace first to be able to perform the steps for the impacts. First we perform an initial fit for the signal strength and its uncertainty: ```shell combineTool.py -M Impacts -d workspace_part3.root -m 200 --rMin -1 --rMax 2 --robustFit 1 --doInitialFit From ca14cef0e0742f9430ace7e06f3a88daa51fc6d9 Mon Sep 17 00:00:00 2001 From: Philip Daniel Keicher Date: Thu, 22 Feb 2024 12:18:13 +0100 Subject: [PATCH 5/7] updated part3 of docs --- docs/part3/commonstatsmethods.md | 3 ++- docs/part3/nonstandard.md | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/part3/commonstatsmethods.md b/docs/part3/commonstatsmethods.md index 8de660cc981..d343ebeb797 100644 --- a/docs/part3/commonstatsmethods.md +++ b/docs/part3/commonstatsmethods.md @@ -754,7 +754,8 @@ where the former gives the result for the S+B model, while the latter gives the ### Making a plot of the GoF test statistic distribution -If you have also checked out the [combineTool](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool), you can use this to run batch jobs or on the grid (see [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#combinetool-for-job-submission)) and produce a plot of the results. Once the jobs have completed, you can hadd them together and run (e.g for the saturated model), +You can use `combineTool.py` to run batch jobs or on the grid (see [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#combinetool-for-job-submission)). +Additionally, you can produce a plot of the results if you have also checked out the [combineTool](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool). Once the jobs have completed, you can hadd them together and run (e.g for the saturated model), ```sh combineTool.py -M CollectGoodnessOfFit --input data_run.root toys_run.root -m 125.0 -o gof.json diff --git a/docs/part3/nonstandard.md b/docs/part3/nonstandard.md index e8f1b6dc275..ecc23a4cd1b 100644 --- a/docs/part3/nonstandard.md +++ b/docs/part3/nonstandard.md @@ -231,7 +231,7 @@ The impact of a nuisance parameter (NP) θ on a parameter of interest (POI) μ i This is effectively a measure of the correlation between the NP and the POI, and is useful for determining which NPs have the largest effect on the POI uncertainty. -It is possible to use the `FitDiagnostics` method of Combine with the option `--algo impact -P parameter` to calculate the impact of a particular nuisance parameter on the parameter(s) of interest. We will use the `combineTool.py` script to automate the fits (see the [`combineTool`](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool) section to check out the tool. +It is possible to use the `FitDiagnostics` method of Combine with the option `--algo impact -P parameter` to calculate the impact of a particular nuisance parameter on the parameter(s) of interest. We will use the `combineTool.py` script to automate the fits (see the [`combineTool`](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool) section to check out the tool). We will use an example workspace from the [$H\rightarrow\tau\tau$ datacard](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/htt/125/htt_tt.txt), From 6849026d3cb823b438dca8c2a30d6bda179c9647 Mon Sep 17 00:00:00 2001 From: Philip Daniel Keicher Date: Thu, 22 Feb 2024 12:25:58 +0100 Subject: [PATCH 6/7] updated useful links section --- docs/part4/usefullinks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/part4/usefullinks.md b/docs/part4/usefullinks.md index 43d44bcc89d..adfb0ee91fd 100644 --- a/docs/part4/usefullinks.md +++ b/docs/part4/usefullinks.md @@ -74,7 +74,7 @@ There is no document currently which can be cited for using the Combine reports the fit status in some routines (for example in the `FitDiagnostics` method). These are typically the status of the last call from Minuit. For details on the meanings of these status codes see the [Minuit2Minimizer](https://root.cern.ch/root/html/ROOT__Minuit2__Minuit2Minimizer.html) documentation page. * _Why does my fit not converge?_ - * There are several reasons why some fits may not converge. Often some indication can be obtained from the `RooFitResult` or status that you will see information from when using the `--verbose X` (with $X>2$) option. Sometimes however, it can be that the likelihood for your data is very unusual. You can get a rough idea about what the likelihood looks like as a function of your parameters (POIs and nuisances) using `combineTool.py -M FastScan -w myworkspace.root` (use --help for options). + * There are several reasons why some fits may not converge. Often some indication can be obtained from the `RooFitResult` or status that you will see information from when using the `--verbose X` (with $X>2$) option. Sometimes however, it can be that the likelihood for your data is very unusual. You can get a rough idea about what the likelihood looks like as a function of your parameters (POIs and nuisances) using `combineTool.py -M FastScan -w myworkspace.root` (use --help for options, see also [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/debugging/#analyzing-the-nll-shape-in-each-parameter). * We have often seen that fits in Combine using `RooCBShape` as a parametric function will fail. This is related to an optimization that fails. You can try to fix the problem as described in this issue: [issues#347](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/347) (i.e add the option `--X-rtd ADDNLL_CBNLL=0`). * _Why does the fit/fits take so long?_ * The minimization routines are common to many methods in Combine. You can tune the fits using the generic optimization command line options described [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#generic-minimizer-options). For example, setting the default minimizer strategy to 0 can greatly improve the speed, since this avoids running HESSE. In calculations such as `AsymptoticLimits`, HESSE is not needed and hence this can be done, however, for `FitDiagnostics` the uncertainties and correlations are part of the output, so using strategy 0 may not be particularly accurate. From 0737f762d9697a9263a76b5721c8625faa2601d5 Mon Sep 17 00:00:00 2001 From: Philip Daniel Keicher Date: Thu, 22 Feb 2024 13:10:30 +0100 Subject: [PATCH 7/7] refine formulation for index --- docs/index.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/index.md b/docs/index.md index 5853533fac4..9ee192e8656 100644 --- a/docs/index.md +++ b/docs/index.md @@ -217,8 +217,13 @@ See [contributing.md](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimi # CombineHarvester/CombineTools -CombineTools is an additional tool with useful features for Combine, such as the automated production of datacards among others. -Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/) subpackage, speeding up the checkout and compile times: +!!! info + Starting with Combine v10, CombineTool functionalities for job submition and parallelization (combineTool.py) as well as many plotting functions have been integrated into the Combine package. + For these tasks you no longer have to follow the instructions below. + + +CombineTools is an additional packages with useful features for Combine, which is used for example for the automated datacard validation (see [instructions](docs/part3/validation)). +Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/tree/main/CombineTools/) subpackage, speeding up the checkout and compile times: git clone via ssh: