diff --git a/contributing.md b/contributing.md
index fe2e5dab2a6..4f0992a2461 100644
--- a/contributing.md
+++ b/contributing.md
@@ -1,6 +1,6 @@
 # Contributing
 
-Contributions to combine of all sizes, from minor documentation updates to big code improvements, are welcome and encouraged.
+Contributions to <span style="font-variant:small-caps;">Combine</span> of all sizes, from minor documentation updates to big code improvements, are welcome and encouraged.
 
 To ensure good development of the tool, we try to coordinate contributions.
 However, we are happy to help overcome any steps that may pose issues for contributors.
@@ -76,7 +76,7 @@ We appreciate you putting in some effort and thought to ensure:
 
 ### Technical details of the documentation
 
-We use [mkdocs](www.mkdocs.org) to produce the static website that documents combine.
+We use [mkdocs](www.mkdocs.org) to produce the static website that documents <span style="font-variant:small-caps;">Combine</span>.
 
 The documentation files are all under the `docs/` folder.
 Which pages get included in the site, and other configuration details are set in the `mkdocs.yml` file.
@@ -103,7 +103,7 @@ If you'd like to test the deployment directly, the suggested method is to set up
 
 ## Big Contributions
 
-We welcome large contributions to combine. 
+We welcome large contributions to <span style="font-variant:small-caps;">Combine</span>. 
 Note, however, that we also follow long term planning, and there is a dedicated group stewarding the overall direction and development of the code.
 
 This means that the code development should fit in with our long term vision;
diff --git a/docs/index.md b/docs/index.md
index 7aebbd3da34..da40bb43012 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -3,9 +3,9 @@
 These pages document the
 [RooStats](https://twiki.cern.ch/twiki/bin/view/RooStats/WebHome) /
 [RooFit](https://root.cern.ch/roofit) - based software tool used for
-statistical analysis within the CMS experiment - **combine**. Note that while this tool was originally developed in the [Higgs PAG](HiggsWG), its usage is now widespread within CMS. 
+statistical analysis within the CMS experiment - <span style="font-variant:small-caps;">Combine</span>. Note that while this tool was originally developed in the [Higgs PAG](HiggsWG), its usage is now widespread within CMS. 
 
-Combine provides a command-line interface to many different statistical techniques, available inside RooFit/RooStats, that are used widely inside CMS.
+<span style="font-variant:small-caps;">Combine</span> provides a command-line interface to many different statistical techniques, available inside RooFit/RooStats, that are used widely inside CMS.
 
 The package exists on GitHub under [https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit)
 
@@ -31,7 +31,7 @@ releases on github under
 
 The nominal installation method is inside CMSSW. The current release targets
 the CMSSW `11_3_X` series because this release has both python2 and python3 ROOT
-bindings, allowing a more gradual migration of user code to python3. Combine is
+bindings, allowing a more gradual migration of user code to python3. <span style="font-variant:small-caps;">Combine</span> is
 fully python3-compatible and, with some adaptations, can also work in 12_X releases.
 
 ```sh
@@ -150,12 +150,12 @@ conda activate combine
 make CONDA=1 -j 8
 ```
 
-Using combine from then on should only require sourcing the conda environment 
+Using <span style="font-variant:small-caps;">Combine</span> from then on should only require sourcing the conda environment 
 ```
 conda activate combine
 ```
 
-**Note:** on OS X, `combine` can only accept workspaces, so run `text2workspace.py` first.
+**Note:** on OS X, <span style="font-variant:small-caps;">Combine</span> can only accept workspaces, so run `text2workspace.py` first.
 This is due to an issue with child processes and `LD_LIBRARY_PATH` (see note in Makefile)
 
 # What has changed between tags? 
@@ -185,7 +185,7 @@ See [contributing.md](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimi
 
 # CombineHarvester/CombineTools
 
-CombineTools is an additional tool for submitting combine jobs to batch systems or crab, which was originally developed in the context of Higgs to tau tau analyses. Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/) subpackage, speeding up the checkout and compile times:
+CombineTools is an additional tool for submitting <span style="font-variant:small-caps;">Combine</span> jobs to batch systems or crab, which was originally developed in the context of Higgs to tau tau analyses. Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/) subpackage, speeding up the checkout and compile times:
 
 git clone via ssh:
 
diff --git a/docs/part2/bin-wise-stats.md b/docs/part2/bin-wise-stats.md
index 6e92260e9d3..10a02591bab 100644
--- a/docs/part2/bin-wise-stats.md
+++ b/docs/part2/bin-wise-stats.md
@@ -15,7 +15,7 @@ The following line should be added at the bottom of the datacard, underneath the
 
 The first string `channel` should give the name of the channels (bins) in the datacard for which the new histogram classes should be used. The wildcard `*` is supported for selecting multiple channels in one go. The value of `threshold` should be set to a value greater than or equal to zero to enable the creation of automatic bin-wise uncertainties, or `-1` to use the new histogram classes without these uncertainties. A positive value sets the threshold on the effective number of unweighted events above which the uncertainty will be modeled with the Barlow-Beeston-lite approach described above. Below the threshold an individual uncertainty per-process will be created. The algorithm is described in more detail below.
 
-The last two settings are optional. The first of these, `include-signal` has a default value of `0` but can be set to `1` as an alternative. By default, the total nominal yield and uncertainty used to test the threshold excludes signal processes. The reason for this is that typically the initial signal normalization is arbitrary, and could unduly lead to a bin being considered well-populated despite poorly populated background templates. Setting this flag will include the signal processes in the uncertainty analysis. Note that this option only affects the logic for creating a single Barlow-Beeston-lite parameter vs. separate per-process parameters - the uncertainties on all signal processes are always included in the actual model! The second flag changes the way the normalization effect of shape-altering uncertainties is handled. In the default mode (`1`) the normalization is handled separately from the shape morphing via a an asymmetric log-normal term. This is identical to how <sub><sup>COMBINE</sup></sub> has always handled shape morphing. When set to `2`, the normalization will be adjusted in the shape morphing directly. Unless there is a strong motivation we encourage users to leave this on the default setting.
+The last two settings are optional. The first of these, `include-signal` has a default value of `0` but can be set to `1` as an alternative. By default, the total nominal yield and uncertainty used to test the threshold excludes signal processes. The reason for this is that typically the initial signal normalization is arbitrary, and could unduly lead to a bin being considered well-populated despite poorly populated background templates. Setting this flag will include the signal processes in the uncertainty analysis. Note that this option only affects the logic for creating a single Barlow-Beeston-lite parameter vs. separate per-process parameters - the uncertainties on all signal processes are always included in the actual model! The second flag changes the way the normalization effect of shape-altering uncertainties is handled. In the default mode (`1`) the normalization is handled separately from the shape morphing via a an asymmetric log-normal term. This is identical to how <span style="font-variant:small-caps;">Combine</span> has always handled shape morphing. When set to `2`, the normalization will be adjusted in the shape morphing directly. Unless there is a strong motivation we encourage users to leave this on the default setting.
 
 ## Description of the algorithm
 
@@ -68,7 +68,7 @@ Bin        Contents        Error           Notes
 </details>
 
 ## Analytic minimisation
-One significant advantage of the Barlow-Beeston-lite approach is that the maximum likelihood estimate of each nuisance parameter has a simple analytic form that depends only on $n_{\text{tot}}$, $e_{\text{tot}}$ and the observed number of data events in the relevant bin. Therefore when minimising the negative log-likelihood of the whole model it is possible to remove these parameters from the fit and set them to their best-fit values automatically. For models with large numbers of bins this can reduce the fit time and increase the fit stability. The analytic minimisation is enabled by default starting in combine v8.2.0, you can disable it by adding the option `--X-rtd MINIMIZER_no_analytic` when running <sub><sup>COMBINE</sup></sub>.
+One significant advantage of the Barlow-Beeston-lite approach is that the maximum likelihood estimate of each nuisance parameter has a simple analytic form that depends only on $n_{\text{tot}}$, $e_{\text{tot}}$ and the observed number of data events in the relevant bin. Therefore when minimising the negative log-likelihood of the whole model it is possible to remove these parameters from the fit and set them to their best-fit values automatically. For models with large numbers of bins this can reduce the fit time and increase the fit stability. The analytic minimisation is enabled by default starting in combine v8.2.0, you can disable it by adding the option `--X-rtd MINIMIZER_no_analytic` when running <span style="font-variant:small-caps;">Combine</span>.
 
 The figure below shows a performance comparison of the analytical minimisation versus the number of bins in the likelihood function. The real time (in sections) for a typical minimisation of a binned likelihood is shown as a function of the number of bins when invoking the analytic minimisation of the nuisance parameters versus the default numerical approach.
 
@@ -84,7 +84,7 @@ The figure below shows a performance comparison of the analytical minimisation v
 Up until recently `text2workspace.py` would only construct the PDF for each channel using a `RooAddPdf`, i.e. each component process is represented by a separate PDF and normalization coefficient. However, in order to model bin-wise statistical uncertainties, the alternative `RooRealSumPdf` can be more useful, as each process is represented by a RooFit function object instead of a PDF, and we can vary the bin yields directly. As such, a new RooFit histogram class `CMSHistFunc` is introduced, which offers the same vertical template morphing algorithms offered by the current default histogram PDF, `FastVerticalInterpHistPdf2`. Accompanying this is the `CMSHistErrorPropagator` class. This evaluates a sum of `CMSHistFunc` objects, each multiplied by a coefficient. It is also able to scale the summed yield of each bin to account for bin-wise statistical uncertainty nuisance parameters.
 
 !!! warning
-    One disadvantage of this new approach comes when evaluating the expectation for individual processes, for example when using the `--saveShapes` option in the `FitDiagnostics` mode of <sub><sup>COMBINE</sup></sub>. The Barlow-Beeston-lite parameters scale the sum of the process yields directly, so extra work is needed to distribute this total scaling back to each individual process. To achieve this, an additional class `CMSHistFuncWrapper` has been created that, given a particular `CMSHistFunc`, the `CMSHistErrorPropagator` will distribute an appropriate fraction of the total yield shift to each bin. As a consequence of the extra computation needed to distribute the yield shifts in this way, the evaluation of individual process shapes in `--saveShapes` can take longer then previously.
+    One disadvantage of this new approach comes when evaluating the expectation for individual processes, for example when using the `--saveShapes` option in the `FitDiagnostics` mode of <span style="font-variant:small-caps;">Combine</span>. The Barlow-Beeston-lite parameters scale the sum of the process yields directly, so extra work is needed to distribute this total scaling back to each individual process. To achieve this, an additional class `CMSHistFuncWrapper` has been created that, given a particular `CMSHistFunc`, the `CMSHistErrorPropagator` will distribute an appropriate fraction of the total yield shift to each bin. As a consequence of the extra computation needed to distribute the yield shifts in this way, the evaluation of individual process shapes in `--saveShapes` can take longer then previously.
 
 
    
diff --git a/docs/part2/physicsmodels.md b/docs/part2/physicsmodels.md
index 1ee3bac4ef0..9cc0dd77875 100644
--- a/docs/part2/physicsmodels.md
+++ b/docs/part2/physicsmodels.md
@@ -1,6 +1,6 @@
 # Physics Models
 
-<sub><sup>COMBINE</sup></sub> can be run directly on the text-based datacard. However, for more advanced physics models, the internal step to convert the datacard to a binary workspace should be performed by the user. To create a binary workspace starting from a `datacard.txt`, you can run
+<span style="font-variant:small-caps;">Combine</span> can be run directly on the text-based datacard. However, for more advanced physics models, the internal step to convert the datacard to a binary workspace should be performed by the user. To create a binary workspace starting from a `datacard.txt`, you can run 
 
 ```sh
 text2workspace.py datacard.txt -o workspace.root
@@ -110,7 +110,7 @@ Below are some (more generic) example models that also exist in GitHub.
 
 ### MultiSignalModel ready made model for multiple signal processes
 
-<sub><sup>COMBINE</sup></sub> already contains a model **`HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel`** that can be used to assign different signal strengths to multiple processes in a datacard, configurable from the command line.
+<span style="font-variant:small-caps;">Combine</span> already contains a model **`HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel`** that can be used to assign different signal strengths to multiple processes in a datacard, configurable from the command line.
 
 The model is configured by passing one or more mappings in the form **`--PO 'map=bin/process:parameter'`** to text2workspace:
 
@@ -126,7 +126,7 @@ The MultiSignalModel will define all parameters as parameters of interest, but t
 
 Some examples, taking as reference the toy datacard [test/multiDim/toy-hgg-125.txt](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/test/multiDim/toy-hgg-125.txt):
 
--   Scale both `ggH` and `qqH` with the same signal strength `r` (that is what the default physics model of <sub><sup>COMBINE</sup></sub> does for all signals; if they all have the same systematic uncertainties, it is also equivalent to adding up their yields and writing them as a single column in the card)
+-   Scale both `ggH` and `qqH` with the same signal strength `r` (that is what the default physics model of <span style="font-variant:small-caps;">Combine</span> does for all signals; if they all have the same systematic uncertainties, it is also equivalent to adding up their yields and writing them as a single column in the card)
 
 ```nohighlight
   $ text2workspace.py -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel  --PO verbose --PO 'map=.*/ggH:r[1,0,10]' --PO 'map=.*/qqH:r' toy-hgg-125.txt -o toy-1d.root
diff --git a/docs/part2/settinguptheanalysis.md b/docs/part2/settinguptheanalysis.md
index 04703091123..7a695f23e91 100644
--- a/docs/part2/settinguptheanalysis.md
+++ b/docs/part2/settinguptheanalysis.md
@@ -1,19 +1,19 @@
 # Preparing the datacard
 
-The input to <sub><sup>COMBINE</sup></sub>, which defines the details of the analysis, is a plain ASCII file we will refer to as datacard. This is true whether the analysis is a simple counting experiment or a shape analysis.
+The input to <span style="font-variant:small-caps;">Combine</span>, which defines the details of the analysis, is a plain ASCII file we will refer to as datacard. This is true whether the analysis is a simple counting experiment or a shape analysis.
 
 ## A simple counting experiment
 
 The file [data/tutorials/counting/realistic-counting-experiment.txt](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/counting/realistic-counting-experiment.txt) shows an example of a counting experiment.
 
-The first lines can be used to add some descriptive information. Those lines must start with a "#", and they are not parsed by <sub><sup>COMBINE</sup></sub>:
+The first lines can be used to add some descriptive information. Those lines must start with a "#", and they are not parsed by <span style="font-variant:small-caps;">Combine</span>:
 
 ```nohighlight
 # Simple counting experiment, with one signal and a few background processes
 # Simplified version of the 35/pb H->WW analysis for mH = 160 GeV
 ```
 
-Following this, one declares the **number of observables**, **`imax`**, that are present in the model used to set limits / extract confidence intervals. The number of observables will typically be the number of channels in a counting experiment. The value **`*`** can be specified for **`imax`**, which tells <sub><sup>COMBINE</sup></sub> to determine the number of observables from the rest of the datacard. In order to better catch mistakes, it is recommended to explicitly specify the value. 
+Following this, one declares the **number of observables**, **`imax`**, that are present in the model used to set limits / extract confidence intervals. The number of observables will typically be the number of channels in a counting experiment. The value **`*`** can be specified for **`imax`**, which tells <span style="font-variant:small-caps;">Combine</span> to determine the number of observables from the rest of the datacard. In order to better catch mistakes, it is recommended to explicitly specify the value. 
 
 ```nohighlight
 imax 1  number of channels
@@ -95,7 +95,7 @@ The expected shape can be parametric, or not. In the first case the parametric P
 
 As with the counting experiment, the total nominal *rate* of a given process must be identified in the **rate** line of the datacard. However, there are special options for shape-based analyses, as follows:
 
-   * A value of **-1** in the rate line means <sub><sup>COMBINE</sup></sub> will calculate the rate from the input TH1 (via TH1::Integral) or RooDataSet/RooDataHist (via RooAbsData::sumEntries).
+   * A value of **-1** in the rate line means <span style="font-variant:small-caps;">Combine</span> will calculate the rate from the input TH1 (via TH1::Integral) or RooDataSet/RooDataHist (via RooAbsData::sumEntries).
    * For parametric shapes (RooAbsPdf), if a parameter with the name pdfname**_norm** is found in the input workspace, the rate will be multiplied by the value of that parameter. Note that since this parameter can be freely floating, the normalization of a process can be set freely float this way. This can also be achieved through the use of [`rateParams`](#rate-parameters).
 
 ### Binned shape analyses
@@ -106,7 +106,7 @@ For each channel, histograms have to be provided for the observed shape and for
 -   The normalization of the data histogram must correspond to the number of observed events.
 -   The normalization of the expected histograms must match the expected event yields.
 
-The <sub><sup>COMBINE</sup></sub> tool can take as input histograms saved as TH1, as RooAbsHist in a RooFit workspace (an example of how to create a RooFit workspace and save histograms is available in [github](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/benchmarks/shapes/make_simple_shapes.cxx)), or from a pandas dataframe ([example](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/shapes/simple-shapes-df.txt)).
+The <span style="font-variant:small-caps;">Combine</span> tool can take as input histograms saved as TH1, as RooAbsHist in a RooFit workspace (an example of how to create a RooFit workspace and save histograms is available in [github](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/benchmarks/shapes/make_simple_shapes.cxx)), or from a pandas dataframe ([example](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/shapes/simple-shapes-df.txt)).
 
 The block of lines defining the mapping (first block in the datacard) contains one or more rows of the form
 
@@ -206,7 +206,7 @@ or a postifx can be added to the histogram name:
 `shapes * * shapes.root $CHANNEL/$PROCESS  $CHANNEL/$PROCESS_$SYSTEMATIC`
 
 !!! warning
-    If you have a nuisance parameter that has shape effects on some processes (using `shape`) *and* rate effects on other processes (using `lnN`) you should use a single line for the systematic uncertainty with `shape?`. This will tell <sub><sup>COMBINE</sup></sub> to fist look for Up/Down systematic templates for that process and if it doesnt find them, it will interpret the number that you put for the process as a `lnN` instead. 
+    If you have a nuisance parameter that has shape effects on some processes (using `shape`) *and* rate effects on other processes (using `lnN`) you should use a single line for the systematic uncertainty with `shape?`. This will tell <span style="font-variant:small-caps;">Combine</span> to fist look for Up/Down systematic templates for that process and if it doesnt find them, it will interpret the number that you put for the process as a `lnN` instead. 
 
 For a detailed example of a template-based binned analysis, see the [H→ττ 2014 DAS tutorial](https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideCMSDataAnalysisSchool2014HiggsCombPropertiesExercise#A_shape_analysis_using_templates)
 
@@ -223,7 +223,7 @@ In the datacard using templates, the column after the file name would have been
 The first part identifies the name of the input [RooWorkspace](http://root.cern.ch/root/htmldoc/RooWorkspace.html) containing the PDF, and the second part the name of the [RooAbsPdf](http://root.cern.ch/root/htmldoc/RooAbsPdf.html) inside it (or, for the observed data, the [RooAbsData](http://root.cern.ch/root/htmldoc/RooAbsData.html)). It is possible to have multiple input workspaces, just as there can be multiple input ROOT files. You can use any of the usual RooFit pre-defined PDFs for your signal and background models.
 
 !!! warning
-    If in your model you are using RooAddPdfs, in which the coefficients are *not defined recursively*, <sub><sup>COMBINE</sup></sub> will not interpret them correctly. You can add the option `--X-rtd ADDNLL_RECURSIVE=0` to any <sub><sup>COMBINE</sup></sub> command in order to recover the correct interpretation, however we recommend that you instead re-define your PDF so that the coefficients are recursive (as described in the [RooAddPdf documentation](https://root.cern.ch/doc/master/classRooAddPdf.html)) and keep the total normalization (i.e the extended term) as a separate object, as in the case of the tutorial datacard.
+    If in your model you are using RooAddPdfs, in which the coefficients are *not defined recursively*, <span style="font-variant:small-caps;">Combine</span> will not interpret them correctly. You can add the option `--X-rtd ADDNLL_RECURSIVE=0` to any <span style="font-variant:small-caps;">Combine</span> command in order to recover the correct interpretation, however we recommend that you instead re-define your PDF so that the coefficients are recursive (as described in the [RooAddPdf documentation](https://root.cern.ch/doc/master/classRooAddPdf.html)) and keep the total normalization (i.e the extended term) as a separate object, as in the case of the tutorial datacard.
 
 For example, take a look at the [data/tutorials/shapes/simple-shapes-parametric.txt](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/shapes/simple-shapes-parametric.txt). We see the following line:
 
@@ -256,11 +256,11 @@ RooDataSet::data_obs(j)
 ```
 
 
-In this datacard, the signal is parameterized in terms of the hypothesized mass (`MH`). <sub><sup>COMBINE</sup></sub> will use this variable, instead of creating its own, which will be interpreted as the value for `-m`. For this reason, we should add the option `-m 30` (or something else within the observable range) when running <sub><sup>COMBINE</sup></sub>. You will also see there is a variable named `bkg_norm`. This is used to normalize the background rate (see the section on [Rate parameters](#rate-parameters) below for details).
+In this datacard, the signal is parameterized in terms of the hypothesized mass (`MH`). <span style="font-variant:small-caps;">Combine</span> will use this variable, instead of creating its own, which will be interpreted as the value for `-m`. For this reason, we should add the option `-m 30` (or something else within the observable range) when running <span style="font-variant:small-caps;">Combine</span>. You will also see there is a variable named `bkg_norm`. This is used to normalize the background rate (see the section on [Rate parameters](#rate-parameters) below for details).
 
 
 !!! warning
-    <sub><sup>COMBINE</sup></sub> will not accept RooExtendedPdfs as input. This is to alleviate a bug that lead to improper treatment of the normalization when using multiple RooExtendedPdfs to describe a single process. You should instead use RooAbsPdfs and provide the rate as a separate object (see the [Rate parameters](#rate-parameters) section).
+    <span style="font-variant:small-caps;">Combine</span> will not accept RooExtendedPdfs as input. This is to alleviate a bug that lead to improper treatment of the normalization when using multiple RooExtendedPdfs to describe a single process. You should instead use RooAbsPdfs and provide the rate as a separate object (see the [Rate parameters](#rate-parameters) section).
 
 The part of the datacard related to the systematics can include lines with the syntax
 
@@ -274,16 +274,16 @@ In the [data/tutorials/shapes/simple-shapes-parametric.txt](https://github.com/c
 sigma   param 1.0      0.1
 ```
 
-meaning there is a parameter in the input workspace called **`sigma`**, that should be *constrained* with a Gaussian centered at 1.0 with a width of 0.1. Note that the exact interpretation of these parameters is left to the user since the signal PDF is constructed externally by you. All <sub><sup>COMBINE</sup></sub> knows is that 1.0 should be the most likely value and 0.1 is its 1σ uncertainy. Asymmetric uncertainties are written using the syntax **-1σ/+1σ** in the datacard, as is the case for `lnN` uncertainties. 
+meaning there is a parameter in the input workspace called **`sigma`**, that should be *constrained* with a Gaussian centered at 1.0 with a width of 0.1. Note that the exact interpretation of these parameters is left to the user since the signal PDF is constructed externally by you. All <span style="font-variant:small-caps;">Combine</span> knows is that 1.0 should be the most likely value and 0.1 is its 1σ uncertainy. Asymmetric uncertainties are written using the syntax **-1σ/+1σ** in the datacard, as is the case for `lnN` uncertainties. 
 
 If one wants to specify a parameter that is freely floating across its given range, and not Gaussian constrained, the following syntax is used:
 
  - **name *flatParam* **
 
-Though this is *not strictly necessary* in frequentist methods using profiled likelihoods, as <sub><sup>COMBINE</sup></sub> will still profile these nuisances when performing fits (as is the case for the `simple-shapes-parametric.txt` datacard).
+Though this is *not strictly necessary* in frequentist methods using profiled likelihoods, as <span style="font-variant:small-caps;">Combine</span> will still profile these nuisances when performing fits (as is the case for the `simple-shapes-parametric.txt` datacard).
 
 !!! warning
-    All parameters that are floating or constant in the user's input workspaces will remain floating or constant. <sub><sup>COMBINE</sup></sub> will ***not*** modify those for you!
+    All parameters that are floating or constant in the user's input workspaces will remain floating or constant. <span style="font-variant:small-caps;">Combine</span> will ***not*** modify those for you!
 
 A full example of a parametric analysis can be found in this [H→γγ 2014 DAS tutorial](https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideCMSDataAnalysisSchool2014HiggsCombPropertiesExercise#A_parametric_shape_analysis_H).
 
@@ -332,7 +332,7 @@ The overall rate "expected" of a particular process in a particular bin does not
 name rateParam bin process initial_value [min,max]
 ```
 
-The `[min,max]` argument is optional. If it is not included, <sub><sup>COMBINE</sup></sub>  will remove the range of this parameter. This will produce a new parameter, which multiplies the rate of that particular **process** in the given **bin** by its value, in the model (unless it already exists).
+The `[min,max]` argument is optional. If it is not included, <span style="font-variant:small-caps;">Combine</span>  will remove the range of this parameter. This will produce a new parameter, which multiplies the rate of that particular **process** in the given **bin** by its value, in the model (unless it already exists).
 
 You can attach the same `rateParam` to multiple processes/bins by either using a wild card (eg `*` will match everything, `QCD_*` will match everything starting with `QCD_`, etc.) in the name of the bin and/or process, or by repeating the `rateParam` line in the datacard for different bins/processes with the same name.
 
@@ -350,7 +350,7 @@ name rateParam bin process formula args
 
 where `args` is a comma-separated list of the arguments for the string `formula`. You can include other nuisance parameters in the `formula`, including ones that are Gaussian constrained (i,e via the `param` directive.)
 
-Below is an example datacard that uses the `rateParam` directive to implement an ABCD-like method in <sub><sup>COMBINE</sup></sub>. For a more realistic description of its use for ABCD, see the single-lepton SUSY search implementation described [here](http://cms.cern.ch/iCMS/jsp/openfile.jsp?tp=draft&files=AN2015_207_v5.pdf).
+Below is an example datacard that uses the `rateParam` directive to implement an ABCD-like method in <span style="font-variant:small-caps;">Combine</span>. For a more realistic description of its use for ABCD, see the single-lepton SUSY search implementation described [here](http://cms.cern.ch/iCMS/jsp/openfile.jsp?tp=draft&files=AN2015_207_v5.pdf).
 
 ```nohighlight
 imax 4  number of channels
@@ -457,7 +457,7 @@ The syntax is simple: **`combineCards.py Name1=card1.txt Name2=card2.txt .... >
 If the input datacards had just one bin each, the output channels will be called `Name1`, `Name2`, and so on. Otherwise, a prefix `Name1_` ... `Name2_` will be added to the bin labels in each datacard. The supplied bin names `Name1`, `Name2`, etc. must themselves conform to valid C++/python identifier syntax.
 
 !!! warning
-    When combining datacards, you should keep in mind that systematic uncertainties that have different names will be assumed to be uncorrelated, and those with the same name will be assumed 100% correlated. An uncertainty correlated across channels must have the same PDF. in all cards (i.e. always **`lnN`**, or all **`gmN`** with same `N`. Note that `shape` and `lnN` can be interchanged via the `shape?` directive). Furthermore, when using *parametric models*, "parameter" objects such as `RooRealVar`, `RooAbsReal`, and `RooAbsCategory` (parameters, PDF indices etc) with the same name will be assumed to be the same object. If this is not intended, you may encounter unexpected behaviour, such as the order of combining cards having an impact on the results. Make sure that such objects are named differently in your inputs if they represent different things! Instead, <sub><sup>COMBINE</sup></sub> will try to rename other "shape" objects (such as PDFs) automatically. 
+    When combining datacards, you should keep in mind that systematic uncertainties that have different names will be assumed to be uncorrelated, and those with the same name will be assumed 100% correlated. An uncertainty correlated across channels must have the same PDF. in all cards (i.e. always **`lnN`**, or all **`gmN`** with same `N`. Note that `shape` and `lnN` can be interchanged via the `shape?` directive). Furthermore, when using *parametric models*, "parameter" objects such as `RooRealVar`, `RooAbsReal`, and `RooAbsCategory` (parameters, PDF indices etc) with the same name will be assumed to be the same object. If this is not intended, you may encounter unexpected behaviour, such as the order of combining cards having an impact on the results. Make sure that such objects are named differently in your inputs if they represent different things! Instead, <span style="font-variant:small-caps;">Combine</span> will try to rename other "shape" objects (such as PDFs) automatically. 
 
 The `combineCards.py` script will fail if you are trying to combine a *shape* datacard with a *counting* datacard. You can however convert a *counting* datacard into an equivalent shape-based one by adding a line `shapes * * FAKE` in the datacard after the `imax`, `jmax`, and `kmax` section. Alternatively, you can add the option `-S` to `combineCards.py`, which will do this for you while creating the combined datacard.
 
diff --git a/docs/part3/commonstatsmethods.md b/docs/part3/commonstatsmethods.md
index 3c237e41514..e1087812115 100644
--- a/docs/part3/commonstatsmethods.md
+++ b/docs/part3/commonstatsmethods.md
@@ -1,6 +1,6 @@
 # Common Statistical Methods
 
-In this section, the most commonly used statistical methods from combine will be covered, including specific instructions on how to obtain limits, significances, and likelihood scans. For all of these methods, the assumed parameter of interest (POI) is the overall signal strength **r** (i.e the default PhysicsModel). In general however, the first POI in the list of POIs (as defined by the PhysicsModel) will be taken instead of **r**. This may or may not make sense for any particular method, so care must be taken. 
+In this section, the most commonly used statistical methods from <span style="font-variant:small-caps;">Combine</span> will be covered, including specific instructions on how to obtain limits, significances, and likelihood scans. For all of these methods, the assumed parameter of interest (POI) is the overall signal strength **r** (i.e the default PhysicsModel). In general however, the first POI in the list of POIs (as defined by the PhysicsModel) will be taken instead of **r**. This may or may not make sense for any particular method, so care must be taken. 
 
 This section will assume that you are using the default physics model, unless otherwise specified.
 
@@ -11,7 +11,7 @@ The limit calculation relies on an asymptotic approximation of the distributions
 
    * The test statistic is defined using the ratio of likelihoods $q_{r} = -2\ln[\mathcal{L}(\mathrm{data}|r,\hat{\theta}_{r})/\mathcal{L}(\mathrm{data}|r=\hat{r},\hat{\theta})]$ , in which the nuisance parameters are profiled separately for $r=\hat{r}$ and $r$. The value of $q_{r}$ is set to 0 when $\hat{r}>r$, giving a one-sided limit. Furthermore, the constraint $r>0$ is enforced in the fit. This means that if the unconstrained value of $\hat{r}$ would be negative, the test statistic $q_{r}$ is evaluated as $-2\ln[\mathcal{L}(\mathrm{data}|r,\hat{\theta}_{r})/\mathcal{L}(\mathrm{data}|r=0,\hat{\theta}_{0})]$
 
-This method is the default combine method: if you call Combine without specifying `-M`, the `AsymptoticLimits` method will be run.
+This method is the default <span style="font-variant:small-caps;">Combine</span> method: if you call <span style="font-variant:small-caps;">Combine</span> without specifying `-M`, the `AsymptoticLimits` method will be run.
 
 A realistic example of a datacard for a counting experiment can be found in the HiggsCombination package: [data/tutorials/counting/realistic-counting-experiment.txt](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/counting/realistic-counting-experiment.txt)
 
@@ -43,7 +43,7 @@ By default, the limits are calculated using the CL<sub>s</sub> prescription, as
 
 
 !!! warning
-    You may find that combine issues a warning that the best fit for the background-only Asimov dataset returns a nonzero value for the signal strength;
+    You may find that <span style="font-variant:small-caps;">Combine</span> issues a warning that the best fit for the background-only Asimov dataset returns a nonzero value for the signal strength;
 
     `WARNING: Best fit of asimov dataset is at r = 0.220944 (0.011047 times` `rMax), while it should be at zero`
 
@@ -253,7 +253,7 @@ Running the script on the output file produced for the same datacard (including
 
 	0.950975 % (0.95 %) interval (target)  = 0 < r < 2.2
 
-along with a plot of the posterior distribution shown below. This is the same as the output from combine, but the script can also be used to find lower limits (for example) or credible intervals.
+along with a plot of the posterior distribution shown below. This is the same as the output from <span style="font-variant:small-caps;">Combine</span>, but the script can also be used to find lower limits (for example) or credible intervals.
 
 ![](images/bayes1D.png)
 
@@ -289,7 +289,7 @@ The expected limit is computed by generating many toy MC data sets and computing
 
 The program will print out the mean and median limit, as well as the 68% and 95% quantiles of the distributions of the limits. This time, the output ROOT tree will contain **one entry per toy**.
 
-For more heavy methods (eg the `MarkovChainMC`) you will probably want to split this calculation into multiple jobs. To do this, just run `combine` multiple times specifying a smaller number of toys (as low as `1`), using a different seed to initialize the random number generator each time. The option `-s` can be used for this; if you set it to -1, the starting seed will be initialized randomly at the beginning of the job. Finally, you can merge the resulting trees with `hadd` and look at the distribution in the merged file.
+For more heavy methods (eg the `MarkovChainMC`) you will probably want to split this calculation into multiple jobs. To do this, just run <span style="font-variant:small-caps;">Combine</span> multiple times specifying a smaller number of toys (as low as `1`), using a different seed to initialize the random number generator each time. The option `-s` can be used for this; if you set it to -1, the starting seed will be initialized randomly at the beginning of the job. Finally, you can merge the resulting trees with `hadd` and look at the distribution in the merged file.
 
 ### Multidimensional bayesian credible regions
 
@@ -346,7 +346,7 @@ While the above shortcuts are the commonly used versions, variations can be test
     If you have unconstrained parameters in your model (`rateParam`, or if you are using a `_norm` variable for a PDF) and you want to use the "Hybrid-Bayesian" method, you **must** declare these as `flatParam` in your datacard. When running text2workspace you must add the option `--X-assign-flatParam-prior` in the command line. This will create uniform priors for these parameters. These are needed for this method and they would otherwise not get created.   
 
 !!! info
-    Note that (observed and expected) values of the test statistic stored in the instances of `RooStats::HypoTestResult` when the option `--saveHybridResult` is passed are defined without the factor 2. They are therefore twice as small as the values given by the formulas above. This factor is however included automatically by all plotting scripts supplied within the Combine package. If you use your own plotting scripts, you need to make sure to incorporate the factor 2. 
+    Note that (observed and expected) values of the test statistic stored in the instances of `RooStats::HypoTestResult` when the option `--saveHybridResult` is passed are defined without the factor 2. They are therefore twice as small as the values given by the formulas above. This factor is however included automatically by all plotting scripts supplied within the <span style="font-variant:small-caps;">Combine</span> package. If you use your own plotting scripts, you need to make sure to incorporate the factor 2. 
 
 ### Simple models
 
@@ -547,7 +547,7 @@ The search for the limit is performed using an adaptive algorithm, terminating w
 -   `clsAcc`: this determines the absolute accuracy up to which the CLs values are computed when searching for the limit. The default is 0.5%. Raising the accuracy above this value will significantly increase the time needed to run the algorithm, as you need N<sup>2</sup> more toys to improve the accuracy by a factor N. You can consider increasing this value if you are computing limits with a larger CL (e.g. 90% or 68%). Note that if you are using the `CLsplusb` rule, this parameter will control the uncertainty on $p_{\mu}$ rather than CL<sub>s</sub>.
 -   `T` or `toysH`: controls the minimum number of toys that are generated for each point. The default value of 500 should be sufficient when computing the limit at 90-95% CL. You can decrease this number if you are computing limits at 68% CL, or increase it if you are using 99% CL.
 
-Note, to further improve the accuracy when searching for the upper limit, <sub><sup>COMBINE</sup></sub> will also fit an exponential function to several of the points and interpolate to find the crossing.
+Note, to further improve the accuracy when searching for the upper limit, <span style="font-variant:small-caps;">Combine</span> will also fit an exponential function to several of the points and interpolate to find the crossing.
 
 ### Complex models
 
@@ -690,9 +690,9 @@ The output limit tree will contain the value of the test statistic in each toy (
 
 For analyses that employ a simultaneous fit across signal and control regions, it may be useful to mask one or more analysis regions, either when the likelihood is maximized (fit) or when the test statistic is computed. This can be done by using the options `--setParametersForFit` and `--setParametersForEval`, respectively. The former will set parameters *before* each fit, while the latter is used to set parameters *after* each fit, but before the NLL is evaluated. Note, of course, that if the parameter in the list is floating, it will still be floating in each fit. Therefore, it will not affect the results when using `--setParametersForFit`.  
 
-A realistic example for a binned shape analysis performed in one signal region and two control samples can be found in this directory of the combine package [Datacards-shape-analysis-multiple-regions](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/tree/81x-root606/data/tutorials/rate_params).
+A realistic example for a binned shape analysis performed in one signal region and two control samples can be found in this directory of the <span style="font-variant:small-caps;">Combine</span> package [Datacards-shape-analysis-multiple-regions](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/tree/81x-root606/data/tutorials/rate_params).
 
-First of all, one needs to combine the individual datacards to build a single model, and to introduce the channel masking variables as follow:
+First of all, one needs to <span style="font-variant:small-caps;">Combine</span> the individual datacards to build a single model, and to introduce the channel masking variables as follow:
 
 ```sh
 combineCards.py signal_region.txt dimuon_control_region.txt singlemuon_control_region.txt > combined_card.txt
@@ -774,7 +774,7 @@ In the default model built from the datacards, the signal strengths in all chann
 When run with a verbosity of 1, as is the default, the program also prints out the best fit signal strengths in all channels. As the fit to all channels is done simultaneously, the correlation between the other systematic uncertainties is taken into account. Therefore, these results can differ from the ones obtained when fitting each channel separately.
 
 
-Below is an example output from <sub><sup>COMBINE</sup></sub>,
+Below is an example output from <span style="font-variant:small-caps;">Combine</span>,
 
 ```nohighlight
 $ combine -M ChannelCompatibilityCheck comb_hww.txt -m 160 -n HWW
@@ -903,7 +903,7 @@ best_fit->SetMarkerSize(3); best_fit->SetMarkerStyle(34); best_fit->Draw("p same
 
 ![](images/nll2D.png)
 
-To make the full profiled scan, just remove the `--fastScan` option from the combine command.
+To make the full profiled scan, just remove the `--fastScan` option from the <span style="font-variant:small-caps;">Combine</span> command.
 
 Similarly, 1D scans can be drawn directly from the tree, however for 1D likelihood scans, there is a python script from the [`CombineHarvester/CombineTools`](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool) package [plot1DScan.py](https://github.com/cms-analysis/CombineHarvester/blob/113x/CombineTools/scripts/plot1DScan.py) that can be used to make plots and extract the crossings of the `2*deltaNLL` - e.g the 1σ/2σ boundaries.
 
@@ -918,7 +918,7 @@ A number of common, useful options (especially for computing likelihood scans wi
 * `--squareDistPoiStep`:  POI step size based on distance from the midpoint ( either (max-min)/2 or the best fit if used with `--autoRange` or `--centeredRange` ) rather than linear separation.
 * `--skipInitialFit`: Skip the initial fit (saves time if, for example, a snapshot is loaded from a previous fit)
 
-Below is a comparison in a likelihood scan, with 20 points, as a function of **`r_qqH`** with our `toy-hgg-125.root` workspace with and without some of these options. The options added tell <sub><sup>COMBINE</sup></sub> to scan more points closer to the minimum (best-fit) than with the default.
+Below is a comparison in a likelihood scan, with 20 points, as a function of **`r_qqH`** with our `toy-hgg-125.root` workspace with and without some of these options. The options added tell <span style="font-variant:small-caps;">Combine</span> to scan more points closer to the minimum (best-fit) than with the default.
 
 ![](images/r_qqH.png)
 
@@ -988,7 +988,7 @@ The Feldman-Cousins (FC) procedure for computing confidence intervals for a gene
 
 With a critical value $\alpha$.
 
-In <sub><sup>COMBINE</sup></sub>, you can perform this test on each individual point (**param1, param2,...**) = (**value1,value2,...**) by doing,
+In <span style="font-variant:small-caps;">Combine</span>, you can perform this test on each individual point (**param1, param2,...**) = (**value1,value2,...**) by doing,
 
 ```sh
 combine workspace.root -M HybridNew --LHCmode LHC-feldman-cousins --clsAcc 0 --singlePoint  param1=value1,param2=value2,param3=value3,... --saveHybridResult [Other options for toys, iterations etc as with limits]
@@ -997,7 +997,7 @@ combine workspace.root -M HybridNew --LHCmode LHC-feldman-cousins --clsAcc 0 --s
 The point belongs to your confidence region if $p_{x}$ is larger than $\alpha$ (e.g. 0.3173 for a 1σ region, $1-\alpha=0.6827$).
 
 !!! warning
-    You should not use this method without the option `--singlePoint`. Although <sub><sup>COMBINE</sup></sub> will not complain, the algorithm to find the crossing will only find a single crossing and therefore not find the correct interval. Instead you should calculate the Feldman-Cousins intervals as described above.
+    You should not use this method without the option `--singlePoint`. Although <span style="font-variant:small-caps;">Combine</span> will not complain, the algorithm to find the crossing will only find a single crossing and therefore not find the correct interval. Instead you should calculate the Feldman-Cousins intervals as described above.
 
 ### Physical boundaries
 
@@ -1018,7 +1018,7 @@ $q(x) = - 2 \ln \mathcal{L}(\mathrm{data}|x,\hat{\theta}_{x})/\mathcal{L}(\mathr
 This can sometimes be an issue as Minuit may not know if has successfully converged when the minimum lies outside of that range. If there is no upper/lower boundary, just set that value to something far from the region of interest.
 
 !!! info
-    One can also imagine imposing the boundaries by first allowing Minuit to find the minimum in the *unrestricted*  region and then setting the test statistic to that in the case that minimum lies outside the physical boundary. This would avoid potential issues of convergence. If you are interested in implementing this version in combine, please contact the development team.
+    One can also imagine imposing the boundaries by first allowing Minuit to find the minimum in the *unrestricted*  region and then setting the test statistic to that in the case that minimum lies outside the physical boundary. This would avoid potential issues of convergence. If you are interested in implementing this version in <span style="font-variant:small-caps;">Combine</span>, please contact the development team.
 
 
 ### Extracting contours from results files
@@ -1039,7 +1039,7 @@ You can produce a plot of the value of $p_{x}$ vs the parameter of interest $x$
 
 #### Extracting 2D contours
 
-There is a tool for extracting *2D contours* from the output of `HybridNew` located in `test/makeFCcontour.py`. This can be used provided the option `--saveHybridResult` was included when running `HybridNew`. It can be run with the usual combine output files (or several of them) as input,
+There is a tool for extracting *2D contours* from the output of `HybridNew` located in `test/makeFCcontour.py`. This can be used provided the option `--saveHybridResult` was included when running `HybridNew`. It can be run with the usual <span style="font-variant:small-caps;">Combine</span> output files (or several of them) as input,
 
 ```sh
 ./test/makeFCcontour.py  toysfile1.root toysfile2.root .... [options] -out outputfile.root
diff --git a/docs/part3/debugging.md b/docs/part3/debugging.md
index 34f6f68dd9b..cecb0178fd2 100644
--- a/docs/part3/debugging.md
+++ b/docs/part3/debugging.md
@@ -1,6 +1,6 @@
 # Debugging fits
 
-When a fit fails there are several things you can do to investigate. CMS users can have a look at [these slides](https://indico.cern.ch/event/976099/contributions/4138476/attachments/2163625/3651175/CombineTutorial-2020-debugging.pdf) from a previous <sub><sup>COMBINE</sup></sub> tutorial.
+When a fit fails there are several things you can do to investigate. CMS users can have a look at [these slides](https://indico.cern.ch/event/976099/contributions/4138476/attachments/2163625/3651175/CombineTutorial-2020-debugging.pdf) from a previous <span style="font-variant:small-caps;">Combine</span> tutorial.
 This section contains a few pointers for some of the methods mentioned in the slides.
 
 ## Analyzing the NLL shape in each parameter
diff --git a/docs/part3/nonstandard.md b/docs/part3/nonstandard.md
index b44392fb7a7..fcc66ebe89e 100644
--- a/docs/part3/nonstandard.md
+++ b/docs/part3/nonstandard.md
@@ -1,6 +1,6 @@
 # Advanced Use Cases
 
-This section will cover some of the more specific use cases for <sub><sup>COMBINE</sup></sub> that are not necessarily related to the main results of the analysis. 
+This section will cover some of the more specific use cases for <span style="font-variant:small-caps;">Combine</span> that are not necessarily related to the main results of the analysis. 
 
 ## Fit Diagnostics
 
@@ -131,7 +131,7 @@ The normalization values for each toy will be stored in the branches inside the
 Additionally, there will be branches that provide the value of the expected **bin** content for each process, in each channel. These are named **n\_exp[\_final]\_binxxx\_proc\_yyy_i** (where **\_final** will only be in the name if there are systematic uncertainties affecting this process) for channel `xxx`, process `yyy`, bin number `i`. In the case of the post-fit trees (`tree_fit_s/b`), these will be the expectations from the _fitted_ models, while for the pre-fit tree, they will be the expectation from the generated model (i.e if running toys with `-t N` and using `--genNuisances`, they will be randomized for each toy). These can be useful, for example, for calculating correlations/covariances between different bins, in different channels or processes, within the model from toys. 
 
 !!! info
-    Be aware that for *unbinned* models, a binning scheme is adopted based on the `RooRealVar::getBinning` for the observable defining the shape, if it exists, or <sub><sup>COMBINE</sup></sub> will adopt some appropriate binning for each observable. 
+    Be aware that for *unbinned* models, a binning scheme is adopted based on the `RooRealVar::getBinning` for the observable defining the shape, if it exists, or <span style="font-variant:small-caps;">Combine</span> will adopt some appropriate binning for each observable. 
 
 
 ### Plotting
@@ -164,7 +164,7 @@ The distributions and normalizations are guaranteed to give the correct interpre
 
 - For shape datacards whose inputs are `TH1`, the histograms/data points will have the bin number as the x-axis and the content of each bin will be a number of events.
 
-- For datacards whose inputs are `RooAbsPdf`/`RooDataHist`s, the x-axis will correspond to the observable and the bin content will be the PDF density / events divided by the bin width. This means the absolute number of events in a given bin, i, can be obtained from `h.GetBinContent(i)*h.GetBinWidth(i)` or similar for the data graphs. **Note** that for *unbinned* analyses <sub><sup>COMBINE</sup></sub> will make a reasonable guess as to an appropriate binning. 
+- For datacards whose inputs are `RooAbsPdf`/`RooDataHist`s, the x-axis will correspond to the observable and the bin content will be the PDF density / events divided by the bin width. This means the absolute number of events in a given bin, i, can be obtained from `h.GetBinContent(i)*h.GetBinWidth(i)` or similar for the data graphs. **Note** that for *unbinned* analyses <span style="font-variant:small-caps;">Combine</span> will make a reasonable guess as to an appropriate binning. 
 
 Uncertainties on the shapes will be added with the option `--saveWithUncertainties`. These uncertainties are generated by re-sampling of the fit covariance matrix, thereby accounting for the full correlation between the parameters of the fit. 
 
@@ -231,7 +231,7 @@ The impact of a nuisance parameter (NP) θ on a parameter of interest (POI) μ i
 
 This is effectively a measure of the correlation between the NP and the POI, and is useful for determining which NPs have the largest effect on the POI uncertainty.
 
-It is possible to use the `FitDiagnostics` method of <sub><sup>COMBINE</sup></sub> with the option `--algo impact -P parameter` to calculate the impact of a particular nuisance parameter on the parameter(s) of interest. We will use the `combineTool.py` script to automate the fits (see the [`combineTool`](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool) section to check out the tool.
+It is possible to use the `FitDiagnostics` method of <span style="font-variant:small-caps;">Combine</span> with the option `--algo impact -P parameter` to calculate the impact of a particular nuisance parameter on the parameter(s) of interest. We will use the `combineTool.py` script to automate the fits (see the [`combineTool`](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool) section to check out the tool.
 
 We will use an example workspace from the [$H\rightarrow\tau\tau$ datacard](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/htt/125/htt_tt.txt),
 
@@ -240,7 +240,7 @@ $ cp HiggsAnalysis/CombinedLimit/data/tutorials/htt/125/htt_tt.txt .
 $ text2workspace.py htt_tt.txt -m 125
 ```
 
-Calculating the impacts is done in a few stages. First we just fit for each POI, using the `--doInitialFit` option with `combineTool.py`, and adding the `--robustFit 1` option that will be passed through to <sub><sup>COMBINE</sup></sub>,
+Calculating the impacts is done in a few stages. First we just fit for each POI, using the `--doInitialFit` option with `combineTool.py`, and adding the `--robustFit 1` option that will be passed through to <span style="font-variant:small-caps;">Combine</span>,
 
     combineTool.py -M Impacts -d htt_tt.root -m 125 --doInitialFit --robustFit 1
    
@@ -250,7 +250,7 @@ Next we perform a similar scan for each nuisance parameter with the `--doFits` o
 
     combineTool.py -M Impacts -d htt_tt.root -m 125 --robustFit 1 --doFits
     
-Note that this will run approximately 60 scans, and to speed things up the option `--parallel X` can be given to run X <sub><sup>COMBINE</sup></sub> jobs simultaneously. The batch and grid submission methods described in the [combineTool for job submission](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#combinetool-for-job-submission) section can also be used.
+Note that this will run approximately 60 scans, and to speed things up the option `--parallel X` can be given to run X <span style="font-variant:small-caps;">Combine</span> jobs simultaneously. The batch and grid submission methods described in the [combineTool for job submission](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#combinetool-for-job-submission) section can also be used.
 
 Once all jobs are completed, the output can be collected and written into a json file:
 
@@ -267,7 +267,7 @@ The first page of the output is shown below.
 
 The direction of the +1σ and -1σ impacts (i.e. when the NP is moved to its +1σ or -1σ values) on the POI indicates whether the parameter is correlated or anti-correlated with it.
 
-For models with multiple POIs, the <sub><sup>COMBINE</sup></sub> option `--redefineSignalPOIs X,Y,Z...` should be specified in all three of the `combineTool.py -M Impacts [...]` steps above. The final step will produce the `impacts.json` file which will contain the impacts for all the specified POIs. In the `plotImpacts.py` script, a particular POI can be specified with `--POI X`. 
+For models with multiple POIs, the <span style="font-variant:small-caps;">Combine</span> option `--redefineSignalPOIs X,Y,Z...` should be specified in all three of the `combineTool.py -M Impacts [...]` steps above. The final step will produce the `impacts.json` file which will contain the impacts for all the specified POIs. In the `plotImpacts.py` script, a particular POI can be specified with `--POI X`. 
 
 !!! warning 
     The plot also shows the *best fit* value of the POI at the top and its uncertainty. You may wish to allow the range to go negative (i.e using `--setParameterRanges` or `--rMin`) to avoid getting one-sided impacts!
@@ -289,7 +289,7 @@ The left panel in the summary plot shows the value of $(\theta-\theta_{0})/\Delt
 
 ## Breakdown of uncertainties
 
-Often you will want to report the breakdown of your total (systematic) uncertainty on a measured parameter due to one or more groups of nuisance parameters. For example, these groups could be theory uncertainties, trigger uncertainties, ... The prodecude to do this in <sub><sup>COMBINE</sup></sub> is to sequentially freeze groups of nuisance parameters and subtract (in quadrature) from the total uncertainty. Below are the steps to do so. We will use the `data/tutorials/htt/125/htt_tt.txt` datacard for this. 
+Often you will want to report the breakdown of your total (systematic) uncertainty on a measured parameter due to one or more groups of nuisance parameters. For example, these groups could be theory uncertainties, trigger uncertainties, ... The prodecude to do this in <span style="font-variant:small-caps;">Combine</span> is to sequentially freeze groups of nuisance parameters and subtract (in quadrature) from the total uncertainty. Below are the steps to do so. We will use the `data/tutorials/htt/125/htt_tt.txt` datacard for this. 
 
 1. Add groups to the datacard to group nuisance parameters. Nuisance parameters not in groups will be considered as "rest" in the later steps. The lines should look like the following and you should add them to the end of the datacard
 ```
@@ -339,7 +339,7 @@ The plot below is produced,
 
 ## Channel Masking 
 
-The <sub><sup>COMBINE</sup></sub> tool has a number of features for diagnostics and plotting results of fits. It can often be useful to turn off particular channels in a combined analysis to see how constraints/shifts in parameter values can vary. It can also be helpful to plot the post-fit shapes and uncertainties of a particular channel (for example a signal region) *without* including the constraints from the data in that region. 
+The <span style="font-variant:small-caps;">Combine</span> tool has a number of features for diagnostics and plotting results of fits. It can often be useful to turn off particular channels in a combined analysis to see how constraints/shifts in parameter values can vary. It can also be helpful to plot the post-fit shapes and uncertainties of a particular channel (for example a signal region) *without* including the constraints from the data in that region. 
 
 This can in some cases be achieved by removing a specific datacard when running `combineCards.py`. However, when doing so, the information of particular nuisance parameters and PDFs in that region will be lost. Instead, it is possible to ***mask*** that channel from the likelihood. This is achieved at the `text2Workspace` step using the option `--channel-masks`.
 
@@ -350,7 +350,7 @@ We will take the control region example from the rate parameters tutorial from [
 The first step is to combine the cards 
     combineCards.py signal=signal_region.txt dimuon=dimuon_control_region.txt singlemuon=singlemuon_control_region.txt > datacard.txt
     
-Note that we use the directive `CHANNELNAME=CHANNEL_DATACARD.txt` so that the names of the channels are under our control and easier to interpret. Next, we make a workspace and tell <sub><sup>COMBINE</sup></sub> to create the parameters used to *mask channels*
+Note that we use the directive `CHANNELNAME=CHANNEL_DATACARD.txt` so that the names of the channels are under our control and easier to interpret. Next, we make a workspace and tell <span style="font-variant:small-caps;">Combine</span> to create the parameters used to *mask channels*
  
     text2workspace.py datacard.txt --channel-masks
     
@@ -359,18 +359,18 @@ Now we will try to do a fit *ignoring* the signal region. We can turn off the si
     combine datacard.root -M FitDiagnostics --saveShapes --saveWithUncertainties --setParameters mask_signal=1
     
 !!! warning 
-    There will be a lot of warnings from <sub><sup>COMBINE</sup></sub>. These are safe to ignore as they are due to the s+b fit not converging. This is expected as the free signal parameter cannot be constrained because the data in the signal region is being ignored. 
+    There will be a lot of warnings from <span style="font-variant:small-caps;">Combine</span>. These are safe to ignore as they are due to the s+b fit not converging. This is expected as the free signal parameter cannot be constrained because the data in the signal region is being ignored. 
 
 We can compare the post-fit background and uncertainties with and without the signal region included by re-running with `--setParameters mask_signal=0` (or just removing that option completely). Below is a comparison of the background in the signal region with and without masking the data in the signal region. We take these from the shapes folder 
 **shapes_fit_b/signal/total_background** in the `fitDiagnostics.root` output.
 
 ![](images/masking_tutorial.png)
 
-Clearly the background shape is different and much less constrained *without including the signal region*, as expected. Channel masking can be used with *any method* in <sub><sup>COMBINE</sup></sub>.
+Clearly the background shape is different and much less constrained *without including the signal region*, as expected. Channel masking can be used with *any method* in <span style="font-variant:small-caps;">Combine</span>.
 
 ## RooMultiPdf conventional bias studies
 
-Several analyses in CMS use a functional form to describe the background. This functional form is fit to the data. Often however, there is some uncertainty associated with the choice of which background function to use, and this choice will impact the fit results. It is therefore often the case that in these analyses, a bias study is performed.  This study will give an indication of the size of the potential bias in the result, given a certain choice of functional form. These studies can be conducted using <sub><sup>COMBINE</sup></sub>.
+Several analyses in CMS use a functional form to describe the background. This functional form is fit to the data. Often however, there is some uncertainty associated with the choice of which background function to use, and this choice will impact the fit results. It is therefore often the case that in these analyses, a bias study is performed.  This study will give an indication of the size of the potential bias in the result, given a certain choice of functional form. These studies can be conducted using <span style="font-variant:small-caps;">Combine</span>.
 
 Below is an example script that will produce a workspace based on a simplified Higgs to diphoton (Hgg) analysis with a *single* category. It will produce the data and PDFs necessary for this example, and you can use it as a basis to construct your own studies.
 
@@ -458,7 +458,7 @@ void makeRooMultiPdfWorkspace(){
 }
 ```
 
-The signal is modelled as a simple Gaussian with a width approximately that of the diphoton resolution. For the background there is a choice of 3 functions: an exponential, a power-law, and a 2nd order polynomial. This choice is accessible within <sub><sup>COMBINE</sup></sub> through the use of the [RooMultiPdf](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/interface/RooMultiPdf.h) object, which can switch between the functions by setting their associated indices (herein called **pdf_index**). This (as with all parameters in <sub><sup>COMBINE</sup></sub>) can be set via the `--setParameters` option.
+The signal is modelled as a simple Gaussian with a width approximately that of the diphoton resolution. For the background there is a choice of 3 functions: an exponential, a power-law, and a 2nd order polynomial. This choice is accessible within <span style="font-variant:small-caps;">Combine</span> through the use of the [RooMultiPdf](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/interface/RooMultiPdf.h) object, which can switch between the functions by setting their associated indices (herein called **pdf_index**). This (as with all parameters in <span style="font-variant:small-caps;">Combine</span>) can be set via the `--setParameters` option.
 
 To assess the bias, one can throw toys using one function and fit with another. To do this, only a single datacard is needed: [hgg_toy_datacard.txt](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/tree/main/data/tutorials/bias_studies/hgg_toy_datacard.txt).
  
@@ -468,7 +468,7 @@ The bias studies are performed in two stages. The first is to generate toys usin
     combine hgg_toy_datacard.txt -M GenerateOnly --setParameters pdf_index=0 --toysFrequentist -t 100 --expectSignal 1 --saveToys -m 125 --freezeParameters pdf_index
 ```
 !!! warning 
-    It is important to freeze `pdf_index`, otherwise <sub><sup>COMBINE</sup></sub> will try to iterate over the index in the frequentist fit.
+    It is important to freeze `pdf_index`, otherwise <span style="font-variant:small-caps;">Combine</span> will try to iterate over the index in the frequentist fit.
 
 Now we have 100 toys which, by setting `pdf_index=0`, sets the background PDF to the exponential function. This means we assume that the exponential is the *true* function. Note that the option `--toysFrequentist` is added; this first performs a fit of the PDF, assuming a signal strength of 1, to the data before generating the toys. This is the most obvious choice as to where to throw the toys from.
 
@@ -525,7 +525,7 @@ The above output will produce the following scans.
 
 As expected, the curve obtained by allowing the `pdf_index` to float (labelled "Envelope") picks out the best function (maximum corrected likelihood) for each value of the signal strength. 
 
-In general, the performance of <sub><sup>COMBINE</sup></sub> can be improved when using the discrete profiling method by including the option `--X-rtd MINIMIZER_freezeDisassociatedParams`. This will stop parameters not associated to the current PDF from floating in the fits. Additionally, you can  include the following options:
+In general, the performance of <span style="font-variant:small-caps;">Combine</span> can be improved when using the discrete profiling method by including the option `--X-rtd MINIMIZER_freezeDisassociatedParams`. This will stop parameters not associated to the current PDF from floating in the fits. Additionally, you can  include the following options:
 
    * `--X-rtd MINIMIZER_multiMin_hideConstants`: hide the constant terms in the likelihood when recreating the minimizer
    * `--X-rtd MINIMIZER_multiMin_maskConstraints`: hide the constraint terms during the discrete minimization process
@@ -533,11 +533,11 @@ In general, the performance of <sub><sup>COMBINE</sup></sub> can be improved whe
       * `<choice> 1`: keeps unmasked all channels that are participating in the discrete minimization.
       * `<choice> 2`: keeps unmasked only the channel whose index is being scanned at the moment.
        
-You may want to check with the <sub><sup>COMBINE</sup></sub> development team if you are using these options, as they are somewhat for *expert* use. 
+You may want to check with the <span style="font-variant:small-caps;">Combine</span> development team if you are using these options, as they are somewhat for *expert* use. 
 
 ## RooSplineND multidimensional splines
 
-[RooSplineND](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/interface/RooSplineND.h) can be used to interpolate from a tree of points to produce a continuous function in N-dimensions. This function can then be used as input to workspaces allowing for parametric rates/cross-sections/efficiencies. It can also be used to up-scale the resolution of likelihood scans (i.e like those produced from <sub><sup>COMBINE</sup></sub>) to produce smooth contours. 
+[RooSplineND](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/interface/RooSplineND.h) can be used to interpolate from a tree of points to produce a continuous function in N-dimensions. This function can then be used as input to workspaces allowing for parametric rates/cross-sections/efficiencies. It can also be used to up-scale the resolution of likelihood scans (i.e like those produced from <span style="font-variant:small-caps;">Combine</span>) to produce smooth contours. 
 
 The spline makes use of a radial basis decomposition to produce a continous $N \to 1$ map (function) from $M$ provided sample points. The function of the $N$ variables $\vec{x}$ 
 is assumed to be of the form, 
@@ -937,7 +937,7 @@ As before, we also need to create the `RooParametricHist` for this process in th
    RooAddition p_CRbkg_norm("bkg_CR_norm","Total Number of events from background in control region",bkg_CR_bins);
 ```
 
-Finally, we can also create alternative shape variations (Up/Down) that can be fed to <sub><sup>COMBINE</sup></sub> as we do with `TH1` or `RooDataHist` type workspaces. These need 
+Finally, we can also create alternative shape variations (Up/Down) that can be fed to <span style="font-variant:small-caps;">Combine</span> as we do with `TH1` or `RooDataHist` type workspaces. These need 
 to be of type `RooDataHist`. The example below is for a Jet Energy Scale type shape uncertainty. 
 
 ```c++
@@ -1024,7 +1024,7 @@ acceptance param 0 1
 
 ```
 
-Note that for the control region, our nuisance parameters appear as `param` types, so that <sub><sup>COMBINE</sup></sub> will correctly constrain them.  
+Note that for the control region, our nuisance parameters appear as `param` types, so that <span style="font-variant:small-caps;">Combine</span> will correctly constrain them.  
 
 If we combine the two cards and fit the result with `-M MultiDimFit -v 3` we can see that the parameters that give the rate of background in each bin of the signal region, along with the nuisance parameters and signal strength, are determined by the fit - i.e we have properly included the constraint from the control region, just as with the 1-bin `gmN`. 
 
diff --git a/docs/part3/regularisation.md b/docs/part3/regularisation.md
index 4a83b35b6d0..6cf42ced570 100644
--- a/docs/part3/regularisation.md
+++ b/docs/part3/regularisation.md
@@ -1,9 +1,9 @@
 # Unfolding & regularization
 
-This section details how to perform an unfolded cross-section measurement, *including regularization*, within <sub><sup>COMBINE</sup></sub>. 
+This section details how to perform an unfolded cross-section measurement, *including regularization*, within <span style="font-variant:small-caps;">Combine</span>. 
 
 There are many resources available that describe unfolding, including when to use it (or not), and what the common issues surrounding it are. For CMS users, useful summary is available in the [CMS Statistics Committee pages](https://twiki.cern.ch/twiki/bin/view/CMS/ScrecUnfolding) on unfolding. You can also 
-find an overview of unfolding and its usage in <sub><sup>COMBINE</sup></sub> in [these slides](https://indico.cern.ch/event/399923/contributions/956409/attachments/800899/1097609/2015_06_24_LHCXSWG.pdf#search=Marini%20AND%20StartDate%3E%3D2015-06-24%20AND%20EndDate%3C%3D2015-06-24).
+find an overview of unfolding and its usage in <span style="font-variant:small-caps;">Combine</span> in [these slides](https://indico.cern.ch/event/399923/contributions/956409/attachments/800899/1097609/2015_06_24_LHCXSWG.pdf#search=Marini%20AND%20StartDate%3E%3D2015-06-24%20AND%20EndDate%3C%3D2015-06-24).
 
 The basic idea behind the unfolding technique is to describe smearing introduced through the reconstruction (e.g. of the particle energy) in a given truth level bin $x_{i}$ through a linear relationship with the effects in the nearby truth-bins. We can make statements about the probability $p_{j}$ that the event falling in the truth bin $x_{i}$ is reconstructed in the bin $y_{i}$ via the linear relationship,
 
@@ -22,7 +22,7 @@ Unfolding aims to find the distribution at truth level $x$, given the observatio
 
 ## Likelihood-based unfolding
 
-Since <sub><sup>COMBINE</sup></sub> has access to the full likelihood for any analysis written in the usual datacard format, we will use likelihood-based unfolding 
+Since <span style="font-variant:small-caps;">Combine</span> has access to the full likelihood for any analysis written in the usual datacard format, we will use likelihood-based unfolding 
 throughout - for other approaches, there are many other tools available (eg `RooUnfold` or `TUnfold`), which can be used instead. 
 
 The benefits of the likelihood-based approach are that, 
@@ -34,7 +34,7 @@ The benefits of the likelihood-based approach are that,
 In practice, one must construct the *response matrix* and unroll it in the reconstructed bins:
 
 * First, one derives the truth distribution, e.g. after the generator-level selection only, $x_{i}$.
-* Each reconstructed bin (e.g. each datacard) should describe the contribution from each truth bin - this is how <sub><sup>COMBINE</sup></sub> knows about the response matrix $\boldsymbol{R}$ 
+* Each reconstructed bin (e.g. each datacard) should describe the contribution from each truth bin - this is how <span style="font-variant:small-caps;">Combine</span> knows about the response matrix $\boldsymbol{R}$ 
 and folds in the acceptance/efficiency effects as usual.
 * The out-of-acceptance contributions can also be included in the above.
 
@@ -70,7 +70,7 @@ The figure below shows a comparison of likelihood-based unfolding and a least-sq
 
 The main difference with respect to other models with multiple signal contributions is the introduction of **Regularization**, which is used to stabilize the unfolding process. 
 
-An example of unfolding in <sub><sup>COMBINE</sup></sub> with and without regularization, can be found under 
+An example of unfolding in <span style="font-variant:small-caps;">Combine</span> with and without regularization, can be found under 
 [data/tutorials/regularization](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/tree/102x/data/tutorials/regularization). 
 
 Running `python createWs.py [-r]` will create a simple datacard and perform a fit both with and without including regularization.
@@ -133,7 +133,7 @@ Alternative valid syntaxes are
     constr1 constr r_bin0+r_bin1 {r_bin0,r_bin1} delta[0.01]
 ```
 
-The figure below shows an example unfolding using the "SVD regularization" approach with the least squares method (as implemented by `RooUnfold`) and implemented as a penalty term added to the likelihood using the maximum likelihood approach in <sub><sup>COMBINE</sup></sub>.
+The figure below shows an example unfolding using the "SVD regularization" approach with the least squares method (as implemented by `RooUnfold`) and implemented as a penalty term added to the likelihood using the maximum likelihood approach in <span style="font-variant:small-caps;">Combine</span>.
 
 /// details | **Show comparison**
 
@@ -144,7 +144,7 @@ The figure below shows an example unfolding using the "SVD regularization" appro
 ### TUnfold method
 
 The Tikhonov regularization as implemented in `TUnfold` uses the MC information, or rather the density prediction, as a bias vector. 
-In order to give this information to <sub><sup>COMBINE</sup></sub>, a single datacard for each reconstruction-level bin needs to be produced, so that we have access to the proper normalization terms during the minimization. In this case the bias vector is $\vec{x}_{obs}-\vec{x}_{true}$ 
+In order to give this information to <span style="font-variant:small-caps;">Combine</span>, a single datacard for each reconstruction-level bin needs to be produced, so that we have access to the proper normalization terms during the minimization. In this case the bias vector is $\vec{x}_{obs}-\vec{x}_{true}$ 
 
 Then one can write a constraint term in the datacard via, for example,
 
diff --git a/docs/part3/runningthetool.md b/docs/part3/runningthetool.md
index 0d20d7b9d65..bffcc5c520c 100644
--- a/docs/part3/runningthetool.md
+++ b/docs/part3/runningthetool.md
@@ -1,6 +1,6 @@
 # How to run the tool
 
-The executable <sub><sup>COMBINE</sup></sub> provided by the package is used to invoke the tools via the command line. The statistical analysis method, as well as user settings, are also specified on the command line. To see the full list of available options, you can run:
+The executable <span style="font-variant:small-caps;">Combine</span> provided by the package is used to invoke the tools via the command line. The statistical analysis method, as well as user settings, are also specified on the command line. To see the full list of available options, you can run:
 
 ```sh
 combine --help
@@ -72,13 +72,13 @@ There are a number of useful command-line options that can be used to alter the
     - The name of the branch will be **trackedError_*name***.
     - The behaviour, in terms of which values are saved, is the same as `--trackParameters` above.
 
-By default, the data set used by <sub><sup>COMBINE</sup></sub> will be the one listed in the datacard. You can tell <sub><sup>COMBINE</sup></sub> to use a different data set (for example a toy data set that you generated) by using the option `--dataset`. The argument should be `rootfile.root:workspace:location` or `rootfile.root:location`. In order to use this option, you must first convert your datacard to a binary workspace and use this binary workspace as the input to <sub><sup>COMBINE</sup></sub>. 
+By default, the data set used by <span style="font-variant:small-caps;">Combine</span> will be the one listed in the datacard. You can tell <span style="font-variant:small-caps;">Combine</span> to use a different data set (for example a toy data set that you generated) by using the option `--dataset`. The argument should be `rootfile.root:workspace:location` or `rootfile.root:location`. In order to use this option, you must first convert your datacard to a binary workspace and use this binary workspace as the input to <span style="font-variant:small-caps;">Combine</span>. 
 
 ### Generic Minimizer Options
 
-<sub><sup>COMBINE</sup></sub> uses its own minimizer class, which is used to steer Minuit (via RooMinimizer), named the `CascadeMinimizer`. This allows for sequential minimization, which can help in case a particular setting or algorithm fails. The `CascadeMinimizer` also knows about extra features of <sub><sup>COMBINE</sup></sub> such as *discrete* nuisance parameters.
+<span style="font-variant:small-caps;">Combine</span> uses its own minimizer class, which is used to steer Minuit (via RooMinimizer), named the `CascadeMinimizer`. This allows for sequential minimization, which can help in case a particular setting or algorithm fails. The `CascadeMinimizer` also knows about extra features of <span style="font-variant:small-caps;">Combine</span> such as *discrete* nuisance parameters.
 
-All of the fits that are performed in <sub><sup>COMBINE</sup></sub>'s methods use this minimizer. This means that the fits can be tuned using these common options,
+All of the fits that are performed in <span style="font-variant:small-caps;">Combine</span>'s methods use this minimizer. This means that the fits can be tuned using these common options,
 
 * `--cminPoiOnlyFit`: First, perform a fit floating *only* the parameters of interest. This can be useful to find, roughly, where the global minimum is.
 * `--cminPreScan`: Do a scan before the first minimization.
@@ -107,7 +107,7 @@ More of these options can be found in the **Cascade Minimizer options** section
 
 ### Output from combine
 
-Most methods will print the results of the computation to the screen. However, in addition, <sub><sup>COMBINE</sup></sub> will also produce a root file containing a tree called **limit** with these results. The name of this file will be of the format,
+Most methods will print the results of the computation to the screen. However, in addition, <span style="font-variant:small-caps;">Combine</span> will also produce a root file containing a tree called **limit** with these results. The name of this file will be of the format,
 
     higgsCombineTest.MethodName.mH$MASS.[word$WORD].root
 
@@ -141,9 +141,9 @@ In some cases, the precise meanings of the branches will depend on the method be
 
 By default, each of the methods described so far will be run using the **observed data** as the input. In several cases (as detailed below), it is useful to run the tool using toy datasets, including Asimov data sets.
 
-The option `-t` is used to tell <sub><sup>COMBINE</sup></sub> to first generate one or more toy data sets, which will be used instead of the observed data. There are two versions,
+The option `-t` is used to tell <span style="font-variant:small-caps;">Combine</span> to first generate one or more toy data sets, which will be used instead of the observed data. There are two versions,
 
-   * `-t N` with N > 0. <sub><sup>COMBINE</sup></sub> will generate N toy datasets from the model and re-run the method once per toy. The seed for the toy generation can be modified with the option `-s` (use `-s -1` for a random seed). The output file will contain one entry in the tree for each of these toys.
+   * `-t N` with N > 0. <span style="font-variant:small-caps;">Combine</span> will generate N toy datasets from the model and re-run the method once per toy. The seed for the toy generation can be modified with the option `-s` (use `-s -1` for a random seed). The output file will contain one entry in the tree for each of these toys.
 
    * `-t -1` will produce an Asimov data set, in which statistical fluctuations are suppressed. The procedure for generating this Asimov data set depends on the type of analysis you are using. More details are given below. 
 
@@ -158,15 +158,15 @@ The output file will contain the toys (as `RooDataSets` for the observables, inc
 
 ### Asimov datasets
 
-If you are using either `-t -1` or  `AsymptoticLimits`, <sub><sup>COMBINE</sup></sub> will calculate results based on an Asimov data set.
+If you are using either `-t -1` or  `AsymptoticLimits`, <span style="font-variant:small-caps;">Combine</span> will calculate results based on an Asimov data set.
 
    * For counting experiments, the Asimov data set will just be the total number of expected events (given the values of the nuisance parameters and POIs of the model)
 
    * For shape analyses with templates, the Asimov data set will be constructed as a histogram using the same binning that is defined for your analysis.
 
-   * If your model uses parametric shapes, there are some options as to what Asimov data set to produce. By *default*, <sub><sup>COMBINE</sup></sub> will produce the Asimov data set as a histogram using the binning that is associated with each observable (ie as set using `RooRealVar::setBins`). If this binning does not exist, <sub><sup>COMBINE</sup></sub> will **guess** a suitable binning - it is therefore best to use `RooRealVar::setBins` to associate a binning with each observable, even if your data is unbinned, if you intend to use Asimov data sets.
+   * If your model uses parametric shapes, there are some options as to what Asimov data set to produce. By *default*, <span style="font-variant:small-caps;">Combine</span> will produce the Asimov data set as a histogram using the binning that is associated with each observable (ie as set using `RooRealVar::setBins`). If this binning does not exist, <span style="font-variant:small-caps;">Combine</span> will **guess** a suitable binning - it is therefore best to use `RooRealVar::setBins` to associate a binning with each observable, even if your data is unbinned, if you intend to use Asimov data sets.
 
-You can also ask <sub><sup>COMBINE</sup></sub> to use a **Pseudo-Asimov** dataset, which is created from many weighted unbinned events.
+You can also ask <span style="font-variant:small-caps;">Combine</span> to use a **Pseudo-Asimov** dataset, which is created from many weighted unbinned events.
 
 Setting `--X-rtd TMCSO_AdaptivePseudoAsimov=`$\beta$ with $\beta>0$ will trigger the internal logic of whether to produce a Pseudo-Asimov dataset. This logic is as follows;
 
@@ -201,7 +201,7 @@ If you are using `toysFrequentist`, be aware that the values set by `--setParame
 
 ### Generate only
 
-It is also possible to generate the toys first, and then feed them to the methods in <sub><sup>COMBINE</sup></sub>. This can be done using `-M GenerateOnly --saveToys`. The toys can then be read and used with the other methods by specifying `--toysFile=higgsCombineTest.GenerateOnly...` and using the same options for the toy generation.
+It is also possible to generate the toys first, and then feed them to the methods in <span style="font-variant:small-caps;">Combine</span>. This can be done using `-M GenerateOnly --saveToys`. The toys can then be read and used with the other methods by specifying `--toysFile=higgsCombineTest.GenerateOnly...` and using the same options for the toy generation.
 
 !!! warning
     Some methods also use toys within the method itself (eg `AsymptoticLimits` and `HybridNew`). For these, you should **not** specify the toy generation with `-t` or the options above. Instead, you should follow the method-specific instructions.
@@ -225,7 +225,7 @@ Here are a few examples of calculations with toys from post-fit workspaces using
 
 ## combineTool for job submission
 
-For longer tasks that cannot be run locally, several methods in <sub><sup>COMBINE</sup></sub> can be split to run on a *batch* system or on the *Grid*. The splitting and submission is handled using the `combineTool` (see [this getting started](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool) section to check out the tool)
+For longer tasks that cannot be run locally, several methods in <span style="font-variant:small-caps;">Combine</span> can be split to run on a *batch* system or on the *Grid*. The splitting and submission is handled using the `combineTool` (see [this getting started](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#combine-tool) section to check out the tool)
 
 
 ### Submission to Condor
@@ -236,7 +236,7 @@ The syntax for running on condor with the tool is
 combineTool.py -M ALGO [options] --job-mode condor --sub-opts='CLASSADS' --task-name NAME [--dry-run]
 ```
 
-with `options` being the usual list of <sub><sup>COMBINE</sup></sub> options. The help option `-h` will give a list of both <sub><sup>COMBINE</sup></sub> and `combineTool` options. It is possible to use this tool with several different methods from <sub><sup>COMBINE</sup></sub>.
+with `options` being the usual list of <span style="font-variant:small-caps;">Combine</span> options. The help option `-h` will give a list of both <span style="font-variant:small-caps;">Combine</span> and `combineTool` options. It is possible to use this tool with several different methods from <span style="font-variant:small-caps;">Combine</span>.
 
 The `--sub-opts` option takes a string with the different ClassAds that you want to set, separated by `\n` as argument (e.g. `'+JobFlavour="espresso"\nRequestCpus=1'`).
 
@@ -259,7 +259,7 @@ The option `--split-points` issues the command to split the jobs for `MultiDimFi
 combineTool.py datacard.txt -M MultiDimFit --algo grid --points 50 --rMin 0 --rMax 1 --job-mode condor --split-points 10 --sub-opts='+JobFlavour="workday"' --task-name mytask -n mytask
 ```
 
-Remember, any usual options (such as redefining POIs or freezing parameters) are passed to <sub><sup>COMBINE</sup></sub> and can be added to the command line for `combineTool`.
+Remember, any usual options (such as redefining POIs or freezing parameters) are passed to <span style="font-variant:small-caps;">Combine</span> and can be added to the command line for `combineTool`.
 
 !!! info
     The option `-n NAME` should be included to avoid overwriting output files, as the jobs will be run inside the directory from which the command is issued.
@@ -269,7 +269,7 @@ Remember, any usual options (such as redefining POIs or freezing parameters) are
 
 For more CPU-intensive tasks, for example determining limits for complex models using toys, it is generally not feasible to compute all the results interactively. Instead, these jobs can be submitted to the Grid.
 
-In this example we will use the `HybridNew` method of <sub><sup>COMBINE</sup></sub> to determine an upper limit for a sub-channel of the Run 1 SM $H\rightarrow\tau\tau$ analysis. For full documentation, see the section on [computing limits with toys](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/commonstatsmethods/#computing-limits-with-toys).
+In this example we will use the `HybridNew` method of <span style="font-variant:small-caps;">Combine</span> to determine an upper limit for a sub-channel of the Run 1 SM $H\rightarrow\tau\tau$ analysis. For full documentation, see the section on [computing limits with toys](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/commonstatsmethods/#computing-limits-with-toys).
 
 With this model it would take too long to find the limit in one go, so instead we create a set of jobs in which each one throws toys and builds up the test statistic distributions for a fixed value of the signal strength. These jobs can then be submitted to a batch system or to the Grid using `crab3`. From the set of output distributions it is possible to extract the expected and observed limits.
 
@@ -282,7 +282,7 @@ $ text2workspace.py data/tutorials/htt/125/htt_mt.txt -m 125
 $ mv data/tutorials/htt/125/htt_mt.root ./
 ```
 
-To get an idea of the range of signal strength values we will need to build test-statistic distributions for, we will first use the `AsymptoticLimits` method of <sub><sup>COMBINE</sup></sub>,
+To get an idea of the range of signal strength values we will need to build test-statistic distributions for, we will first use the `AsymptoticLimits` method of <span style="font-variant:small-caps;">Combine</span>,
 
 ```nohighlight
 $ combine -M Asymptotic htt_mt.root -m 125
@@ -299,9 +299,9 @@ Expected 97.5%: r < 1.7200
 
 Based on this, a range of 0.2 to 2.0 should be suitable.
 
-We can use the same command for generating the distribution of test statistics with `combineTool`. The `--singlePoint` option is now enhanced to support expressions that generate a set of calls to <sub><sup>COMBINE</sup></sub> with different values. The accepted syntax is of the form **MIN:MAX:STEPSIZE**, and multiple comma-separated expressions can be specified.
+We can use the same command for generating the distribution of test statistics with `combineTool`. The `--singlePoint` option is now enhanced to support expressions that generate a set of calls to <span style="font-variant:small-caps;">Combine</span> with different values. The accepted syntax is of the form **MIN:MAX:STEPSIZE**, and multiple comma-separated expressions can be specified.
 
-The script also adds an option `--dry-run`, which will not actually call com<sub><sup>COMBINE</sup></sub>bine but just prints out the commands that would be run, e.g,
+The script also adds an option `--dry-run`, which will not actually call com<span style="font-variant:small-caps;">Combine</span>bine but just prints out the commands that would be run, e.g,
 
 ```sh
 combineTool.py -M HybridNew -d htt_mt.root --LHCmode LHC-limits --singlePoint 0.2:2.0:0.2 -T 2000 -s -1 --saveToys --saveHybridResult -m 125 --dry-run
@@ -347,5 +347,5 @@ $ mv higgsCombine*.root ../../
 $ cd ../../
 ```
 
-These output files should be combined with `hadd`, after which we invoke <sub><sup>COMBINE</sup></sub> as usual to calculate observed and expected limits from the merged grid, as usual.
+These output files should be combined with `hadd`, after which we invoke <span style="font-variant:small-caps;">Combine</span> as usual to calculate observed and expected limits from the merged grid, as usual.
 
diff --git a/docs/part3/simplifiedlikelihood.md b/docs/part3/simplifiedlikelihood.md
index 8cd3b3694af..cde56c9fee1 100644
--- a/docs/part3/simplifiedlikelihood.md
+++ b/docs/part3/simplifiedlikelihood.md
@@ -4,9 +4,9 @@ This page is to give a brief outline for the creation of (potentially aggregated
 
 ## Requirements 
 
-You need an up to date version of <sub><sup>COMBINE</sup></sub>. Note You should use the latest release of <sub><sup>COMBINE</sup></sub> for the exact commands on this page. You should be using <sub><sup>COMBINE</sup></sub> tag `v9.0.0` or higher or the latest version of the `112x` branch to follow these instructions.  
+You need an up to date version of <span style="font-variant:small-caps;">Combine</span>. Note You should use the latest release of <span style="font-variant:small-caps;">Combine</span> for the exact commands on this page. You should be using <span style="font-variant:small-caps;">Combine</span> tag `v9.0.0` or higher or the latest version of the `112x` branch to follow these instructions.  
 
-You will find the python scripts needed to convert <sub><sup>COMBINE</sup></sub> outputs into simplified likelihood inputs under `test/simplifiedLikelihood`
+You will find the python scripts needed to convert <span style="font-variant:small-caps;">Combine</span> outputs into simplified likelihood inputs under `test/simplifiedLikelihood`
 
 If you're using the `102x` branch (not recommended), then you can obtain these scripts from here by running: 
 ```
@@ -72,7 +72,7 @@ You should check that the order of the bins in the covariance matrix is as expec
 
 ## Produce simplified likelihood inputs
 
-Head over to the `test/simplifiedLikelihoods` directory inside your <sub><sup>COMBINE</sup></sub> area. The following instructions depend on whether you are aggregating or not aggregating your signal regions. Choose the instructions for your case. 
+Head over to the `test/simplifiedLikelihoods` directory inside your <span style="font-variant:small-caps;">Combine</span> area. The following instructions depend on whether you are aggregating or not aggregating your signal regions. Choose the instructions for your case. 
 
 ### Not Aggregating    
 Run the `makeLHInputs.py` script to prepare the inputs for the simplified likelihood. The filter flag can be used to select only signal regions based on the channel names. To include all channels do not include the filter flag.
@@ -107,7 +107,7 @@ At this point you have the inputs as ROOT files necessary to publish and run the
 
 ## Validating the simplified likelihood approach
 
-The simplified likelihood relies on several assumptions (detailed in the documentation at the top). To test the validity for your analysis, statistical results between <sub><sup>COMBINE</sup></sub> and the simplified likelihood can be compared. 
+The simplified likelihood relies on several assumptions (detailed in the documentation at the top). To test the validity for your analysis, statistical results between <span style="font-variant:small-caps;">Combine</span> and the simplified likelihood can be compared. 
 
 We will use the package [SLtools](https://gitlab.cern.ch/SimplifiedLikelihood/SLtools/-/blob/master/README.md) from the [Simplified Likelihood Paper](https://link.springer.com/article/10.1007/JHEP04(2019)064) for this. The first step is to convert the ROOT files into python configs to run in the tool. 
 
@@ -191,7 +191,7 @@ combine data/tutorials/longexercise/datacard_part3.root -M FitDiagnostics --save
 
 combine data/tutorials/longexercise/datacard_part3.root -M FitDiagnostics --saveShapes --saveWithUnc --numToysForShape 1 --saveOverall --preFitValue 1   -n SimpleTH1_Signal1 -m 200
 ```
-We will also want to compare our scan to that from the full likelihood, which we can get as usual from <sub><sup>COMBINE</sup></sub>. 
+We will also want to compare our scan to that from the full likelihood, which we can get as usual from <span style="font-variant:small-caps;">Combine</span>. 
 
 ```
 combine -M MultiDimFit data/tutorials/longexercise/datacard_part3.root --rMin -0.5 --rMax 2 --algo grid -n SimpleTH1 -m 200
@@ -227,7 +227,7 @@ We can convert this to a python module that we can use to run a scan with the `S
 python test/simplifiedLikelihoods/convertSLRootToPython.py -O mymodel.py -s SLinput_Signal1.root:shapes_prefit/total_signal  -b SLinput.root:shapes_prefit/total_M1-d SLinput.root:shapes_prefit/total_data -c SLinput.root:shapes_prefit/total_M2
 ```
 
-We can compare the profiled likelihood scans from our simplified likelihood (using the python file we just created) and from the full likelihood (that we created with <sub><sup>COMBINE</sup></sub>.). For the former, we need to first checkout the `SLtools` package 
+We can compare the profiled likelihood scans from our simplified likelihood (using the python file we just created) and from the full likelihood (that we created with <span style="font-variant:small-caps;">Combine</span>.). For the former, we need to first checkout the `SLtools` package 
 
 ```
 git clone https://gitlab.cern.ch/SimplifiedLikelihood/SLtools.git
@@ -277,4 +277,4 @@ This will produce a figure like the one below.
 
 ![](SLexample.jpg)
 
-It is also possible to include the third moment of each bin to improve the precision of the simplified likelihood [ [JHEP 64 2019](https://link.springer.com/article/10.1007/JHEP04(2019)064) ]. The necessary information is stored in the outputs from <sub><sup>COMBINE</sup></sub>, therefore you just need to include the option `-t SLinput.root:shapes_prefit/total_M3` in the options list for `convertSLRootToPython.py` to  include this in the model file. The third moment information can be included in `SLtools` by using ` sl.SLParams(background, covariance, third_moment, obs=data, sig=signal)`
+It is also possible to include the third moment of each bin to improve the precision of the simplified likelihood [ [JHEP 64 2019](https://link.springer.com/article/10.1007/JHEP04(2019)064) ]. The necessary information is stored in the outputs from <span style="font-variant:small-caps;">Combine</span>, therefore you just need to include the option `-t SLinput.root:shapes_prefit/total_M3` in the options list for `convertSLRootToPython.py` to  include this in the model file. The third moment information can be included in `SLtools` by using ` sl.SLParams(background, covariance, third_moment, obs=data, sig=signal)`
diff --git a/docs/part3/validation.md b/docs/part3/validation.md
index 8e034605b4f..06499c3fadb 100644
--- a/docs/part3/validation.md
+++ b/docs/part3/validation.md
@@ -69,7 +69,7 @@ To print information to screen, the script parses the json file that contains th
 
 The options `--checkUncertOver` and `--reportSigUnder` will be described in more detail in the section that discusses the checks for which they are relevant.
 
-Note: the `--mass` argument should only be set if you normally use it when running <sub><sup>COMBINE</sup></sub>, otherwise you can leave it at the default.
+Note: the `--mass` argument should only be set if you normally use it when running <span style="font-variant:small-caps;">Combine</span>, otherwise you can leave it at the default.
 
 The datacard validation tool is primarily intended for shape (histogram) based analyses. However, when running on a parametric model or counting experiment the checks for small signal processes, empty processes, and uncertainties with large normalization effects can still be performed. 
 
diff --git a/docs/part4/usefullinks.md b/docs/part4/usefullinks.md
index 378d4fc069c..43d44bcc89d 100644
--- a/docs/part4/usefullinks.md
+++ b/docs/part4/usefullinks.md
@@ -2,7 +2,7 @@
 
 ### Tutorials and reading material
 
-There are several tutorials that have been run over the last few years with instructions and examples for running the <sub><sup>COMBINE</sup></sub> tool.
+There are several tutorials that have been run over the last few years with instructions and examples for running the <span style="font-variant:small-caps;">Combine</span> tool.
 
 Tutorial Sessions:
 
@@ -15,7 +15,7 @@ Tutorial Sessions:
    * [7th tutorial 3rd Feb 2023](https://indico.cern.ch/event/1227742/) - Uses `113x` branch
 
 
-Worked examples from Higgs analyses using <sub><sup>COMBINE</sup></sub>:
+Worked examples from Higgs analyses using <span style="font-variant:small-caps;">Combine</span>:
 
    * [The CMS DAS at CERN 2014](https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideCMSDataAnalysisSchool2014HiggsCombPropertiesExercise)
    * [The CMS DAS at DESY 2018](https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCMSDataAnalysisSchoolHamburg2018LongStatisticsExercise)
@@ -25,11 +25,11 @@ Higgs combinations procedures
 
    * [Conventions to be used when preparing inputs for Higgs combinations](https://twiki.cern.ch/twiki/bin/view/CMS/HiggsWG/HiggsCombinationConventions)
 
-   * [CMS AN-2011/298](http://cms.cern.ch/iCMS/jsp/db_notes/noteInfo.jsp?cmsnoteid=CMS AN-2011/298) Procedure for the LHC Higgs boson search combination in summer 2011. This describes in more detail some of the methods used in <sub><sup>COMBINE</sup></sub>.
+   * [CMS AN-2011/298](http://cms.cern.ch/iCMS/jsp/db_notes/noteInfo.jsp?cmsnoteid=CMS AN-2011/298) Procedure for the LHC Higgs boson search combination in summer 2011. This describes in more detail some of the methods used in <span style="font-variant:small-caps;">Combine</span>.
 
 ### Citations
 
-There is no document currently which can be cited for using the <sub><sup>COMBINE</sup></sub> tool, however, you can use the following publications for the procedures we use,
+There is no document currently which can be cited for using the <span style="font-variant:small-caps;">Combine</span> tool, however, you can use the following publications for the procedures we use,
 
 * [Summer 2011 public ATLAS-CMS note](https://cds.cern.ch/record/1379837) for any Frequentist limit setting procedures with toys or Bayesian limits, constructing likelihoods, descriptions of nuisance parameter options (like log-normals (`lnN`) or gamma (`gmN`), and for definitions of test-statistics.
 
@@ -61,7 +61,7 @@ There is no document currently which can be cited for using the <sub><sup>COMBIN
 
 # FAQ
 
-* _Why does <sub><sup>COMBINE</sup></sub> have trouble with bins that have zero expected contents?_
+* _Why does <span style="font-variant:small-caps;">Combine</span> have trouble with bins that have zero expected contents?_
     * If you are computing only upper limits, and your zero-prediction bins are all empty in data, then you can just set the background to a very small value instead of zero as the computation is regular for background going to zero (e.g. a counting experiment with $B\leq1$ will have essentially the same expected limit and observed limit as one with $B=0$). If you are computing anything else, e.g. p-values, or if your zero-prediction bins are not empty in data, you're out of luck, and you should find a way to get a reasonable background prediction there (and set an uncertainty on it, as per the point above)
 * _How can an uncertainty be added to a zero quantity?_
     * You can put an uncertainty even on a zero event yield if you use a gamma distribution. That is in fact the more proper way of doing it if the prediction of zero comes from the limited size of your MC or data sample used to compute it.
@@ -69,15 +69,15 @@ There is no document currently which can be cited for using the <sub><sup>COMBIN
     * The expected limit (if using either the default behaviour of `-M AsymptoticLimits` or using the `LHC-limits` style limit setting with toys) uses the _**post-fit**_ expectation of the background model to generate toys. This means that first the model is fit to the _**observed data**_ before toy generation. See the sections on [blind limits](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/commonstatsmethods/#blind-limits) and [toy generation](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#toy-data-generation) to avoid this behavior. 
 * _How can I deal with an interference term which involves a negative contribution?_
     * You will need to set up a specific PhysicsModel to deal with this, however you can [see this section](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part2/physicsmodels/#interference) to implement such a model that can incorperate a negative contribution to the physics process
-* _How does <sub><sup>COMBINE</sup></sub> work?_
+* _How does <span style="font-variant:small-caps;">Combine</span> work?_
     * That is not a question that can be answered without someone's head exploding; please try to formulate something specific.
 * _What does fit status XYZ mean?_ 
-    * <sub><sup>COMBINE</sup></sub> reports the fit status in some routines (for example in the `FitDiagnostics` method). These are typically the status of the last call from Minuit. For details on the meanings of these status codes see the [Minuit2Minimizer](https://root.cern.ch/root/html/ROOT__Minuit2__Minuit2Minimizer.html) documentation page.
+    * <span style="font-variant:small-caps;">Combine</span> reports the fit status in some routines (for example in the `FitDiagnostics` method). These are typically the status of the last call from Minuit. For details on the meanings of these status codes see the [Minuit2Minimizer](https://root.cern.ch/root/html/ROOT__Minuit2__Minuit2Minimizer.html) documentation page.
 * _Why does my fit not converge?_ 
     * There are several reasons why some fits may not converge. Often some indication can be obtained from the `RooFitResult` or status that you will see information from when using the `--verbose X` (with $X>2$) option. Sometimes however, it can be that the likelihood for your data is very unusual. You can get a rough idea about what the likelihood looks like as a function of your parameters (POIs and nuisances) using `combineTool.py -M FastScan -w myworkspace.root` (use --help for options). 
-    * We have often seen that fits in <sub><sup>COMBINE</sup></sub> using `RooCBShape` as a parametric function will fail. This is related to an optimization that fails. You can try to fix the problem as described in this issue: [issues#347](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/347) (i.e add the option `--X-rtd ADDNLL_CBNLL=0`).
+    * We have often seen that fits in <span style="font-variant:small-caps;">Combine</span> using `RooCBShape` as a parametric function will fail. This is related to an optimization that fails. You can try to fix the problem as described in this issue: [issues#347](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/347) (i.e add the option `--X-rtd ADDNLL_CBNLL=0`).
 * _Why does the fit/fits take so long?_ 
-    * The minimization routines are common to many methods in <sub><sup>COMBINE</sup></sub>. You can tune the fits using the generic optimization command line options described [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#generic-minimizer-options). For example, setting the default minimizer strategy to 0 can greatly improve the speed, since this avoids running HESSE. In calculations such as `AsymptoticLimits`, HESSE is not needed and hence this can be done, however, for `FitDiagnostics` the uncertainties and correlations are part of the output, so using strategy 0 may not be particularly accurate. 
+    * The minimization routines are common to many methods in <span style="font-variant:small-caps;">Combine</span>. You can tune the fits using the generic optimization command line options described [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#generic-minimizer-options). For example, setting the default minimizer strategy to 0 can greatly improve the speed, since this avoids running HESSE. In calculations such as `AsymptoticLimits`, HESSE is not needed and hence this can be done, however, for `FitDiagnostics` the uncertainties and correlations are part of the output, so using strategy 0 may not be particularly accurate. 
 * _Why are the results for my counting experiment so slow or unstable?_ 
     * There is a known issue with counting experiments with ***large*** numbers of events that will cause unstable fits or even the fit to fail. You can avoid this by creating a "fake" shape datacard (see [this section](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part2/settinguptheanalysis/#combination-of-multiple-datacards) from the setting up the datacards page). The simplest way to do this is to run `combineCards.py -S mycountingcard.txt > myshapecard.txt`. You may still find that your parameter uncertainties are not correct when you have large numbers of events. This can be often fixed using the `--robustHesse` option. An example of this issue is detailed [here](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/498). 
 * _Why do some of my nuisance parameters have uncertainties &gt; 1?_
@@ -85,7 +85,7 @@ There is no document currently which can be cited for using the <sub><sup>COMBIN
 * _How can I avoid using the data?_ 
     * For almost all methods, you can use toy data (or an Asimov dataset) in place of the real data for your results to be blind. You should be careful however as in some methods, such as `-M AsymptoticLimits` or `-M HybridNew --LHCmode LHC-limits` or any other method using the option `--toysFrequentist`, the data will be used to determine the most likely nuisance parameter values (to determine the so-called a-posteriori expectation). See the section on [toy data generation](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#toy-data-generation) for details on this. 
 * _What if my nuisance parameters have correlations which are not 0 or 1?_
-    * <sub><sup>COMBINE</sup></sub> is designed under the assumption that each *source* of nuisance parameter is uncorrelated with the other sources. If you have a case where some pair (or set) of nuisances have some known correlation structure, you can compute the eigenvectors of their correlation matrix and provide these *diagonalised* nuisances to <sub><sup>COMBINE</sup></sub>. You can also model *partial correlations*, between different channels or data taking periods, of a given nuisance parameter using the `combineTool` as described in [this page](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/503). 
+    * <span style="font-variant:small-caps;">Combine</span> is designed under the assumption that each *source* of nuisance parameter is uncorrelated with the other sources. If you have a case where some pair (or set) of nuisances have some known correlation structure, you can compute the eigenvectors of their correlation matrix and provide these *diagonalised* nuisances to <span style="font-variant:small-caps;">Combine</span>. You can also model *partial correlations*, between different channels or data taking periods, of a given nuisance parameter using the `combineTool` as described in [this page](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/503). 
 * _My nuisances are (artificially) constrained and/or the impact plot show some strange behaviour, especially after including MC statistical uncertainties. What can I do?_
     * Depending on the details of the analysis, several solutions can be adopted to mitigate these effects. We advise to run the validation tools at first, to identify possible redundant shape uncertainties that can be safely eliminated or replaced with lnN ones. Any remaining artificial constraints should be studies. Possible mitigating strategies can be to (a) smooth the templates or (b) adopt some rebinning in order to reduce statistical fluctuations in the templates. A description of possible strategies and effects can be found in [this talk by Margaret Eminizer](https://indico.cern.ch/event/788727/contributions/3401374/attachments/1831680/2999825/higgs_combine_4_17_2019_fitting_details.pdf)
 * _What do CLs, CLs+b and CLb in the code mean?_
diff --git a/docs/part5/longexercise.md b/docs/part5/longexercise.md
index 4bd482286b7..c291669e8d3 100644
--- a/docs/part5/longexercise.md
+++ b/docs/part5/longexercise.md
@@ -1,5 +1,5 @@
 # Long exercise: main features of Combine
-This exercise is designed to give a broad overview of the tools available for statistical analysis in CMS using the combine tool. <sub><sup>COMBINE</sup></sub> is a high-level tool for building `RooFit`/`RooStats` models and running common statistical methods. We will cover the typical aspects of setting up an analysis and producing the results, as well as look at ways in which we can diagnose issues and get a deeper understanding of the statistical model. This is a long exercise - expect to spend some time on it especially if you are new to <sub><sup>COMBINE</sup></sub>. If you get stuck while working through this exercise or have questions specifically about the exercise, you can ask them on [this mattermost channel](https://mattermost.web.cern.ch/cms-exp/channels/hcomb-tutorial). Finally, we also provide some solutions to some of the questions that are asked as part of the exercise. These are available [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part5/longexerciseanswers).
+This exercise is designed to give a broad overview of the tools available for statistical analysis in CMS using the combine tool. <span style="font-variant:small-caps;">Combine</span> is a high-level tool for building `RooFit`/`RooStats` models and running common statistical methods. We will cover the typical aspects of setting up an analysis and producing the results, as well as look at ways in which we can diagnose issues and get a deeper understanding of the statistical model. This is a long exercise - expect to spend some time on it especially if you are new to <span style="font-variant:small-caps;">Combine</span>. If you get stuck while working through this exercise or have questions specifically about the exercise, you can ask them on [this mattermost channel](https://mattermost.web.cern.ch/cms-exp/channels/hcomb-tutorial). Finally, we also provide some solutions to some of the questions that are asked as part of the exercise. These are available [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part5/longexerciseanswers).
 
 For the majority of this course we will work with a simplified version of a real analysis, that nonetheless will have many features of the full analysis. The analysis is a search for an additional heavy neutral Higgs boson decaying to tau lepton pairs. Such a signature is predicted in many extensions of the standard model, in particular the minimal supersymmetric standard model (MSSM). You can read about the analysis in the paper [here](https://arxiv.org/pdf/1803.06553.pdf). The statistical inference makes use of a variable called the total transverse mass ($M_{\mathrm{T}}^{\mathrm{tot}}$) that provides good discrimination between the resonant high-mass signal and the main backgrounds, which have a falling distribution in this high-mass region. The events selected in the analysis are split into a several categories which target the main di-tau final states as well as the two main production modes: gluon-fusion (ggH) and b-jet associated production (bbH). One example is given below for the fully-hadronic final state in the b-tag category which targets the bbH signal:
 
@@ -12,7 +12,7 @@ You can find a presentation with some more background on likelihoods and extract
 If you are not yet familiar with these concepts, or would like to refresh your memory, we recommend that you have a look at these presentations before you start with the exercise. 
 
 ## Getting started
-We need to set up a new CMSSW area and checkout the <sub><sup>COMBINE</sup></sub> package: 
+We need to set up a new CMSSW area and checkout the <span style="font-variant:small-caps;">Combine</span> package: 
 
 ```shell
 cmsrel CMSSW_11_3_4
@@ -26,7 +26,7 @@ git fetch origin
 git checkout v9.0.0
 ```
 
-We will also make use another package, `CombineHarvester`, which contains some high-level tools for working with <sub><sup>COMBINE</sup></sub>. The following command will download the repository and checkout just the parts of it we need for this tutorial:
+We will also make use another package, `CombineHarvester`, which contains some high-level tools for working with <span style="font-variant:small-caps;">Combine</span>. The following command will download the repository and checkout just the parts of it we need for this tutorial:
 ```shell
 bash <(curl -s https://raw.githubusercontent.com/cms-analysis/CombineHarvester/main/CombineTools/scripts/sparse-checkout-https.sh)
 ```
@@ -82,10 +82,10 @@ The layout of the datacard is as follows:
   -   The first line starting with `bin` gives a unique label to each channel, and the following line starting with `observation` gives the number of events observed in data.
   -   In the remaining part of the card there are several columns: each one represents one process in one channel. The first four lines labelled `bin`, `process`, `process` and `rate` give the channel label, the process label, a process identifier (`<=0` for signal, `>0` for background) and the number of expected events respectively.
   -   The remaining lines describe sources of systematic uncertainty. Each line gives the name of the uncertainty, (which will become the name of the nuisance parameter inside our RooFit model), the type of uncertainty ("lnN" = log-normal normalisation uncertainty) and the effect on each process in each channel. E.g. a 20% uncertainty on the yield is written as 1.20.
-  -   It is also possible to add a hash symbol (`#`) at the start of a line, which <sub><sup>COMBINE</sup></sub> will then ignore when it reads the card.
+  -   It is also possible to add a hash symbol (`#`) at the start of a line, which <span style="font-variant:small-caps;">Combine</span> will then ignore when it reads the card.
 
 
-We can now run <sub><sup>COMBINE</sup></sub> directly using this datacard as input. The general format for running <sub><sup>COMBINE</sup></sub> is:
+We can now run <span style="font-variant:small-caps;">Combine</span> directly using this datacard as input. The general format for running <span style="font-variant:small-caps;">Combine</span> is:
 
 ```shell
 combine -M [method] [datacard] [additional options...]
@@ -93,13 +93,13 @@ combine -M [method] [datacard] [additional options...]
 
 ### A: Computing limits using the asymptotic approximation 
 
-As we are searching for a signal process that does not exist in the standard model, it's natural to set an upper limit on the cross section times branching fraction of the process (assuming our dataset does not contain a significant discovery of new physics). <sub><sup>COMBINE</sup></sub> has dedicated method for calculating upper limits. The most commonly used one is `AsymptoticLimits`, which implements the [CLs criterion](https://inspirehep.net/literature/599622) and uses the profile likelihood ratio as the test statistic. As the name implies, the test statistic distributions are determined analytically in the [asymptotic approximation](https://arxiv.org/abs/1007.1727), so there is no need for more time-intensive toy throwing and fitting. Try running the following command:
+As we are searching for a signal process that does not exist in the standard model, it's natural to set an upper limit on the cross section times branching fraction of the process (assuming our dataset does not contain a significant discovery of new physics). <span style="font-variant:small-caps;">Combine</span> has dedicated method for calculating upper limits. The most commonly used one is `AsymptoticLimits`, which implements the [CLs criterion](https://inspirehep.net/literature/599622) and uses the profile likelihood ratio as the test statistic. As the name implies, the test statistic distributions are determined analytically in the [asymptotic approximation](https://arxiv.org/abs/1007.1727), so there is no need for more time-intensive toy throwing and fitting. Try running the following command:
 
 ```shell
 combine -M AsymptoticLimits datacard_part1.txt -n .part1A
 ```
 
-You should see the results of the observed and expected limit calculations printed to the screen. Here we have added an extra option, `-n .part1A`, which is short for `--name`, and is used to label the output file <sub><sup>COMBINE</sup></sub> produces, which in this case will be called `higgsCombine.part1A.AsymptoticLimits.mH120.root`. The file name depends on the options we ran with, and is of the form: `higgsCombine[name].[method].mH[mass].root`. The file contains a TTree called `limit` which stores the numerical values returned by the limit computation. Note that in our case we did not set a signal mass when running <sub><sup>COMBINE</sup></sub> (i.e. `-m 800`), so the output file just uses the default value of `120`. This does not affect our result in any way though, just the label that is used on the output file.
+You should see the results of the observed and expected limit calculations printed to the screen. Here we have added an extra option, `-n .part1A`, which is short for `--name`, and is used to label the output file <span style="font-variant:small-caps;">Combine</span> produces, which in this case will be called `higgsCombine.part1A.AsymptoticLimits.mH120.root`. The file name depends on the options we ran with, and is of the form: `higgsCombine[name].[method].mH[mass].root`. The file contains a TTree called `limit` which stores the numerical values returned by the limit computation. Note that in our case we did not set a signal mass when running <span style="font-variant:small-caps;">Combine</span> (i.e. `-m 800`), so the output file just uses the default value of `120`. This does not affect our result in any way though, just the label that is used on the output file.
 
 The limits are given on a parameter called `r`. This is the default **parameter of interest (POI)** that is added to the model automatically. It is a linear scaling of the normalization of all signal processes given in the datacard, i.e. if $s_{i,j}$ is the nominal number of signal events in channel $i$ for signal process $j$, then the normalization of that signal in the model is given as $r\cdot s_{i,j}(\vec{\theta})$, where $\vec{\theta}$ represents the set of nuisance parameters which may also affect the signal normalization. We therefore have some choice in the interpretation of r: for the measurement of a process with a well-defined SM prediction we may enter this as the nominal yield in the datacard, such that $r=1$ corresponds to this SM expectation, whereas for setting limits on BSM processes we may choose the nominal yield to correspond to some cross section, e.g. 1 pb, such that we can interpret the limit as a cross section limit directly. In this example the signal has been normalised to a cross section times branching fraction of 1 fb.
 
@@ -116,19 +116,19 @@ In this case we only computed the values for one signal mass hypothesis, indicat
   -   Now try changing the number of observed events. The observed limit will naturally change, but the expected does too - why might this be?
 
 
-There are other command line options we can supply to <sub><sup>COMBINE</sup></sub> which will change its behaviour when run. You can see the full set of supported options by doing `combine -h`. Many options are specific to a given method, but others are more general and are applicable to all methods. Throughout this tutorial we will highlight some of the most useful options you may need to use, for example:
+There are other command line options we can supply to <span style="font-variant:small-caps;">Combine</span> which will change its behaviour when run. You can see the full set of supported options by doing `combine -h`. Many options are specific to a given method, but others are more general and are applicable to all methods. Throughout this tutorial we will highlight some of the most useful options you may need to use, for example:
 
   - The range on the signal strength modifier: `--rMin=X` and `--rMax=Y`: In `RooFit` parameters can optionally have a range specified. The implication of this is that their values cannot be adjusted beyond the limits of this range. The min and max values can be adjusted though, and we might need to do this for our POI `r` if the order of magnitude of our measurement is different from the default range of `[0, 20]`. This will be discussed again later in the tutorial.
   - Verbosity: `-v X`: By default combine does not usually produce much output on the screen other the main result at the end. However, much more detailed information can be printed by setting the `-v N` with N larger than zero. For example at `-v 3` the logs from the minimizer, Minuit, will also be printed. These are very useful for debugging problems with the fit.
 
 ### Advanced section: B: Computing limits with toys
-Now we will look at computing limits without the asymptotic approximation, so instead using toy datasets to determine the test statistic distributions under the signal+background and background-only hypotheses. This can be necessary if we are searching for signal in bins with a small number of events expected. In <sub><sup>COMBINE</sup></sub> we will use the `HybridNew` method to calculate limits using toys. This mode is capable of calculating limits with several different test statistics and with fine-grained control over how the toy datasets are generated internally. To calculate LHC-style profile likelihood limits (i.e. the same as we did with the asymptotic) we set the option `--LHCmode LHC-limits`. You can read more about the different options in the [Combine documentation](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/commonstatsmethods/#computing-limits-with-toys).
+Now we will look at computing limits without the asymptotic approximation, so instead using toy datasets to determine the test statistic distributions under the signal+background and background-only hypotheses. This can be necessary if we are searching for signal in bins with a small number of events expected. In <span style="font-variant:small-caps;">Combine</span> we will use the `HybridNew` method to calculate limits using toys. This mode is capable of calculating limits with several different test statistics and with fine-grained control over how the toy datasets are generated internally. To calculate LHC-style profile likelihood limits (i.e. the same as we did with the asymptotic) we set the option `--LHCmode LHC-limits`. You can read more about the different options in the [Combine documentation](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/commonstatsmethods/#computing-limits-with-toys).
 
 Run the following command:
 ```shell
 combine -M HybridNew datacard_part1.txt --LHCmode LHC-limits -n .part1B --saveHybridResult --fork 0
 ```
-In contrast to `AsymptoticLimits` this will only determine the observed limit, and will take a few minutes. There will not be much output to the screen while combine is running. You can add the option `-v 1` to get a better idea of what is going on. You should see <sub><sup>COMBINE</sup></sub> stepping around in `r`, trying to find the value for which CLs = 0.05, i.e. the 95% CL limit. The `--saveHybridResult` option will cause the test statistic distributions that are generated at each tested value of `r` to be saved in the output ROOT file.
+In contrast to `AsymptoticLimits` this will only determine the observed limit, and will take a few minutes. There will not be much output to the screen while combine is running. You can add the option `-v 1` to get a better idea of what is going on. You should see <span style="font-variant:small-caps;">Combine</span> stepping around in `r`, trying to find the value for which CLs = 0.05, i.e. the 95% CL limit. The `--saveHybridResult` option will cause the test statistic distributions that are generated at each tested value of `r` to be saved in the output ROOT file.
 
 To get an expected limit add the option `--expectedFromGrid X`, where `X` is the desired quantile, e.g. for the median:
 
@@ -136,7 +136,7 @@ To get an expected limit add the option `--expectedFromGrid X`, where `X` is the
 combine -M HybridNew datacard_part1.txt --LHCmode LHC-limits -n .part1B --saveHybridResult --fork 0 --expectedFromGrid 0.500
 ```
 
-Calculate the median expected limit and the 68% range. The 95% range could also be done, but note it will take much longer to run the 0.025 quantile. While <sub><sup>COMBINE</sup></sub> is running you can move on to the next steps below.
+Calculate the median expected limit and the 68% range. The 95% range could also be done, but note it will take much longer to run the 0.025 quantile. While <span style="font-variant:small-caps;">Combine</span> is running you can move on to the next steps below.
 
 **Tasks and questions:**
 - In contrast to `AsymptoticLimits`, with `HybridNew` each limit comes with an uncertainty. What is the origin of this uncertainty?
@@ -169,7 +169,7 @@ Note that for more complex models the fitting time can increase significantly, m
 Topics covered in this section:
 
   - A: Setting up the datacard
-  - B: Running <sub><sup>COMBINE</sup></sub> for a blind analysis
+  - B: Running <span style="font-variant:small-caps;">Combine</span> for a blind analysis
   - C: Using FitDiagnostics
   - D: MC statistical uncertainties
 
@@ -247,12 +247,12 @@ A more general way of blinding is to use combine's toy and Asimov dataset genera
 **Task:** Calculate a blind limit by generating a background-only Asimov with the `-t -1` option instead of using the `AsymptoticLimits` specific options. You should find the observed limit is the same as the expected. Then see what happens if you inject a signal into the Asimov dataset using the `--expectSignal [X]` option.
 
 ### C: Using FitDiagnostics
-We will now explore one of the most commonly used modes of <sub><sup>COMBINE</sup></sub>: `FitDiagnostics` . As well as allowing us to make a **measurement** of some physical quantity (as opposed to just setting a limit on it), this method is useful to gain additional information about the model and the behaviour of the fit. It performs two fits:
+We will now explore one of the most commonly used modes of <span style="font-variant:small-caps;">Combine</span>: `FitDiagnostics` . As well as allowing us to make a **measurement** of some physical quantity (as opposed to just setting a limit on it), this method is useful to gain additional information about the model and the behaviour of the fit. It performs two fits:
 
   - A "background-only" (b-only) fit: first POI (usually "r") fixed to zero
   - A "signal+background" (s+b) fit: all POIs are floating
  
-With the s+b fit <sub><sup>COMBINE</sup></sub> will report the best-fit value of our signal strength modifier `r`. As well as the usual output file, a file named `fitDiagnosticsTest.root` is produced which contains additional information. In particular it includes two `RooFitResult` objects, one for the b-only and one for the s+b fit, which store the fitted values of all the **nuisance parameters (NPs)** and POIs as well as estimates of their uncertainties. The covariance matrix from both fits is also included, from which we can learn about the correlations between parameters. Run the `FitDiagnostics` method on our workspace:
+With the s+b fit <span style="font-variant:small-caps;">Combine</span> will report the best-fit value of our signal strength modifier `r`. As well as the usual output file, a file named `fitDiagnosticsTest.root` is produced which contains additional information. In particular it includes two `RooFitResult` objects, one for the b-only and one for the s+b fit, which store the fitted values of all the **nuisance parameters (NPs)** and POIs as well as estimates of their uncertainties. The covariance matrix from both fits is also included, from which we can learn about the correlations between parameters. Run the `FitDiagnostics` method on our workspace:
 
 ```shell
 combine -M FitDiagnostics workspace_part2.root -m 800 --rMin -20 --rMax 20
@@ -334,13 +334,13 @@ The numbers in each column are respectively $\frac{\theta-\theta_I}{\sigma_I}$ (
 
 So far there is an important source of uncertainty we have neglected. Our estimates of the backgrounds come either from MC simulation or from sideband regions in data, and in both cases these estimates are subject to a statistical uncertainty on the number of simulated or data events. 
 In principle we should include an independent statistical uncertainty for every bin of every process in our model. 
-It's important to note that <sub><sup>COMBINE</sup></sub>/`RooFit` does not take this into account automatically - statistical fluctuations of the data are implicitly accounted 
+It's important to note that <span style="font-variant:small-caps;">Combine</span>/`RooFit` does not take this into account automatically - statistical fluctuations of the data are implicitly accounted 
 for in the likelihood formalism, but statistical uncertainties in the model must be specified by us.
 
 One way to implement these uncertainties is to create a `shape` uncertainty for each bin of each process, in which the up and down histograms have the contents of the bin
  shifted up and down by the $1\sigma$ uncertainty. 
 However this makes the likelihood evaluation computationally inefficient, and can lead to a large number of nuisance parameters 
-in more complex models. Instead we will use a feature in <sub><sup>COMBINE</sup></sub> called `autoMCStats` that creates these automatically from the datacard, 
+in more complex models. Instead we will use a feature in <span style="font-variant:small-caps;">Combine</span> called `autoMCStats` that creates these automatically from the datacard, 
 and uses a technique called "Barlow-Beeston-lite" to reduce the number of systematic uncertainties that are created. 
 This works on the assumption that for high MC event counts we can model the uncertainty with a Gaussian distribution. Given the uncertainties in different bins are independent, the total uncertainty of several processes in a particular bin is just the sum of $N$ individual Gaussians, which is itself a Gaussian distribution. 
 So instead of $N$ nuisance parameters we need only one. This breaks down when the number of events is small and we are not in the Gaussian regime. 
@@ -484,7 +484,7 @@ To produce these distributions add the `--saveShapes` and `--saveWithUncertainti
 combine -M FitDiagnostics workspace_part3.root -m 200 --rMin -1 --rMax 2 --saveShapes --saveWithUncertainties -n .part3B
 ```
 
-<sub><sup>COMBINE</sup></sub> will produce pre- and post-fit distributions (for fit_s and fit_b) in the fitDiagnosticsTest.root output file:
+<span style="font-variant:small-caps;">Combine</span> will produce pre- and post-fit distributions (for fit_s and fit_b) in the fitDiagnosticsTest.root output file:
 
 ![](images/fit_diag_shapes.png)
 
@@ -497,7 +497,7 @@ combine -M FitDiagnostics workspace_part3.root -m 200 --rMin -1 --rMax 2 --saveS
 
 
 ### D: Calculating the significance
-In the event that you observe a deviation from your null hypothesis, in this case the b-only hypothesis, <sub><sup>COMBINE</sup></sub> can be used to calculate the p-value or significance. To do this using the asymptotic approximation simply do:
+In the event that you observe a deviation from your null hypothesis, in this case the b-only hypothesis, <span style="font-variant:small-caps;">Combine</span> can be used to calculate the p-value or significance. To do this using the asymptotic approximation simply do:
 
 ```shell
 combine -M Significance workspace_part3.root -m 200 --rMin -1 --rMax 2
@@ -601,7 +601,7 @@ python plot1DScan.py higgsCombine.part3E.MultiDimFit.mH200.root --others 'higgsC
 
 ![](images/freeze_first_attempt.png)
 
-This doesn't look quite right - the best-fit has been shifted because unfortunately the `--freezeParameters` option acts before the initial fit, whereas we only want to add it for the scan after this fit. To remedy this we can use a feature of <sub><sup>COMBINE</sup></sub> that lets us save a "snapshot" of the best-fit parameter values, and reuse this snapshot in subsequent fits. First we perform a single fit, adding the `--saveWorkspace` option:
+This doesn't look quite right - the best-fit has been shifted because unfortunately the `--freezeParameters` option acts before the initial fit, whereas we only want to add it for the scan after this fit. To remedy this we can use a feature of <span style="font-variant:small-caps;">Combine</span> that lets us save a "snapshot" of the best-fit parameter values, and reuse this snapshot in subsequent fits. First we perform a single fit, adding the `--saveWorkspace` option:
 
 ```shell
 combine -M MultiDimFit workspace_part3.root -n .part3E.snapshot -m 200 --rMin -1 --rMax 2 --saveWorkspace
@@ -642,7 +642,7 @@ While it is perfectly fine to just list the relevant nuisance parameters in the
   - How important are these tau-related uncertainties compared to the others?
 
 ### F: Use of channel masking
-We will now return briefly to the topic of blinding. We've seen that we can compute expected results by performing any <sub><sup>COMBINE</sup></sub> method on an Asimov dataset generated using `-t -1`. This is useful, because we can optimise our analysis without introducing any accidental bias that might come from looking at the data in the signal region. However our control regions have been chosen specifically to be signal-free, and it would be useful to use the data here to set the normalisation of our backgrounds even while the signal region remains blinded. Unfortunately there's no easy way to generate a partial Asimov dataset just for the signal region, but instead we can use a feature called "channel masking" to remove specific channels from the likelihood evaluation. One useful application of this feature is to make post-fit plots of the signal region from a control-region-only fit.
+We will now return briefly to the topic of blinding. We've seen that we can compute expected results by performing any <span style="font-variant:small-caps;">Combine</span> method on an Asimov dataset generated using `-t -1`. This is useful, because we can optimise our analysis without introducing any accidental bias that might come from looking at the data in the signal region. However our control regions have been chosen specifically to be signal-free, and it would be useful to use the data here to set the normalisation of our backgrounds even while the signal region remains blinded. Unfortunately there's no easy way to generate a partial Asimov dataset just for the signal region, but instead we can use a feature called "channel masking" to remove specific channels from the likelihood evaluation. One useful application of this feature is to make post-fit plots of the signal region from a control-region-only fit.
 
 To use the masking we first need to rerun `text2workspace.py` with an extra option that will create variables named like `mask_[channel]` in the workspace:
 
@@ -659,7 +659,7 @@ Topics covered in this section:
   - A: Writing a simple physics model
   - B: Performing and plotting 2D likelihood scans
 
-With <sub><sup>COMBINE</sup></sub> we are not limited to parametrising the signal with a single scaling parameter `r`. In fact we can define any arbitrary scaling using whatever functions and parameters we would like. 
+With <span style="font-variant:small-caps;">Combine</span> we are not limited to parametrising the signal with a single scaling parameter `r`. In fact we can define any arbitrary scaling using whatever functions and parameters we would like. 
 For example, when measuring the couplings of the Higgs boson to the different SM particles we would introduce a POI for each coupling parameter, for example $\kappa_{\text{W}}$, $\kappa_{\text{Z}}$, $\kappa_{\tau}$ etc. We would then generate scaling terms for each $i\rightarrow \text{H}\rightarrow j$ process in terms of how the cross section ($\sigma_i(\kappa)$) and branching ratio ($\frac{\Gamma_i(\kappa)}{\Gamma_{\text{tot}}(\kappa)}$) scale relative to the SM prediction.
 
 This parametrisation of the signal (and possibly backgrounds too) is specified in a **physics model**. This is a python class that is used by `text2workspace.py` to construct the model in terms of RooFit objects. There is documentation on using phyiscs models [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part2/physicsmodels/#physics-models).
@@ -692,7 +692,7 @@ dasModel = DASModel()
 </details>
 
 In this we override two methods of the basic `PhysicsModel` class: `doParametersOfInterest` and `getYieldScale`. In the first we define our POI variables, using the doVar function which accepts the RooWorkspace factory syntax for creating variables, and then define all our POIs in a set via the doSet function. The second function will be called for every process in every channel (bin), and using the corresponding strings we have to specify how that process should be scaled. Here we check if the process was declared as signal in the datacard, and if so scale it by `r`, otherwise if it is a background no scaling is applied (`1`).
-To use the physics model with `text2workspace.py` first copy it to the python directory in the <sub><sup>COMBINE</sup></sub> package:
+To use the physics model with `text2workspace.py` first copy it to the python directory in the <span style="font-variant:small-caps;">Combine</span> package:
 ```shell
 cp DASModel.py $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/python/
 ```
@@ -712,7 +712,7 @@ combine -M MultiDimFit workspace_part4.root -n .part4A -m 200 --rMin 0 --rMax 2
 
 ### B: Performing and plotting 2D likelihood scans
 
-For a model with two POIs it is often useful to look at the how well we are able to measure both simultaneously. A natural extension of determining 1D confidence intervals on a single parameter like we did in part 3D is to determine confidence level regions in 2D. To do this we also use combine in a similar way, with `-M MultiDimFit --algo grid`. When two POIs are found, <sub><sup>COMBINE</sup></sub> will scan a 2D grid of points instead of a 1D array.
+For a model with two POIs it is often useful to look at the how well we are able to measure both simultaneously. A natural extension of determining 1D confidence intervals on a single parameter like we did in part 3D is to determine confidence level regions in 2D. To do this we also use combine in a similar way, with `-M MultiDimFit --algo grid`. When two POIs are found, <span style="font-variant:small-caps;">Combine</span> will scan a 2D grid of points instead of a 1D array.
 
 **Tasks and questions:**
 
diff --git a/docs/part5/roofit.md b/docs/part5/roofit.md
index a9c964f5c63..cbd8cf2d220 100644
--- a/docs/part5/roofit.md
+++ b/docs/part5/roofit.md
@@ -687,7 +687,7 @@ nominal_values = (MH=124.627 +/- 0.398094,resolution=1[C],norm_s=33.9097 +/- 11.
 ```
  </details>
 
-This is exactly what needs to be done when you want to use shape based datacards in <sub><sup>COMBINE</sup></sub> with parametric models.
+This is exactly what needs to be done when you want to use shape based datacards in <span style="font-variant:small-caps;">Combine</span> with parametric models.
 
 ## A likelihood for a counting experiment
 An introductory presentation about likelihoods and interval estimation is available [here](https://indico.cern.ch/event/976099/contributions/4138517/).
@@ -791,7 +791,7 @@ canv.SaveAs("likelihoodscan.pdf")
 ```
 
 Well, this is doable - but we were only looking at a simple one-bin counting experiment. This might become rather cumbersome for large models... $[*]$
-We will now switch to <sub><sup>COMBINE</sup></sub> which will make it a lot easier to set up your model and do the statistical analysis than trying to build the likelihood yourself.
+We will now switch to <span style="font-variant:small-caps;">Combine</span> which will make it a lot easier to set up your model and do the statistical analysis than trying to build the likelihood yourself.
 
 $[*]$ Side note - `RooFit` does have additional functionality to help with statistical model building, but we will not go into detail today.