Skip to content

Commit

Permalink
Merge pull request #842 from cms-analysis/interference
Browse files Browse the repository at this point in the history
Accelerated interference models
  • Loading branch information
anigamova authored Nov 6, 2023
2 parents 7be2d15 + 6ca7196 commit bed1eed
Show file tree
Hide file tree
Showing 16 changed files with 878 additions and 15 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,9 @@ jobs:
run: |
cd HiggsAnalysis/CombinedLimit
make CONDA=1 -j 2
- name: Run tests
if: startsWith(matrix.python, '3.')
shell: bash -l {0}
run: |
cd HiggsAnalysis/CombinedLimit/test
python test_interference.py
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,12 @@ DICTNAME=$(LIBNAME)_xr
# Linker and flags -------------------------------------------------------------
LD = $(shell root-config --ld)
ROOTLDFLAGS = $(shell root-config --ldflags)
ROOTLIBDIR = $(shell root-config --libdir)
# OS x specific linkage
DARWIN := $(shell uname|grep Darwin)
ifdef DARWIN
LDFLAGS = $(ROOTLDFLAGS) -shared -install_name @rpath/$(SONAME) -fPIC
EXELDFLAGS = -Wl,-rpath,'@executable_path/../lib'
LDFLAGS = $(ROOTLDFLAGS) -g -shared -install_name @rpath/$(SONAME) -fPIC
EXELDFLAGS = -Wl,-rpath,'@executable_path/../lib' -Wl,-rpath,$(ROOTLIBDIR)
else
LDFLAGS = $(ROOTLDFLAGS) -shared -Wl,-soname,$(SONAME) -Wl,-E -Wl,-z,defs -fPIC
EXELDFLAGS =
Expand Down
120 changes: 115 additions & 5 deletions docs/part2/physicsmodels.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,19 +213,19 @@ The `PhysicsModel` that encodes the signal model above is the [twoHypothesisHigg

The two processes (S and S_ALT) will get different scaling parameters. The LEP-style likelihood for hypothesis testing can now be performed by setting **x** or **not_x** to 1 and 0 and comparing two likelihood evaluations.

### Interference
### Signal-background interference

Since there are no such things as negative probability distribution functions, the recommended way to implement this is to start from the expression for the individual amplitudes and the parameter of interest $k$,
Since there are no such things as negative probability distribution functions, the recommended way to implement this is to start from the expression for the individual amplitudes $A$ and the parameter of interest $k$,

$$
\mathrm{Yield} = (k * A_{s} + A_{b})^2
= k^2 * A_{s}^2 + k * 2 A_{s} A_{b} + A_{b}^2
\mathrm{Yield} = |k * A_{s} + A_{b}|^2
= k^2 * |A_{s}|^2 + k * 2 \Re(A_{s}^* A_{b}) + |A_{b}|^2
= \mu * S + \sqrt{\mu} * I + B
$$

where

$\mu = k^2, ~S = A_{s}^2,~B = Ab^2$ and $ S+B+I = (As + Ab)^2$.
$\mu = k^2, ~S = |A_{s}|^2,~B = |A_b|^2$ and $S+B+I = |A_s + A_b|^2$.

With some algebra you can work out that,

Expand All @@ -244,3 +244,113 @@ An example of this scheme is implemented in a [HiggsWidth](https://gitlab.cern.c
self.modelBuilder.factory_( "expr::qqH_b_func(\"1-sqrt(@0)\", CMS_zz4l_mu)")
self.modelBuilder.factory_( "expr::qqH_sbi_func(\"sqrt(@0)\", CMS_zz4l_mu)")
```

### Multi-process interference

The above formulation can be extended to multiple parameters of interest
(POIs). See
[AnalyticAnomalousCoupling](https://github.com/amassiro/AnalyticAnomalousCoupling)
for an example. However, the computational performance scales quadratically
with the number of POIs, and can get extremely expensive for 10 or more, as may
be encountered often with EFT analyses. To alleviate this issue, an accelerated
interference modeling technique is implemented for template-based analyses via
the `interferenceModel` physics model. In this model, each bin yield $y$ is parameterized
$$y(\theta) = y_0 (\theta^\top M \theta)$$
as a function of the POI vector $\theta$, a nominal template $y_0$, and a scaling matrix $M$.
To see how this parameterization relates to that of the previous section, we can define:

$$
y_0 = A_b^2, \qquad
M = \frac{1}{A_b^2} \begin{bmatrix}
|A_s|^2 & \Re(A_s^* A_b) \\
\Re(A_s A_b^*) & |A_b|^2
\end{bmatrix}, \qquad \theta = \begin{bmatrix}
\sqrt{\mu} \\
1
\end{bmatrix}
$$

which leads to the same parameterization. At present, this technique only works with
`CMSHistFunc`-based workspaces, as these are the most common workspace types
encountered and the default when using
[autoMCStats](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part2/bin-wise-stats).
To use this model, for each bin find $y_0$ and put it into the datacard as a signal process, then find $M$ and
save the lower triangular component as an array in a `scaling.json` file with a
syntax as follows:

```json
[
{
"channel": "my_channel",
"process": "my_nominal_process",
"parameters": ["sqrt_mu[1,0,2]", "Bscaling[1]"],
"scaling": [
[0.5, 0.1, 1.0],
[0.6, 0.2, 1.0],
[0.7, 0.3, 1.0]
]
}
]
```

where the parameters are declared using RooFit's [factory
syntax](https://root.cern.ch/doc/v622/classRooWorkspace.html#a0ddded1d65f5c6c4732a7a3daa8d16b0)
and each row of the `scaling` field represents the scaling information of a bin, e.g. if $y_0 = |A_b|^2$
then each row would contain three entries:
$$|A_s|^2 / |A_b|^2,\quad \Re(A_s^* A_b)/|A_b|^2,\quad 1$$

For several coefficients, one would enumerate as follows:
```python
scaling = []
for ibin in range(nbins):
binscaling = []
for icoef in range(ncoef):
for jcoef in range(icoef + 1):
binscaling.append(amplitude_squared_for(ibin, icoef, jcoef))
scaling.append(binscaling)
```

Then, to construct the workspace, run

```bash
text2workspace.py card.txt -P HiggsAnalysis.CombinedLimit.InterferenceModels:interferenceModel \
--PO verbose --PO scalingData=scaling.json
```
For large amounts of scaling data, you can optionally use gzipped json (`.json.gz`) or pickle (`.pkl.gz`)
files with 2D numpy arrays for the scaling coefficients instead of lists. The function `numpy.tril_indices(ncoef)`
is helpful for extracting the lower triangle of a square matrix.

You could pick any
nominal template, and adjust the scaling as appropriate. Generally it is
advisable to use a nominal template corresponding to near where you expect the
POIs to land so that the shape systematic effects are well-modeled in that
region.

It may be the case that the relative contributions of the terms are themselves
a function of the POIs. For example, in VBF di-Higgs production, BSM
modifications to the production rate can be parameterized in the "kappa"
framework via three diagrams, with scaling coefficients $\kappa_V \kappa_\lambda$,
$\kappa_V^2$, and $\kappa_{2V}$, respectively, that interfere. In
that case, you can declare formulas with the factory syntax to represent each
amplitude as follows:

```json
[
{
"channel": "a_vbf_channel",
"process": "VBFHH",
"parameters": ["expr::a0('@0*@1', kv[1,0,2], kl[1,0,2])", "expr::a1('@0*@0', kv[1,0,2])", "k2v[1,0,2]"],
"scaling": [
[3.30353674666415, -8.54170982038222, 22.96464188467882, 4.2353483207128, -11.07996258835088, 5.504469544697623],
[2.20644332142891, -7.076836641962523, 23.50989689214267, 4.053185685866683, -13.08569222837996, 7.502346155380032]
]
}
]
```

However, you will need to manually specify what the POIs should be when creating the workspace using the `POIs=` physics option, e.g.

```bash
text2workspace.py card.txt -P HiggsAnalysis.CombinedLimit.InterferenceModels:interferenceModel \
--PO scalingData=scaling.json --PO 'POIs=kl[1,0,2]:kv[1,0,2]:k2v[1,0,2]'
```
45 changes: 45 additions & 0 deletions interface/CMSExternalMorph.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#ifndef CMSExternalMorph_h
#define CMSExternalMorph_h
#include <vector>

#include "RooAbsReal.h"
#include "RooRealVar.h"
#include "RooRealProxy.h"

class CMSExternalMorph : public RooAbsReal {
public:
CMSExternalMorph();
/*
* All subclasses need to provide an edges array of length nbins+1
* of the observable (x)
* TODO: CMSHistFunc and CMSHistSum do not check the binning is compatible
* with their binning other than having the correct length
*/
CMSExternalMorph(
const char* name,
const char* title,
RooRealVar& x,
const std::vector<double>& edges
);
CMSExternalMorph(CMSExternalMorph const& other, const char* name = 0);
virtual ~CMSExternalMorph();

/* Batch accessor for CMSHistFunc / CMSHistSum, to be overriden by concrete
* implementations. hasChanged() should indicate whether or not
* batchGetBinValues() would return a new vector, given the state of
* any dependent variables.
*/
virtual bool hasChanged() const = 0;
virtual const std::vector<double>& batchGetBinValues() const = 0;

protected:
RooRealProxy x_;
std::vector<double> edges_;

double evaluate() const;

private:
ClassDef(CMSExternalMorph, 1)
};

#endif // CMSExternalMorph_h
9 changes: 7 additions & 2 deletions interface/CMSHistFunc.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "CMSHistV.h"
#include "FastTemplate_Old.h"
#include "SimpleCacheSentry.h"
#include "CMSExternalMorph.h"

class CMSHistFuncWrapper;

Expand Down Expand Up @@ -85,7 +86,7 @@ class CMSHistFunc : public RooAbsReal {
virtual TObject* clone(const char* newname) const {
return new CMSHistFunc(*this, newname);
}
virtual ~CMSHistFunc() {}
virtual ~CMSHistFunc();

void addHorizontalMorph(RooAbsReal& hvar, TVectorD hpoints);

Expand Down Expand Up @@ -148,6 +149,7 @@ class CMSHistFunc : public RooAbsReal {
friend class CMSHistV<CMSHistFunc>;
friend class CMSHistSum;

void injectExternalMorph(CMSExternalMorph& morph);
/*
– RooAbsArg::setVerboseEval(Int_t level) • Level 0 – No messages
Expand Down Expand Up @@ -190,6 +192,9 @@ class CMSHistFunc : public RooAbsReal {

static bool enable_fast_vertical_; //! not to be serialized

// This is an "optional proxy", i.e. a list with either zero or one entry
RooListProxy external_morph_;

private:
void initialize() const;
void setGlobalCache() const;
Expand All @@ -214,7 +219,7 @@ class CMSHistFunc : public RooAbsReal {

void applyRebin() const;

ClassDef(CMSHistFunc, 1)
ClassDef(CMSHistFunc, 2)
};


Expand Down
9 changes: 8 additions & 1 deletion interface/CMSHistSum.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,13 @@ class CMSHistSum : public RooAbsReal {
RooArgList const& coefList() const { return coeffpars_; }
// RooArgList const& funcList() const { return funcs_; }

RooAbsReal const& getXVar() const { return x_.arg(); }

static void EnableFastVertical();
friend class CMSHistV<CMSHistSum>;

void injectExternalMorph(int idx, CMSExternalMorph& morph);

protected:
RooRealProxy x_;

Expand Down Expand Up @@ -128,6 +132,9 @@ class CMSHistSum : public RooAbsReal {
mutable int fast_mode_; //! not to be serialized
static bool enable_fast_vertical_; //! not to be serialized

RooListProxy external_morphs_;
std::vector<int> external_morph_indices_;

inline int& morphField(int const& ip, int const& iv) {
return vmorph_fields_[ip * n_morphs_ + iv];
}
Expand All @@ -143,7 +150,7 @@ class CMSHistSum : public RooAbsReal {


private:
ClassDef(CMSHistSum,1)
ClassDef(CMSHistSum,2)
};

#endif
55 changes: 55 additions & 0 deletions interface/CMSInterferenceFunc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#ifndef CMSInterferenceFunc_h
#define CMSInterferenceFunc_h
#include "RooListProxy.h"
#include "SimpleCacheSentry.h"

#include "CMSExternalMorph.h"

class _InterferenceEval;

class CMSInterferenceFunc : public CMSExternalMorph {
public:
CMSInterferenceFunc();
CMSInterferenceFunc(CMSInterferenceFunc const& other, const char* name = 0);
/*
* For a coefficients list of length n and edges array of length b+1,
* the binscaling nested vector should have b entries (for b bins) with
* each being of length n*(n+1)/2, corresponding to the lower triangular
* elements of the scaling matrix, i.e. (m_00, m_10, m_11, m_20, m_21, m_22, ...)
*/
CMSInterferenceFunc(
const char* name,
const char* title,
RooRealVar& x,
const std::vector<double>& edges,
const RooArgList& coefficients,
const std::vector<std::vector<double>> binscaling
);
virtual ~CMSInterferenceFunc();

virtual TObject* clone(const char* newname) const override {
return new CMSInterferenceFunc(*this, newname);
};

void printMultiline(
std::ostream& os, Int_t contents, Bool_t verbose, TString indent
) const override;

bool hasChanged() const override { return !sentry_.good(); };
const std::vector<double>& batchGetBinValues() const override;

protected:
RooListProxy coefficients_;
std::vector<std::vector<double>> binscaling_;

mutable SimpleCacheSentry sentry_; //!
mutable std::unique_ptr<_InterferenceEval> evaluator_; //!

private:
void initialize() const;
void updateCache() const;

ClassDefOverride(CMSInterferenceFunc, 1)
};

#endif // CMSInterferenceFunc_h
Loading

0 comments on commit bed1eed

Please sign in to comment.