Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerated interference models #842

Merged
merged 25 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2fd2c11
Working CMSInterferenceFunc
nsmith- Jun 9, 2023
8c494e8
Give up on RooChangeTracker, add hook in CMSHistFunc
nsmith- Jun 13, 2023
6619f7e
Add test to exercise functionality
nsmith- Jun 13, 2023
a62418d
Lint
nsmith- Jun 13, 2023
7bc66a4
Lint correctly
nsmith- Jun 13, 2023
1473ddf
Errant space is a problem on linux
nsmith- Jun 13, 2023
76fa6ac
Use static cast for speed
nsmith- Aug 22, 2023
7f90b0c
Implement a PhysicsModel to set up interference morphing
nsmith- Aug 23, 2023
45dd13b
Docs for interference model
nsmith- Aug 23, 2023
6f5e6de
Some clarifications in docs
nsmith- Aug 23, 2023
a7737f9
Improve interference docs
nsmith- Aug 25, 2023
171431a
Actually test the CMSInterferenceFunc against reference values
nsmith- Aug 29, 2023
886335c
Switch to lower triangular ordering
nsmith- Aug 29, 2023
3d99a6f
Also switch to lower triangular in docs
nsmith- Aug 29, 2023
f99a2ee
Python2 compatibility 😢😢😢
nsmith- Aug 29, 2023
9e40e3c
Ensure contiguous
nsmith- Aug 30, 2023
5b061b9
Allow robustHesse without a fit
nsmith- Aug 30, 2023
e80071c
Actually run test
nsmith- Oct 27, 2023
3e25193
Implement external morph for CMSHistSum as well
nsmith- Oct 31, 2023
06a807f
Merge remote-tracking branch 'origin/main' into interference
nsmith- Oct 31, 2023
6123b59
Refactor CMSInterferenceFunc to use base class CMSExternalMorph
nsmith- Nov 1, 2023
8857916
Only run tests in python3
nsmith- Nov 1, 2023
334e052
Remove f-string
nsmith- Nov 1, 2023
484a45e
Unique external morph name
nsmith- Nov 1, 2023
6ca7196
Sometimes coefficient has a different name
nsmith- Nov 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,9 @@ jobs:
run: |
cd HiggsAnalysis/CombinedLimit
make CONDA=1 -j 2
- name: Run tests
if: startsWith(matrix.python, '3.')
shell: bash -l {0}
run: |
cd HiggsAnalysis/CombinedLimit/test
python test_interference.py
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,12 @@ DICTNAME=$(LIBNAME)_xr
# Linker and flags -------------------------------------------------------------
LD = $(shell root-config --ld)
ROOTLDFLAGS = $(shell root-config --ldflags)
ROOTLIBDIR = $(shell root-config --libdir)
# OS x specific linkage
DARWIN := $(shell uname|grep Darwin)
ifdef DARWIN
LDFLAGS = $(ROOTLDFLAGS) -shared -install_name @rpath/$(SONAME) -fPIC
EXELDFLAGS = -Wl,-rpath,'@executable_path/../lib'
LDFLAGS = $(ROOTLDFLAGS) -g -shared -install_name @rpath/$(SONAME) -fPIC
EXELDFLAGS = -Wl,-rpath,'@executable_path/../lib' -Wl,-rpath,$(ROOTLIBDIR)
else
LDFLAGS = $(ROOTLDFLAGS) -shared -Wl,-soname,$(SONAME) -Wl,-E -Wl,-z,defs -fPIC
EXELDFLAGS =
Expand Down
120 changes: 115 additions & 5 deletions docs/part2/physicsmodels.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,19 +213,19 @@ The `PhysicsModel` that encodes the signal model above is the [twoHypothesisHigg

The two processes (S and S_ALT) will get different scaling parameters. The LEP-style likelihood for hypothesis testing can now be performed by setting **x** or **not_x** to 1 and 0 and comparing two likelihood evaluations.

### Interference
### Signal-background interference

Since there are no such things as negative probability distribution functions, the recommended way to implement this is to start from the expression for the individual amplitudes and the parameter of interest $k$,
Since there are no such things as negative probability distribution functions, the recommended way to implement this is to start from the expression for the individual amplitudes $A$ and the parameter of interest $k$,

$$
\mathrm{Yield} = (k * A_{s} + A_{b})^2
= k^2 * A_{s}^2 + k * 2 A_{s} A_{b} + A_{b}^2
\mathrm{Yield} = |k * A_{s} + A_{b}|^2
= k^2 * |A_{s}|^2 + k * 2 \Re(A_{s}^* A_{b}) + |A_{b}|^2
= \mu * S + \sqrt{\mu} * I + B
$$

where

$\mu = k^2, ~S = A_{s}^2,~B = Ab^2$ and $ S+B+I = (As + Ab)^2$.
$\mu = k^2, ~S = |A_{s}|^2,~B = |A_b|^2$ and $S+B+I = |A_s + A_b|^2$.

With some algebra you can work out that,

Expand All @@ -244,3 +244,113 @@ An example of this scheme is implemented in a [HiggsWidth](https://gitlab.cern.c
self.modelBuilder.factory_( "expr::qqH_b_func(\"1-sqrt(@0)\", CMS_zz4l_mu)")
self.modelBuilder.factory_( "expr::qqH_sbi_func(\"sqrt(@0)\", CMS_zz4l_mu)")
```

### Multi-process interference

The above formulation can be extended to multiple parameters of interest
(POIs). See
[AnalyticAnomalousCoupling](https://github.com/amassiro/AnalyticAnomalousCoupling)
for an example. However, the computational performance scales quadratically
with the number of POIs, and can get extremely expensive for 10 or more, as may
be encountered often with EFT analyses. To alleviate this issue, an accelerated
interference modeling technique is implemented for template-based analyses via
the `interferenceModel` physics model. In this model, each bin yield $y$ is parameterized
$$y(\theta) = y_0 (\theta^\top M \theta)$$
as a function of the POI vector $\theta$, a nominal template $y_0$, and a scaling matrix $M$.
To see how this parameterization relates to that of the previous section, we can define:

$$
y_0 = A_b^2, \qquad
M = \frac{1}{A_b^2} \begin{bmatrix}
|A_s|^2 & \Re(A_s^* A_b) \\
\Re(A_s A_b^*) & |A_b|^2
\end{bmatrix}, \qquad \theta = \begin{bmatrix}
\sqrt{\mu} \\
1
\end{bmatrix}
$$

which leads to the same parameterization. At present, this technique only works with
`CMSHistFunc`-based workspaces, as these are the most common workspace types
encountered and the default when using
[autoMCStats](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part2/bin-wise-stats).
To use this model, for each bin find $y_0$ and put it into the datacard as a signal process, then find $M$ and
save the lower triangular component as an array in a `scaling.json` file with a
syntax as follows:

```json
[
{
"channel": "my_channel",
"process": "my_nominal_process",
"parameters": ["sqrt_mu[1,0,2]", "Bscaling[1]"],
"scaling": [
[0.5, 0.1, 1.0],
[0.6, 0.2, 1.0],
[0.7, 0.3, 1.0]
]
}
]
```

where the parameters are declared using RooFit's [factory
syntax](https://root.cern.ch/doc/v622/classRooWorkspace.html#a0ddded1d65f5c6c4732a7a3daa8d16b0)
and each row of the `scaling` field represents the scaling information of a bin, e.g. if $y_0 = |A_b|^2$
then each row would contain three entries:
$$|A_s|^2 / |A_b|^2,\quad \Re(A_s^* A_b)/|A_b|^2,\quad 1$$

For several coefficients, one would enumerate as follows:
```python
scaling = []
for ibin in range(nbins):
binscaling = []
for icoef in range(ncoef):
for jcoef in range(icoef + 1):
binscaling.append(amplitude_squared_for(ibin, icoef, jcoef))
scaling.append(binscaling)
```

Then, to construct the workspace, run

```bash
text2workspace.py card.txt -P HiggsAnalysis.CombinedLimit.InterferenceModels:interferenceModel \
--PO verbose --PO scalingData=scaling.json
```
For large amounts of scaling data, you can optionally use gzipped json (`.json.gz`) or pickle (`.pkl.gz`)
files with 2D numpy arrays for the scaling coefficients instead of lists. The function `numpy.tril_indices(ncoef)`
is helpful for extracting the lower triangle of a square matrix.

You could pick any
nominal template, and adjust the scaling as appropriate. Generally it is
advisable to use a nominal template corresponding to near where you expect the
POIs to land so that the shape systematic effects are well-modeled in that
region.

It may be the case that the relative contributions of the terms are themselves
a function of the POIs. For example, in VBF di-Higgs production, BSM
modifications to the production rate can be parameterized in the "kappa"
framework via three diagrams, with scaling coefficients $\kappa_V \kappa_\lambda$,
$\kappa_V^2$, and $\kappa_{2V}$, respectively, that interfere. In
that case, you can declare formulas with the factory syntax to represent each
amplitude as follows:

```json
[
{
"channel": "a_vbf_channel",
"process": "VBFHH",
"parameters": ["expr::a0('@0*@1', kv[1,0,2], kl[1,0,2])", "expr::a1('@0*@0', kv[1,0,2])", "k2v[1,0,2]"],
"scaling": [
[3.30353674666415, -8.54170982038222, 22.96464188467882, 4.2353483207128, -11.07996258835088, 5.504469544697623],
[2.20644332142891, -7.076836641962523, 23.50989689214267, 4.053185685866683, -13.08569222837996, 7.502346155380032]
]
}
]
```

However, you will need to manually specify what the POIs should be when creating the workspace using the `POIs=` physics option, e.g.

```bash
text2workspace.py card.txt -P HiggsAnalysis.CombinedLimit.InterferenceModels:interferenceModel \
--PO scalingData=scaling.json --PO 'POIs=kl[1,0,2]:kv[1,0,2]:k2v[1,0,2]'
```
45 changes: 45 additions & 0 deletions interface/CMSExternalMorph.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#ifndef CMSExternalMorph_h
#define CMSExternalMorph_h
#include <vector>

#include "RooAbsReal.h"
#include "RooRealVar.h"
#include "RooRealProxy.h"

class CMSExternalMorph : public RooAbsReal {
public:
CMSExternalMorph();
/*
* All subclasses need to provide an edges array of length nbins+1
* of the observable (x)
* TODO: CMSHistFunc and CMSHistSum do not check the binning is compatible
* with their binning other than having the correct length
*/
CMSExternalMorph(
const char* name,
const char* title,
RooRealVar& x,
const std::vector<double>& edges
);
CMSExternalMorph(CMSExternalMorph const& other, const char* name = 0);
virtual ~CMSExternalMorph();

/* Batch accessor for CMSHistFunc / CMSHistSum, to be overriden by concrete
* implementations. hasChanged() should indicate whether or not
* batchGetBinValues() would return a new vector, given the state of
* any dependent variables.
*/
virtual bool hasChanged() const = 0;
virtual const std::vector<double>& batchGetBinValues() const = 0;

protected:
RooRealProxy x_;
std::vector<double> edges_;

double evaluate() const;

private:
ClassDef(CMSExternalMorph, 1)
};

#endif // CMSExternalMorph_h
9 changes: 7 additions & 2 deletions interface/CMSHistFunc.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "CMSHistV.h"
#include "FastTemplate_Old.h"
#include "SimpleCacheSentry.h"
#include "CMSExternalMorph.h"

class CMSHistFuncWrapper;

Expand Down Expand Up @@ -85,7 +86,7 @@ class CMSHistFunc : public RooAbsReal {
virtual TObject* clone(const char* newname) const {
return new CMSHistFunc(*this, newname);
}
virtual ~CMSHistFunc() {}
virtual ~CMSHistFunc();

void addHorizontalMorph(RooAbsReal& hvar, TVectorD hpoints);

Expand Down Expand Up @@ -148,6 +149,7 @@ class CMSHistFunc : public RooAbsReal {
friend class CMSHistV<CMSHistFunc>;
friend class CMSHistSum;

void injectExternalMorph(CMSExternalMorph& morph);
/*

– RooAbsArg::setVerboseEval(Int_t level) • Level 0 – No messages
Expand Down Expand Up @@ -190,6 +192,9 @@ class CMSHistFunc : public RooAbsReal {

static bool enable_fast_vertical_; //! not to be serialized

// This is an "optional proxy", i.e. a list with either zero or one entry
RooListProxy external_morph_;

private:
void initialize() const;
void setGlobalCache() const;
Expand All @@ -214,7 +219,7 @@ class CMSHistFunc : public RooAbsReal {

void applyRebin() const;

ClassDef(CMSHistFunc, 1)
ClassDef(CMSHistFunc, 2)
};


Expand Down
9 changes: 8 additions & 1 deletion interface/CMSHistSum.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,13 @@ class CMSHistSum : public RooAbsReal {
RooArgList const& coefList() const { return coeffpars_; }
// RooArgList const& funcList() const { return funcs_; }

RooAbsReal const& getXVar() const { return x_.arg(); }

static void EnableFastVertical();
friend class CMSHistV<CMSHistSum>;

void injectExternalMorph(int idx, CMSExternalMorph& morph);

protected:
RooRealProxy x_;

Expand Down Expand Up @@ -128,6 +132,9 @@ class CMSHistSum : public RooAbsReal {
mutable int fast_mode_; //! not to be serialized
static bool enable_fast_vertical_; //! not to be serialized

RooListProxy external_morphs_;
std::vector<int> external_morph_indices_;

inline int& morphField(int const& ip, int const& iv) {
return vmorph_fields_[ip * n_morphs_ + iv];
}
Expand All @@ -143,7 +150,7 @@ class CMSHistSum : public RooAbsReal {


private:
ClassDef(CMSHistSum,1)
ClassDef(CMSHistSum,2)
};

#endif
55 changes: 55 additions & 0 deletions interface/CMSInterferenceFunc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#ifndef CMSInterferenceFunc_h
#define CMSInterferenceFunc_h
#include "RooListProxy.h"
#include "SimpleCacheSentry.h"

#include "CMSExternalMorph.h"

class _InterferenceEval;

class CMSInterferenceFunc : public CMSExternalMorph {
public:
CMSInterferenceFunc();
CMSInterferenceFunc(CMSInterferenceFunc const& other, const char* name = 0);
/*
* For a coefficients list of length n and edges array of length b+1,
* the binscaling nested vector should have b entries (for b bins) with
* each being of length n*(n+1)/2, corresponding to the lower triangular
* elements of the scaling matrix, i.e. (m_00, m_10, m_11, m_20, m_21, m_22, ...)
*/
CMSInterferenceFunc(
const char* name,
const char* title,
RooRealVar& x,
const std::vector<double>& edges,
const RooArgList& coefficients,
const std::vector<std::vector<double>> binscaling
);
virtual ~CMSInterferenceFunc();

virtual TObject* clone(const char* newname) const override {
return new CMSInterferenceFunc(*this, newname);
};

void printMultiline(
std::ostream& os, Int_t contents, Bool_t verbose, TString indent
) const override;

bool hasChanged() const override { return !sentry_.good(); };
const std::vector<double>& batchGetBinValues() const override;

protected:
RooListProxy coefficients_;
std::vector<std::vector<double>> binscaling_;

mutable SimpleCacheSentry sentry_; //!
mutable std::unique_ptr<_InterferenceEval> evaluator_; //!

private:
void initialize() const;
void updateCache() const;

ClassDefOverride(CMSInterferenceFunc, 1)
};

#endif // CMSInterferenceFunc_h
Loading
Loading