Finding uncertainty on parameter of interest with non-parabolic likelihood #844

JaLuka98 · 2023-03-02T23:55:44Z

JaLuka98
Mar 2, 2023

Hello,

currently, I am trying to find a suitable library to perform some simple fits. I have a histogram for pseudo data (exactly following the Standard Model prediction) with a covariance matrix and a histogram for possible deviations (beyond the Standard Model contributions). Essentially, this amounts to a parameter inference where a likelihood can be specified as
multivariate_gauss(data_hist; data_hist + a * a * deviations_hist, cov).
We can see that the coupling parameter "a" enters quadratically. It is evident that the best fit value of a is zero. I would like to obtain a one-sided confidence interval for a given confidence level, meaning for example a 68% CL upper limit (upper bound of the 68% CL interval) and assuming a positive coupling parameter (although this should not matter because a and -a give the same deviations from the pseudo data).

I would assume that I could perform such a fit with iminuit using an UnbinnedNLL cost function since I treat the bin contents as observations that should be compared with an array of means of "data_hist + a * a * deviations_hist". I tried to implement this in the following code, for illustrative purposes I start with a linear deviation (a * deviations_hist) because, in that case, it works well.
Here, we assume the observation of [3,3], a diagonal covariance matrix and we have possible deviations of the form a * [1,1].

from iminuit import cost, Minuit
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal as mvnorm

def multivariate(SM, a):
    BSM = np.ones(2)
    cov = np.eye(2)  # can also consider non-diagonal covariance matrices but use diagonal one for simplicity here
    value = mvnorm.pdf(SM, SM + a*BSM, cov)
    return value.flatten()  # not really sure why this is necessary but the code does not work when just returning "value"

SM = 3*np.ones(2)
c = cost.UnbinnedNLL(SM, multivariate, log=False, verbose=1)
m = Minuit(c, a=0.0)  # give an initial value of the best-fit that is known a-priori
m.limits["a"] = (-10,10)  # we expect the limit roughly in this range, so it makes sense to specify the limit here
m.errordef = Minuit.LEAST_SQUARES  # See discussion below the code
m.migrad()
print(m)
m.draw_profile("a")
plt.xlim(-2,2)
plt.ylim(0,3)
plt.show()

This returns a 68% interval of [-0.707, 0.707], which agrees well with my expectation from the argument that the (2 * negative log likelihood + 1)-contours should determine the 1-sigma interval of the parameter of interest: 2 * a^2 - 1 = 0 (we have two bins) leads to a_68% = +- 1/sqrt(2), which is approximately 0.707.
Note that I had to specify m.errordef = Minuit.LEAST_SQUARES to force the calculation of the error from FCN+1 instead of FCN+0.5. It seems that the routines calculate - 2 * log(multivariate) already instead of - log(multivariate).

However, when I specify a quadratic scaling of possible deviations by simply changing value = mvnorm.pdf(SM, SM + a*BSM, cov) to value = mvnorm.pdf(SM, SM + a*a*BSM, cov), the fit does not work so well anymore. I assume that I now need to use minos since the FCN is not parabolic around the minimum any longer. However, minos fails and claims that the Function minimum is not valid when starting with an initial value of 0.00, which I find odd since this is the best fit value by definition.

I tried to tune the initial value and the limits a bit and found that a=0.05 and limits from -1 to 1 yield a decent, but not very good result of best-fit of 0.05 with a down uncertainty of 1.05 and an up uncertainty of 1.03. Note that the analytic calculation of the boundaries of the 68% interval yields a_68% = +- (1/2)^(1/4) approx. 0.8408964. I am quite puzzled that the minimisation returns the given initial value of 0.05 as the best fit because the verbose printouts show that the FCN is smaller towards 0.
You can also see in the attached profile that the (FCN + 1)-lines are not properly found and minos appears to overestimate the uncertainty.
I should also mention that the results are quite unstable and unreliable when changing the initial value and the limits...

I would appreciate any guidance on how to obtain the uncertainties / the upper limit on the "a" parameter when it enters quadratically. Thanks a lot!

Best,
Jan Lukas Späh

Answered by HDembinski

Mar 3, 2023

You have to call flatten in your model, because the data you want to fit is 1D. The model prediction must match the data. You can also fit multivariate data with a multivariate model with iminuit.

If you have binned data (a histogram), you cannot use UnbinnedNLL, that is for unbinned samples. That you need to set m.errordef = Minuit.LEAST_SQUARES is a consequence of the misuse of UnbinnedNLL, this would be wrong if you used UnbinnedNLL in its correct context. The correct likelihood for binned data is BinnedNLL, but you cannot use that either, because your histogram does not contain poisson-distributed counts but has bin-to-bin correlations which are captured by the cov matrix.

There is cu…

View full answer

HDembinski · 2023-03-03T15:58:58Z

HDembinski
Mar 3, 2023
Maintainer

You have to call flatten in your model, because the data you want to fit is 1D. The model prediction must match the data. You can also fit multivariate data with a multivariate model with iminuit.

If you have binned data (a histogram), you cannot use UnbinnedNLL, that is for unbinned samples. That you need to set m.errordef = Minuit.LEAST_SQUARES is a consequence of the misuse of UnbinnedNLL, this would be wrong if you used UnbinnedNLL in its correct context. The correct likelihood for binned data is BinnedNLL, but you cannot use that either, because your histogram does not contain poisson-distributed counts but has bin-to-bin correlations which are captured by the cov matrix.

There is currently no builtin cost function in iminuit.cost for your case, but it is easy to write a likelihood from scratch. You assume that the data is distributed according to a multivariate normal. So the log-likelihood is logL = mvnorm.logpdf(observation, prediction, covariance)), where observation and prediction are vectors, and covariance is a matrix. There is no sum here, because you have a single observation, the histogram. iminuit expects a negative log-likelihood, so your cost function looks like this:

from iminuit import cost, Minuit
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal as mvnorm


# likelihood has single free parameter "a"
def nll(a):
    SM = np.ones(2)
    BSM = np.ones(2)
    cov = np.eye(2)  # can also consider non-diagonal covariance matrices but use diagonal one for simplicity here
    return -mvnorm.logpdf(SM, SM + a*BSM, cov)

m = Minuit(nll, a=0.0)  # give an initial value of the best-fit that is known a-priori
m.limits["a"] = (-10,10)  # we expect the limit roughly in this range, so it makes sense to specify the limit here
m.errordef = Minuit.LIKELIHOOD  # required, nll is a negative-log-likelihood
m.migrad()
print(m)
m.draw_profile("a")
plt.xlim(-2,2)
plt.ylim(0,3)
plt.show()

This produces the correct confidence interval of [-0.707, 0.707].

However, when I specify a quadratic scaling of possible deviations by simply changing value = mvnorm.pdf(SM, SM + a*BSM, cov) to value = mvnorm.pdf(SM, SM + a*a*BSM, cov), the fit does not work so well anymore. I assume that I now need to use minos since the FCN is not parabolic around the minimum any longer. However, minos fails and claims that the Function minimum is not valid when starting with an initial value of 0.00, which I find odd since this is the best fit value by definition.

Yes, with SM + a*a*BSM you constructed a pathological case, but I think it is contrieved. The general case is SM + f(a) * BSM. For a close to zero, you can expand f(a) in a Taylor series, which in general gives a linear term plus higher-order terms. You constructed a case where the leading linear term is always exactly zero, but gradient descent methods like MIGRAD assume that there is a linear term. That's why the minimum looks weird (not parabolic). It is correct that you shoud use Minos in this case.

from iminuit import cost, Minuit
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal as mvnorm


# likelihood has single free parameter "a"
def nll(a):
    SM = np.ones(2)
    BSM = np.ones(2)
    cov = np.eye(2)  # can also consider non-diagonal covariance matrices but use diagonal one for simplicity here
    return -mvnorm.logpdf(SM, SM + a*a*BSM, cov)

m = Minuit(nll, a=0.0)  # give an initial value of the best-fit that is known a-priori
m.limits["a"] = (-10,10)  # we expect the limit roughly in this range, so it makes sense to specify the limit here
m.errordef = Minuit.LIKELIHOOD
m.migrad()
m.minos()
print(m)

For me, MINOS succeeds:

┌─────────────────────────────────────────────────────────────────────────┐
│                                Migrad                                   │
├──────────────────────────────────┬──────────────────────────────────────┤
│ FCN = 1.838                      │              Nfcn = 38               │
│ EDM = 0 (Goal: 0.0001)           │                                      │
├──────────────────────────────────┼──────────────────────────────────────┤
│          Valid Minimum           │        No Parameters at limit        │
├──────────────────────────────────┼──────────────────────────────────────┤
│ Below EDM threshold (goal x 10)  │           Below call limit           │
├───────────────┬──────────────────┼───────────┬─────────────┬────────────┤
│  Covariance   │     Hesse ok     │ Accurate  │  Pos. def.  │ Not forced │
└───────────────┴──────────────────┴───────────┴─────────────┴────────────┘
┌───┬──────┬───────────┬───────────┬────────────┬────────────┬─────────┬─────────┬───────┐
│   │ Name │   Value   │ Hesse Err │ Minos Err- │ Minos Err+ │ Limit-  │ Limit+  │ Fixed │
├───┼──────┼───────────┼───────────┼────────────┼────────────┼─────────┼─────────┼───────┤
│ 0 │ a    │    0.0    │   13.5    │    -1.1    │    1.1     │   -10   │   10    │       │
└───┴──────┴───────────┴───────────┴────────────┴────────────┴─────────┴─────────┴───────┘
┌──────────┬───────────────────────┐
│          │           a           │
├──────────┼───────────┬───────────┤
│  Error   │   -1.1    │    1.1    │
│  Valid   │   True    │   True    │
│ At Limit │   False   │   False   │
│ Max FCN  │   False   │   False   │
│ New Min  │   False   │   False   │
└──────────┴───────────┴───────────┘
┌───┬───────┐
│   │     a │
├───┼───────┤
│ a │ 5e+03 │
└───┴───────┘

2 replies

HDembinski Mar 3, 2023
Maintainer

I tried to tune the initial value and the limits a bit and found that a=0.05 and limits from -1 to 1 yield a decent, but not very good result of best-fit of 0.05 with a down uncertainty of 1.05 and an up uncertainty of 1.03. Note that the analytic calculation of the boundaries of the 68% interval yields a_68% = +- (1/2)^(1/4) approx. 0.8408964. I am quite puzzled that the minimisation returns the given initial value of 0.05 as the best fit because the verbose printouts show that the FCN is smaller towards 0.

Because you constructed a pathological case, MIGRAD does work very well. The convergence criterion of MIGRAD assumes that the minimum of the negative log-likelihood looks parabolic at least very close to the minimum, but you constructed a pathological case where it is never parabolic. In such a case, MIGRAD may "converge" although the point is not actually a minimum.

It is pretty easy to make MIGRAD fail with contrieved pathelogical cases, but that does not invalidate the method in general.

HDembinski Mar 3, 2023
Maintainer

If your example is not contrieved and you really need to analyse a model which looks exactly like this SM + a*a*BSM, then I suggest to not use Minuit.migrad, but Minuit.scipy. The Scipy minimizers do not use MIGRADs special convergence criterion that is based on second derivatives, so they should handle this pathological case better. You can still use Minuit.minos to compute the confidence interval as before.

HDembinski · 2023-04-04T09:30:55Z

HDembinski
Apr 4, 2023
Maintainer

@JaLuka98 If your question has been answered, please indicate that here on Github by clicking on the button. Thanks!

0 replies

JaLuka98 · 2023-04-07T20:03:59Z

JaLuka98
Apr 7, 2023
Author

Ah sure, sorry, I just forgot. It helped me a lot. If I need any further help, I will open a new thread. Thank you very much!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finding uncertainty on parameter of interest with non-parabolic likelihood #844

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Finding uncertainty on parameter of interest with non-parabolic likelihood #844

JaLuka98 Mar 2, 2023

Replies: 3 comments · 2 replies

HDembinski Mar 3, 2023 Maintainer

HDembinski Mar 3, 2023 Maintainer

HDembinski Mar 3, 2023 Maintainer

HDembinski Apr 4, 2023 Maintainer

JaLuka98 Apr 7, 2023 Author

JaLuka98
Mar 2, 2023

Replies: 3 comments 2 replies

HDembinski
Mar 3, 2023
Maintainer

HDembinski Mar 3, 2023
Maintainer

HDembinski Mar 3, 2023
Maintainer

HDembinski
Apr 4, 2023
Maintainer

JaLuka98
Apr 7, 2023
Author