changes in inputs.json and run sequence.sh adding also dir runImpacts…

…ALT_0M_ALT_0M
cms-analysis · Mar 28, 2024 · 9935bbd · 9935bbd
2 parents 1ce8b56 + 1f5e2a6
commit 9935bbd
Show file tree

Hide file tree

Showing 12 changed files with 719 additions and 11 deletions.
diff --git a/Combine/Checks/Bias_in_significance/README.md b/Combine/Checks/Bias_in_significance/README.md
@@ -0,0 +1,57 @@
+# Bias in significance
+
+Here we provide a script to perform the bias studies developed for the HIG-22-012 analysis.
+
+For extremely low stat categories the nominal bias studies can fail as the fitted PDF goes negative for a large fraction of the toys, unless we truncate r at 0 which then means we get a truncated pull distribution.
+
+For HIG-22-012 we developed a method which looks at the **bias in significance**. In short, the method uses the significance as a metric, and compares the Type-1 error rate (fraction of toys with significance greater than some threshold) for fixed (best-fit) and floating PDF indices, where the toys are generated with the best-fit index. If the curves differ significantly between these two choices then it indicates some bias in the method. The plot also compares the Type-1 error rate to the asymptotic approximation curve, indicating whether the analysis will need to use the `HybridNew` method in the limit setting. More information is provided in `AN2022_074_v7` (section 9.5), which is attached to the HIG-22-012 CADI. 
+
+A description of what the plots show is provided in the following slide:
+![](plot_explanation.png)
+
+## Inputs
+You can run the study for the full workspace or for each category individually.
+To prepare individual category workspaces you can use the existing combineCards functionality, for example:
+```
+combineCards.py Datacard.txt --ic cat_name > Datacard_cat_name.txt
+```
+
+This creates a .txt datacard with only categories matching the reg exp `cat_name` included. 
+Be careful: you will probably have to manually delete some pdfindex lines at the bottom; 
+the script does not know that these correspond to the analysis categories, 
+and therefore will leave them all in (you only want the one corresponding to the category you are studying). 
+
+Once that is done, you can run your usual `text2workspace` command to generate the `-d, --datacard` input for this script. 
+
+For the full workspace, just use the compiled datacard with all categories as input.
+
+## Usage
+
+The script is split into four different stages. You can configure the number of toys with the `--nToys` option. If the run-time is too long you could consider creating a stat-only workspace e.g. remove all systematics (not including the PDF indices) from the txt datacard and recompile. The bias studies are probably fine to do stat-only (for stat dominated analyses). If still too long then you will want to parallelize into separate jobs and hadd the fit results at the end. An example on how to do this for the SGE batch submission system is provided in the `RunBiasInSignificance_hig22012.py` script.
+
+### Setup
+Do an initial fit to the workspace, fixing the signal strength (r) to zero. Save the values of the best-fit pdf indices. Combine requires at least one parameter to fit which you can define with the `--initial-fit-param` option. You can set this to any parameter in the workspace (ideally a nuisance a low impact). The default is a parameter named `lumi_13TeV_uncorrelated_2016`.
+```
+python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --mode setup
+```
+
+### Generate toys
+```
+python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --nToys N --mode generate
+```
+### Fit with the PDF indices fixed
+Extract significance for each toy, where the PDF is fixed to the generator PDF (best-fit from setup step).
+```
+python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --nToys N --mode fixed
+```
+
+### Fit with the PDF indices floating
+Extract significance for each toy, where the PDF index if floating
+```
+python RunBiasInSignificance.py --inputWSFile Datacard.root --MH 125.38 --nToys N --mode envelope
+```
+
+## Plotting the bias in significance results
+```
+python SummaryBiasSignificance.py
+```
diff --git a/Combine/Checks/Bias_in_significance/RunBiasInSignificance.py b/Combine/Checks/Bias_in_significance/RunBiasInSignificance.py
@@ -0,0 +1,80 @@
+import ROOT
+import os
+import glob
+import re
+from optparse import OptionParser
+import subprocess
+import json
+
+def rooiter(x):
+  iter = x.iterator()
+  ret = iter.Next()
+  while ret:
+    yield ret
+    ret = iter.Next()
+
+def get_options():
+    parser = OptionParser()
+    parser.add_option('--inputWSFile', dest='inputWSFile', default='Datacard.root', help="Input workspace")
+    parser.add_option('--MH', dest='MH', default='125.38', help="MH")
+    parser.add_option('--initial-fit-param', dest='initial_fit_param', default='lumi_13TeV_uncorrelated_2016', help="Initial fit parameter (combine must have an input parameter to fit to, pick any low impact nuisance)")
+    parser.add_option('--nToys', dest='nToys', default=2000, type='int', help="Number of toys")
+    parser.add_option('--mode', dest='mode', default="setup", help="[setup,generate,fixed,envelope]")
+    return parser.parse_args()
+(opt,args) = get_options()
+
+if opt.mode == "setup":
+
+    # Get list of pdfindeices
+    f = ROOT.TFile(opt.inputWSFile)
+    w = f.Get("w")
+    cats = w.allCats()
+    pdf_index = []
+    for cat in rooiter(cats):
+        if "pdfindex" in cat.GetName(): pdf_index.append(cat.GetName()) 
+    f.Close()
+
+    # Initial fit fixing params to be zero
+    cmd = "combine -m %s -d %s -M MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,r=0 --freezeParameters MH,r -P %s -n _initial --saveWorkspace --saveSpecifiedIndex %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2; cd .."%(opt.MH,opt.inputWSFile,opt.MH,opt.initial_fit_param,",".join(pdf_index))
+    print(cmd)
+    os.system(cmd)
+
+    # Save best fit pdf indices to json file
+    f_res = ROOT.TFile("higgsCombine_initial.MultiDimFit.mH%s.root"%opt.MH)
+    t = f_res.Get("limit")
+    t.GetEntry(0)
+    pdf_index_bf = {}
+    for index in pdf_index: pdf_index_bf[index] = getattr(t,index)
+    f_res.Close()
+    with open("pdfindex.json","w") as jf:
+        json.dump(pdf_index_bf, jf)
+
+if opt.mode == "generate":
+
+    cmd = "combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M GenerateOnly --setParameters MH=%s --freezeParameters MH --expectSignal 0 -n _toy_ --saveToys --snapshotName MultiDimFit -t %s -s -1\n\n"%(opt.MH,opt.MH,opt.MH,opt.nToys)
+    print(cmd)
+    os.system(cmd)
+
+    cmd = "mv higgsCombine_toy_* toys.root"
+    os.system(cmd)
+
+if opt.mode == "fixed":
+
+    # Get pdf index and the best fit values
+    with open("pdfindex.json", "r") as jf:
+        pdf_index_bf = json.load(jf)
+
+    pdf_index_freeze = ",".join(pdf_index_bf.keys())
+    pdf_index_set = ""
+    for k,v in pdf_index_bf.items(): pdf_index_set += "%s=%s,"%(k,v)
+    pdf_index_set = pdf_index_set[:-1]
+
+    cmd = "combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,%s --expectSignal 0 --freezeParameters MH,%s -n _fixed_ -t %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys.root --X-rtd ADDNLL_RECURSIVE=1; mv higgsCombine_fixed_* fit_fixed.root"%(opt.MH,opt.MH,opt.MH,pdf_index_set,pdf_index_freeze,opt.nToys)
+    print(cmd)
+    os.system(cmd)
+
+if opt.mode == "envelope":
+
+    cmd = "combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s --expectSignal 0 --freezeParameters MH -n _envelope_ -t %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys.root --X-rtd ADDNLL_RECURSIVE=1; mv higgsCombine_envelope_* fit_envelope.root"%(opt.MH,opt.MH,opt.MH,opt.nToys)
+    print(cmd)
+    os.system(cmd)
diff --git a/Combine/Checks/Bias_in_significance/RunBiasInSignificance_hig22012.py b/Combine/Checks/Bias_in_significance/RunBiasInSignificance_hig22012.py
@@ -0,0 +1,193 @@
+import ROOT
+import os
+import glob
+import re
+from optparse import OptionParser
+import subprocess
+import json
+
+def rooiter(x):
+  iter = x.iterator()
+  ret = iter.Next()
+  while ret:
+    yield ret
+    ret = iter.Next()
+
+def write_preamble(_file):
+    _file.write("#!/bin/bash\n")
+    _file.write("ulimit -s unlimited\n")
+    _file.write("set -e\n")
+    _file.write("cd %s/src\n"%os.environ['CMSSW_BASE'])
+    _file.write("export SCRAM_ARCH=%s\n"%os.environ['SCRAM_ARCH'])
+    _file.write("source /cvmfs/cms.cern.ch/cmsset_default.sh\n")
+    _file.write("eval `scramv1 runtime -sh`\n")
+    _file.write("cd %s\n"%os.environ['PWD'])
+
+def get_options():
+    parser = OptionParser()
+    parser.add_option('--MH', dest='MH', default='125.38', help="MH")
+    parser.add_option('--initial-fit-param', dest='initial_fit_param', default='lumi_13TeV_2016', help="Initial fit parameters")
+    parser.add_option('--nToys', dest='nToys', default=2000, type='int', help="Number of toys")
+    parser.add_option('--nJobs', dest='nJobs', default=10, type='int', help="Number of jobs")
+    parser.add_option('--mode', dest='mode', default="setup", help="[setup,generate,fixed,envelope,hadd]")
+    return parser.parse_args()
+(opt,args) = get_options()
+
+if opt.mode == "setup":
+
+    os.system("cp %s/Combine/Datacard/Datacard_ggtt_resonant_mx%smy%s.txt %s"%(opt.inputDir,opt.MX,opt.MY,run_dir))
+
+    if not os.path.isdir("Models%s"%ext_str):
+        os.system("cp -rp %s/Combine/Models ./Models%s"%(opt.inputDir,ext_str))
+
+    # Make stat only datacard and delete pdf indices
+    with open("%s/Datacard_ggtt_resonant_mx%smy%s.txt"%(run_dir,opt.MX,opt.MY), "r") as f:
+        lines = f.readlines()
+
+    new_lines_all = []
+    for line in lines:
+        line = re.sub("\./Models", "%s/Models%s"%(os.environ['PWD'],ext_str), line)
+        new_lines_all.append(line)
+    with open("%s/Datacard.txt"%run_dir, "w") as f:
+        for line in new_lines_all:
+           f.write(line)
+
+    cmd = "cd %s; text2workspace.py Datacard.txt -m %s higgsMassRange=65,150; cd .."%(run_dir,opt.MH)
+    print(cmd)
+    os.system(cmd)
+
+    # Get list of pdfindeices
+    f = ROOT.TFile("%s/Datacard.root"%run_dir)
+    w = f.Get("w")
+    cats = w.allCats()
+    pdf_index = []
+    for cat in rooiter(cats):
+        if "pdfindex" in cat.GetName(): pdf_index.append(cat.GetName()) 
+    f.Close()
+
+    # Initial fit fixing params to be zero
+    cmd = "cd %s; combine -m %s -d Datacard.root -M MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,MY=%s,MX=%s,r=0 --freezeParameters MH,MX,MY,r -P %s -n _initial --setParameterRanges r=0,0.5 --saveWorkspace --saveSpecifiedIndex %s --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2; cd .."%(run_dir,opt.MH,opt.MH,opt.MY,opt.MX,opt.initial_fit_param,",".join(pdf_index))
+    print(cmd)
+    os.system(cmd)
+
+    # Save best fit pdf indices to json file
+    f_res = ROOT.TFile("%s/higgsCombine_initial.MultiDimFit.mH%s.root"%(run_dir,opt.MH))
+    t = f_res.Get("limit")
+    t.GetEntry(0)
+    pdf_index_bf = {}
+    for index in pdf_index: pdf_index_bf[index] = getattr(t,index)
+    f_res.Close()
+    with open("%s/pdfindex.json"%run_dir,"w") as jf:
+        json.dump(pdf_index_bf, jf)
+
+if opt.mode == "generate":
+
+    if not os.path.isdir("toys"):
+        os.system("mkdir -p toys")
+
+    if not os.path.isdir("jobs_toys"):
+        os.system("mkdir -p jobs_toys")
+    else:
+        # Remove previous jobs
+        os.system("rm jobs_toys/sub*")
+
+    for i_job in range( opt.nJobs ):
+        f_sub_name = "jobs_toys/sub_%g.sh"%(i_job)
+        f_sub = open(f_sub_name, "w")
+        write_preamble(f_sub)
+        f_sub.write("combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M GenerateOnly --setParameters MH=%s --freezeParameters MH --expectSignal 0 -n _toy_%g_ --saveToys --snapshotName MultiDimFit -t %s -s -1\n\n"%(opt.MH,opt.MH,opt.MH,i_job,opt.nToys))
+        # Move toy to toys folder
+        f_sub.write("mv higgsCombine_toy_%g_* toys/toy_%g.root"%(i_job,i_job))
+        f_sub.close()
+
+        os.system("chmod 775 %s"%f_sub_name)
+
+        f_out_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".out",f_sub_name))
+        f_err_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".err",f_sub_name))
+        cmd = "qsub -q hep.q -l h_rt=1:0:0 -o %s -e %s %s"%(f_out_name,f_err_name,f_sub_name)
+        print(cmd)
+        os.system(cmd)
+
+if opt.mode == "fixed":
+
+    if not os.path.isdir("fits_fixed"):
+        os.system("mkdir -p fits_fixed")
+
+    if not os.path.isdir("jobs_fits_fixed"):
+        os.system("mkdir -p jobs_fits_fixed")
+    else:
+        # Remove previous jobs
+        os.system("rm jobs_fits_fixed/sub*")
+
+    # Get pdf index and the best fit values
+    with open("pdfindex.json", "r") as jf:
+        pdf_index_bf = json.load(jf)
+
+    pdf_index_save = ",".join(pdf_index_bf.keys())
+    pdf_index_freeze = ",".join(pdf_index_bf.keys())
+    pdf_index_set = ""
+    for k,v in pdf_index_bf.items(): pdf_index_set += "%s=%s,"%(k,v)
+    pdf_index_set = pdf_index_set[:-1]
+
+    for i_job in range( opt.nJobs ):
+        f_sub_name = "jobs_fits_fixed/sub_%g.sh"%(i_job)
+        f_sub = open(f_sub_name, "w")
+        write_preamble(f_sub)
+        f_sub.write("combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s,%s --expectSignal 0 --freezeParameters MH,%s --trackCats %s -n _fixed_%g_ -t %s --setParameterRanges r=0,50 --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys/toy_%g.root --X-rtd ADDNLL_RECURSIVE=1\n\n"%(opt.MH,opt.MH,opt.MH,pdf_index_set,pdf_index_freeze,pdf_index_save,i_job,opt.nToys,i_job))
+        # Move toy to toys folder
+        f_sub.write("mv higgsCombine_fixed_%g_* fits_fixed/fit_fixed_%g.root"%(i_job,i_job))
+        f_sub.close()
+
+        os.system("chmod 775 %s"%f_sub_name)
+
+        f_out_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".out",f_sub_name))
+        f_err_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".err",f_sub_name))
+        cmd = "qsub -q hep.q -l h_rt=2:0:0 -o %s -e %s %s"%(f_out_name,f_err_name,f_sub_name)
+        print(cmd)
+        os.system(cmd)
+
+if opt.mode == "envelope":
+
+    if not os.path.isdir("fits_envelope"):
+        os.system("mkdir -p fits_envelope")
+
+    if not os.path.isdir("jobs_fits_envelope"):
+        os.system("mkdir -p jobs_fits_envelope")
+    else:
+        # Remove previous jobs
+        os.system("rm jobs_fits_envelope/sub*")
+
+    # Get pdf index and the best fit values
+    with open("pdfindex.json", "r") as jf:
+        pdf_index_bf = json.load(jf)
+
+    pdf_index_save = ",".join(pdf_index_bf.keys())
+
+    for i_job in range( opt.nJobs ):
+        f_sub_name = "jobs_fits_envelope/sub_%g.sh"%(i_job)
+        f_sub = open(f_sub_name, "w")
+        write_preamble(f_sub)
+        f_sub.write("combine -m %s -d higgsCombine_initial.MultiDimFit.mH%s.root -M Significance --snapshotName MultiDimFit --cminDefaultMinimizerStrategy 0 --setParameters MH=%s --expectSignal 0 --freezeParameters MH --trackCats %s -n _envelope_%g_ -t %s --setParameterRanges r=0,50 --X-rtd MINIMIZER_freezeDisassociatedParams --X-rtd MINIMIZER_multiMin_hideConstants --X-rtd MINIMIZER_multiMin_maskConstraints --X-rtd MINIMIZER_multiMin_maskChannels=2 --toysFile toys/toy_%g.root --X-rtd ADDNLL_RECURSIVE=1\n\n"%(opt.MH,opt.MH,opt.MH,pdf_index_save,i_job,opt.nToys,i_job))
+        # Move toy to toys folder
+        f_sub.write("mv higgsCombine_envelope_%g_* fits_envelope/fit_envelope_%g.root"%(i_job,i_job))
+        f_sub.close()
+
+        os.system("chmod 775 %s"%f_sub_name)
+
+        f_out_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".out",f_sub_name))
+        f_err_name = "%s/%s"%(os.environ['PWD'],re.sub("\.sh",".err",f_sub_name))
+        cmd = "qsub -q hep.q -l h_rt=2:0:0 -o %s -e %s %s"%(f_out_name,f_err_name,f_sub_name)
+        print(cmd)
+        os.system(cmd)
+
+
+
+if opt.mode == "hadd":
+
+    cmd = "hadd -f fit_fixed.root fits_fixed/fit_fixed_*.root"
+    print(cmd)
+    os.system(cmd)
+
+    cmd = "hadd -f fit_envelope.root fits_envelope/fit_envelope_*.root"
+    print(cmd)
+    os.system(cmd)