diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/combined_mPhi_1500_2018.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/combined_mPhi_1500_2018.txt new file mode 100644 index 00000000000..a2ec87cf790 --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/combined_mPhi_1500_2018.txt @@ -0,0 +1,28 @@ +Combination of mPhi_1500_Catany_2018_CR_B.txt mPhi_1500_Catany_2018_CR_C.txt mPhi_1500_Catany_2018_CR_D.txt mPhi_1500_Catany_2018_SR.txt +imax 4 number of bins +jmax 1 number of processes minus 1 +kmax 2 number of nuisance parameters +---------------------------------------------------------------------------------------------------------------------------------- +shapes Bkg ch1 param_ws.root wspace:bkg_B +shapes data_obs ch1 param_ws.root wspace:data_obs_B +shapes mPhi_1500 ch1 param_ws.root wspace:mPhi_1500_B +shapes Bkg ch2 param_ws.root wspace:bkg_C +shapes data_obs ch2 param_ws.root wspace:data_obs_C +shapes mPhi_1500 ch2 param_ws.root wspace:mPhi_1500_C +shapes Bkg ch3 param_ws.root wspace:bkg_D +shapes data_obs ch3 param_ws.root wspace:data_obs_D +shapes mPhi_1500 ch3 param_ws.root wspace:mPhi_1500_D +shapes Bkg ch4 param_ws.root wspace:bkg_A +shapes data_obs ch4 param_ws.root wspace:data_obs_A +shapes mPhi_1500 ch4 param_ws.root wspace:mPhi_1500_A +---------------------------------------------------------------------------------------------------------------------------------- +bin ch1 ch2 ch3 ch4 +observation -1 -1 -1 -1 +---------------------------------------------------------------------------------------------------------------------------------- +bin ch1 ch1 ch2 ch2 ch3 ch3 ch4 ch4 +process mPhi_1500 Bkg mPhi_1500 Bkg mPhi_1500 Bkg mPhi_1500 Bkg +process 0 1 0 1 0 1 0 1 +rate -1 1 -1 1 -1 1 -1 1 +---------------------------------------------------------------------------------------------------------------------------------- +BkgRate lnN - - - - - - - 1.05 +lumi lnN 1.016 - 1.016 - 1.016 - 1.016 - diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_B.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_B.txt new file mode 100644 index 00000000000..181d2017626 --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_B.txt @@ -0,0 +1,17 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs B param_ws.root wspace:data_obs_B +shapes Bkg B param_ws.root wspace:bkg_B +shapes mPhi_1500 B param_ws.root wspace:mPhi_1500_B +----------------------------------------------------------------------------------- +bin B +observation -1 +----------------------------------------------------------------------------------- +bin B B +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_C.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_C.txt new file mode 100644 index 00000000000..3eb6d9d699b --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_C.txt @@ -0,0 +1,17 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs C param_ws.root wspace:data_obs_C +shapes Bkg C param_ws.root wspace:bkg_C +shapes mPhi_1500 C param_ws.root wspace:mPhi_1500_C +----------------------------------------------------------------------------------- +bin C +observation -1 +----------------------------------------------------------------------------------- +bin C C +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_D.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_D.txt new file mode 100644 index 00000000000..06ebc9cdace --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_D.txt @@ -0,0 +1,17 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs D param_ws.root wspace:data_obs_D +shapes Bkg D param_ws.root wspace:bkg_D +shapes mPhi_1500 D param_ws.root wspace:mPhi_1500_D +----------------------------------------------------------------------------------- +bin D +observation -1 +----------------------------------------------------------------------------------- +bin D D +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_SR.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_SR.txt new file mode 100644 index 00000000000..eafebec2bde --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_SR.txt @@ -0,0 +1,18 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs A param_ws.root wspace:data_obs_A +shapes Bkg A param_ws.root wspace:bkg_A +shapes mPhi_1500 A param_ws.root wspace:mPhi_1500_A +----------------------------------------------------------------------------------- +bin A +observation -1 +----------------------------------------------------------------------------------- +bin A A +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 +BkgRate lnN 1.05 -- diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/param_ws.root b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/param_ws.root new file mode 100644 index 00000000000..6b110f0c60f Binary files /dev/null and b/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/mPhi1500/param_ws.root differ diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/combined_mPhi_1500_2018.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/combined_mPhi_1500_2018.txt new file mode 100644 index 00000000000..a2ec87cf790 --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/combined_mPhi_1500_2018.txt @@ -0,0 +1,28 @@ +Combination of mPhi_1500_Catany_2018_CR_B.txt mPhi_1500_Catany_2018_CR_C.txt mPhi_1500_Catany_2018_CR_D.txt mPhi_1500_Catany_2018_SR.txt +imax 4 number of bins +jmax 1 number of processes minus 1 +kmax 2 number of nuisance parameters +---------------------------------------------------------------------------------------------------------------------------------- +shapes Bkg ch1 param_ws.root wspace:bkg_B +shapes data_obs ch1 param_ws.root wspace:data_obs_B +shapes mPhi_1500 ch1 param_ws.root wspace:mPhi_1500_B +shapes Bkg ch2 param_ws.root wspace:bkg_C +shapes data_obs ch2 param_ws.root wspace:data_obs_C +shapes mPhi_1500 ch2 param_ws.root wspace:mPhi_1500_C +shapes Bkg ch3 param_ws.root wspace:bkg_D +shapes data_obs ch3 param_ws.root wspace:data_obs_D +shapes mPhi_1500 ch3 param_ws.root wspace:mPhi_1500_D +shapes Bkg ch4 param_ws.root wspace:bkg_A +shapes data_obs ch4 param_ws.root wspace:data_obs_A +shapes mPhi_1500 ch4 param_ws.root wspace:mPhi_1500_A +---------------------------------------------------------------------------------------------------------------------------------- +bin ch1 ch2 ch3 ch4 +observation -1 -1 -1 -1 +---------------------------------------------------------------------------------------------------------------------------------- +bin ch1 ch1 ch2 ch2 ch3 ch3 ch4 ch4 +process mPhi_1500 Bkg mPhi_1500 Bkg mPhi_1500 Bkg mPhi_1500 Bkg +process 0 1 0 1 0 1 0 1 +rate -1 1 -1 1 -1 1 -1 1 +---------------------------------------------------------------------------------------------------------------------------------- +BkgRate lnN - - - - - - - 1.05 +lumi lnN 1.016 - 1.016 - 1.016 - 1.016 - diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_B.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_B.txt new file mode 100644 index 00000000000..181d2017626 --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_B.txt @@ -0,0 +1,17 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs B param_ws.root wspace:data_obs_B +shapes Bkg B param_ws.root wspace:bkg_B +shapes mPhi_1500 B param_ws.root wspace:mPhi_1500_B +----------------------------------------------------------------------------------- +bin B +observation -1 +----------------------------------------------------------------------------------- +bin B B +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_C.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_C.txt new file mode 100644 index 00000000000..3eb6d9d699b --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_C.txt @@ -0,0 +1,17 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs C param_ws.root wspace:data_obs_C +shapes Bkg C param_ws.root wspace:bkg_C +shapes mPhi_1500 C param_ws.root wspace:mPhi_1500_C +----------------------------------------------------------------------------------- +bin C +observation -1 +----------------------------------------------------------------------------------- +bin C C +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_D.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_D.txt new file mode 100644 index 00000000000..06ebc9cdace --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_CR_D.txt @@ -0,0 +1,17 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs D param_ws.root wspace:data_obs_D +shapes Bkg D param_ws.root wspace:bkg_D +shapes mPhi_1500 D param_ws.root wspace:mPhi_1500_D +----------------------------------------------------------------------------------- +bin D +observation -1 +----------------------------------------------------------------------------------- +bin D D +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_SR.txt b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_SR.txt new file mode 100644 index 00000000000..eafebec2bde --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/mPhi_1500_Catany_2018_SR.txt @@ -0,0 +1,18 @@ +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs A param_ws.root wspace:data_obs_A +shapes Bkg A param_ws.root wspace:bkg_A +shapes mPhi_1500 A param_ws.root wspace:mPhi_1500_A +----------------------------------------------------------------------------------- +bin A +observation -1 +----------------------------------------------------------------------------------- +bin A A +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN -- 1.016 +BkgRate lnN 1.05 -- diff --git a/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/param_ws.root b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/param_ws.root new file mode 100644 index 00000000000..aede24da824 Binary files /dev/null and b/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/param_ws.root differ diff --git a/data/tutorials/abcd_rooparametrichist_exercise/docs/inputs.png b/data/tutorials/abcd_rooparametrichist_exercise/docs/inputs.png new file mode 100644 index 00000000000..6c1561a20da Binary files /dev/null and b/data/tutorials/abcd_rooparametrichist_exercise/docs/inputs.png differ diff --git a/data/tutorials/abcd_rooparametrichist_exercise/docs/limits.png b/data/tutorials/abcd_rooparametrichist_exercise/docs/limits.png new file mode 100644 index 00000000000..ef95eec48cf Binary files /dev/null and b/data/tutorials/abcd_rooparametrichist_exercise/docs/limits.png differ diff --git a/data/tutorials/abcd_rooparametrichist_exercise/docs/post_fit_plots_A.png b/data/tutorials/abcd_rooparametrichist_exercise/docs/post_fit_plots_A.png new file mode 100644 index 00000000000..62192355c0c Binary files /dev/null and b/data/tutorials/abcd_rooparametrichist_exercise/docs/post_fit_plots_A.png differ diff --git a/data/tutorials/abcd_rooparametrichist_exercise/utils/create_datacards.py b/data/tutorials/abcd_rooparametrichist_exercise/utils/create_datacards.py new file mode 100644 index 00000000000..8547a39da71 --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/utils/create_datacards.py @@ -0,0 +1,159 @@ +## Create workspace storing information with RooParametricHist + +import ROOT +import os +import optparse + +# add arguments +parser = optparse.OptionParser(description="Option parser") +parser.add_option("-m", "--mass", dest="mass", help="Input mass point", default=1500, type=int) +parser.add_option( + "--deplete_crs_from_signal", dest="deplete_crs_from_signal", help="Deplete the control regions from signal", action="store_true", default=False +) +(opt, args) = parser.parse_args() + + +# main code starting here +def main(): + + print("Creating datacards for the analysis with mass point: ", opt.mass) + + # depletion of control regions from signal + deplete_str = "" + if opt.deplete_crs_from_signal: + print("Depleting control regions from signal") + deplete_str = "_depletedCRs" + else: + print("No depletion of control regions from signal") + deplete_str = "" + + # get current directory + current_directory = os.getcwd() + card_output_directory = current_directory + "/example_analysis%s/" % deplete_str + "datacards/" + "mPhi%s" % int(opt.mass) + "/" + # check if output directory exists, if not exit with error + if not os.path.exists(card_output_directory): + print("Error: Directory does not exist. Please run create_workspace.py first.") + exit() + + signal = "mPhi_%s" % int(opt.mass) + + # OOpen workspace + workfile_name = "param_ws.root" + wspace_name = "wspace" + + # systematics applied + lumi_sys = 1.016 + non_closure_sys = 1.05 + + # Adding header datacards + cardSR_A = "imax * number of bins \n" + cardSR_A += "jmax * number of processes minus 1 \n" + cardSR_A += "kmax * number of nuisance parameters\n" + cardSR_A += "-----------------------------------------------------------------------------------\n" + + cardCR_B = "imax * number of bins \n" + cardCR_B += "jmax * number of processes minus 1 \n" + cardCR_B += "kmax * number of nuisance parameters\n" + cardCR_B += "-----------------------------------------------------------------------------------\n" + + cardCR_C = "imax * number of bins \n" + cardCR_C += "jmax * number of processes minus 1 \n" + cardCR_C += "kmax * number of nuisance parameters\n" + cardCR_C += "-----------------------------------------------------------------------------------\n" + + cardCR_D = "imax * number of bins \n" + cardCR_D += "jmax * number of processes minus 1 \n" + cardCR_D += "kmax * number of nuisance parameters\n" + cardCR_D += "-----------------------------------------------------------------------------------\n" + + # Adding shapes section + cardSR_A += "shapes %s %s %s %s\n" % ("data_obs", "A", workfile_name, wspace_name + ":data_obs_A") + cardSR_A += "shapes %s %s %s %s\n" % ("Bkg", "A", workfile_name, wspace_name + ":bkg_A") + cardSR_A += "shapes %s %s %s %s\n" % (signal, "A", workfile_name, wspace_name + ":" + signal + "_A") + cardSR_A += "-----------------------------------------------------------------------------------\n" + + cardCR_B += "shapes %s %s %s %s\n" % ("data_obs", "B", workfile_name, wspace_name + ":data_obs_B") + cardCR_B += "shapes %s %s %s %s\n" % ("Bkg", "B", workfile_name, wspace_name + ":bkg_B") + cardCR_B += "shapes %s %s %s %s\n" % (signal, "B", workfile_name, wspace_name + ":" + signal + "_B") + cardCR_B += "-----------------------------------------------------------------------------------\n" + + cardCR_C += "shapes %s %s %s %s\n" % ("data_obs", "C", workfile_name, wspace_name + ":data_obs_C") + cardCR_C += "shapes %s %s %s %s\n" % ("Bkg", "C", workfile_name, wspace_name + ":bkg_C") + cardCR_C += "shapes %s %s %s %s\n" % (signal, "C", workfile_name, wspace_name + ":" + signal + "_C") + cardCR_C += "-----------------------------------------------------------------------------------\n" + + cardCR_D += "shapes %s %s %s %s\n" % ("data_obs", "D", workfile_name, wspace_name + ":data_obs_D") + cardCR_D += "shapes %s %s %s %s\n" % ("Bkg", "D", workfile_name, wspace_name + ":bkg_D") + cardCR_D += "shapes %s %s %s %s\n" % (signal, "D", workfile_name, wspace_name + ":" + signal + "_D") + cardCR_D += "-----------------------------------------------------------------------------------\n" + + # Adding bin section + cardSR_A += "bin %s\n" % "A" + cardSR_A += "observation %s\n" % ("-1") + cardSR_A += "-----------------------------------------------------------------------------------\n" + cardSR_A += "bin %-43s %-43s\n" % ("A", "A") + cardSR_A += "process %-43s %-43s\n" % ("Bkg", signal) + cardSR_A += "process %-43s %-43s\n" % ("1", "0") + cardSR_A += "rate %-43s %-43s\n" % ("1", "-1") + cardSR_A += "-----------------------------------------------------------------------------------\n" + + cardCR_B += "bin %s\n" % "B" + cardCR_B += "observation %s\n" % ("-1") + cardCR_B += "-----------------------------------------------------------------------------------\n" + cardCR_B += "bin %-43s %-43s\n" % ("B", "B") + cardCR_B += "process %-43s %-43s\n" % ("Bkg", signal) + cardCR_B += "process %-43s %-43s\n" % ("1", "0") + cardCR_B += "rate %-43s %-43s\n" % ("1", "-1") + cardCR_B += "-----------------------------------------------------------------------------------\n" + + cardCR_C += "bin %s\n" % "C" + cardCR_C += "observation %s\n" % ("-1") + cardCR_C += "-----------------------------------------------------------------------------------\n" + cardCR_C += "bin %-43s %-43s\n" % ("C", "C") + cardCR_C += "process %-43s %-43s\n" % ("Bkg", signal) + cardCR_C += "process %-43s %-43s\n" % ("1", "0") + cardCR_C += "rate %-43s %-43s\n" % ("1", "-1") + cardCR_C += "-----------------------------------------------------------------------------------\n" + + cardCR_D += "bin %s\n" % "D" + cardCR_D += "observation %s\n" % ("-1") + cardCR_D += "-----------------------------------------------------------------------------------\n" + cardCR_D += "bin %-43s %-43s\n" % ("D", "D") + cardCR_D += "process %-43s %-43s\n" % ("Bkg", signal) + cardCR_D += "process %-43s %-43s\n" % ("1", "0") + cardCR_D += "rate %-43s %-43s\n" % ("1", "-1") + cardCR_D += "-----------------------------------------------------------------------------------\n" + + # add systematics section + cardSR_A += "lumi lnN %-43s %-43s\n" % ("--", lumi_sys) + cardSR_A += "BkgRate lnN %-43s %-43s\n" % (non_closure_sys, "--") + + cardCR_B += "lumi lnN %-43s %-43s\n" % ("--", lumi_sys) + + cardCR_C += "lumi lnN %-43s %-43s\n" % ("--", lumi_sys) + + cardCR_D += "lumi lnN %-43s %-43s\n" % ("--", lumi_sys) + + # write datacards in output directory + cardSR_A_file = open(card_output_directory + signal + "_Catany_2018_SR.txt", "w") + cardSR_A_file.write(cardSR_A) + cardSR_A_file.close() + + cardCR_B_file = open(card_output_directory + signal + "_Catany_2018_CR_B.txt", "w") + cardCR_B_file.write(cardCR_B) + cardCR_B_file.close() + + cardCR_C_file = open(card_output_directory + signal + "_Catany_2018_CR_C.txt", "w") + cardCR_C_file.write(cardCR_C) + cardCR_C_file.close() + + cardCR_D_file = open(card_output_directory + signal + "_Catany_2018_CR_D.txt", "w") + cardCR_D_file.write(cardCR_D) + cardCR_D_file.close() + + print("Datacards created in directory: " + card_output_directory) + + +if __name__ == "__main__": + + main() diff --git a/data/tutorials/abcd_rooparametrichist_exercise/utils/create_workspace.py b/data/tutorials/abcd_rooparametrichist_exercise/utils/create_workspace.py new file mode 100644 index 00000000000..30acc7afa15 --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/utils/create_workspace.py @@ -0,0 +1,259 @@ +## Create workspace storing information with RooParametricHist + +import ROOT +from ROOT import RooRealVar, RooDataHist, RooArgList, RooWorkspace, RooParametricHist, RooFit, RooAddition, RooFormulaVar +import os +import optparse + +# add arguments +parser = optparse.OptionParser(description="Option parser") +parser.add_option("-m", "--mass", dest="mass", help="Input mass point", default=1500, type=int) +parser.add_option( + "--deplete_crs_from_signal", dest="deplete_crs_from_signal", help="Deplete the control regions from signal", action="store_true", default=False +) +(opt, args) = parser.parse_args() + + +# Get histograms from input file for a given process and all regions +def __get_histograms_regions(process, input_file): + hist_nameA = "A/" + "h_" + process + "_A" + hist_nameB = "B/" + "h_" + process + "_B" + hist_nameC = "C/" + "h_" + process + "_C" + hist_nameD = "D/" + "h_" + process + "_D" + print("Reading histogram: ", hist_nameA) + print("Reading histogram: ", hist_nameB) + print("Reading histogram: ", hist_nameC) + print("Reading histogram: ", hist_nameD) + histA = input_file.Get(hist_nameA) + histA.SetDirectory(0) + histB = input_file.Get(hist_nameB) + histB.SetDirectory(0) + histC = input_file.Get(hist_nameC) + histC.SetDirectory(0) + histD = input_file.Get(hist_nameD) + histD.SetDirectory(0) + + return histA, histB, histC, histD + + +# main code starting here +def main(): + + print("Creating workspace for the analysis with mass point: ", opt.mass) + + # depletion of control regions from signal + deplete_str = "" + if opt.deplete_crs_from_signal: + print("Depleting control regions from signal") + deplete_str = "_depletedCRs" + else: + print("No depletion of control regions from signal") + deplete_str = "" + + # get current directory + current_directory = os.getcwd() + card_output_directory = current_directory + "/example_analysis%s/" % deplete_str + "datacards/" + "mPhi%s" % int(opt.mass) + "/" + # check if output directory exists, if not create it + if not os.path.exists(card_output_directory): + os.makedirs(card_output_directory) + + try: + input_file_bkg = ROOT.TFile.Open(current_directory + "/generated_histograms/background.root") + except IOError: + print("Error: Background file does not exist. Please generate first histogram for background.") + exit() + + try: + input_file_sgn = ROOT.TFile.Open(current_directory + "/generated_histograms/mPhi_%s.root" % int(opt.mass)) + except IOError: + print("Error: Signal file does not exist. Please generate first histogram for mass point %s." % int(opt.mass)) + exit() + + signal = "mPhi_%s" % int(opt.mass) + + # Output file and workspace + # Here we create a TFile where to store the workspace + output_file_ws = ROOT.TFile(card_output_directory + "param_ws.root", "RECREATE") + ws = RooWorkspace("wspace", "wspace") + + # Define a RooRealVar for the observable z to fit + variable_z = RooRealVar("z", "z", 200, 14000, "GeV") + + # Getting histograms for observed data saved in input ROOT file as TH1F for all the regions + histA_obs, histB_obs, histC_obs, histD_obs = __get_histograms_regions("bkg", input_file_bkg) + + # Save TH1F histograms for data in RooDataHist for all the regions. + # RooDataHist can be initialized using a RooArgList with the observable to use, the TH1F for a given region and weight=1 + histData_A = RooDataHist("data_obs_A", "Obs Data region A", RooArgList(variable_z), histA_obs, 1.0) + histData_B = RooDataHist("data_obs_B", "Obs Data region B", RooArgList(variable_z), histB_obs, 1.0) + histData_C = RooDataHist("data_obs_C", "Obs Data region C", RooArgList(variable_z), histC_obs, 1.0) + histData_D = RooDataHist("data_obs_D", "Obs Data region D", RooArgList(variable_z), histD_obs, 1.0) + + # Import data in workspace + getattr(ws, "import")(histData_A, RooFit.Rename("data_obs_A")) + getattr(ws, "import")(histData_B, RooFit.Rename("data_obs_B")) + getattr(ws, "import")(histData_C, RooFit.Rename("data_obs_C")) + getattr(ws, "import")(histData_D, RooFit.Rename("data_obs_D")) + + # Save the signals in RooDataHist + histA_sgn, histB_sgn, histC_sgn, histD_sgn = __get_histograms_regions("sgn_" + signal, input_file_sgn) + + # if depletion of control regions from signal is requested, we deplete the signal histograms in the control regions + if opt.deplete_crs_from_signal: + histB_sgn_scaled = histB_sgn.Clone() + histB_sgn_scaled.Scale(0.99) + histB_sgn.Add(histB_sgn_scaled, -1) + + histC_sgn_scaled = histC_sgn.Clone() + histC_sgn_scaled.Scale(0.99) + histC_sgn.Add(histC_sgn_scaled, -1) + + histD_sgn_scaled = histD_sgn.Clone() + histD_sgn_scaled.Scale(0.99) + histD_sgn.Add(histD_sgn_scaled, -1) + + histSgn_A = RooDataHist(signal + "_A", "Sgn Data region A", RooArgList(variable_z), histA_sgn, 1.0) + histSgn_B = RooDataHist(signal + "_B", "Sgn Data region B", RooArgList(variable_z), histB_sgn, 1.0) + histSgn_C = RooDataHist(signal + "_C", "Sgn Data region C", RooArgList(variable_z), histC_sgn, 1.0) + histSgn_D = RooDataHist(signal + "_D", "Sgn Data region D", RooArgList(variable_z), histD_sgn, 1.0) + + # Import signals in workspace + getattr(ws, "import")(histSgn_A, RooFit.Rename(signal + "_A")) + getattr(ws, "import")(histSgn_B, RooFit.Rename(signal + "_B")) + getattr(ws, "import")(histSgn_C, RooFit.Rename(signal + "_C")) + getattr(ws, "import")(histSgn_D, RooFit.Rename(signal + "_D")) + + # Here we define the background histograms from "data" (which in our case is equal to the background) + # They will be used to build the parametric histograms we use for modelling the background + histA_pr, histB_pr, histC_pr, histD_pr = __get_histograms_regions("bkg", input_file_bkg) + + # Save in RooArgList the content of the bins of histB_pr to define the RooParametricHist for the B region + process_B_region_bins = RooArgList() + process_B_region_bins_list = [] + + # Save in RooArgList the content of the bins of histC_pr to define the RooParametricHist for the C region + process_C_region_bins = RooArgList() + process_C_region_bins_list = [] + + # Save in RooArgList the content of the bins of histD_pr to define the RooParametricHist for the D region + process_D_region_bins = RooArgList() + process_D_region_bins_list = [] + + # Add yields for each bin for the RooParametricHist in the B Region + # each bin is defined as a RooRealVar initialized at the nominal bin content, and with a range between 0 and 2 times the nominal rate + for i in range(1, histB_obs.GetNbinsX() + 1): + bin_B_i = RooRealVar( + "Bkg_B_region_bin_" + str(i), + "Background yield in control region B bin " + str(i), + histB_obs.GetBinContent(i), + 0.0, + 2.0 * histB_obs.GetBinContent(i), + ) + process_B_region_bins_list.append(bin_B_i) + process_B_region_bins.add(process_B_region_bins_list[i - 1]) + + # Add yields for each bin for the RooParametricHist in the C Region + # each bin is defined as a RooRealVar initialized at the nominal bin content, and with a range between 0 and 2 times the nominal rate + for i in range(1, histC_obs.GetNbinsX() + 1): + bin_C_i = RooRealVar( + "Bkg_C_region_bin_" + str(i), + "Background yield in control region C bin " + str(i), + histC_obs.GetBinContent(i), + 0.0, + 2.0 * histC_obs.GetBinContent(i), + ) + process_C_region_bins_list.append(bin_C_i) + process_C_region_bins.add(process_C_region_bins_list[i - 1]) + + # Add yields for each bin for the RooParametricHist in the D Region + # each bin is defined as a RooRealVar initialized at the nominal bin content, and with a range between 0 and 2 times the nominal rate + for i in range(1, histD_obs.GetNbinsX() + 1): + bin_D_i = RooRealVar( + "Bkg_D_region_bin_" + str(i), + "Background yield in control region D bin " + str(i), + histD_obs.GetBinContent(i), + 0.0, + 2.0 * histD_obs.GetBinContent(i), + ) + process_D_region_bins_list.append(bin_D_i) + process_D_region_bins.add(process_D_region_bins_list[i - 1]) + + # Define the parametric histogram for control region B. + # Here we consider the B region to be the transfering region, so the region for which each bin content will be multiplied by a transfer factor (determined by C, D yields) + # The RooParametricHist is initalized giving as input the observable, the RooArgList of the bins previously built and a template TH1F. + param_hist_B_region = RooParametricHist("bkg_B", "Background PDF in B region", variable_z, process_B_region_bins, histB_pr) + + # Here we define the total normalization for the RooparametricHist in the B region + param_Bkg_B_norm = RooAddition("bkg_B" + "_norm", "Total Number of events from background in control region B", process_B_region_bins) + + # Here we import the the parametric histogram and the normalization in our workspace + getattr(ws, "import")(param_hist_B_region, RooFit.Rename("bkg_B")) + getattr(ws, "import")(param_Bkg_B_norm, RooFit.Rename("bkg_B" + "_norm"), RooFit.RecycleConflictNodes()) + + # Define the parametric histogram for control region C. + param_hist_C_region = RooParametricHist("bkg_C", "Background PDF in C region", variable_z, process_C_region_bins, histC_pr) + + # Here we define the total normalization for the RooparametricHist in the C region + param_Bkg_C_norm = RooAddition("bkg_C" + "_norm", "Total Number of events from background in control region C", process_C_region_bins) + + # Here we import the the parametric histogram and the normalization in our workspace + getattr(ws, "import")(param_hist_C_region, RooFit.Rename("bkg_C")) + getattr(ws, "import")(param_Bkg_C_norm, RooFit.Rename("bkg_C" + "_norm"), RooFit.RecycleConflictNodes()) + + # Define the parametric histogram for control region D. + param_hist_D_region = RooParametricHist("bkg_D", "Background PDF in D region", variable_z, process_D_region_bins, histD_pr) + + # Here we define the total normalization for the RooparametricHist in the D region + param_Bkg_D_norm = RooAddition("bkg_D" + "_norm", "Total Number of events from background in control region D", process_D_region_bins) + + # Here we import the the parametric histogram and the normalization in our workspace + getattr(ws, "import")(param_hist_D_region, RooFit.Rename("bkg_D")) + getattr(ws, "import")(param_Bkg_D_norm, RooFit.Rename("bkg_D" + "_norm"), RooFit.RecycleConflictNodes()) + + # Relate the signal region (A) to control region B via transfer factors + # Define RooArgList for expected yields in region A after applying transfer factors + process_AB_region_bins = RooArgList() + # Define list for transfer factors + TF_list = [] + # Define list for expected yields in region A after applying transfer factors + process_AB_region_bins_list = [] + + # Compute per-bin transfer factor + # Loop over the bins of the transfering region B, and compute the transfer factors as C/D + for i in range(1, histB_pr.GetNbinsX() + 1): + # Define transfer factor as a RooFormulaVar. Use the method .obj for RooWorkSpace to retrieve the yield for a given bin and region + TF_i = RooFormulaVar( + "TF" + str(i), + "Transfer factor C/D bin " + str(i), + "(@0/@1)", + RooArgList(ws.obj("Bkg_C_region_bin_" + str(i)), ws.obj("Bkg_D_region_bin_" + str(i))), + ) + TF_list.append(TF_i) + # Compute the expected yield in the signal region as A = B * TF. This will be used to initialise the RooParametricHist in the Signal Region + bin_AB_i = RooFormulaVar( + "Bkg_AB_region_bin_" + str(i), "Background yield in SR A region bin " + str(i), "@0*@1", RooArgList(TF_i, ws.obj("Bkg_B_region_bin_" + str(i))) + ) + process_AB_region_bins_list.append(bin_AB_i) + process_AB_region_bins.add(process_AB_region_bins_list[i - 1]) + + # Create parametric histogram for signal region (A) using bin contents saved in process_AB_region_bins + param_hist_A_region = RooParametricHist("bkg_A", "Background PDF in A region", variable_z, process_AB_region_bins, histB_pr) + + # Define total normalization parameter + param_bkg_A_norm = RooAddition("bkg_A" + "_norm", "Total Number of events from background in A region", process_AB_region_bins) + + # Store in Workspace + getattr(ws, "import")(param_hist_A_region, RooFit.Rename("bkg_A")) + getattr(ws, "import")(param_bkg_A_norm, RooFit.Rename("bkg_A" + "_norm"), RooFit.RecycleConflictNodes()) + + # Save workspace in output file + output_file_ws.cd() + ws.Write() + output_file_ws.Close() + + print("Workspace saved in: ", card_output_directory + "param_ws.root") + + +if __name__ == "__main__": + + main() diff --git a/data/tutorials/abcd_rooparametrichist_exercise/utils/produce_input_histograms_and_analyse.py b/data/tutorials/abcd_rooparametrichist_exercise/utils/produce_input_histograms_and_analyse.py new file mode 100644 index 00000000000..3ad41d2975e --- /dev/null +++ b/data/tutorials/abcd_rooparametrichist_exercise/utils/produce_input_histograms_and_analyse.py @@ -0,0 +1,473 @@ +## ABCD with RooParametricHist Tutorial +## Input data preparation +##As a first step we want to produce some toy input data. We imagine some BSM particle () decaying in a t-channel topology with an hypothetical mass between 1500 and 5000 GeV. +##After applying some selections, we are left with the following expected yields for the different signal points and background. + +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +import ROOT +from scipy.stats import kde + +ROOT.TH1.SetDefaultSumw2() +ROOT.gStyle.SetOptStat(0) +ROOT.gStyle.SetOptStat(0) +ROOT.gStyle.SetOptTitle(False) +ROOT.gROOT.SetBatch(ROOT.kTRUE) +from array import array +import os + + +# Saving the distributions in ROOT files +def save_histograms_in_root_files(bkg_A, bkg_B, bkg_C, bkg_D, new_bin_edges, signals_expected_rates, signals_A, signals_B, signals_C, signals_D, out_dir): + + # use FillN to fill the histograms with weights + h_A_bkg = ROOT.TH1F("h_bkg_A", "h_bkg_A", len(new_bin_edges) - 1, array("d", new_bin_edges)) + h_B_bkg = ROOT.TH1F("h_bkg_B", "h_bkg_B", len(new_bin_edges) - 1, array("d", new_bin_edges)) + h_C_bkg = ROOT.TH1F("h_bkg_C", "h_bkg_C", len(new_bin_edges) - 1, array("d", new_bin_edges)) + h_D_bkg = ROOT.TH1F("h_bkg_D", "h_bkg_D", len(new_bin_edges) - 1, array("d", new_bin_edges)) + + # use FillN to fill the histograms (for now set weights to 1) + h_A_bkg.FillN(len(bkg_A["z"]), bkg_A["z"].values, bkg_A["w"].values) + h_B_bkg.FillN(len(bkg_B["z"]), bkg_B["z"].values, bkg_B["w"].values) + h_C_bkg.FillN(len(bkg_C["z"]), bkg_C["z"].values, bkg_C["w"].values) + h_D_bkg.FillN(len(bkg_D["z"]), bkg_D["z"].values, bkg_D["w"].values) + + # save histograms for signals + h_A_signals = {} + h_B_signals = {} + h_C_signals = {} + h_D_signals = {} + + for signal, rate in signals_expected_rates.items(): + h_A_signals[signal] = ROOT.TH1F(f"h_sgn_{signal}_A", f"h_sgn_{signal}_A", len(new_bin_edges) - 1, array("d", new_bin_edges)) + h_B_signals[signal] = ROOT.TH1F(f"h_sgn_{signal}_B", f"h_sgn_{signal}_B", len(new_bin_edges) - 1, array("d", new_bin_edges)) + h_C_signals[signal] = ROOT.TH1F(f"h_sgn_{signal}_C", f"h_sgn_{signal}_C", len(new_bin_edges) - 1, array("d", new_bin_edges)) + h_D_signals[signal] = ROOT.TH1F(f"h_sgn_{signal}_D", f"h_sgn_{signal}_D", len(new_bin_edges) - 1, array("d", new_bin_edges)) + + h_A_signals[signal].FillN(len(signals_A[signal]["z"]), signals_A[signal]["z"].values, signals_A[signal]["w"].values) + h_B_signals[signal].FillN(len(signals_B[signal]["z"]), signals_B[signal]["z"].values, signals_B[signal]["w"].values) + h_C_signals[signal].FillN(len(signals_C[signal]["z"]), signals_C[signal]["z"].values, signals_C[signal]["w"].values) + h_D_signals[signal].FillN(len(signals_D[signal]["z"]), signals_D[signal]["z"].values, signals_D[signal]["w"].values) + + # save histograms in a root file (create directory for background in current directory and separate one for signals) + # create in current directory a new directory called generated_histograms + f = ROOT.TFile(out_dir + "/background.root", "recreate") + # save each histogram in a tdirectory with the region name + d_A = f.mkdir("A") + d_A.cd() + h_A_bkg.Write() + f.cd() + d_B = f.mkdir("B") + d_B.cd() + h_B_bkg.Write() + f.cd() + d_C = f.mkdir("C") + d_C.cd() + h_C_bkg.Write() + f.cd() + d_D = f.mkdir("D") + d_D.cd() + h_D_bkg.Write() + # close file + f.Close() + + # save each signal in a separate file + for signal, rate in signals_expected_rates.items(): + f = ROOT.TFile(out_dir + f"/{signal}.root", "recreate") + d_A = f.mkdir("A") + d_A.cd() + h_A_signals[signal].Write() + f.cd() + d_B = f.mkdir("B") + d_B.cd() + h_B_signals[signal].Write() + f.cd() + d_C = f.mkdir("C") + d_C.cd() + h_C_signals[signal].Write() + f.cd() + d_D = f.mkdir("D") + d_D.cd() + h_D_signals[signal].Write() + f.Close() + + +# plot the expected yields for signal as a function of the mass +def plot_expected_signal_yields(signals_expected_rates, out_dir): + masses = [] + rates = [] + for key, value in signals_expected_rates.items(): + masses.append(int(key.split("_")[1])) + rates.append(value) + plt.plot(masses, rates, "o") + plt.xlabel("mPhi [GeV]") + plt.ylabel("Expected Signal Yield") + + # save the plot in current directory + plt.savefig(out_dir + "/expected_signal_yields.png") + + +# make 2D contour plots for background and signal +def plot_2d_countors(bkg, signals, bkg_weights, signals_weights, out_dir): + # plot the 2D distributions for background in a 2D histogram + fig, ax = plt.subplots() + + k = kde.gaussian_kde(bkg.T) + ks = kde.gaussian_kde(signals["mPhi_1500"].T) + + nbins = 20 + x_bkg_i, y_bkg_i = np.mgrid[-0.5 : 1.5 : nbins * 1j, -0.5 : 1.5 : nbins * 1j] + z_bkg_i = k(np.vstack([x_bkg_i.flatten(), y_bkg_i.flatten()])) * bkg_weights[0] + z_sgn_i = ks(np.vstack([x_bkg_i.flatten(), y_bkg_i.flatten()])) * signals_weights["mPhi_1500"][0] + + plt.pcolormesh(x_bkg_i, y_bkg_i, z_bkg_i.reshape(x_bkg_i.shape), cmap=plt.cm.viridis) + plt.contour(x_bkg_i, y_bkg_i, z_sgn_i.reshape(x_bkg_i.shape), cmap=plt.cm.Reds) + + # set limit for the plot and add labels + ax.set_xlim(-0.5, 1.5) + ax.set_ylim(-0.5, 1.5) + ax.set_xlabel("x") + ax.set_ylabel("y") + + # plot vertical and orizontal lines to divide the plane in 4 regions (at 0.5 on x and y axis), name them A,B,C,D + plt.axvline(0.5, color="black") + plt.axhline(0.5, color="black") + plt.text(-0.25, 1.25, "B", fontsize=20) + plt.text(1.25, 1.25, "A", fontsize=20) + plt.text(-0.25, -0.25, "D", fontsize=20) + plt.text(1.25, -0.25, "C", fontsize=20) + + # Set title + plt.title("Signal (1500) vs Background distributions") + + # save the plot in current directory + plt.savefig(out_dir + "/2D_contours.png") + + # plot the 2D distributions for background in a 2D histogram + fig, ax = plt.subplots() + x_sgn, y_sgn = signals["mPhi_5000"].T + ks = kde.gaussian_kde(signals["mPhi_5000"].T) + + z_sgn_i = ks(np.vstack([x_bkg_i.flatten(), y_bkg_i.flatten()])) * signals_weights["mPhi_5000"][0] + + plt.pcolormesh(x_bkg_i, y_bkg_i, z_bkg_i.reshape(x_bkg_i.shape), cmap=plt.cm.viridis) + plt.contour(x_bkg_i, y_bkg_i, z_sgn_i.reshape(x_bkg_i.shape), cmap=plt.cm.Reds) + + # set limit for the plot and add labels + ax.set_xlim(-0.5, 1.5) + ax.set_ylim(-0.5, 1.5) + ax.set_xlabel("x") + ax.set_ylabel("y") + + # plot vertical and orizontal lines to divide the plane in 4 regions (at 0.5 on x and y axis), name them A,B,C,D + plt.axvline(0.5, color="black") + plt.axhline(0.5, color="black") + plt.text(-0.25, 1.25, "B", fontsize=20) + plt.text(1.25, 1.25, "A", fontsize=20) + plt.text(-0.25, -0.25, "D", fontsize=20) + plt.text(1.25, -0.25, "C", fontsize=20) + + # Set title + plt.title("Signal (5000) vs Background distributions") + + # save the plot in current directory + plt.savefig(out_dir + "/2D_contours_5000.png") + + +# Here we visualize the distributions of the observables we want to fit +# plot the 1D distributions for z observable for bkg and signal +def plot_1d_distributions(bkg_z, signals_z, bkg_weights, signals_weights, signals_expected_rates, outdir): + + bin_edges = np.arange(200, np.max(signals_z["mPhi_5000"]), 100) + fig, ax = plt.subplots() + plt.hist(bkg_z, bins=100, weights=bkg_weights, histtype="step", label="bkg", density=True) + for signal, rate in signals_expected_rates.items(): + if signal == "mPhi_1500" or signal == "mPhi_3000" or signal == "mPhi_5000": + plt.hist(signals_z[signal], bins=bin_edges, weights=signals_weights[signal], histtype="step", label=signal, density=True) + plt.legend() + plt.xlabel("z") + plt.ylabel("Normalized entries") + plt.title("Signal and Background distributions for z observable") + # set log scale for y axis + plt.yscale("log") + + # save the plot in current directory + plt.savefig(outdir + "/1D_distributions.png") + + +def plot_1d_distributions_regions(bkg_A, bkg_B, bkg_C, bkg_D, signals_A, new_bin_edges, out_dir): + + # plot the shape of the z observable in the 4 regions for bkg with the optimal binning + fig, ax = plt.subplots() + plt.hist(bkg_A["z"], bins=new_bin_edges, weights=bkg_A["w"], histtype="step", label="A") + plt.hist(bkg_B["z"], bins=new_bin_edges, weights=bkg_B["w"], histtype="step", label="B") + plt.hist(bkg_C["z"], bins=new_bin_edges, weights=bkg_C["w"], histtype="step", label="C") + plt.hist(bkg_D["z"], bins=new_bin_edges, weights=bkg_D["w"], histtype="step", label="D") + + plt.legend() + plt.xlabel("z") + plt.ylabel("Entries") + # set y axis lower limit to 1 + plt.ylim(1, 5.0 * 10**5) + + plt.title("Background distributions for z observable in the 4 regions") + # set log scale + plt.yscale("log") + plt.savefig(out_dir + "/1D_distributions_bkg.png") + + # plot signals with the optimal binning (normalize histograms) together with background signals_z[signal] + fig, ax = plt.subplots() + plt.hist(bkg_A["z"], bins=new_bin_edges, weights=bkg_A["w"], histtype="step", label="Region A - bkg", density=True) + plt.hist( + signals_A["mPhi_1500"]["z"], + bins=new_bin_edges, + weights=signals_A["mPhi_1500"]["w"], + histtype="step", + label="Region A - signal mPhi_1500", + density=True, + ) + plt.hist( + signals_A["mPhi_3000"]["z"], + bins=new_bin_edges, + weights=signals_A["mPhi_3000"]["w"], + histtype="step", + label="Region A - signal mPhi_3000", + density=True, + ) + plt.hist( + signals_A["mPhi_5000"]["z"], + bins=new_bin_edges, + weights=signals_A["mPhi_5000"]["w"], + histtype="step", + label="Region A - signal mPhi_5000", + density=True, + ) + + plt.legend() + plt.xlabel("z") + plt.ylabel("Entries") + + plt.title("Background distributions for z observable in SR vs signals") + # set log scale + plt.yscale("log") + plt.savefig(out_dir + "/1D_distributions_signals_A.png") + + +def print_signal_significance(signals_expected_rates, n_signals_A, n_signals_B, n_signals_C, n_signals_D, n_bkg_A, n_bkg_B, n_bkg_C, n_bkg_D): + + # compute s/sqrt(b) for each signal mass point in each region and print the results + for signal, rate in signals_expected_rates.items(): + s = n_signals_A[signal] + b = n_bkg_A + print("****************************************************************************************") + # print number of signal events + print(f"background in region A: {b}") + if b > 0: + print(f"{signal} in region A: s/sqrt(b) = {s/np.sqrt(b)}") + else: + print(f"{signal} in region A: s/sqrt(b) = inf") + + s = n_signals_B[signal] + b = n_bkg_B + print(f"background in region B: {b}") + + if b > 0: + print(f"{signal} in region B: s/sqrt(b) = {s/np.sqrt(b)}") + else: + print(f"{signal} in region B: s/sqrt(b) = inf") + + s = n_signals_C[signal] + b = n_bkg_C + print(f"background in region C: {b}") + + if b > 0: + print(f"{signal} in region C: s/sqrt(b) = {s/np.sqrt(b)}") + else: + print(f"{signal} in region C: s/sqrt(b) = inf") + + s = n_signals_D[signal] + b = n_bkg_D + print(f"background in region D: {b}") + + if b > 0: + print(f"{signal} in region D: s/sqrt(b) = {s/np.sqrt(b)}") + else: + print(f"{signal} in region D: s/sqrt(b) = inf") + print("****************************************************************************************") + + +def print_expected_background_in_regions_and_closure( + n_bkg_A, n_bkg_B, n_bkg_C, n_bkg_D, bkg_A, bkg_B, bkg_C, bkg_D, signals_A, signals_B, signals_C, signals_D, signals_expected_rates +): + + print("****************************************************************************************") + + # Printout the expected yields for the background in regions A,B,C,D + print("bkg A: ", n_bkg_A) + print("bkg B: ", n_bkg_B) + print("bkg C: ", n_bkg_C) + print("bkg D: ", n_bkg_D) + + # Compute total expected predicted number of events in region A applying the ABCD method and expected closure of normalization + # compute for bkg B*C/D + predicted_A = n_bkg_B * (n_bkg_C / n_bkg_D) + + print("predicted A: ", predicted_A) + print("closure (%): ", (np.abs(predicted_A - n_bkg_A) / (n_bkg_A)) * 100) + + print("****************************************************************************************") + + +# main code starting here +def main(): + + # create directories for plots and histograms + cwd = os.getcwd() + out_dir_plots = os.path.join(cwd, "plots") + if not os.path.exists(out_dir_plots): + os.makedirs(out_dir_plots) + + out_dir_histos = os.path.join(cwd, "generated_histograms") + if not os.path.exists(out_dir_histos): + os.makedirs(out_dir_histos) + + # As a first step we want to produce some toy input data. We imagine some BSM particle Phi decaying in a t-channel topology with an hypothetical mass between 1500 and 5000 GeV. + # After applying some selections, we are left with the following expected yields for the different signal points and background. + + ################## Part 1 - Generate distributions ################## + + # signal points expected yields + signals_expected_rates = { + "mPhi_1500": 1000.25 * 3, + "mPhi_2000": 316.1 * 7, + "mPhi_3000": 63.5 * 8, + "mPhi_4000": 20.3 * 8, + "mPhi_5000": 8.6 * 8, + } + + # bkg expected yields + bkg_rate = 1000500.5 + + plot_expected_signal_yields(signals_expected_rates, out_dir_plots) + + # Here generate 2D distributions of the ABCD plane to visualize data + # generate bkg events distributed as a 2D gaussian centred at (0.3,0.3) + np.random.seed(0) + bkg = np.random.multivariate_normal([0.2, 0.2], [[1.0, 0], [0, 1.0]], 1000000) + + # compute weights for bkg events + bkg_weights = np.ones(len(bkg)) * bkg_rate / len(bkg) + + # generate signal events distributed as a 2D gaussian centred at given centroid + signals_centroids = { + "mPhi_1500": [0.63, 0.63], + "mPhi_2000": [0.68, 0.68], + "mPhi_3000": [0.72, 0.72], + "mPhi_4000": [0.76, 0.76], + "mPhi_5000": [0.8, 0.8], + } + signals = {} + for signal, rate in signals_expected_rates.items(): + signals[signal] = np.random.multivariate_normal(signals_centroids[signal], [[0.05, 0], [0, 0.05]], 10000) + + # compute weights for signal events + signals_weights = {} + for signal, rate in signals_expected_rates.items(): + signals_weights[signal] = np.ones(len(signals[signal])) * rate / len(signals[signal]) + + # plot the 2D distributions for background and signal + plot_2d_countors(bkg, signals, bkg_weights, signals_weights, out_dir_plots) + + # generate 1D distribution for z observable for bkg (falling exponential in the range 1500-9000) + bkg_z = np.random.exponential(1000, len(bkg)) + # shift the distribution to start at 1500 + bkg_z = bkg_z + 200 + + # generate 1D distribution for z observable for signal (gaussian in the range 1500-9000) + signals_z = {} + for signal, rate in signals_expected_rates.items(): + # take an exponential factor that shift with the signal mass + signals_z[signal] = np.random.exponential(1000 + 1500 * (float(signal.split("_")[1]) / 1500), 10000) + signals_z[signal] = signals_z[signal] + 200 + + # plot the 1D distributions for z observable for bkg and signal + plot_1d_distributions(bkg_z, signals_z, bkg_weights, signals_weights, signals_expected_rates, out_dir_plots) + + ################## Part 2 - ABCD method ################## + + # Here we select events based on the cut that defines the regions A,B,C,D + # put bkg x,y,z and weights in a dataframe + bkg_df = pd.DataFrame(bkg, columns=["x", "y"]) + bkg_df["z"] = bkg_z + bkg_df["w"] = bkg_weights + + # put signal x,y,z and weights in a dataframe + signals_df = {} + for signal, rate in signals_expected_rates.items(): + signals_df[signal] = pd.DataFrame(signals[signal], columns=["x", "y"]) + signals_df[signal]["z"] = signals_z[signal] + signals_df[signal]["w"] = signals_weights[signal] + + # choose cuts for the ABCD method: boundaries for x,y at 0.5 + cut_A = "x >= 0.5 and y >= 0.5" + cut_B = "x < 0.5 and y >= 0.5" + cut_D = "x <= 0.5 and y < 0.5" + cut_C = "x > 0.5 and y < 0.5" + + # apply the cuts to the dataframes for bkg and signal and save them in new dataframes for A,B,C,D regions + bkg_A = bkg_df.query(cut_A) + bkg_B = bkg_df.query(cut_B) + bkg_C = bkg_df.query(cut_C) + bkg_D = bkg_df.query(cut_D) + + signals_A = {} + signals_B = {} + signals_C = {} + signals_D = {} + + for signal, rate in signals_expected_rates.items(): + signals_A[signal] = signals_df[signal].query(cut_A) + signals_B[signal] = signals_df[signal].query(cut_B) + signals_C[signal] = signals_df[signal].query(cut_C) + signals_D[signal] = signals_df[signal].query(cut_D) + + # Printout the expected number of events in each region, as well as the signal significance + # compute number of bkg and signal events in each region based on the weights + n_bkg_A = bkg_A["w"].sum() + n_bkg_B = bkg_B["w"].sum() + n_bkg_C = bkg_C["w"].sum() + n_bkg_D = bkg_D["w"].sum() + + n_signals_A = {} + n_signals_B = {} + n_signals_C = {} + n_signals_D = {} + + for signal, rate in signals_expected_rates.items(): + n_signals_A[signal] = signals_A[signal]["w"].sum() + n_signals_B[signal] = signals_B[signal]["w"].sum() + n_signals_C[signal] = signals_C[signal]["w"].sum() + n_signals_D[signal] = signals_D[signal]["w"].sum() + + # print the signal significance in each region + print_signal_significance(signals_expected_rates, n_signals_A, n_signals_B, n_signals_C, n_signals_D, n_bkg_A, n_bkg_B, n_bkg_C, n_bkg_D) + + # print closure test + print_expected_background_in_regions_and_closure( + n_bkg_A, n_bkg_B, n_bkg_C, n_bkg_D, bkg_A, bkg_B, bkg_C, bkg_D, signals_A, signals_B, signals_C, signals_D, signals_expected_rates + ) + + ################## Part 3 - Save histograms ################## + + # now plot the shape of the z observable in the 4 regions for signal + new_bin_edges = [200, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 14000] + plot_1d_distributions_regions(bkg_A, bkg_B, bkg_C, bkg_D, signals_A, new_bin_edges, out_dir_plots) + + # save the distributions in ROOT files + save_histograms_in_root_files(bkg_A, bkg_B, bkg_C, bkg_D, new_bin_edges, signals_expected_rates, signals_A, signals_B, signals_C, signals_D, out_dir_histos) + + +if __name__ == "__main__": + + main() diff --git a/data/tutorials/longexercise/postFitPlot.py b/data/tutorials/longexercise/postFitPlot.py index 2619c491555..f2623e9342e 100644 --- a/data/tutorials/longexercise/postFitPlot.py +++ b/data/tutorials/longexercise/postFitPlot.py @@ -2,18 +2,28 @@ import HiggsAnalysis.CombinedLimit.util.plotting as plot import ROOT +import argparse ROOT.PyConfig.IgnoreCommandLineOptions = True ROOT.gROOT.SetBatch(ROOT.kTRUE) plot.ModTDRStyle() +parser = argparse.ArgumentParser(description="Arguments parser") +parser.add_argument("--input_file", dest="input_file", help="Input root file from fitDiagnostics", required=True) +parser.add_argument( + "--shape_type", dest="shape_type", help="Shape directory in input root file from fitDiagnostics (e.g. shapes_fit_b, shapes_fit_s)", required=True +) +parser.add_argument("--region", dest="cards_dir", help="Region of interest (e.g. signal_region, ch1, ch2, etc.)", required=True) +args = parser.parse_args() + + canvas = ROOT.TCanvas() -fin = ROOT.TFile("fitDiagnostics.part3B.root") +fin = ROOT.TFile(args.input_file) -first_dir = "shapes_fit_b" -second_dir = "signal_region" +first_dir = args.shape_type +second_dir = args.cards_dir h_bkg = fin.Get(first_dir + "/" + second_dir + "/total_background") h_sig = fin.Get(first_dir + "/" + second_dir + "/total_signal") @@ -41,5 +51,6 @@ legend.AddEntry(h_err, "Background uncertainty", "F") legend.Draw() -canvas.SaveAs("plot.pdf") -canvas.SaveAs("plot.png") + +canvas.SaveAs("plot_%s_%s.png" % (first_dir, second_dir)) +canvas.SaveAs("plot_%s_%s.pdf" % (first_dir, second_dir)) diff --git a/docs/abcd_rooparametrichist_tutorial/figures/inputs.png b/docs/abcd_rooparametrichist_tutorial/figures/inputs.png new file mode 100644 index 00000000000..719f41cde10 Binary files /dev/null and b/docs/abcd_rooparametrichist_tutorial/figures/inputs.png differ diff --git a/docs/abcd_rooparametrichist_tutorial/figures/limits.png b/docs/abcd_rooparametrichist_tutorial/figures/limits.png new file mode 100644 index 00000000000..edd057f7bdc Binary files /dev/null and b/docs/abcd_rooparametrichist_tutorial/figures/limits.png differ diff --git a/docs/abcd_rooparametrichist_tutorial/figures/post_fit_plots_A.png b/docs/abcd_rooparametrichist_tutorial/figures/post_fit_plots_A.png new file mode 100644 index 00000000000..dbc561a6434 Binary files /dev/null and b/docs/abcd_rooparametrichist_tutorial/figures/post_fit_plots_A.png differ diff --git a/docs/abcd_rooparametrichist_tutorial/rooparametrichist_exercise.md b/docs/abcd_rooparametrichist_tutorial/rooparametrichist_exercise.md new file mode 100644 index 00000000000..2d7b5337b6a --- /dev/null +++ b/docs/abcd_rooparametrichist_tutorial/rooparametrichist_exercise.md @@ -0,0 +1,525 @@ +# combine_tutorial_ABCD_rooParametricHist +tutorial for Combine using RooParamertricHist to perform the ABCD method + +## Getting started +To get started, you should have a working setup of ```Combine```, please follow the instructions from the [home page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/tutorial2023/parametric_exercise/). Make sure to use the latest recommended release. + +Now let's move to the working directory for this tutorial which contains all of the inputs and scripts needed to run the ABCD method with RooParametricHist exercise: + +``` +cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/abcd_rooparametrichist_exercise + +``` + +## Introduction +The goal of this tutorial is to exemplify the usage of ```RooParametricHist``` in [CMS Combine](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/) to implement a bin-by-bin ABCD method. +In this tutorial we will work with a toy example that could resemble a real physics analysis case. We consider the search for a BSM particle $\Phi$ with a mass range between 1500 and 5000 GeV that leads to some excess in the tails of an observable $z$ (which could be $p_{T,\mathrm{miss}}$ ). We assume that we have found two uncorrelated discriminating features $x$ and $y$ that can be used to build the ABCD plane (the regions A,B,C,D will be defined by cutting on $x,y$), and we assume that $z$ is uncorrelated with respect to these two features. In this way, binning the variable $z$ in the same way in the regions A,B,C,D, per-bin transfer factors in the $z$ variable can be derived with the ABCD method to obtain the estimate of the background in the signal region. In our example, we will generate a set of pseudodata from our background-only model, and then the background will be estimated from these data from the control regions B,C,D and compared to data in the Signal Region A. + +The tutorial has 4 main parts: + +1. [Generate input data](#inputs) +2. [Create RooParametricHist for ABCD method](#rooparametrichist) +3. [Prepare Combine datacards](#datacards) +4. [Run fit](#fit) +5. [Produce limits](#limits) + +## Generate input data + + +The histograms for the $z$ observable in the different regions A,B,C,D can be produced using the ```produce_input_histograms_and_analyse.py``` script in ```utils/produce_input_histograms_and_analyse.py```. In the script the expected rates for different signal hypotheses (as a function of $\Phi$ mass $m_{\Phi} \in \{1500, 2000, 3000, 4000, 5000 \}$ GeV) and the background yields are specified, as well as the distributions in $x,y,z$ of the signals and backgrounds. In the following steps of the tutorial we will just consider one of the mass points generated, $m_{\Phi} = 1500$ GeV, but the same analysis can be run separately on other mass points as well. In $x,y$, the signal and the background are assumed to be distributed as multivariate gaussians, with the background centred at $(0,2,0.2)$ in $(x,y)$ while the signals centred in the upper-right corner of the plane ($x,y>0.5$). For the $z$ feature, the background and the signal distributions are sampled from an exponential, for the signal the tails of the exponential get enhanced with the mass parameter $m_{\Phi}$. + +![input distributions](figures/inputs.png) + +The ABCD boundaries are chosen in the example to be $(0.5,0.5)$, and A is defined as the signal region, while the others are control regions used for the estimation of the background. From the example provided, the signal contamination in the control regions is expected to be low, and the non-closure of the background estimation to be small. The histograms for the different regions are saved in separate root files for each signal hypothesis and total background. + +To generate your own input data, run: + +``` +python3 utils/produce_input_histograms_and_analyse.py + +``` + +## Create RooParametricHist for ABCD method + + +In order to prepare the datacards for our ABCD method we will need to pass the histograms of our data in the A, B, C, and D regions to the datacard to be read in by Combine. +Moreover, we will need to relate the bins of our signal region A $N_{A}^{bin,i}$ to the bins of the control regions $N_{B/C/D}^{bin,i}$ via the ABCD method formula $N_{A}^{bin,i} = N_{B}^{bin,i} \cdot TF^{bin,i}$, where the transfer factor is $TF^{bin,i} = N_{C}^{bin,i}/N_{D}^{bin,i}$. To achieve our goal, we can use ```RooParametricHist``` implemented in Combine (for further documentation look [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/part3/nonstandard/?h=rooparametrichist#rooparametrichist-gamman-for-shapes)). + +A ```RooParametricHist``` is a custom implementation within the ROOT framework, specifically designed for handling parametric histograms in a way that integrates with ROOT's RooFit package. This class extends ```RooAbsPdf```, indicating it's meant to represent a probability density function (PDF) that can be parameterized and manipulated within the RooFit framework (see implentation in Combine [here](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/interface/RooParametricHist.h)). The idea is that ```RooParametricHist``` allows to define histograms as PDFs where each bin can be either a ```RooRealVar``` or a ```RooFormulaVar```. This will allow us to relate each bin of our signal region histogram A with the corresponding one of the control regions via the ABCD method formula. + +A ```RooParamtricHist``` can be initialized as follows: + +``` +RooParametricHist parametric_hist("paramtric_hist", "Parametric Hist",variable,roo_arg_list_bins,data_th1) + +``` +where ```variable``` is a ```RooRealVar``` defining the observable which is being binned, ```roo_arg_list_bins``` is a ```RooArgList``` containing bins defined as ```RooRealVar``` or ```RooFormulaVar``` and ```data_th1``` is a ```TH1``` used to initialize the ```RooParametricHist```. It is also possible to define a normalization parameter for the parametric histogram as follows: + +``` +RooAddition parametric_hist_norm("paramtric_hist_norm","Total Number of events for Parametric Hist",roo_arg_list_bins) + +``` + +This normalization parameter is relevant for our ABCD method since we would like to know also the total predicted background yield in our Signal Region. + +In the following we describe how to build parametric histograms for our ABCD method regions and how to build the worskpace to link to the datacard. + + + + +## Prepare Combine datacards + + +From the input histograms, for each signal hypothesis, 4 datacards can be built, one for each region of the ABCD plane. Examples of the templates for the datacards (for a signal mass point at 1500 GeV) can be found in the following. All the example datacards are stored in the directory ```sgn_CRs``` in ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/abcd_rooparametrichist_exercise/datacards/```. We consider for now the datacards stored in the directory ```sgn_CRs```, for which the signal is present in the control regions. +Let's take as an example the cards for the $m_{\Phi} = 1500$ GeV in the directory ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/abcd_rooparametrichist_exercise/datacards/sgn_CRs/mPhi1500/```: + + +
+ Datacard Region A (Signal Region) + +``` +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs A param_ws.root wspace:data_obs_A +shapes Bkg A param_ws.root wspace:bkg_A +shapes mPhi_1500 A param_ws.root wspace:mPhi_1500_A +----------------------------------------------------------------------------------- +bin A +observation -1 +----------------------------------------------------------------------------------- +bin A A +process Bkg mPhi_1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN - 1.0160000000 +BkgRate lnN 1.05 - + +``` + +
+ +
+ Datacard Region B + +``` +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs B param_ws.root wspace:data_obs_B +shapes Bkg B param_ws.root wspace:bkg_B +shapes mPhi_1500 B param_ws.root wspace:mPhi_1500_B +----------------------------------------------------------------------------------- +bin B +observation -1 +----------------------------------------------------------------------------------- +bin B B +process Bkg mPhi1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN - 1.0160000000 + +``` + +
+ +
+ Datacard Region C + +``` +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs C param_ws.root wspace:data_obs_C +shapes Bkg C param_ws.root wspace:bkg_C +shapes mPhi_1500 C param_ws.root wspace:mPhi_1500_C +----------------------------------------------------------------------------------- +bin C +observation -1 +----------------------------------------------------------------------------------- +bin C C +process Bkg mPhi1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN - 1.0160000000 + +``` + +
+ +
+ Datacard Region D + +``` +imax * number of bins +jmax * number of processes minus 1 +kmax * number of nuisance parameters +----------------------------------------------------------------------------------- +shapes data_obs D param_ws.root wspace:data_obs_D +shapes Bkg D param_ws.root wspace:bkg_D +shapes mPhi_1500 D param_ws.root wspace:mPhi_1500_D +----------------------------------------------------------------------------------- +bin D +observation -1 +----------------------------------------------------------------------------------- +bin D D +process Bkg mPhi1500 +process 1 0 +rate 1 -1 +----------------------------------------------------------------------------------- +lumi lnN - 1.0160000000 + +``` +
+ +As an example, for each datacard, we have assigned a systematic uncertainty of 1.6% due to lumi to the signal processes, and a systematic of 5% to background in the SR (to take into account of non-closure of the method). +Notice that each datacard for each region has a ```shapes``` section for the observed data ```data_obs```, for the background ```Bkg``` and for the signal. The signal and data shapes are stored in a workspace ```wspace``` linked to the shapes section in the datacard, while the background shapes are stored in a ```RooParametricHist``` object. In the following we show how to build the workspace. + +We follow the main steps implemented in a working code to create the workspace ```create_workspace.py``` in ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/abcd_rooparametrichist_exercise/utils/```. +First create a RooWorkspace, implement a function ```__get_histograms_regions``` to read the input histograms from the A,B,C,D regions and import them as ```RooDataHist``` in the workspace. + +
+ Import histograms in workspace for signal and observed data + +``` +#Output file and workspace +#Here we create a TFile where to store the workspace +output_file_ws = ROOT.TFile(card_output_directory+"param_ws.root","RECREATE") +ws = RooWorkspace("wspace","wspace") + +#Define a RooRealVar for the observable z to fit +variable_z = RooRealVar( "z", "z", 200, 14000, "GeV") + +#Getting histograms for observed data saved in input ROOT file as TH1F for all the regions +histA_obs , histB_obs, histC_obs, histD_obs = __get_histograms_regions("bkg", input_file_bkg) + +#Save TH1F histograms for data in RooDataHist for all the regions. +#RooDataHist can be initialized using a RooArgList with the observable to use, the TH1F for a given region and weight=1 +histData_A = RooDataHist("data_obs_A", "Obs Data region A", RooArgList(variable_z), histA_obs, 1.) +histData_B = RooDataHist("data_obs_B", "Obs Data region B", RooArgList(variable_z), histB_obs, 1.) +histData_C = RooDataHist("data_obs_C", "Obs Data region C", RooArgList(variable_z), histC_obs, 1.) +histData_D = RooDataHist("data_obs_D", "Obs Data region D", RooArgList(variable_z), histD_obs, 1.) + +#Import data in workspace +getattr(ws, "import")(histData_A, RooFit.Rename("data_obs_A")) +getattr(ws, "import")(histData_B, RooFit.Rename("data_obs_B")) +getattr(ws, "import")(histData_C, RooFit.Rename("data_obs_C")) +getattr(ws, "import")(histData_D, RooFit.Rename("data_obs_D")) + +#Save the signals in RooDataHist +histA_sgn , histB_sgn, histC_sgn, histD_sgn = __get_histograms_regions("sgn", input_file_sgn) +histSgn_A = RooDataHist(signal+"_A", "Sgn Data region A", RooArgList(variable_z), histA_sgn, 1.) +histSgn_B = RooDataHist(signal+"_B", "Sgn Data region B", RooArgList(variable_z), histB_sgn, 1.) +histSgn_C = RooDataHist(signal+"_C", "Sgn Data region C", RooArgList(variable_z), histC_sgn, 1.) +histSgn_D = RooDataHist(signal+"_D", "Sgn Data region D", RooArgList(variable_z), histD_sgn, 1.) + +#Import signals in workspace +getattr(ws, "import")(histSgn_A, RooFit.Rename(signal+"_A")) +getattr(ws, "import")(histSgn_B, RooFit.Rename(signal+"_B")) +getattr(ws, "import")(histSgn_C, RooFit.Rename(signal+"_C")) +getattr(ws, "import")(histSgn_D, RooFit.Rename(signal+"_D")) + +``` +
+ +For B,C,D regions, create a ```RooParametricHist``` object, storing the content of the background bins in B,C,D as ```RooRealVar``` : + +
+ Create RooParametricHist for control regions background templates + +``` +#Here we define the background histograms from "data" (which in our case is equal to the background) +#They will be used to build the parametric histograms we use for modelling the background +histA_pr , histB_pr, histC_pr, histD_pr = __get_histograms_regions("bkg", input_file_bkg) + +#Save in RooArgList the content of the bins of histB_pr to define the RooParametricHist for the B region +process_B_region_bins = RooArgList() +process_B_region_bins_list = [] + +#Save in RooArgList the content of the bins of histC_pr to define the RooParametricHist for the C region +process_C_region_bins = RooArgList() +process_C_region_bins_list = [] + +#Save in RooArgList the content of the bins of histD_pr to define the RooParametricHist for the D region +process_D_region_bins = RooArgList() +process_D_region_bins_list = [] + +#Add yields for each bin for the RooParametricHist in the B Region +#each bin is defined as a RooRealVar initialized at the nominal bin content, and with a range between 0 and 2 times the nominal rate +for i in range(1,histB_obs.GetNbinsX()+1): + bin_B_i = RooRealVar("Bkg_B_region_bin_"+str(i),"Background yield in control region B bin " + str(i),histB_obs.GetBinContent(i),0.,2.0*histB_obs.GetBinContent(i)) + process_B_region_bins_list.append(bin_B_i) + process_B_region_bins.add(process_B_region_bins_list[i-1]) + + +#Add yields for each bin for the RooParametricHist in the C Region +#each bin is defined as a RooRealVar initialized at the nominal bin content, and with a range between 0 and 2 times the nominal rate +for i in range(1,histC_obs.GetNbinsX()+1): + bin_C_i = RooRealVar("Bkg_C_region_bin_"+str(i),"Background yield in control region C bin " + str(i),histC_obs.GetBinContent(i),0.,2.0*histC_obs.GetBinContent(i)) + process_C_region_bins_list.append(bin_C_i) + process_C_region_bins.add(process_C_region_bins_list[i-1]) + + +#Add yields for each bin for the RooParametricHist in the D Region +#each bin is defined as a RooRealVar initialized at the nominal bin content, and with a range between 0 and 2 times the nominal rate +for i in range(1,histD_obs.GetNbinsX()+1): + bin_D_i = RooRealVar("Bkg_D_region_bin_"+str(i),"Background yield in control region D bin " + str(i),histD_obs.GetBinContent(i),0.,2.0*histD_obs.GetBinContent(i)) + process_D_region_bins_list.append(bin_D_i) + process_D_region_bins.add(process_D_region_bins_list[i-1]) + + +#Define the parametric histogram for control region B. +#Here we consider the B region to be the transfering region, so the region for which each bin content will be multiplied by a transfer factor (determined by C, D yields) +#The RooParametricHist is initalized giving as input the observable, the RooArgList of the bins previously built and a template TH1F. +param_hist_B_region = RooParametricHist("bkg_B", "Background PDF in B region",variable_z,process_B_region_bins,histB_pr) + +#Here we define the total normalization for the RooparametricHist in the B region +param_Bkg_B_norm = RooAddition("bkg_B"+"_norm","Total Number of events from background in control region B",process_B_region_bins) + +#Here we import the the parametric histogram and the normalization in our workspace +getattr(ws, "import")(param_hist_B_region, RooFit.Rename("bkg_B")) +getattr(ws, "import")(param_Bkg_B_norm, RooFit.Rename("bkg_B"+"_norm"),RooFit.RecycleConflictNodes()) + +#Define the parametric histogram for control region C. +#The RooParametricHist is initalized giving as input the observable, the RooArgList of the bins previously built and a template TH1F. +param_hist_C_region = RooParametricHist("bkg_C", "Background PDF in C region",variable_z,process_C_region_bins,histC_pr) + +#Here we define the total normalization for the RooparametricHist in the C region +param_Bkg_C_norm = RooAddition("bkg_C"+"_norm","Total Number of events from background in control region C",process_C_region_bins) + +#Here we import the the parametric histogram and the normalization in our workspace +getattr(ws, "import")(param_hist_C_region, RooFit.Rename("bkg_C")) +getattr(ws, "import")(param_Bkg_C_norm, RooFit.Rename("bkg_C"+"_norm"),RooFit.RecycleConflictNodes()) + +#Define the parametric histogram for control region D. +param_hist_D_region = RooParametricHist("bkg_D", "Background PDF in D region",variable_z,process_D_region_bins,histD_pr) + +#Here we define the total normalization for the RooparametricHist in the D region +param_Bkg_D_norm = RooAddition("bkg_D"+"_norm","Total Number of events from background in control region D",process_D_region_bins) + +#Here we import the the parametric histogram and the normalization in our workspace +getattr(ws, "import")(param_hist_D_region, RooFit.Rename("bkg_D")) +getattr(ws, "import")(param_Bkg_D_norm, RooFit.Rename("bkg_D"+"_norm"),RooFit.RecycleConflictNodes()) + +``` +
+ +For the signal region A, create a ```RooParametricHist``` with each bin made from a ```RooFormulaVar``` relating the C,D,B regions to A via the ABCD formula. + +
+ Create RooParametricHist for SR background template + +``` +#Relate the signal region (A) to control region B via transfer factors +#Define RooArgList for expected yields in region A after applying transfer factors +process_AB_region_bins = RooArgList() +#Define list for transfer factors +TF_list = [] +#Define list for expected yields in region A after applying transfer factors +process_AB_region_bins_list = [] + +#Compute per-bin transfer factor +#Loop over the bins of the transfering region B, and compute the transfer factors as C/D +for i in range(1,histB_pr.GetNbinsX()+1): + #Define transfer factor as a RooFormulaVar. Use the method .obj for RooWorkSpace to retrieve the yield for a given bin and region + TF_i = RooFormulaVar("TF"+str(i),"Transfer factor C/D bin " + str(i),"(@0/@1)",RooArgList(ws.obj("Bkg_C_region_bin_"+str(i)) , ws.obj("Bkg_D_region_bin_"+str(i)) )) + TF_list.append(TF_i) + #Compute the expected yield in the signal region as A = B * TF. This will be used to initialise the RooParametricHist in the Signal Region + bin_AB_i = RooFormulaVar("Bkg_AB_region_bin_"+str(i),"Background yield in SR A region bin " + str(i), "@0*@1", RooArgList(TF_i, ws.obj("Bkg_B_region_bin_"+str(i)) )) + process_AB_region_bins_list.append(bin_AB_i) + process_AB_region_bins.add(process_AB_region_bins_list[i-1]) + +#Create parametric histogram for signal region (A) using bin contents saved in process_AB_region_bins +param_hist_A_region = RooParametricHist("bkg_A", "Background PDF in A region",variable_z,process_AB_region_bins,histB_pr) + +#Define total normalization parameter +param_bkg_A_norm = RooAddition("bkg_A"+"_norm","Total Number of events from background in A region",process_AB_region_bins) + +#Store in Workspace +getattr(ws, "import")(param_hist_A_region, RooFit.Rename("bkg_A")) +getattr(ws, "import")(param_bkg_A_norm, RooFit.Rename("bkg_A"+"_norm"),RooFit.RecycleConflictNodes()) + +``` +
+ +To run the workspace creation script: + +``` +python3 utils/create_workspace.py -m 1500 + +``` + +where ```-m``` is the flag for the mass point you want to run the script on. The script will use by default the histograms stored in ```generated_histograms``` in the tutorial directory. +After running the script, the workspace will be saved in ```example_analysis/datacards/```. To create the datacards automatically fatching the correct workspace, run: + + +``` +python3 utils/create_datacards.py -m 1500 + +``` + +The datacards will be created in the directory ```example_analysis/datacards/mPhi1500/``` inside the tutorial directory. Move into the directory where the datacards are stored: +``` +cd example_analysis/datacards/mPhi1500/ + +``` + +The datacards can be combined then using the usual command: + +``` +combineCards.py mPhi_1500_*2018*.txt > combined_mPhi_1500_2018.txt + +``` + +
+ Datacard with A,B,C,D regions combined + +``` +Combination of mPhi1500_Catany_2018_CR_B.txt mPhi1500_Catany_2018_CR_C.txt mPhi1500_Catany_2018_CR_D.txt mPhi1500_Catany_2018_SR.txt +imax 4 number of bins +jmax 1 number of processes minus 1 +kmax 2 number of nuisance parameters +---------------------------------------------------------------------------------------------------------------------------------- +shapes Bkg ch1 param_ws.root wspace:bkg_B +shapes data_obs ch1 param_ws.root wspace:data_obs_B +shapes mPhi_1500 ch1 param_ws.root wspace:mPhi_1500_B +shapes Bkg ch2 param_ws.root wspace:bkg_C +shapes data_obs ch2 param_ws.root wspace:data_obs_C +shapes mPhi_1500 ch2 param_ws.root wspace:mPhi_1500_C +shapes Bkg ch3 param_ws.root wspace:bkg_D +shapes data_obs ch3 param_ws.root wspace:data_obs_D +shapes mPhi_1500 ch3 param_ws.root wspace:mPhi_1500_D +shapes Bkg ch4 param_ws.root wspace:bkg_A +shapes data_obs ch4 param_ws.root wspace:data_obs_A +shapes mPhi_1500 ch4 param_ws.root wspace:mPhi_1500_A +---------------------------------------------------------------------------------------------------------------------------------- +bin ch1 ch2 ch3 ch4 +observation -1 -1 -1 -1 +---------------------------------------------------------------------------------------------------------------------------------- +bin ch1 ch1 ch2 ch2 ch3 ch3 ch4 ch4 +process mPhi_1500 Bkg mPhi_1500 Bkg mPhi_1500 Bkg mPhi_1500 Bkg +process 0 1 0 1 0 1 0 1 +rate -1 1 -1 1 -1 1 -1 1 +---------------------------------------------------------------------------------------------------------------------------------- +BkgRate lnN - - - - - - - 1.05 +lumi lnN 1.016 - 1.016 - 1.016 - 1.016 - + + +``` +
+ + + +## Run Fit + + +From the combined datacard for all the regions, one can run the usual fit diagnostics as follows: + +``` +combine -M FitDiagnostics combined_mPhi_1500_2018.txt -m 1500 --saveShapes --saveWithUncertainties --saveNormalizations + +``` + +
+ Results FitDiagnostics + +``` + --- FitDiagnostics --- +Best fit r: 1.25367e-06 -1.25367e-06/+0.278001 (68% CL) + +``` +
+ + +Using the output ```fitDiagnosticsTest.root```, one can run the script ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/mlfitNormsToText.py``` to get the predictions for the normalizations: + +``` +python3 $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/mlfitNormsToText.py fitDiagnosticsTest.root + +``` + +
+ Normalizations predictions + +``` +Channel Process Pre-fit S+B Fit B-Only Fit +--------------------------------------------------------------------------------------------------------------------------------- +ch1 mPhi_1500 594.749 0.001 0.000 +ch2 mPhi_1500 605.251 0.001 0.000 +ch3 mPhi_1500 245.161 0.000 0.000 +ch4 mPhi_1500 1541.786 0.002 0.000 +ch1 Bkg 236222.924 236225.827 236225.566 +ch2 Bkg 235703.859 235706.772 235706.507 +ch3 Bkg 381479.390 381476.654 381476.592 +ch4 Bkg 145963.042 146872.863 146873.546 +ch1 total 236817.673 236225.828 236225.566 +ch1 total_signal 594.749 0.001 0.000 +ch1 total_background 236222.924 236225.827 236225.566 +ch2 total 236309.110 235706.773 235706.507 +ch2 total_signal 605.251 0.001 0.000 +ch2 total_background 235703.859 235706.772 235706.507 +ch3 total 381724.551 381476.654 381476.592 +ch3 total_signal 245.161 0.000 0.000 +ch3 total_background 381479.390 381476.654 381476.592 +ch4 total 147504.828 146872.865 146873.546 +ch4 total_signal 1541.786 0.002 0.000 +ch4 total_background 145963.042 146872.863 146873.546 + +``` +
+ +Moreover, you can run the script in ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/longexercise/postFitPlot.py``` to get pre-fit and post-fit plots in the signal region (in the combined datacard ```ch4```): + +``` +python3 $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/longexercise/postFitPlot.py --input_file fitDiagnosticsTest.root --shape_type --region + +``` + +where the argument `````` can be ```shapes_prefit```, ```shapes_fit_b``` or ```shapes_fit_s``` to get prefit, background-only fit or signal+background fit plots. While the argument `````` specifies the region, which can be ```ch4``` for the signal region A. + +![input distributions](figures/post_fit_plots_A.png) + +## Produce limits + + +Limits can be computed from the combined datacard for all the regions using the following command (in case of asymptotic limits): + +``` +combine -M AsymptoticLimits combined_mPhi_1500_2018.txt -m 1500 2>&1 | tee asymp_limits_mPhi_1500_2018.txt + +``` + +
+ Results AsymptoticLimits + +``` + -- AsymptoticLimits ( CLs ) -- +Observed Limit: r < 0.5986 +Expected 2.5%: r < 0.3532 +Expected 16.0%: r < 0.4662 +Expected 50.0%: r < 0.6367 +Expected 84.0%: r < 0.8702 +Expected 97.5%: r < 1.1325 + +``` +
+ + +Both the observed (from nominal Monte Carlo) and the expected limits can be computed for each mass point. + + +The same exercise can be repeated generating a workspace where the control regions are depleted from the signal (see datacards in ```$CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/data/tutorials/abcd_rooparametrichist_exercise/datacards/no_sgn_CRs/```), and re-running the limits. This should give a hint of how much the signal contamination in the control regions is worsening the limits. If you want to generate by your self the workspace and the cards where the signal is removed from the CRs, just run the scripts ```create_workspace.py``` and ```create_datacards.py``` with the flag ```--deplete_crs_from_signal```. + + +![limit distributions](figures/limits.png) + +The analysis illustrated so far can be run for all mass point, and finally the typical brazil exclusion plot can be produced. As we can see the expected limit without signal in the control regions is better compared to the one taking into account the signal, showing that in this example there is an impact from the signal contamination of the control regions affecting the final sensitivity. Instead, the observed line is showing the impact of statistical fluctuations inducing a non-closure effect of the ABCD method: mainly the background is overestimated, thus leading to slightly worse expected limits (equivalent to an underfluctuation of the data). + +In our single mass point analysis one can check indeed the behaviour illustrated in the limit plot above by looking at the $50\%$ quantile on $r$ after running the limit with and without signal injection in the control regions. Moreover, the effect of statistical fluctuations inducing a non-closure effect can be seen looking at the observed limit on $r$. + diff --git a/mkdocs.yml b/mkdocs.yml index 6b99dd90de2..2cb135cce5c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -34,6 +34,7 @@ nav: - "Likelihood Based Unfolding": tutorial2023_unfolding/unfolding_exercise.md - "Statitiscal Tests Basics": tutorial_stat_routines/stat_routines.md - "Model building": model_building_tutorial2024/model_building_exercise.md + - "Per-bin Transfer Factors with Parametric Histograms": abcd_rooparametrichist_tutorial/rooparametrichist_exercise.md - Links & FAQ: part4/usefullinks.md theme: