Version 2 of inference workflow #3

jomatthi · 2024-06-11T15:19:25Z

This PR implements a more thought-out version of the task structure of the inference workflow calculating the top tagging scale factors, impacts of the considered nuisance parameters, and pre- and postfit shapes. This is achieved by restructuring the inheritance and internal dependencies between the different blocks, and by introducing a CombineBaseTask and a FitMixin class that incorporates common parameters needed by the other tasks. The same tasks introduced in PR #1 are also implemented in this version of the workflow. In addition, the topsf.PlotShapesV2 task is implemented, taking the output root file of tops.PostFitShapesFromWorkspaceV2. This is however not yet final and still to be polished and expanded to also plot the shapes of the different subprocesses contributions.

To make sure only the shape information relevant for the wanted fit is used, the topsf.CreateDatacards task now requires a wp_name parameter to be set as the SF for the different working points are not to be fitted simultaneously.

The confirm_and_run.sh script now also accepts and expands some predefined and often used extra_params.

An example command that should trigger the entire workflow and result in impact plots for each of the SF:

source $TOPSF_BASE/confirm_and_run.sh

version=my_version

fit_args=(
    "--physics-model" "topsf.inference.combine_physics_model:topsf_model"
    "--wp-name" "very_tight"
    "--fit-modes" "3q:ThetaLike,2q:ThetaLike,0o1q:ThetaLike"
    "--years" "UL17"
    "--pt-bins" "pt_300_400,pt_400_480"
    "--sf-range" "[1,0.2,2.0]"
    "--mode" "exp"
)

common_args=(
    "--version" "${version}"
    "--inference-model" "uhh2"
    "--producers" "weights,features"
)

args=(
    "--poi" "SF__0o1q__UL17__pt_300_400,SF__2q__UL17__pt_300_400,SF__0o1q__UL17__pt_400_480,SF__2q__UL17__pt_400_480,SF__3q__UL17__pt_300_400,SF__3q__UL17__pt_400_480"
    "--cms-label" "'Simulation Private Work'"
    "--height" "800"
    "--topsf.ImpactsV2-mass" "0"
    "--topsf.ImpactsV2-robust-fit" "1"
    "--topsf.ImpactsV2-combine-parallel" "12"  # FIXME How can I remove this?
    "--topsf.GenToysV2-set-parameters" "SF__0o1q__UL17__pt_300_400=1.,SF__2q__UL17__pt_300_400=1.,SF__3q__UL17__pt_300_400=1.,SF__0o1q__UL17__pt_400_480=1.,SF__2q__UL17__pt_400_480=1.,SF__3q__UL17__pt_400_480=1."
    "--topsf.GenToysV2-freeze-gen-parameters" "SF__0o1q__UL17__pt_300_400,SF__2q__UL17__pt_300_400,SF__3q__UL17__pt_300_400,SF__0o1q__UL17__pt_400_480,SF__2q__UL17__pt_400_480,SF__3q__UL17__pt_400_480"
    "--topsf.GenToysV2-gen-name" "_toy"
    ${fit_args[@]}
    ${common_args[@]}
)

command_to_run="law run topsf.PlotImpactsV2 ${args[@]}"
confirm_and_run "$command_to_run"

…ategories.

…ents, run_command.

…input datacard.

…for any combine command of the form 'combine -m method -t -l'.

…zen systematics.

…d postfit shapes in a root file.

…lot of the contributions of the (sub-) processes.

…f all nuisances, collecting the results in a json file.

…to EL9.

… up.

…ean up.

…d of fit mode.

…tasks to perform exp and obs fits.

jomatthi · 2024-08-16T15:29:42Z

Small additions to the code to now also perform observed fits.

An expected fit command that should work (using existing topsf.CreateDatacards outputs):

law run topsf.ImpactsV2 \
--mass 0 \
--robust-fit 1 \
--combine-parallel 12 \
--asimov-data \
--topsf.GenToysV2-set-parameters SF__0o1q__22__pt_300_400=1.,SF__2q__22__pt_300_400=1.,SF__3q__22__pt_300_400=1. \
--topsf.GenToysV2-freeze-gen-parameters SF__0o1q__22__pt_300_400,SF__2q__22__pt_300_400,SF__3q__22__pt_300_400 \
--topsf.GenToysV2-gen-name _toy \
--mode exp \
--physics-model topsf.inference.combine_physics_model:topsf_model \
--wp-name very_tight \
--fit-modes 3q:TagAndProbe,2q:TagAndProbe,0o1q:TagAndProbe \
--years 22 \
--pt-bins pt_300_400 \
--sf-range [1,0.2,2.0] \
--version my_version \
--config run3_sf_2022_preEE_nano_v12 \
--analysis topsf.config.run3.analysis_sf.analysis_sf \
--inference-model uhh2 \
--producers weights,features

And an observed fit command:

law run topsf.ImpactsV2 
--mass 0 
--robust-fit 1 
--combine-parallel 12 
--asimov-data 
--mode obs 
--physics-model topsf.inference.combine_physics_model:topsf_model 
--wp-name very_tight 
--fit-modes 3q:TagAndProbe,2q:TagAndProbe,0o1q:TagAndProbe 
--years 22 
--pt-bins pt_300_400 
--sf-range [1,0.2,2.0] 
--version my_version 
--config run3_sf_2022_preEE_nano_v12 
--analysis topsf.config.run3.analysis_sf.analysis_sf 
--inference-model uhh2 
--producers weights,features

dsavoiu

This looks great, thanks @jomatthi! But I think there are still a few things to be ironed out. See below for some inline comments.

Here I only managed to to a partial review, but maybe one could address this set of comments first and then do another one for the second part. I also have a general comment:

in many places you define properties called <name>_inst which just do return self.<name>, which seems needlessly complex. In most cases you can use self.<name> directly everywhere the property is needed. The _inst would make sense for string-values parameters that actually represent a Python object, e.g. an od.Category or od.Variable. The value of <name>_inst is then set by looking up the object called <name> in the config (or a similar container).

law.cfg

topsf/config/run2/config_sf.py

topsf/config/run2/config_wp.py

topsf/config/run3/analysis_sf.py

topsf/tasks/inference_v2/combine_base.py

topsf/tasks/inference_v2/run_combine.py

… config.

…rectionlib based pileup treatment.

…ation.

jomatthi · 2024-08-23T15:15:39Z

Thank you for the first set of comments and the valuable feedback, @dsavoiu. The last commits address the comments, more or less one commit per comment to hopefully help with reviewing.

Further improvements using mixing are currently in development.

jomatthi · 2024-08-29T12:36:12Z

The latest commit include a restructured version of the inference tasks, now using mixins to declare the used parameters for each tasks.

Also, instead of using one task to produce expected and observed impacts, now two classes are defined: topsf.ImpactsExpV2 and topsf.ImpactsObsV2 to avoid the dependency on the topsf.GenToysV2 task in the observed case. This is however not perfect and definitively up to discussion and currently limited by my limited knowledge on how to properly implement a dynamic dependency. Feedback is appreciated!

jomatthi added 30 commits June 6, 2024 13:50

Add tasks for the combine workflow to perform fits and plot impacts.

665afc4

Fix broken physics model.

01555d9

Add predefined extra commands to confirm_and_run script.

9885c7c

Add wp parameter to topsf.CreateDatacards task that now filters the c…

678423f

…ategories.

Add CombineBaseTask that defines common parameters, sandbox, requirem…

18ae12a

…ents, run_command.

Add CreateWorkspaceV2 that runs combine's text2workspace.py given an …

21f83e3

…input datacard.

Add RunCommand task defining common requirements and parameters used …

8543358

…for any combine command of the form 'combine -m method -t -l'.

Add FitFixin defining the parameters specifying the fit.

967ec90

Add GenToysV2 generating toys from workspace.

f0c28df

Add MultiDimFitV2 performing expected fit first without then with fro…

824d85a

…zen systematics.

Add PostFitShapesFromWorkspaceV2 calculating and storing both pre- an…

2bcb230

…d postfit shapes in a root file.

Add PlotShapesV2 plotting pre- and postfit shapes, missing: stacked p…

20e9ffd

…lot of the contributions of the (sub-) processes.

Add ImpactsV2 performing intial fit followed by calculating impacts o…

2a5030e

…f all nuisances, collecting the results in a json file.

Add PlotImpactsV2 plotting the impacts given a json file.

45e8708

Add V2 tasks to law config.

65fcd22

Init file for inference tasks V2.

22eaa78

Add definition of colors to confirm_and_run script.

7edf2a6

Minor fix of law config.

ac0bed3

Fix scram arch, cmssw and combine versions to adapt to NAF migration …

a0d9f3d

…to EL9.

Include wp name in cat_label and change position of label in plot.

f580008

Update config for 22preEE analysis, including dataset names and clean…

c130797

… up.

Update wp config for 22preEE analysis, including dataset names and cl…

87411ad

…ean up.

Fix broken increment_stats function.

e340f41

Fix postfit shapes task to adapt to change in combine syntax.

79ba342

Fix physics model to write out correct (Anti)SF name.

b5e9d51

Move fit setup parameter and process rates to config for easier access.

4d8739c

Further small adaptations and typo fixes.

8fbcb02

Add CombineHarvester to cmssw_combine sandbox setup script.

8e7f9be

Add probejet_tau2 and probejet_tau3 variables.

7c46170

Include updated cmsdb.

23cf0a4

jomatthi added 5 commits August 15, 2024 14:04

Add possibility to produce plots with a cleaned up legend.

f881c37

Remove fit mode (exp, obs) from CombineBaseTask.

4aa786b

Small change of workspace name to include hashed physics model instea…

3e110f0

…d of fit mode.

Use fit mode param as job_name, adapt reqs in exp fit mode.

a94fb11

Add fit mode (exp, obs) param to various tasks and adapt the combine …

7b22376

…tasks to perform exp and obs fits.

dsavoiu mentioned this pull request Aug 21, 2024

Feature/pretty legend #4

Closed

Merge branch 'master' into feature/fit_tasks_v2

a55c15c

dsavoiu requested changes Aug 22, 2024

View reviewed changes

jomatthi added 16 commits August 22, 2024 17:21

Remove property definition of parameters previously ending with _inst.

753aa42

Change default analysis to also be Run 3 SF analysis to match default…

b62cbbd

… config.

Change dy_lep to dy to match cmsdb convention.

8042b4d

Added comment about not needing minbias_xs any more after move to cor…

cb33d19

…rectionlib based pileup treatment.

Rephrased description of analysis_id in Run 3 to include campaign.

b0fe448

Small adaptations of the Run 3 config.

82e3b36

Rename base inference and combine tasks.

a693b4a

Rename combine verbosity and help parameter.

fc8506a

Use shorter idiom for touching output directory.

caf8d77

Renamed names and cleaned up variable duplicate.

d7e546f

Reverted significance of per_catergory parameter to be True.

2fe8856

Use SettingsParameter to resolve fit_modes.

a6cca62

Log files now also inludes the run combine command and the cwd inform…

a153afb

…ation.

Simplify axes transformation.

558943d

Renamed files to match name change of base classes.

fc6ffd5

Simplify workflow requirement implementation.

9ed30c4

jomatthi added 2 commits August 29, 2024 14:26

Restructured inference tasks to use mixins for parameter definitions.

5cc280e

Fixed output path of topsf.CreateDatacards.

032c4f1

Fix plot name.

cd644e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 2 of inference workflow #3

Version 2 of inference workflow #3

jomatthi commented Jun 11, 2024

jomatthi commented Aug 16, 2024

dsavoiu left a comment

jomatthi commented Aug 23, 2024

jomatthi commented Aug 29, 2024

Version 2 of inference workflow #3

Are you sure you want to change the base?

Version 2 of inference workflow #3

Conversation

jomatthi commented Jun 11, 2024

jomatthi commented Aug 16, 2024

dsavoiu left a comment

Choose a reason for hiding this comment

jomatthi commented Aug 23, 2024

jomatthi commented Aug 29, 2024