Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2 of inference workflow #3

Open
wants to merge 55 commits into
base: master
Choose a base branch
from

Conversation

jomatthi
Copy link
Contributor

This PR implements a more thought-out version of the task structure of the inference workflow calculating the top tagging scale factors, impacts of the considered nuisance parameters, and pre- and postfit shapes. This is achieved by restructuring the inheritance and internal dependencies between the different blocks, and by introducing a CombineBaseTask and a FitMixin class that incorporates common parameters needed by the other tasks. The same tasks introduced in PR #1 are also implemented in this version of the workflow. In addition, the topsf.PlotShapesV2 task is implemented, taking the output root file of tops.PostFitShapesFromWorkspaceV2. This is however not yet final and still to be polished and expanded to also plot the shapes of the different subprocesses contributions.

To make sure only the shape information relevant for the wanted fit is used, the topsf.CreateDatacards task now requires a wp_name parameter to be set as the SF for the different working points are not to be fitted simultaneously.

The confirm_and_run.sh script now also accepts and expands some predefined and often used extra_params.

An example command that should trigger the entire workflow and result in impact plots for each of the SF:

source $TOPSF_BASE/confirm_and_run.sh

version=my_version

fit_args=(
    "--physics-model" "topsf.inference.combine_physics_model:topsf_model"
    "--wp-name" "very_tight"
    "--fit-modes" "3q:ThetaLike,2q:ThetaLike,0o1q:ThetaLike"
    "--years" "UL17"
    "--pt-bins" "pt_300_400,pt_400_480"
    "--sf-range" "[1,0.2,2.0]"
    "--mode" "exp"
)

common_args=(
    "--version" "${version}"
    "--inference-model" "uhh2"
    "--producers" "weights,features"
)

args=(
    "--poi" "SF__0o1q__UL17__pt_300_400,SF__2q__UL17__pt_300_400,SF__0o1q__UL17__pt_400_480,SF__2q__UL17__pt_400_480,SF__3q__UL17__pt_300_400,SF__3q__UL17__pt_400_480"
    "--cms-label" "'Simulation Private Work'"
    "--height" "800"
    "--topsf.ImpactsV2-mass" "0"
    "--topsf.ImpactsV2-robust-fit" "1"
    "--topsf.ImpactsV2-combine-parallel" "12"  # FIXME How can I remove this?
    "--topsf.GenToysV2-set-parameters" "SF__0o1q__UL17__pt_300_400=1.,SF__2q__UL17__pt_300_400=1.,SF__3q__UL17__pt_300_400=1.,SF__0o1q__UL17__pt_400_480=1.,SF__2q__UL17__pt_400_480=1.,SF__3q__UL17__pt_400_480=1."
    "--topsf.GenToysV2-freeze-gen-parameters" "SF__0o1q__UL17__pt_300_400,SF__2q__UL17__pt_300_400,SF__3q__UL17__pt_300_400,SF__0o1q__UL17__pt_400_480,SF__2q__UL17__pt_400_480,SF__3q__UL17__pt_400_480"
    "--topsf.GenToysV2-gen-name" "_toy"
    ${fit_args[@]}
    ${common_args[@]}
)

command_to_run="law run topsf.PlotImpactsV2 ${args[@]}"
confirm_and_run "$command_to_run"

…for any combine command of the form 'combine -m method -t -l'.
…lot of the contributions of the (sub-) processes.
…f all nuisances, collecting the results in a json file.
@jomatthi
Copy link
Contributor Author

Small additions to the code to now also perform observed fits.

An expected fit command that should work (using existing topsf.CreateDatacards outputs):

law run topsf.ImpactsV2 \
--mass 0 \
--robust-fit 1 \
--combine-parallel 12 \
--asimov-data \
--topsf.GenToysV2-set-parameters SF__0o1q__22__pt_300_400=1.,SF__2q__22__pt_300_400=1.,SF__3q__22__pt_300_400=1. \
--topsf.GenToysV2-freeze-gen-parameters SF__0o1q__22__pt_300_400,SF__2q__22__pt_300_400,SF__3q__22__pt_300_400 \
--topsf.GenToysV2-gen-name _toy \
--mode exp \
--physics-model topsf.inference.combine_physics_model:topsf_model \
--wp-name very_tight \
--fit-modes 3q:TagAndProbe,2q:TagAndProbe,0o1q:TagAndProbe \
--years 22 \
--pt-bins pt_300_400 \
--sf-range [1,0.2,2.0] \
--version my_version \
--config run3_sf_2022_preEE_nano_v12 \
--analysis topsf.config.run3.analysis_sf.analysis_sf \
--inference-model uhh2 \
--producers weights,features

And an observed fit command:

law run topsf.ImpactsV2 
--mass 0 
--robust-fit 1 
--combine-parallel 12 
--asimov-data 
--mode obs 
--physics-model topsf.inference.combine_physics_model:topsf_model 
--wp-name very_tight 
--fit-modes 3q:TagAndProbe,2q:TagAndProbe,0o1q:TagAndProbe 
--years 22 
--pt-bins pt_300_400 
--sf-range [1,0.2,2.0] 
--version my_version 
--config run3_sf_2022_preEE_nano_v12 
--analysis topsf.config.run3.analysis_sf.analysis_sf 
--inference-model uhh2 
--producers weights,features

@dsavoiu dsavoiu mentioned this pull request Aug 21, 2024
Copy link
Member

@dsavoiu dsavoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @jomatthi! But I think there are still a few things to be ironed out. See below for some inline comments.

Here I only managed to to a partial review, but maybe one could address this set of comments first and then do another one for the second part. I also have a general comment:

  • in many places you define properties called <name>_inst which just do return self.<name>, which seems needlessly complex. In most cases you can use self.<name> directly everywhere the property is needed. The _inst would make sense for string-values parameters that actually represent a Python object, e.g. an od.Category or od.Variable. The value of <name>_inst is then set by looking up the object called <name> in the config (or a similar container).

law.cfg Outdated Show resolved Hide resolved
law.cfg Show resolved Hide resolved
topsf/config/run2/config_sf.py Outdated Show resolved Hide resolved
topsf/config/run2/config_wp.py Show resolved Hide resolved
topsf/config/run3/analysis_sf.py Outdated Show resolved Hide resolved
topsf/tasks/inference_v2/combine_base.py Show resolved Hide resolved
topsf/tasks/inference_v2/combine_base.py Outdated Show resolved Hide resolved
topsf/tasks/inference_v2/combine_base.py Outdated Show resolved Hide resolved
topsf/tasks/inference_v2/run_combine.py Outdated Show resolved Hide resolved
topsf/tasks/inference_v2/run_combine.py Outdated Show resolved Hide resolved
@jomatthi
Copy link
Contributor Author

Thank you for the first set of comments and the valuable feedback, @dsavoiu. The last commits address the comments, more or less one commit per comment to hopefully help with reviewing.

Further improvements using mixing are currently in development.

@jomatthi
Copy link
Contributor Author

The latest commit include a restructured version of the inference tasks, now using mixins to declare the used parameters for each tasks.

Also, instead of using one task to produce expected and observed impacts, now two classes are defined: topsf.ImpactsExpV2 and topsf.ImpactsObsV2 to avoid the dependency on the topsf.GenToysV2 task in the observed case. This is however not perfect and definitively up to discussion and currently limited by my limited knowledge on how to properly implement a dynamic dependency. Feedback is appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants