Skip to content

Understand and fix the observed cluster splitting in the CMS Stage 2 reconstruction on FPGAs

License

Notifications You must be signed in to change notification settings

isehle/bye_splits

 
 

Repository files navigation

Table of Contents

  1. Installation
  2. Data production
    1. Skimming
    2. Data sources
  3. Reconstruction Chain
    1. Cluster Size Studies
  4. Event Visualization
    1. Setup
    2. Setup in local browser
    3. Visualization in local browser
      1. 2D display app
      2. 3D display app
    4. Visualization with OpenShift OKD4
      1. Additional information
  5. Cluster Radii Studies
  6. Merging plotly and bokeh with flask
    1. Introduction
    2. Flask embedding
      1. Note
  7. Producing tikz standalone pictures

img

This repository reproduces the CMS HGCAL L1 Stage2 reconstruction chain in Python for quick testing. It can generate an event visualization app. It was originally used for understanding and fixing the observed cluster splitting.

Installation

# setup conda environment
create -n <EnvName> python=3 pandas uproot pytables h5py
conda activate <EnvName>

# setup a ssh key if not yet done and clone the repository
git clone [email protected]:bfonta/bye_splits.git
# enforce git hooks locally (required for development)
git config core.hooksPath .githooks

The user could also use Mamba, a fast and robust package manager. It is fully compatible with conda packages and supports most of conda’s commands.

Data production

Skimming

To make the size of the files more manageable, a skimming step was implemented that relies on ROOT's RDataFrame. Several cuts are applied, and additionally many type conversions are run for uproot usage at later steps. To run it:

python bye_splits/production/produce.py --nevents -1 --particles photons

where "-1" represents all events, and the input file is defined in config.yaml.

Data sources

This framework relies on photon-, electron- and pion-gun samples produced via CRAB. The most up to date versions are currently stored under:

Photons (PU0) /dpm/in2p3.fr/home/cms/trivcat/store/user/lportale/DoublePhoton_FlatPt-1To100/GammaGun_Pt1_100_PU0_HLTSummer20ReRECOMiniAOD_2210_BCSTC-FE-studies_v3-29-1_realbcstc4/221025_153226/0000/
Electrons (PU0) /dpm/in2p3.fr/home/cms/trivcat/store/user/lportale/DoubleElectron_FlatPt-1To100/ElectronGun_Pt1_100_PU200_HLTSummer20ReRECOMiniAOD_2210_BCSTC-FE-studies_v3-29-1_realbcstc4/221102_102633/0000/
Pions (PU0) /dpm/in2p3.fr/home/cms/trivcat/store/user/lportale/SinglePion_PT0to200/SinglePion_Pt0_200_PU0_HLTSummer20ReRECOMiniAOD_2210_BCSTC-FE-studies_v3-29-1_realbcstc4/221102_103211/0000
Photons (PU200) /eos/user/i/iehle/data/PU200/photons/ntuples
Electrons (PU200) /eos/user/i/iehle/data/PU200/electrons/ntuples

The PU0 files above were merged and are stored under /data_CMS/cms/alves/L1HGCAL/, accessible to LLR users and under /eos/user/b/bfontana/FPGAs/new_algos/, accessible to all lxplus and LLR users. The latter is used since it is well interfaced with CERN services. The PU200 files were merged and stored under /eos/user/i/iehle/data/PU200/<particle>/.

Reconstruction Chain

The reconstruction chain is implemented in Python. To run it:

python bye_splits/run_chain.py

where one can use the -h flag to visualize available options. To use the steps separately in your own script use the functions defined under bye_splits/tasks/, just as done in the iterative_optimization.py script.

For plotting results as a function of the optimization trigger cell parameter:

python plot/meta_algorithm.py

The above will create html files with interactive outputs.

Cluster Size Studies

The script bye_splits/scripts/cluster_size.py reads a configuration file bye_splits/scripts/cl_size_params.yaml and runs the Reconstruction Chain on the .root inside corresponding to the chosen particle, where the clustering step is repeated for a range of cluster radii that is specified in the parameter file under cl_size: Coeffs.

The most convenient way of running the study is to do:

bash run_cluster_size.sh <username>

where <username> is your lxplus username, creating .hdf5 files containing Pandas DFs containing cluster properties (notably energy, eta, phi) and associated gen-level particle information for each radius. The bash script acts as a wrapper for the python script, setting a few options that are convenient for the cluster size studies that are not the default options for the general reconstruction chain. As of now, the output .hdf5 files will be written to your local directory using the structure:

├── /<base_dir>
│            ├── out
│            ├── data
│            │   ├──new_algos

with the files ending up in new_algos/. Currently working on implementing an option to send the files directly to your eos/ directory, assuming the structure:

├── /eos/user/<first_letter>/<username>
│                                   ├── out
│                                   ├── data
│                                   │   ├──PU0
│                                   │   │   ├──electrons
│                                   │   │   ├──photons
│                                   │   │   ├──pions
│                                   │   ├──PU200
│                                   │   │   ├──electrons
│                                   │   │   ├──photons
│                                   │   │   ├──pions

Event Visualization

The repository creates two web apps that can be visualized in a browser. The code is stored under bye_splits/plot.

Setup

Please install the following from within the conda environment you should have already created:

conda install -c conda-forge pyarrow
#if the above fails: python -m pip install pyarrow
python3 -m pip install --upgrade pip setuptools #to avoid annoying "Setuptools is replacing distutils." warning

Setup in local browser

Since browser usage directly in the server will necessarily be slow, we can:

Use LLR's intranet at llruicms01.in2p3.fr:<port>/display

Forward it to our local machines via ssh. To establish a connection between the local machine and the remote llruicms01 server, passing by the gate, use:

ssh -L <port>:llruicms01.in2p3.fr:<port> -N <llr_username>@llrgate01.in2p3.fr
# for instance: ssh -L 8080:lruicms01.in2p3.fr:8080 -N [email protected]

The two ports do not have to be the same, but it avoids possible confusion. Leave the terminal open and running (it will not produce any output).

Visualization in local browser

1) 2D display app

In a new terminal window go to the llruicms01 machines and launch one of the apps, for instance:

bokeh serve bye_splits/plot/display/ --address llruicms01.in2p3.fr --port <port>  --allow-websocket-origin=localhost:<port>
# if visualizing directly at LLR: --allow-websocket-origin=llruicms01.in2p3.fr:<port>

This uses the server-creation capabilities of bokeh, a python package for interactive visualization (docs). Note the port number must match. For further customisation of bokeh serve see the serve documentation. The above command should give access to the visualization under http://localhost:8080/display. For debugging, just run python bye_splits/plot/display/main.py and see that no errors are raised.

2) 3D display app

Make sure you have activated your conda environment. conda activate

Run the following lines. With these commands, some useful packages to run the web application (e.g. dash, uproot, awkward, etc) will be installed in your conda environment:

conda install dash
python3 -m pip install dash-bootstrap-components
python3 -m pip install dash-bootstrap-templates
conda install pandas pyyaml numpy bokeh awkward uproot h5py pytables
conda install -c conda-forge pyarrow fsspec

Then go to the llruicms01 machine (if you are indide LLR intranet) or to your preferred machine and launch:

python bye_splits/plot/display_plotly/main.py --port 5004 --host localhost

In a browser, go to http://localhost:5004/. Make sure you have access to the geometry and event files, to be configured in config.yaml.

Visualization with OpenShift OKD4

We use the S2I (Source to Image) service via CERN's PaaS (Platform-as-a-Service) using OpenShift to deploy and host web apps in the CERN computing environment here. There are three ways to deploys such an app: S2I represents the easiest (but less flexible) of the three; instructions here. It effectively abstracts away the need for Dockerfiles.

We will use S2I's simplest configuration possible under app.sh. The image is created alongside the packages specified in requirements.txt. The two latter definitions are documented here.

We are currently running a pod at https://viz2-hgcal-event-display.app.cern.ch/. The port being served by bokeh in app.sh must match the one the pod is listening to, specified at configuration time before deployment in the OpenShift management console at CERN. The network visibility was also updated to allow access from outside the CERN network.

Additional information

Cluster Radii Studies

A DashApp has been built to interactively explore the effect of cluster size on various cluster properties, which is currently hosted at https://bye-splits-app-hgcal-cl-size-studies.app.cern.ch/. To run the app locally, you can do:

bash run_cluster_app.sh <username>

where <username> is your lxplus username. The app reads the configuration file bye_splits/plot/display_clusters/config.yaml and assumes that you have a directory structure equivalent to the directories described in the cluster size step (depending on your choice of ```Local```).

It performs the necessary analysis on the files in the specified directory to generate the data for each page, which are themselves written to files in this directory. In order to minimize duplication and greatly speed up the user experience, if one of these files does not exist in your own directory, it looks for it under the appropriate directories (listed in our Data Sources), where a large number of the possible files already exist. The same procedure is used for reading the generated cluster size files, so you can use the app without having had to run the study yourself.

Merging plotly and bokeh with flask

Introduction

Flask is a python micro web framework to simplify web development. It is considered "micro" because it’s lightweight and only provides essential components. Given that plotly's dashboard framework, dash, runs on top of flask, and that bokeh can produce html components programatically (which can be embedded in a flask app), it should be possible to develop a flask-powered web app mixing these two plotting packages. Having a common web framework also simplifies future integration.

Flask embedding

The embedding of bokeh and plotly plots within flask is currently demonstrated in plot/join/app.py. Two servers run: one from flask and the other from bokeh, so special care is required to ensure the browser where the app is being served listens to both ports. Listening to flask's port only will cause the html plot/join/templates/embed.html to be rendered without bokeh plots.

Note

Running a server is required when more advanced callbacks are needed. Currently only bokeh has a server of its own; plotly simply creates an html block with all the required information. If not-so-simple callbacks are required for plotly plots, another port will have to be listened to.

Producing tikz standalone pictures

For the purpose of illustration, tikz standalone script have been included under docs/tikz/. To run them (taking docs/tikz/flowchart.tex as an example):

cd docs/tikz/
pdflatex -shell-escape flowchart.tex

The above should produce the flowchart.svg file. The code depends on latex and pdf2svg.

About

Understand and fix the observed cluster splitting in the CMS Stage 2 reconstruction on FPGAs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.0%
  • C++ 3.2%
  • Shell 2.1%
  • Other 0.7%