CLUE algorithm and Phoenix visualiser #1416

Lysarina · 2024-08-28T13:23:30Z

I am updating ldmx-sw, here are the details.

What are the issues that this addresses?

This resolves #1411

I have worked with ECal clustering this summer and implemented the CLUE algorithm from CMS. The main goal was to use this for electron clustering so I've strived for number of clusters == number of electrons, and also working to get a high energy purity (i.e. how much of the energy in the cluster comes from the same initial electron). It works pretty well! I've also added a simple reclustering clause to try and handle merged clusters, which results in a more prominent peak at number of clusters == number of electrons, but also introduces some overcounting, so it is currently an optional parameter.

This also partly resolves #1394 in getting the cluster producer to work

Additionally, I have added a Phoenix-based visualiser LDMX-VIS, located in EventDisplay/ldmx-vis. I have also made an analyzer VisGenerator in DQM (maybe not the right folder for this) that writes event data into JSON files that can be visualised by Phoenix.

Fixes

While I feel mostly happy with the code, I created what is basically a copy of WorkingCluster.h in Ecal called WorkingEcalCluster, that stores EcalHits as normal objects instead of pointers, as when I started I did not really know how to handle pointers :^) I never had time to fix so that CLUE would use WorkingCluster instead but this would be a nice fix. WorkingEcalCluster does contain some other improvements such as a parameterless constructor and eliminating EcalGeometry as it is not actually needed in that file.

Check List

I successfully compiled ldmx-sw with my developments
I ran my developments and the following shows that they are successful.

Below are histograms produced by EcalClusterAnalyzer, an analyzer I created to analyze the performance of the clustering

Below are some graphs for unmodified CLUE
2 electrons

3 electrons

versus the initial algorithm (TemplatedClusterFinder)
2 electrons

3 electrons

Reclustering with CLUE
2e

3e

Visualisation examples

Here is the (slightly messy) config I use for ldmx fire

from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('sim')
events = 100
p.maxEvents = events
p.termLogLevel = 0
p.logFrequency = 1

nbrOfElectrons = 2

# Initial algo
seedThresh = 350. 
cutoff = 10.

# CLUE
CLUE = True
debug = False
recluster = False
layers = 1
dc = 0.
rhoc = 550.
deltac = 10.
deltao = 40.

inputFiles = []
for i in range(1, 31):
    inputFiles.append(f"data/{nbrOfElectrons}e/sim_{i}_{nbrOfElectrons}e.root")
p.inputFiles = inputFiles
p.outputFiles = [
 "all.root"
 ]
p.histogramFile = "clusters.root"

import LDMX.Ecal.EcalGeometry # geometry required by sim

# import chip / geometry conditions
import LDMX.Ecal.ecal_hardcoded_conditions
import LDMX.Ecal.digi as ecal_digi
import LDMX.DQM.dqm as dqm
import LDMX.Ecal.ecalClusters as cl
import LDMX.Ecal.ecal_trig_digi as etrigdigi
import LDMX.Trigger.trigger_energy_sums as etrig

json = dqm.VisGenerator()
json.filename = "vis.json"
json.originIdAvailable = True
json.nbrOfElectrons = nbrOfElectrons

cluster = cl.EcalClusterProducer()
cluster.seedThreshold = seedThresh
cluster.cutoff = cutoff
# CLUE
cluster.CLUE = CLUE
cluster.dc = dc
cluster.rhoc = rhoc
cluster.deltac = deltac
cluster.deltao = deltao
cluster.debug = debug
cluster.nbrOfLayers = layers
cluster.reclustering = recluster

clan = cl.EcalClusterAnalyzer()
clan.nbrOfElectrons = nbrOfElectrons

p.logPerformance = True

p.sequence.extend([
    ecal_digi.EcalDigiProducer(),
    ecal_digi.EcalRecProducer(),
    cluster,
    clan,
    json,
    ])

…ualisation

Ugly import, fix

+ some visgenerator tweaks

Only non-empty clusters above seed threshold are saved

Shows incident IDs and PDG IDs that contributed to hit Needs to be cleaned a bit Added option for ground truth

Generates a bunch of histograms to evaluate how good clusters are

Does not work too well

Committing in case I get a stroke of genius later

Chonky commit

tomeichlersmith

Thank you for creating this PR! I'm wondering if its possible to separate the visualization stuff from the clustering stuff. They have rather separate purposes and so separating them would allow them to be reviewed at different rates.

No worries if not! Just wanted to check.

tvami

I second the request two break this up to two PRs, also it seems there are some files that were not supposed to be pushed, like the .vscode directory and .yarn, but also I dont think we need the CMS and ATLAS logos, etc. :)

bryngemark · 2024-09-02T08:35:48Z

You might be right @tomeichlersmith that visualization and clustering are conceptually separate... even if they were used in tandem for this project. Do you have some handy github wizardry that you could share with @Lysarina for how to cherry pick the different parts into two separate PRs, to help this move along?

Also, any good ideas for where to keep the visualizer analyzer? In EventDisplay or DQM?

tomeichlersmith · 2024-09-02T13:49:05Z

In this specific situation, I would just make new branches and copy over your updates to those new branches. This also give you the opportunity to make branch names that reference the issue number. You can use git checkout to do this copying by specifying the files you want to get from that branch after --. (Note: tab-complete probably won't work since these files don't exist on trunk.)

git switch trunk
git pull
git switch -c 1411-ecal-clue-clustering
# just an example, you'll need to do more of these to get all of the files
git checkout ella-dev-clustering -- Ecal/include/Ecal/CLUE.h

You can git commit whenever you want. The files you've checkouted this way will already be git added. I would suggest many small commits so that the commit messages reference what you are adding but to each their own.

The same procedure can be done for the vis branch (make sure to switch back to trunk before creating the new branch), but I have some cleanup notes that may be helpful at this stage.

I think the processors that produce JSON for Phoenix should reside in the EventDisplay submodule. You can delete everything that is currently there - it is broken and no-one has used it in a long time. Use git mv after you git checkout a file so that git registers the move.
I do not want a full copy of nlohmann json.hpp in our source repo. I think adding it as a submodule git submodule add https://github.com/nlohmann/json.git EventDisplay/json makes the most sense and then using add_subdirectory within EventDisplay/CMakeLists.txt. https://json.nlohmann.me/integration/cmake/#embedded
I would like to avoid copying a pile of Typescript into ldmx-sw so if we can use Phoenix another way, I would prefer that; however, if we need to create our own Typescript application in order to properly use Phoenix and all its features, then I second @tvami 's points about removing the extra files pertaining to other experiments. (Maybe we put the Typescript app in some other repository? Does it even run from within the container? These are the types of details I would iron out on an event display-specific PR)

EinarElen · 2024-09-03T10:57:32Z

* I do not want a full copy of nlohmann json.hpp in our source repo. I think adding it as a submodule `git submodule add https://github.com/nlohmann/json.git EventDisplay/json` makes the most sense and then using `add_subdirectory` within `EventDisplay/CMakeLists.txt`. https://json.nlohmann.me/integration/cmake/#embedded

Is there a reason for not wanting the full header? I think that is a super common way to use it

tomeichlersmith · 2024-09-03T14:27:44Z

I don't have a good reason, I just like that its easier to find the original project (and thus its documentation). I also want to point out that acts also uses nlohmann/json so we could do an install into the image for both to use in the future.

Probably should have worded that comment less strongly - I would like to move it to a submodule but its not necessary. If we move to putting it in the image, then we can just as easily remove the header here as well.

bryngemark · 2024-11-21T14:54:08Z

SimCore/src/SimCore/SDs/EcalSD.cxx

@tomeichlersmith @EinarElen I have some evidence that this change messes with the overlay producer. I suspect it is because it collapses all trackIDs, incident IDs and PDG codes to nonsensical values. Can you see, though, how the changes done here would affect it?

here's what is done in overlay for ecal

ldmx-sw/Recon/src/Recon/OverlayProducer.cxx

Lines 271 to 286 in 96f6980

if (needsContribsAdded) { // special treatment for (for now only)

// ecal

int overlayHitID = overlayHit.getID();

if (hitMap.find(overlayHitID) ==

hitMap.end()) { // there wasn't already a simhit in this id

hitMap[overlayHitID] = ldmx::SimCalorimeterHit();

hitMap[overlayHitID].setID(overlayHitID);

std::vector<float> hitPos = overlayHit.getPosition();

hitMap[overlayHitID].setPosition(hitPos[0], hitPos[1], hitPos[2]);

}

// add the overlay hit (as a) contrib

// incidentID = -1000, trackID = -1000, pdgCode = 0 <-- these are

// set in the header for now but could be parameters

hitMap[overlayHitID].addContrib(overlayIncidentID_, overlayTrackID_,

overlayPdgCode_,

overlayHit.getEdep(), overlayTime);

The effect I see is that the ecal total energy comes out as really strange spikes at multiples of roughly 5 GeV. I have debugged my overlay config. When running it with trunk, though, my inclusive 2e events all came out nicely distributed around 16 GeV.

to add some details, i think this was developed using a multiparticle gun sample, where the trackID for each beam electron is different. in overlay, all beam electrons have the same truth IDs, and i could make some sort of intelligent mapping of them to different numbers (e.g. the number of the pileup interaction as in 2, 3, ... , N_ele or perhaps that but negative to avoid clashing with e.g. a hard brem trackID in the process of interest).

for any signal or ecalPN studies with ecal clustering we need to run overlay and not an mpg sample.

This does alter the on-disk layout of the SimCalorimeterHit which should mean that ROOT is warning you, but ROOT may not be warning you because we have "just" added one more variable which ROOT may be able to handle without concern. (i.e. reading old style hits without the originID using the class that does have originID)

The total SimCalorimterHit edep is calculated while we addContrib so since OverlayProducer is creating these overlayed hits, there could be an issue there?

Things I would try:

I would update the version number for the SimCalorimeterHit in its ClassDef macro to see if that helps.

Update overlay producer, copying this origin parameter over?

Besides that, I really don't know why its happening right now. I would confirm that the issue is not present when the in-memory data layout and the on-disk data layout is the same. (i.e. the SimCalorimeterHit class hasn't been changed).

Ok so you're saying that even if I generated the SimHits "from scratch" with this branch including this change to SimCore (which I did), there might be some confusion since the ClassDef number wasn't updated?

As for confirming, I have only been able to verify that this behaviour does not happen with trunk. The only place I think is reasonable to look for an effect, is right here in the Ecal SD and SimCalorimeterHit changes.

I would have asumed that since I don't specify originID in the overlay implementation, it is just set to its default value (-1) if left unspecified. Which shouldn't upset any downstream hit reconstruction (just the origin tracking).

I take it that there's nothing obvious here and I'll have to dig deeper.

My hypothesis about the ClassDef number would only apply if you were reading the SimHits from files created with trunk using this branch. I don't think there would be confusion if the SimHits were created and read on this branch.

Besides that, I have the same assumptions so I think this does require digging deeper unfortunately.

Lysarina added 30 commits August 28, 2024 14:20

Add VisGenerator, analyzer for producing JSON from event data for vis…

93e24b1

…ualisation

Updates to ldmx-vis

035d2e9

Checkpoint for VisGenertor

d6d91d7

First draft of ldmx phoenix visualization

0cc403c

Colors array

eee6022

Small fixes

3ff7c7f

Added nlohmann library for JSON

3a1c9f5

Ugly import, fix

Short installation readme for ldmx-vis

5fbe589

Add some bools to VisGenerator to choose JSON content

8c6c9bc

EcalClusterProducer is now runnable

0e05643

Merge

de65817

Remove empty ecal clusters from collection

bb2d34a

+ some visgenerator tweaks

Clean saved ecal clusters

3d72269

Only non-empty clusters above seed threshold are saved

More vis info

fc5461b

Shows incident IDs and PDG IDs that contributed to hit Needs to be cleaned a bit Added option for ground truth

Add EcalClusterAnalyzer

648bd64

Generates a bunch of histograms to evaluate how good clusters are

SimCalorimeterHits now have origin electron ID in Contrib

7e25996

Update vis and clusteranalyzer with originid

81f0ae4

Add electron truth visualization

87e9dee

Implemented ATLAS topoclustering method

46f5f51

Does not work too well

Some refactoring of visgenerator

4044c33

More histograms

84783ab

First draft of CLUE

fd37a6b

WIP; improving merged cluster splitting

b6a8688

First okay-ish version of single-layer CLUE + cleanup

542380f

Layer visualization + commented out calocell vis

3a0338f

Tuning parameters

8e47820

Changed event display default file

3ec004e

Git complained about file size, cut it in half

9bfcc1d

Removed some obsolete files

862f57e

A shabby layering expansion for CLUE

ed0fd84

Lysarina added 11 commits August 28, 2024 14:23

Checkpoint for mixed hit handling

26a31aa

A very bad mixed hit handling, will be removed

2e103f1

Committing in case I get a stroke of genius later

Final updates to bad mixed hit handling

f03e0f1

Cleaning + generalization of nbr of electrons

9ec8b36

Generalisation updates

2c90bf8

Removed printline

98e99fa

Generalization to origin ID storage

65e52e9

3D Clustering, comments, cleaning

a181335

Chonky commit

Changed variable names

d9c5f75

clang

f474d22

clang

a0093c2

Lysarina requested review from EinarElen, tomeichlersmith and bryngemark August 28, 2024 13:23

tomeichlersmith requested a review from tvami August 28, 2024 14:09

tomeichlersmith reviewed Aug 28, 2024

View reviewed changes

tvami requested changes Aug 28, 2024

View reviewed changes

tvami mentioned this pull request Sep 26, 2024

Update .gitignore to include DS_Store and vscode #1476

Merged

2 tasks

bryngemark reviewed Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLUE algorithm and Phoenix visualiser #1416

CLUE algorithm and Phoenix visualiser #1416

Lysarina commented Aug 28, 2024 •

edited

Loading

tomeichlersmith left a comment

tvami left a comment

bryngemark commented Sep 2, 2024

tomeichlersmith commented Sep 2, 2024

EinarElen commented Sep 3, 2024

tomeichlersmith commented Sep 3, 2024

bryngemark Nov 21, 2024 •

edited

Loading

bryngemark Nov 21, 2024

tomeichlersmith Nov 22, 2024

bryngemark Nov 25, 2024

tomeichlersmith Nov 25, 2024

	if (needsContribsAdded) { // special treatment for (for now only)
	// ecal
	int overlayHitID = overlayHit.getID();
	if (hitMap.find(overlayHitID) ==
	hitMap.end()) { // there wasn't already a simhit in this id
	hitMap[overlayHitID] = ldmx::SimCalorimeterHit();
	hitMap[overlayHitID].setID(overlayHitID);
	std::vector<float> hitPos = overlayHit.getPosition();
	hitMap[overlayHitID].setPosition(hitPos[0], hitPos[1], hitPos[2]);
	}
	// add the overlay hit (as a) contrib
	// incidentID = -1000, trackID = -1000, pdgCode = 0 <-- these are
	// set in the header for now but could be parameters
	hitMap[overlayHitID].addContrib(overlayIncidentID_, overlayTrackID_,
	overlayPdgCode_,
	overlayHit.getEdep(), overlayTime);

CLUE algorithm and Phoenix visualiser #1416

Are you sure you want to change the base?

CLUE algorithm and Phoenix visualiser #1416

Conversation

Lysarina commented Aug 28, 2024 • edited Loading

What are the issues that this addresses?

Fixes

Check List

tomeichlersmith left a comment

Choose a reason for hiding this comment

tvami left a comment

Choose a reason for hiding this comment

bryngemark commented Sep 2, 2024

tomeichlersmith commented Sep 2, 2024

EinarElen commented Sep 3, 2024

tomeichlersmith commented Sep 3, 2024

bryngemark Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

bryngemark Nov 21, 2024

Choose a reason for hiding this comment

tomeichlersmith Nov 22, 2024

Choose a reason for hiding this comment

bryngemark Nov 25, 2024

Choose a reason for hiding this comment

tomeichlersmith Nov 25, 2024

Choose a reason for hiding this comment

Lysarina commented Aug 28, 2024 •

edited

Loading

bryngemark Nov 21, 2024 •

edited

Loading