-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLUE algorithm and Phoenix visualiser #1416
base: trunk
Are you sure you want to change the base?
Conversation
Ugly import, fix
+ some visgenerator tweaks
Only non-empty clusters above seed threshold are saved
Shows incident IDs and PDG IDs that contributed to hit Needs to be cleaned a bit Added option for ground truth
Generates a bunch of histograms to evaluate how good clusters are
Does not work too well
Committing in case I get a stroke of genius later
Chonky commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for creating this PR! I'm wondering if its possible to separate the visualization stuff from the clustering stuff. They have rather separate purposes and so separating them would allow them to be reviewed at different rates.
No worries if not! Just wanted to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second the request two break this up to two PRs, also it seems there are some files that were not supposed to be pushed, like the .vscode directory and .yarn, but also I dont think we need the CMS and ATLAS logos, etc. :)
You might be right @tomeichlersmith that visualization and clustering are conceptually separate... even if they were used in tandem for this project. Do you have some handy github wizardry that you could share with @Lysarina for how to cherry pick the different parts into two separate PRs, to help this move along? Also, any good ideas for where to keep the visualizer analyzer? In |
In this specific situation, I would just make new branches and copy over your updates to those new branches. This also give you the opportunity to make branch names that reference the issue number. You can use
You can The same procedure can be done for the vis branch (make sure to switch back to
|
Is there a reason for not wanting the full header? I think that is a super common way to use it |
I don't have a good reason, I just like that its easier to find the original project (and thus its documentation). I also want to point out that acts also uses nlohmann/json so we could do an install into the image for both to use in the future. Probably should have worded that comment less strongly - I would like to move it to a submodule but its not necessary. If we move to putting it in the image, then we can just as easily remove the header here as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomeichlersmith @EinarElen I have some evidence that this change messes with the overlay producer. I suspect it is because it collapses all trackIDs, incident IDs and PDG codes to nonsensical values. Can you see, though, how the changes done here would affect it?
here's what is done in overlay for ecal
ldmx-sw/Recon/src/Recon/OverlayProducer.cxx
Lines 271 to 286 in 96f6980
if (needsContribsAdded) { // special treatment for (for now only) | |
// ecal | |
int overlayHitID = overlayHit.getID(); | |
if (hitMap.find(overlayHitID) == | |
hitMap.end()) { // there wasn't already a simhit in this id | |
hitMap[overlayHitID] = ldmx::SimCalorimeterHit(); | |
hitMap[overlayHitID].setID(overlayHitID); | |
std::vector<float> hitPos = overlayHit.getPosition(); | |
hitMap[overlayHitID].setPosition(hitPos[0], hitPos[1], hitPos[2]); | |
} | |
// add the overlay hit (as a) contrib | |
// incidentID = -1000, trackID = -1000, pdgCode = 0 <-- these are | |
// set in the header for now but could be parameters | |
hitMap[overlayHitID].addContrib(overlayIncidentID_, overlayTrackID_, | |
overlayPdgCode_, | |
overlayHit.getEdep(), overlayTime); |
The effect I see is that the ecal total energy comes out as really strange spikes at multiples of roughly 5 GeV. I have debugged my overlay config. When running it with trunk
, though, my inclusive 2e events all came out nicely distributed around 16 GeV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to add some details, i think this was developed using a multiparticle gun sample, where the trackID for each beam electron is different. in overlay, all beam electrons have the same truth IDs, and i could make some sort of intelligent mapping of them to different numbers (e.g. the number of the pileup interaction as in 2, 3, ... , N_ele or perhaps that but negative to avoid clashing with e.g. a hard brem trackID in the process of interest).
for any signal or ecalPN studies with ecal clustering we need to run overlay and not an mpg sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does alter the on-disk layout of the SimCalorimeterHit which should mean that ROOT is warning you, but ROOT may not be warning you because we have "just" added one more variable which ROOT may be able to handle without concern. (i.e. reading old style hits without the originID using the class that does have originID)
The total SimCalorimterHit edep is calculated while we addContrib
so since OverlayProducer is creating these overlayed hits, there could be an issue there?
Things I would try:
- I would update the version number for the SimCalorimeterHit in its
ClassDef
macro to see if that helps. - Update overlay producer, copying this origin parameter over?
Besides that, I really don't know why its happening right now. I would confirm that the issue is not present when the in-memory data layout and the on-disk data layout is the same. (i.e. the SimCalorimeterHit class hasn't been changed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so you're saying that even if I generated the SimHits "from scratch" with this branch including this change to SimCore (which I did), there might be some confusion since the ClassDef number wasn't updated?
As for confirming, I have only been able to verify that this behaviour does not happen with trunk. The only place I think is reasonable to look for an effect, is right here in the Ecal SD and SimCalorimeterHit changes.
I would have asumed that since I don't specify originID in the overlay implementation, it is just set to its default value (-1) if left unspecified. Which shouldn't upset any downstream hit reconstruction (just the origin tracking).
I take it that there's nothing obvious here and I'll have to dig deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hypothesis about the ClassDef number would only apply if you were reading the SimHits from files created with trunk using this branch. I don't think there would be confusion if the SimHits were created and read on this branch.
Besides that, I have the same assumptions so I think this does require digging deeper unfortunately.
I am updating ldmx-sw, here are the details.
What are the issues that this addresses?
This resolves #1411
I have worked with ECal clustering this summer and implemented the CLUE algorithm from CMS. The main goal was to use this for electron clustering so I've strived for number of clusters == number of electrons, and also working to get a high energy purity (i.e. how much of the energy in the cluster comes from the same initial electron). It works pretty well! I've also added a simple reclustering clause to try and handle merged clusters, which results in a more prominent peak at number of clusters == number of electrons, but also introduces some overcounting, so it is currently an optional parameter.
This also partly resolves #1394 in getting the cluster producer to work
Additionally, I have added a Phoenix-based visualiser LDMX-VIS, located in EventDisplay/ldmx-vis. I have also made an analyzer VisGenerator in DQM (maybe not the right folder for this) that writes event data into JSON files that can be visualised by Phoenix.
Fixes
While I feel mostly happy with the code, I created what is basically a copy of WorkingCluster.h in Ecal called WorkingEcalCluster, that stores EcalHits as normal objects instead of pointers, as when I started I did not really know how to handle pointers :^) I never had time to fix so that CLUE would use WorkingCluster instead but this would be a nice fix. WorkingEcalCluster does contain some other improvements such as a parameterless constructor and eliminating EcalGeometry as it is not actually needed in that file.
Check List
Below are histograms produced by EcalClusterAnalyzer, an analyzer I created to analyze the performance of the clustering
Below are some graphs for unmodified CLUE
2 electrons
3 electrons
versus the initial algorithm (TemplatedClusterFinder)
2 electrons
3 electrons
Reclustering with CLUE
2e
3e
Visualisation examples
Here is the (slightly messy) config I use for ldmx fire