Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of build_evt() and build_skm() #519

Merged
merged 73 commits into from
Jan 30, 2024

Conversation

patgo25
Copy link
Contributor

@patgo25 patgo25 commented Oct 13, 2023

event tier builder to build a static event level from dsp, hit, and tcm files with help of json configuration file.

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Attention: 420 lines in your changes are missing coverage. Please review.

Comparison is base (a404ab0) 18.63% compared to head (013dc04) 22.62%.
Report is 19 commits behind head on main.

❗ Current head 013dc04 differs from pull request most recent head 78134ca. Consider uploading reports for the commit 78134ca to get more accurate results

Files Patch % Lines
src/pygama/pargen/dplms_ge_dict.py 0.00% 310 Missing ⚠️
src/pygama/evt/build_evt.py 82.63% 25 Missing ⚠️
src/pygama/evt/modules/spm.py 84.56% 25 Missing ⚠️
src/pygama/pargen/utils.py 6.25% 15 Missing ⚠️
src/pygama/evt/modules/legend.py 0.00% 14 Missing ⚠️
src/pygama/evt/utils.py 85.52% 11 Missing ⚠️
src/pygama/skm/build_skm.py 88.09% 10 Missing ⚠️
src/pygama/evt/aggregators.py 95.65% 4 Missing ⚠️
src/pygama/flow/file_db.py 50.00% 4 Missing ⚠️
src/pygama/pargen/energy_optimisation.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #519      +/-   ##
==========================================
+ Coverage   18.63%   22.62%   +3.98%     
==========================================
  Files          33       42       +9     
  Lines        7206     8106     +900     
==========================================
+ Hits         1343     1834     +491     
- Misses       5863     6272     +409     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@patgo25 patgo25 marked this pull request as ready for review October 28, 2023 17:16
@patgo25 patgo25 marked this pull request as draft October 28, 2023 17:41
@patgo25 patgo25 marked this pull request as ready for review October 29, 2023 11:24
@patgo25
Copy link
Contributor Author

patgo25 commented Oct 30, 2023

Things one may still want to implement:

  • allow for function call in channel definition (define "stock" functions for legendmeta)
  • remove ambiguity from field calls (e.g. tier.field instead of just field)
  • move mode filter into own field (e.g. mode_filter)
  • allow for evt tier fields in mode filter
  • Generalization is taking its toll on speed --> speed up again

But current version can be merged as is.

@gipert
Copy link
Member

gipert commented Oct 31, 2023

My suggestions about the config field names, to be further discussed (there is certainly still room for improvement):

{
  "channels_defs": {
    "geds_on": ["ch1084803", "ch1084804", "ch1121600"], // explicit
    "my_spms": "legendmeta.get_on_spms_channels('20220629T095300Z')", // function provided by a 'legendmeta' subpackage
    "pulser": "legendmeta.get_pulser_channel('20220629T095300Z')" // same
  },
  "operations": {
    "multiplicity": {  // a scalar
      "channels": "geds_on",
      "hit_aggregation_mode": "sum", // sum booleans (see expression)
      "store_channel_id": true,
      "expression": "hit.cuspEmax_ctc_cal > a", // a boolean
      "parameters": {"a": 25},
      "default": 0
    },
    "energy": { // a variable-length array
      "channels": "geds_on",
      "aggregation_mode": "collect", // or just "none"/omitted?
      "query": "hit.cuspEmax_ctc_cal > 25", // or "where" instead of "query" (numpy)?
      "store_channel_id": true,
      "expression": "hit.cuspEmax_ctc_cal" // the hit field itself
    },
    "tot_energy": { // a scalar
      "channels": "geds_on",
      "aggregation_mode": "collect",
      "query": "hit.cuspEmax_ctc_cal > 25"
      "expression": "hit.cuspEmax_ctc_cal",
    },
    "aoe": { // a variable-length array
      "aggregation_mode": "keep_at:energy_id", // keep same channel ids as in field "energy"
      "expression": "hit.AoE_Classifier"
    },
    "is_pulser": {
      "channels": "pulser",
      "aggregation_mode": "logical_or", // omittable since only one channel?
      "expression": "dsp.trapTmax > 200"
    },
    "is_global_ringing": {
      "channels": "geds_on",
      "aggregation_mode": "logical_or", // OR
      "expression": "~(is_valid_neg_current)",
      "default": false
    },
    "is_global_physical": {
      "channels": "geds_on",
      "aggregation_mode": "logical_and", // AND
      "expression": "(is_valid_0vbb) & (is_valid_tail) & (is_valid_baseline) & ~(is_noise_burst) & ~(is_negative)",
      "default": false
    },
    "lar_multiplicity": {
      "channels": "spms_on",
      "mode": "function",
      "expression": "pygama.evt.modules.spm.get_majority(0.5, evt.t0, 48000, 1000, 5000)"
    },
    "is_lar_rejected": {
      "expression": "(evt.lar_energy >4) | (evt.lar_multiplicity > 4)" // use other already-computed evt outputs
    },
}

Further features for discussion

  • in the build_hit() config we reference parameters in the expression with @. We should be consistent: we can decide to drop the @, but the build_hit() config syntax should be updated accordingly
  • do we want to have an outputs array at the top of the config, to make clear which fields get written to disk (see build_dsp() and build_hit() configs)?
  • Should we promote definition of channel rawid columns to its own config block? In the end these are columns like the others and we might want to set custom column names (more details in next comment)

@gipert
Copy link
Member

gipert commented Oct 31, 2023

Follow up on concept of config blocks for channel id columns. Instead of

    "energy": {
      "channels": "geds_on",
      "aggregation_mode": "collect",
      "query": "hit.cuspEmax_ctc_cal > 25",
      "store_channel_id": true,
      "expression": "hit.cuspEmax_ctc_cal"
    },

We could write:

    "rawid_that_fired": {
      "channels": "geds_on",
      "aggregation_mode": "collect",
      "query": "hit.cuspEmax_ctc_cal > 25",
      "expression": "tcm.array_id" // or whatever makes sense
    },
    "energy": {
      "channels": "geds_on",
      "aggregation_mode": "keep_at:rawid_that_fired",
      "expression": "hit.cuspEmax_ctc_cal"
    },

@gipert gipert changed the title Evt tier First iplementation of build_evt() Nov 3, 2023
@gipert gipert added flow High-level data management evt build events from hit data and removed flow High-level data management labels Nov 3, 2023
@patgo25 patgo25 marked this pull request as draft November 28, 2023 14:55
@patgo25 patgo25 marked this pull request as ready for review November 29, 2023 18:33
@patgo25
Copy link
Contributor Author

patgo25 commented Nov 29, 2023

An config example:

{
  "channels": {
      "geds_on": ["ch1084803", "ch1084804", "ch1121600"],
       "spms_on": {
           "module": "pygama.evt.modules.legend.legend_meta",
            "meta_path": "/data2/public/prodenv/prod-blind/ref/v02.00/inputs",
            "system": "spms",
            "selectors":{
                "analysis.usability": "on"
            }
      "muon": "ch1027202",
  },
  "outputs": ["energy_id","is_muon_rejected","multiplicity","lar_energy"],
  "operations": {
      "energy_id":{
          "channels": "geds_on",
          "aggregation_mode": "vectorize",
          "query": "hit.cuspEmax_ctc_cal>25",
          "expression": "tcm.array_id",
          "sort": "ascend_by:dsp.tp_0_est"
      },
      "energy":{
          "aggregation_mode": "keep_at:evt.energy_id",
          "expression": "hit.cuspEmax_ctc_cal"
      },
      "is_muon_rejected":{
          "channels": "muon",
          "aggregation_mode": "any",
          "expression": "dsp.wf_max>a",
          "parameters": {"a":15100},
          "initial": false
      },
      "multiplicity":{
          "channels":  ["geds_on","geds_no_psd","geds_ac"],
          "aggregation_mode": "sum",
          "expression": "hit.cuspEmax_ctc_cal > a",
          "parameters": {"a":25},
          "initial": 0
      },
      "t0":{
          "aggregation_mode": "keep_at:evt.energy_id",
          "expression": "dsp.tp_0_est"
      },
      "lar_energy":{
          "channels": "spms_on",
          "aggregation_mode": "function",
          "expression": ".modules.spm.get_energy(0.5,evt.t0,48000,1000,5000)"
      },
   }
}

Major changes since last time:

  • Clearer keywords (tbd)
  • Modulized channel list loading
  • dot notation to explicitly mark fields
  • moved rawid generation in own block

@gipert
Copy link
Member

gipert commented Jan 2, 2024

@patgo25 you should now be able to use all latest pydataobj features here, if you merge main into this PR.

Copy link
Member

@gipert gipert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments about pydataobj related renamings, but did not have time to go through all the code...

src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
src/pygama/evt/build_evt.py Outdated Show resolved Hide resolved
@gipert gipert force-pushed the evt_tier branch 4 times, most recently from 8a4906a to dea499e Compare January 11, 2024 16:10
@gipert gipert marked this pull request as draft January 11, 2024 16:22
@gipert
Copy link
Member

gipert commented Jan 11, 2024

TODOs I am thinking about atm:

  • split build_evt module
  • fix saving of evt data to Table (and remove temporary file saving)
  • use Awkward routines on VoVs
  • update code to new VoV format for SiPM data
  • seems to me that there is a lot of duplicated code in evt.modules.spm, can be simplified?
  • cleanup deprecated LH5 I/O calls in tests

@gipert gipert changed the title First iplementation of build_evt() First implementation of build_evt() and build_skm() Jan 11, 2024
@patgo25 patgo25 marked this pull request as ready for review January 19, 2024 20:14
Copy link
Member

@gipert gipert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see harcoded tier names or reliance on specific file naming conventions, can we make it more agnostic? What happens if we add another tier or change name (e.g. pht tier)? I like more the idea of being more explicit about things in the config file.

src/pygama/evt/modules/legend.py Outdated Show resolved Hide resolved
src/pygama/evt/modules/spm.py Outdated Show resolved Hide resolved
src/pygama/skm/build_skm.py Outdated Show resolved Hide resolved
src/pygama/skm/build_skm.py Outdated Show resolved Hide resolved
@patgo25
Copy link
Contributor Author

patgo25 commented Jan 24, 2024

pre-commit.ci autofix

@patgo25 patgo25 requested a review from gipert January 24, 2024 14:47
@gipert gipert merged commit 724655c into legend-exp:main Jan 30, 2024
10 checks passed
@gipert
Copy link
Member

gipert commented Jan 30, 2024

God forgive me for what I have done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evt build events from hit data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants