First implementation of `build_evt()` and `build_skm()` #519

patgo25 · 2023-10-13T17:17:04Z

event tier builder to build a static event level from dsp, hit, and tcm files with help of json configuration file.

codecov · 2023-10-13T17:21:07Z

Codecov Report

Attention: 420 lines in your changes are missing coverage. Please review.

Comparison is base (a404ab0) 18.63% compared to head (013dc04) 22.62%.
Report is 19 commits behind head on main.

❗ Current head 013dc04 differs from pull request most recent head 78134ca. Consider uploading reports for the commit 78134ca to get more accurate results

Files	Patch %	Lines
src/pygama/pargen/dplms_ge_dict.py	0.00%	310 Missing ⚠️
src/pygama/evt/build_evt.py	82.63%	25 Missing ⚠️
src/pygama/evt/modules/spm.py	84.56%	25 Missing ⚠️
src/pygama/pargen/utils.py	6.25%	15 Missing ⚠️
src/pygama/evt/modules/legend.py	0.00%	14 Missing ⚠️
src/pygama/evt/utils.py	85.52%	11 Missing ⚠️
src/pygama/skm/build_skm.py	88.09%	10 Missing ⚠️
src/pygama/evt/aggregators.py	95.65%	4 Missing ⚠️
src/pygama/flow/file_db.py	50.00%	4 Missing ⚠️
src/pygama/pargen/energy_optimisation.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #519      +/-   ##
==========================================
+ Coverage   18.63%   22.62%   +3.98%     
==========================================
  Files          33       42       +9     
  Lines        7206     8106     +900     
==========================================
+ Hits         1343     1834     +491     
- Misses       5863     6272     +409

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

patgo25 · 2023-10-30T20:31:10Z

Things one may still want to implement:

allow for function call in channel definition (define "stock" functions for legendmeta)
remove ambiguity from field calls (e.g. tier.field instead of just field)
move mode filter into own field (e.g. mode_filter)
allow for evt tier fields in mode filter
Generalization is taking its toll on speed --> speed up again

But current version can be merged as is.

gipert · 2023-10-31T14:27:44Z

My suggestions about the config field names, to be further discussed (there is certainly still room for improvement):

{
  "channels_defs": {
    "geds_on": ["ch1084803", "ch1084804", "ch1121600"], // explicit
    "my_spms": "legendmeta.get_on_spms_channels('20220629T095300Z')", // function provided by a 'legendmeta' subpackage
    "pulser": "legendmeta.get_pulser_channel('20220629T095300Z')" // same
  },
  "operations": {
    "multiplicity": {  // a scalar
      "channels": "geds_on",
      "hit_aggregation_mode": "sum", // sum booleans (see expression)
      "store_channel_id": true,
      "expression": "hit.cuspEmax_ctc_cal > a", // a boolean
      "parameters": {"a": 25},
      "default": 0
    },
    "energy": { // a variable-length array
      "channels": "geds_on",
      "aggregation_mode": "collect", // or just "none"/omitted?
      "query": "hit.cuspEmax_ctc_cal > 25", // or "where" instead of "query" (numpy)?
      "store_channel_id": true,
      "expression": "hit.cuspEmax_ctc_cal" // the hit field itself
    },
    "tot_energy": { // a scalar
      "channels": "geds_on",
      "aggregation_mode": "collect",
      "query": "hit.cuspEmax_ctc_cal > 25"
      "expression": "hit.cuspEmax_ctc_cal",
    },
    "aoe": { // a variable-length array
      "aggregation_mode": "keep_at:energy_id", // keep same channel ids as in field "energy"
      "expression": "hit.AoE_Classifier"
    },
    "is_pulser": {
      "channels": "pulser",
      "aggregation_mode": "logical_or", // omittable since only one channel?
      "expression": "dsp.trapTmax > 200"
    },
    "is_global_ringing": {
      "channels": "geds_on",
      "aggregation_mode": "logical_or", // OR
      "expression": "~(is_valid_neg_current)",
      "default": false
    },
    "is_global_physical": {
      "channels": "geds_on",
      "aggregation_mode": "logical_and", // AND
      "expression": "(is_valid_0vbb) & (is_valid_tail) & (is_valid_baseline) & ~(is_noise_burst) & ~(is_negative)",
      "default": false
    },
    "lar_multiplicity": {
      "channels": "spms_on",
      "mode": "function",
      "expression": "pygama.evt.modules.spm.get_majority(0.5, evt.t0, 48000, 1000, 5000)"
    },
    "is_lar_rejected": {
      "expression": "(evt.lar_energy >4) | (evt.lar_multiplicity > 4)" // use other already-computed evt outputs
    },
}

Further features for discussion

in the build_hit() config we reference parameters in the expression with @. We should be consistent: we can decide to drop the @, but the build_hit() config syntax should be updated accordingly
do we want to have an outputs array at the top of the config, to make clear which fields get written to disk (see build_dsp() and build_hit() configs)?
Should we promote definition of channel rawid columns to its own config block? In the end these are columns like the others and we might want to set custom column names (more details in next comment)

gipert · 2023-10-31T14:33:51Z

Follow up on concept of config blocks for channel id columns. Instead of

    "energy": {
      "channels": "geds_on",
      "aggregation_mode": "collect",
      "query": "hit.cuspEmax_ctc_cal > 25",
      "store_channel_id": true,
      "expression": "hit.cuspEmax_ctc_cal"
    },

We could write:

    "rawid_that_fired": {
      "channels": "geds_on",
      "aggregation_mode": "collect",
      "query": "hit.cuspEmax_ctc_cal > 25",
      "expression": "tcm.array_id" // or whatever makes sense
    },
    "energy": {
      "channels": "geds_on",
      "aggregation_mode": "keep_at:rawid_that_fired",
      "expression": "hit.cuspEmax_ctc_cal"
    },

patgo25 · 2023-11-29T18:40:08Z

An config example:

{
  "channels": {
      "geds_on": ["ch1084803", "ch1084804", "ch1121600"],
       "spms_on": {
           "module": "pygama.evt.modules.legend.legend_meta",
            "meta_path": "/data2/public/prodenv/prod-blind/ref/v02.00/inputs",
            "system": "spms",
            "selectors":{
                "analysis.usability": "on"
            }
      "muon": "ch1027202",
  },
  "outputs": ["energy_id","is_muon_rejected","multiplicity","lar_energy"],
  "operations": {
      "energy_id":{
          "channels": "geds_on",
          "aggregation_mode": "vectorize",
          "query": "hit.cuspEmax_ctc_cal>25",
          "expression": "tcm.array_id",
          "sort": "ascend_by:dsp.tp_0_est"
      },
      "energy":{
          "aggregation_mode": "keep_at:evt.energy_id",
          "expression": "hit.cuspEmax_ctc_cal"
      },
      "is_muon_rejected":{
          "channels": "muon",
          "aggregation_mode": "any",
          "expression": "dsp.wf_max>a",
          "parameters": {"a":15100},
          "initial": false
      },
      "multiplicity":{
          "channels":  ["geds_on","geds_no_psd","geds_ac"],
          "aggregation_mode": "sum",
          "expression": "hit.cuspEmax_ctc_cal > a",
          "parameters": {"a":25},
          "initial": 0
      },
      "t0":{
          "aggregation_mode": "keep_at:evt.energy_id",
          "expression": "dsp.tp_0_est"
      },
      "lar_energy":{
          "channels": "spms_on",
          "aggregation_mode": "function",
          "expression": ".modules.spm.get_energy(0.5,evt.t0,48000,1000,5000)"
      },
   }
}

Major changes since last time:

Clearer keywords (tbd)
Modulized channel list loading
dot notation to explicitly mark fields
moved rawid generation in own block

gipert · 2024-01-02T17:04:12Z

@patgo25 you should now be able to use all latest pydataobj features here, if you merge main into this PR.

gipert

I left some comments about pydataobj related renamings, but did not have time to go through all the code...

src/pygama/evt/build_evt.py

gipert · 2024-01-11T16:35:34Z

TODOs I am thinking about atm:

split build_evt module
fix saving of evt data to Table (and remove temporary file saving)
use Awkward routines on VoVs
update code to new VoV format for SiPM data
seems to me that there is a lot of duplicated code in evt.modules.spm, can be simplified?
cleanup deprecated LH5 I/O calls in tests

gipert

I see harcoded tier names or reliance on specific file naming conventions, can we make it more agnostic? What happens if we add another tier or change name (e.g. pht tier)? I like more the idea of being more explicit about things in the config file.

src/pygama/evt/modules/legend.py

src/pygama/evt/modules/spm.py

src/pygama/skm/build_skm.py

patgo25 · 2024-01-24T14:23:24Z

pre-commit.ci autofix

…h Python 3.9

…ording suggestions

gipert · 2024-01-30T09:51:59Z

God forgive me for what I have done

patgo25 marked this pull request as ready for review October 28, 2023 17:16

patgo25 marked this pull request as draft October 28, 2023 17:41

patgo25 marked this pull request as ready for review October 29, 2023 11:24

gipert changed the title ~~Evt tier~~ First iplementation of build_evt() Nov 3, 2023

gipert added flow High-level data management evt build events from hit data and removed flow High-level data management labels Nov 3, 2023

patgo25 marked this pull request as draft November 28, 2023 14:55

patgo25 marked this pull request as ready for review November 29, 2023 18:33

gipert reviewed Jan 4, 2024

View reviewed changes

gipert force-pushed the evt_tier branch 4 times, most recently from 8a4906a to dea499e Compare January 11, 2024 16:10

gipert marked this pull request as draft January 11, 2024 16:22

gipert changed the title ~~First iplementation of build_evt()~~ First implementation of build_evt() and build_skm() Jan 11, 2024

patgo25 marked this pull request as ready for review January 19, 2024 20:14

gipert requested changes Jan 22, 2024

View reviewed changes

src/pygama/evt/modules/legend.py Outdated Show resolved Hide resolved

src/pygama/evt/modules/spm.py Outdated Show resolved Hide resolved

src/pygama/skm/build_skm.py Outdated Show resolved Hide resolved

src/pygama/skm/build_skm.py Outdated Show resolved Hide resolved

patgo25 requested a review from gipert January 24, 2024 14:47

gipert approved these changes Jan 24, 2024

View reviewed changes

patgo25 added 2 commits January 30, 2024 10:35

first idea for evt tier

a4ea61a

3x speedup and added LAr classifier module

1a966de

patgo25 and others added 26 commits January 30, 2024 10:35

add possibility to add atributes to evt LGDO

d359769

removed explicit typing in a spm module function to be compatible wit…

c332447

…h Python 3.9

routine to create dplms dictionary for Ge processing

978836e

removed dependency on nopt routine

ff5419a

loading full raw table

e800f39

smale change on loading data

ffa4617

moved load data out of pargen routine

e4f7d68

small changes

8811839

update the LH5 file writes/reads to match the new LH5Store syntax acc…

d2b162d

…ording suggestions

modification for dsp processing

2aa52e2

revert modification on ene_opt

e255e1e

change load data

ce69014

added lh5 output format for skm tier

5523ac7

full sparse mode compatibility

b6bd4e6

Friday evening changes

b4672ee

allow passing of env vars in legend meta module

b5240f4

add pyarrow dependence

a7856e5

error parameter deprecation in pandas 2.2 to_numeric function

ccc1b71

removed other output format options than lh5 from build_skm

56ec1d4

spm module cleanup

a8c8393

making channel table names agnostic

18df7b9

handle divide by 0 warnings correctly

5138711

become agnostic to group nameing

8fcd1fc

split build_evt into sub modules

eaa6cb4

agnostify also skm tier

2c5b9b4

updated etc classifier

013dc04

gipert force-pushed the evt_tier branch from b7efed2 to 013dc04 Compare January 30, 2024 09:39

Docstring cosmetics

78134ca

gipert merged commit 724655c into legend-exp:main Jan 30, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First implementation of `build_evt()` and `build_skm()` #519

First implementation of `build_evt()` and `build_skm()` #519

patgo25 commented Oct 13, 2023

codecov bot commented Oct 13, 2023 •

edited

Loading

patgo25 commented Oct 30, 2023 •

edited

Loading

gipert commented Oct 31, 2023

gipert commented Oct 31, 2023 •

edited

Loading

patgo25 commented Nov 29, 2023 •

edited by gipert

Loading

gipert commented Jan 2, 2024

gipert left a comment

gipert commented Jan 11, 2024 •

edited

Loading

gipert left a comment

patgo25 commented Jan 24, 2024

gipert commented Jan 30, 2024

First implementation of build_evt() and build_skm() #519

First implementation of build_evt() and build_skm() #519

Conversation

patgo25 commented Oct 13, 2023

codecov bot commented Oct 13, 2023 • edited Loading

Codecov Report

patgo25 commented Oct 30, 2023 • edited Loading

gipert commented Oct 31, 2023

Further features for discussion

gipert commented Oct 31, 2023 • edited Loading

patgo25 commented Nov 29, 2023 • edited by gipert Loading

gipert commented Jan 2, 2024

gipert left a comment

Choose a reason for hiding this comment

gipert commented Jan 11, 2024 • edited Loading

gipert left a comment

Choose a reason for hiding this comment

patgo25 commented Jan 24, 2024

gipert commented Jan 30, 2024

First implementation of `build_evt()` and `build_skm()` #519

First implementation of `build_evt()` and `build_skm()` #519

codecov bot commented Oct 13, 2023 •

edited

Loading

patgo25 commented Oct 30, 2023 •

edited

Loading

gipert commented Oct 31, 2023 •

edited

Loading

patgo25 commented Nov 29, 2023 •

edited by gipert

Loading

gipert commented Jan 11, 2024 •

edited

Loading