Skip to content
Francesco Pannarale edited this page Oct 26, 2024 · 191 revisions

PyCBC multi-interferometer coherent followup

Note: this page is publicly viewable and should not be used to record sensitive or confidential information.

A project board is associated to this work.

Context

PyGRB is a coherent matched-filtering search that looks for compact binary coalescence (CBC) signals associated with external triggers, such as gamma-ray bursts (GRBs). The O1, O2, and O3 LVK analyses used a workflow that was a mixture of PyCBC and lalapps_coh_PTF code, as well as parts of PyLAL during the post-processing. A flowchart of the PyGRB workflow can be seen here (from p.85 of Iain Dorrington's thesis).

Objective

The final goal is the full integration of PyGRB within PyCBC. This will allow us to use optimised PyCBC code to speed up the analysis, to pick up any new feature introduced in PyCBC, and to drop the maintenance of lalapps_cohPTF and PyLAL.

Development

There are five main parts to the development

The filtering engine

Make a coherent matched-filtering executable. This is called pycbc_multi_inspiral.

  • develop an executable to calculate the coherent SNR for a single sky point at and around the time of an event (e.g., a GRB)
  • allow for face-on / face-away projection of signals (appropriate for beamed GRBs)
  • add time slides (see PR1 and PR2)
  • loop over a sky patch (see PR)

Injections

For the record, this is the (outdated!) wikipage originally set up to coordinate this effort.

Ranking statistic

  • Make (an equivalent of) the BestNR ranking statistic used by cohPTF available to pycbc_multi_inspiral. Defined in Iain Dorrington's thesis.
  • Implement the ranking statistic based on machine learning proposed in this thesis.
  • Test one against the other to perform an informed decision on wether to improve BestNR or pick up the machine-learning ranking statistic as the new standard.

Post-processing

This area of development addresses the postprocessing and the output webpages. The former is very memory consuming and will be rewritten to use

  • hdf5 files, rather than xml ones, and
  • PyCBC style output webpages.

As of April 2024 this work is essentially complete; adjustments of scripts, configurations files, etc. will follow when needed. Here is one of the most recent example pages.

The workflow generator(s)

In order to run the PyGRB pipeline one needs code that generates an analysis workflow, e.g., pycbc_make_offline_grb_workflow for offline GRB follow ups. This works out the relationships between different elements of the pipeline (querying data servers, fetching template banks, gating data, filtering data, preparing injections, searching for injections, making plots, etc.).

The offline workflow generator was progressively updated to work with the new executables to perform:

  • injections
  • filtering
  • post-processing and production of results webpage

The post-processing is handled by pycbc_pygrb_results_workflow a dedicated workflow generator called by pycbc_make_offline_grb_workflow. Another relevant file is pycbc/workflow/grb_utils.py

The workflow generator currently allows for timeslides and looping over sky-points. The most recent example of the output page is available here.

The last open point pertaining to the workflow generator(s) is to fold in vetoes.

The end products of this activity are

  • A complete workflow generator for offline analyses
  • A complete workflow generator for medium latency analysis

Configuration files for testing must be maintained here, where installation and run scripts for LDG clusters are available.

pycbc_multi_inspiral Tasks

Issue Description Assigned to Status
Time slides Short (and maybe long) timeslides need to be implemented. This will require changes to event manager: keep track of which time slide we are on and to treat events from a different time slide as separate events. Also requires changes to trig cluster, trig combiner, inj cluster, and inj combiner. Erin, Hannah, Apratim
A new reweighted SNR We are now using single detector chisq tests rather than coherent chisq. This means we need a new way of calculating the reweighted SNR. The one described in Iain Dorrington's thesis is implemented Jam/(Tom) Relevant issue: #3466
Make work with injections Injections were originally too slow. Need to implement the pycbc injfilterrejector stuff to make it go faster. Iain
Rewrite injection code These should use hdf5 and generally not be so ad hoc. We should get rid of inspinj, em filter, jitter sky loc, and align spin codes and replace with one executable that makes suitable injections for a GRB search: piggy back on pycbc_create_injections and pycbc_hdf5_splitbank Prasia Missing jittering stage. #3468 #4591
Search over sky grid At the moment we can only search over a single sky point. pycbc_multi_inspiral will need to loop over a sky patch. Stephanie old wiki #4380
Circularly polarised coherent SNR This requires a new function to replace the old projection matrix function. It needs to find two matricies (left and right polarised), so it then requires changes later in the code to apply both of these matrices. Jam/(Tom)/Andrew #3599
Frame filelist must be a list of GWFs Cache file functionality to be removed. Can run with cache. -- Not started
Remove precessing stuff from event manager There is no point in event manager storing these values, so we should remove them. They are never used and unlikely to be used (new ideas about how to do a precessing search would requrie a re-write). -- Not started.
Add other chisq tests At the moment we only use the single IFO power chisq test, and assemble a cut by combining these. pycbc_multi_inspiral also calculates single IFO bank and auto chisq values but no cut is based on them. New pycbc chi-sq calc is no longer a bottle neck, but would be good to check that pygrb has no potential speed up by using these cheaper cuts. Jacob Issues fixed with chisq tests. Attempting to use BestNR as a metric for if the bank and auto chisq test give useful information.

Post-processing and Results Webpage

The work undertaken focused on

  • abandoning all PyLAL dependencies
  • having one executable per plot/table-production task rather than a couple of scripts producing several outputs, and on
  • using hdf5 as I/O format rather than xml #4419.

All new post-processing executables are placed in pycbc/bin/pygrb and all utility functions in pycbc/results/pygrb_postprocessing_utils.py, or pycbc/results/pygrb_plotting_utils.py. pycbc/results/legacy_grb.py was removed (see #4288).

  • A list of todos and desiderata for the results webpage is maintained in issue #3660.

Credit for development so far: Cameron, Duncan, Francesco, Gino, Michael, Nathan, Viviana, Shamita, Erin, Marco, Jacob.

Tasks

Issue Description Assigned to Status
xml to hdf Switch from xml to hdf in the trigger combination and clustering. Duncan, with fixes from Erin and Francesco
Add chisq cut At the moment the power chisq is calculated and saved but no cut is made. This should be added to the post-processing so that we can make the cuts plots on the webpage. Possibly sanity check on order of magnitude that there is no speed/memory save by making this cut in the pycbc_multi_inspiral. But post-processing with h5py should be fine. -- Not started
Abandon PyLAL Rewrite of PyLAL plotting scripts and removal of all PyLAL dependencies. Francesco, Gino, Cam
Make new plotting codes work for new filtering output Conversion of xml handling to hdf5 handling. Additionally, the new output has different statistics; the coherent chi2 tests are gone for example. Francesco, Hannah, Viviana, Shamita Done #4034
Postprocessing and webpage production Switch to PyCBC style webpage and webpage generation Francesco, Michael, Nathan ✅ (but regular updates will happen as new results capabilities are introduced)
Automatic follow up of loudest triggers and missed injections Francesco, Michael Done in xml first and then hdf5: see example as PRs are being set up

Telecons

07 March 2024 Attendance: Francesco, Jacob, Erin, Tito, Marco, Sebastian

  • Francesco: merged Jacob's PR #397 to update config files

  • Erin: Full workflow run complete. Discussed why the html pages were not produced

    • Discussion led to finding job failures related to pycbc_grb_inj_finder
  • Marco: Done with pycbc_pygrb_page_tables PR #4649. However, there's something broken with the pycbc repo at the moment that forbids him to pass the tests.

  • Jacob:

    • Implemented fixes on pycbc_pygrb_inj_combiner related to injection names not matching what the postprocessing code expects.
    • Reported the efficiency code producing negative errorbars. Likely related to background triggers being wiped by an SNR cutoff (work in progress)
  • Sebastian: Started implementing on-source/off-source window separation. Opened new issue #4659

  • Tito: mentioned Marion's work on skygrid fixes is currently in progress. see issue 4610

22 February 2024

Attendance: Erin, Francesco, Jacob, Marco, Prasia, Sebastian

  • Erin
    • Running the standard GW170817 workflow, but has noticed that it is quite slow.
    • Suggested integrating code lines to increase disk space allocation into the standard GitLab workflow executable.
  • Francesco
    • Agreed with Erin's comment and suggested opening a merge request
  • Jacob
    • Running a workflow to check if the past PRs give consistent results.
    • Noticed that the injection file produced by the workflow is empty and suggested investigating its causes
  • Marco
    • Finishing merging his work on page tables with pycbc master
    • Opened draft PR #4649
  • Prasia
  • Sebastian
    • Going to give a presentation about the updates made in 2023 on pyGRB

08 February 2024

Attendance: Jacob, Sebastian, Prasia, Erin

  • Prasia: Asking if her code should be a class in PyCBC or if it should be a seperate python file. Was told that normally we try to include it in existing scripts, ex. PyCBC_utils.
  • Jacob
    • Waiting for PRs to go through.
    • Will try to get efficiency code into workflow soon.
    • Has run workflow during coding and has gotten it to run except for the postprocessing that is still in progress.
  • Sebastian
    • Was asked to give presentation on status of PyGRB. Comments on the PRs that we worked on would be appreciated.
    • Made wikipage for PyGRB performance.
    • Working on seperating onsource and offsource.
  • Erin: Will be working on running workflow.

01 February 2024

Attendance: Jacob, Francesco, Marco, Sebastian

  • Jacob: PR's ready for review. see #4502, #4550, #4562.
  • Francesco: will start reviewing Jacob's PRs.
  • Marco: Waiting for Jacob's changes to be merged.
  • Sebastian: Almost done with the PyGRB_performance's wikipage.

25 January 2023

Attendance: Jacob, Tito, Francesco, Marco, Erin, Prasia

  • Jacob illustrates the PRs he is working on and the solution to use template ids to overcome one of the points raised in the review of his PR for pycbc_pygrb_efficiency (see #4419). Plans to update the PR with this solution by next week.
  • Marco illustrates his fork for pycbc_pygrb_page_tables (see #4419). Suggestion to avoid duplicating functions and rely on, e.g., ranking.py for methods to compute the new SNR. Will clean up the fork and open a PR once Jacob's is through (there is some dependency).
  • Erin's work on pycbc_pygrb_plot_stats_distribution (see #4419) is not affected by the development above.
  • Francesco approves PR on Rachel's sky-grid code and Tito picks up issue #4610 to improve it.
  • Prasia needs to ping #pycbc-code to decide where to place code to jitter injection distances. Options are (participants prefer option 1, but this must be discussed with the wider PyCBC crowd):
    1. a method in the injection class; this could then be generalized/improved and all PyCBC would benefit from it;
    2. a standalone executable that the PyGRB workflows would use.

21 December 2023

Attendance: Jacob, Tito, Sebastian

Jacob: presented slides with his approved/in-progress PRs

  • Expanded pycbc_multi_inspiral&pygrb_postprocessing_utils to enable loading complete information on the postprocessing side.
    • PR4427 extended timeslides information.
    • PR4542 similar work with segments.
  • Bugfixes on the injection code, see PR4502
  • Currently working on pycbc_pygrb_efficiency PR 4562. Planning to finish it in the next few weeks.

Sebastian: presented some slides with all the performance results for pycbc_multi_inspiral vs lalapps_coh_PTF_inspiral

  • This results will be posted soon in the issue 4434
  • Development friendly comments PR has been approved.
  • Discussed a typical error caused at the pycbc.strain.py level when choosing the wrong input paramenters. Tito suggested doing this at the workflow generation level via comments on the config files and/or opening a PR to try making this error more explicit rather than introducing assert statements on pycbc_multi_inspiral. see issue
  • Shared some insights on possible bottlenecks when reading large frame files.

Tito:

  • Approved Sebastian's PR.
  • Commented about how much template information should Jacob's PR consider, based on what's currently being done by the all-sky searches.
  • Asked Sebastian to share performance test scripts, in order to confirm possible bottleneck while reading frame files.
  • Gave maintainer rights to Jacob and Sebastian since PR conflicts can not be easily resolved without them.

7 December 2023

Attendance: Francesco, Prasia, Sebastian.

Francesco:

  • Reviewed one of Jacob's PRs.
  • Started looking into Tito&Stephanie's PR for the skygrid implementation.

Sebastian:

  • Discussed new performance plots. Found multi_inspiral(lalframe) to spend a lot of time reading 5000+ second .gwf files.
  • Prasia suggested to look at how pycbc_inspiral splits large .gwf files using cache to avoid this bottleneck.

Prasia:

  • Done with the script for computing injected distances from hdf files.
  • Will open issue contaning her fork.
  • The issue will follow the discussion of whether we should put her code in a new separate script or whether it should inside pycbc's existing injection code.

30 November 2023 Attendance: Erin, Hannah, Marco, Sebastian.

Marco:

  • Tables development ready on his fork.
  • Waiting for Sebastian's pycbc_multi_inspiral dev comments PR to be approved.
  • Started running the O4 workflow generation example.

Sebastian:

  • PR still waiting for approval.
  • Currently Running performance tests on CIT pcdev6 matching dorrington's old config on the cardiff cluster.

Erin:

  • Back from holiday. Will continue to look into the dev workflow generation config files to compare the O3 and O4 versions and see how to make them production-ready.

Jacob:

  • Two PRs are ready for a re-review: PR1, PR2

23 November 2023

  • Attendance:Prasia, Marco, Francesco, Sebastian

  • Francesco: will soon look into Tito's and Jacob's PRs.

  • Prasia: working on a script that computes injected distances from hdf files.

    • Sam's notebook living in: home/samuel.higginbotham/Injection_testing/pycbc_test_env/pycbc/jitter_skyloc/eff_distance_testing.ipynb was discussed to help futher injected distance developments
  • Marco: Almost done with the new tables. Still having som issues with avoiding Vetoes. The script is in his fork

  • Sebastian: started to implement the onsource/offsource analysis in pycbc_multi_inspiral. Rerunning the timing scripts one last time to match Dorrington's runs on the Cardiff cluster /home/iain.dorrington/170817_pmi.

16 November 2023

  • Attendance: Marco, Erin, Prasia, Sebastian
  • Sebastian:
    • Waiting for Francesco to approve pull requests for comments and sanity checks within multi_inspiral.
    • Jacob's PR is waiting still.
    • Testing multi_inspiral timing in different template bank sizes vs short slides, incrementing block and segment duration. Created 2D plot of results. Will post plot in Github issue.
  • Marco:
    • Had question about chi_squared. In multi_inspiral, there is both power_chisq, autochisq, and bank_chisq but bank returns nothing, to which Sebastian replied that vetoes have not been applied yet. Marco cannot reweight SNR when they are NONE. The only thing he can work with is power_chisq.
    • Had question about veto files being in XML instead of HDF. Was told to temporarily ignore vetoes until a conclusion is reached about if we will be using them or not.
  • Prasia:
    • Was checking injection distances and now has to test the whole workflow.
  • Erin:
  • Jacob:
    • Slowly working through PR reviews. Segment PR is done (until further review).

02 November 2023

  • Attendance: Francesco, Erin, Prasia, Hannah, Sebastian

  • Prasia:

    • Asked for an example injection hdf5 file computed after the fisher distribution changes.
  • Francesco:

    • Merged Erin's PR and planning on reviewing Jacob's PR soon.
    • Next step: put together the workflow generation with Erin's and Jacob's changes to the postprocessing side.
    • Suggested other developers to check the following config files, ahead of the round of test worflow runs coming soon: https://git.ligo.org/ligo-cbc/pycbc-config/-/tree/master/O4/pygrb/dev
  • Discussion: what to do with the Bank Veto xml files, and how not including them in the analysis would affect several computed statistics of PyGRB vs PyCBC all sky.

  • Sebastian:

    • Done with the dev comments PR https://github.com/gwastro/pycbc/pull/4513.
    • Done with the coh_PTF vs multi_insp comparison in the large block duration regime, will post the plots in the optimization issue soon.
  • Jacob:

26 October 2023

  • Attendance: Sebastian, Prasia, Erin, Hannah
  • Prasia:
    • Use new computed inj distances, get rid of old dependencies (xml —> hdf):
    • Testing, asked if anyone has an hdf inj trigger file generated by the workflow with the embright filter
    • Otherwise code is ready
  • Erin:
    • Waiting on Jacob's PR to go through
    • Will begin testing once post-processing loose ends are tied up
  • Sebastian:
    • Performance:
      • Found a regime in which multi-inspiral is faster than cohptf
      • Cohptf struggles when frame files are large. multi-inspiral handles that much better
      • Multi-inspiral doesn’t need cache files and does not crash for 2000s frames as cohptf does
      • Preparing analysis for this regime for next week
  • Jacob:
    • Update on Slack 10/24: "I just opened a new PR for the segment dictionary (https://github.com/gwastro/pycbc/pull/4542). I think I will have one or two more PRs after this for the remainder of the background stuff, and then the direct changes to pycbc_pygrb_efficiency."

19 October 2023

  • Attendance: Francesco, Tito, Erin, Marco, Hannah, Prasia, Sebastian

  • Tito discussed new skygrid code progress. Now it is possible to do a uniform circular skigrid. See issue

  • Prasia working on phase and amplitude errors for the injected distance computation.

  • Marco working on page tables. Reported problems with construct_trials and load_segment_dict in /pycbc/results/pygrb_postprocessing_utils.py

  • Sebastian showed new coh_PTF vs multi_inspiral performance comparison results. See issue and fork containing the optimized code

  • Erin and Jacob are done with xml to hdf transition on the postprocessing side. Currently waiting for PR approvals to start running full workflow tests. See issue

17 January 2022

  • Attendance: Andrew, Francesco, Erin, Hannah, Sam, Viviana, Steve
  • Call will move to 1 hour earlier from next week (2pm UTC)
  • Erin, Max, Andrew met to discuss time slides work, will be testing/developing together on branch this coming week
  • Francesco, Hannah, Viviana, Andrew to meet to plan work on outstanding post-processing tasks\
  • Sam and Steve have discussed EM-bright injections work and Sam has started coding
  • We might have HEALPix sky grids coming in during O4 (TBC), so we may want to make use of that for injection placement as well as grids for filtering

13 September 2021

  • Attendance: Andrew, Francesco, Max, Jam
  • Andrew: Injection code has an open PR. With Sam will look at getting that merged. Then integrating the EM-bright cut is next on list.
  • ACTION: Finish off injection PR.
  • Jam: Talking with Steve. Issue with the matrix definition, apparent inconsistency needs to be investigated a bit more. Will talk to Tom about the tuning of detection statistic.
  • Max: Can start by looking at workflow generation to check everything still runs end-to-end. Also can test the pycbc_multi_inspiral executable to see how that runs and look at the output. This can feed into the post processing development tasks.
  • ACTION: Andrew to send Max example scripts and config files for these tests.
  • Francesco: Post processing tasks can move on once the code output in hdf5 format can be used.
  • ACTION: Andrew to email usual contributors to check for other updates and send poll on the best time to have meetings this term.

08 Mar 2021

  • Attendance: Andrew, Francesco, Jam, Michael, Ryan, Sam
  • Action item (Francesco): create an issue on gwastro/pycbc for the new PyGRB webpage.
  • Can start including missed injection/loudest offsource event follow-up work by Michael in the webpages/
  • Jam hits an error on CIT running /home/jam.sadiq/pygrbtest/pycbc/IainCodes/ExperimentswithInjections/runscript_pycbcMultiInspiral.sh. Discussed how to proceed.
  • Sam will have to check the reference frame spins of injections are defined in; if it is the one aligned with the total angular momentum, we should be fine and no extra layer of coding will be needed.

01 Mar 2021

  • Attendance: Cam, Francesco, Jam, Michael, Sam
  • Discussed use of injections in pycbc_multi_inspiral and dc-errs in pycbc_pygrb_efficiency

22 Feb 2021

  • Attendance: Cam, Francesco, Jam, Ryan, Sam
  • Francesco & Cam: old pylal_cbc_cohptf_efficiency code split up in 4 independent executables that tackle a specific need (injection plots, statistics vs FAP distributions, html tables, efficiency calculations. Now we can clean these up idependently, optimize, etc.
  • Sam: manage to reproduce original injection distributions with my tools for a single point and I know have to work on the sky-jittering.
  • Jam: able to use pycbc_create_injections and learning about inclination angles. Problems with options to then use these in pycbc_multi_inspiral.

The rest of the call is dedicated to debugging this and it appears a more recent installation of PyCBC is needed.

15 Feb 2021

  • Attendance: Francesco, Jam, Ryan, Sam
  • Francesco: progress on simplifying pylal_cohptf_efficiency. All injection related scatter plots are now produced in a loop rather than serially. The loop will then be handled at workflow level and the plot production by a stand-alone executable. Once efficiency curve plots are also handled similarly, we will be able to abandon pylal_cohptf_efficiency, generate PyGRB output pages in PyCBC style, and move on to hdf5.
  • Jam: trying to experiment with new definitions of reweighted SNR. Help needed in generating injections. Will iterate with Sam on slack
  • Sam: learning more about hdf5 file creation and manipulation.
  • Ryan: starting summer student on PyGRB. Expected to work on chi-square cuts in June and July.

8 Feb 2021

  • Attendance: Andrew, Cam, Francesco, Jam, Ryan, Sam
  • Jam: expect to start making progress on new reweighted SNR starting next week
  • Sam: rudimentary ligolw_cbc_jitter_skyloc runs with hdf5 files I/O.
  • Jam: Circularly polarised coherent SNR function now works. Discussion follows on trigger timeseries with standard SNR and left-polarized SNR and how to compare them. Suggestion is to compare noise distributions and start with face-on injections. Then increase inclination and compare behaviours.

25 Jan 2021

  • Attendance: Andrew, Francesco, Jam, Michal, Michael
  • Summary of previous call.
  • Jam made good progress on circular polarization. His implementation runs without failures. Will check and understand all output and the submit for review in the coming 1-2 weeks.

18 Jan 2021

  • Updates on pylal_cbc_cohptf_efficiency rewrite and injection handling.
  • Sorted out usage of Gaussian prior distribution in setting up injection configuration files.

17 Dec 2020

10 Dec 2020

  • Please document your task(s), indicating git repo where devel work happens and code(s) that will be changed.
  • Injections work (Andrew/Sam/Steve): had a telecon about this, starting after Christmas break.
  • Iain’s thesis (linked above) formulated a few ideas for new reweighted SNR and he had carried out tests on raven/hawk. He tried to write a chi-square that focuses on a single det SNR. This was weighted by the detector SNR.
  • Jam (and Tom) on this task.
  • Jam (with Andrew and Tom’s guidance) also on circular polarized SNR but the reweighted SNR task has higher priority. This involves implementing the formula from Andrew’s paper and checking the paper-code conventions are consistent.
  • KAGRA colleagues can log on clusters and run (short!) tests on the headnodes, but cannot sumbit to queues.

3 Dec 2020

  • Attendance: Francesco, Hideyuki, Jam, Patrick, Philippe, Ryan, Samuel, Tessa
  • We will host this call at 2 PM GMT until the end of the year. Francesco will set up a new poll for the first half of 2021.
  • Tessa: I set up repo for end-to-end test runs; contains a very reduced template bank and a few GRBs from O3a.
  • Timeslides and skyloc fine with Tessa and Cam, but not the rest: Tessa will update the tables in this wikipage accordingly.
  • Cameron has started work on pylal_inj_efficiency. Francesco committing to it too.
  • Sam taking over from where Cam and Tessa leave.
  • Next call: continue assigning tasks and outlining work to be done.
  • 2 weeks from now: add a wikipage to your taks(s) and document work done and todo as much as possible. Point to commits, lines of code, etc.
  • Jam will inquire about picking up the two 2 SNR-related items.

20 Sep 2020

  • Tasks were discussed and prioritized. Task specific comments were added to the descriptions above. Other general remarks below.
  • Steve: Overarching goal is to get a pipeline that runs and produce a webpage maybe for a single sky point with just short slides.
  • Francesco: pycbc_multi_inspiral does not support xml.
  • Patrick: split execs into “do calculations” and “make plots” so that different versions of the code can be compared.
  • Francesco: Yes this is the plan.
  • Francesco: would be good to work on the efficiency stuff now so that when we have the new executable can make a webpage.
  • Andrew: suggest development cycles with forks to implement a specific task and merge often.
  • The timeline is April 2021. Mid November goal: pylal efficiency gone. End of November: time slides. First milestone by Christmas 2020: get short timeslides with one sky point webpage working.
Clone this wiki locally