Add pipeline manager #181

xgarrido · 2025-01-02T20:05:10Z

This PR adds a pspipe-run binary to ease the sequential execution of the different python scripts needed when analysing data (such as ACT DR6). The configuration of the pipeline is done via a yaml file. Several examples for ACT DR6 data are proposed within data_analysis/yaml directory.

There are several options for the pipeline.yml file that relates to slurm configuration @ NERSC. By default, the location of python script is where pspipe has been installed but you can set a different path to the python scripts with the variable script_base_dir. You can also set the location of the global.dict file and where all the pipeline products will be stored. Both parameters can also be set via the command line.

The variables block (see yaml/pipeline_dr6.yml) allows you to overload values from the original dict file without changing the content of this file. This way the dict file always remains the same and the values are only change the time of the pipeline execution. The current dict file used by the run is in any case stored within the product_dir directory.

Finally, you have to define a pipeline section with the needed python modules. For each module, you can also set different option such as the number of tasks ntasks and the number of CPUs per task cpus_per_task. The script checks if the module has been already run and skips it. You can force the re-execution of a module by adding the option force: true at the module level. You can also ask for a minimal amount of time needed to run the module : in case the remaining allocation time is not enough at NERSC, then the program will tell you to re-allocate time. Here is an example of such block and its options

  get_covariance_blocks:
    force: true
    slurm:
      nodes: 2
      ntasks: 8
      cpus_per_task: 64
      minimal_needed_time: 03:00:00

The yaml/pipeline_dust.yml also shows how to handle different options for the same module name (using a matrix block in a similar way of what github does for CI).

…odes

The batch mode only produces the script file, no batch script is actually sent to slurm computer farm.

…scripts

Add pspipe-run binary to handle pipeline scheme through yaml files

ffb6de4

xgarrido requested review from adrien-laposta and thibautlouis January 2, 2025 20:05

xgarrido added the enhancement New feature or request label Jan 2, 2025

xgarrido added 7 commits January 2, 2025 13:51

fix script_base_dir path and use same syntax as slurm for number of n…

3e9523c

…odes

Add batch mode and README instructions

1462ecb

The batch mode only produces the script file, no batch script is actually sent to slurm computer farm.

Update batch mode support

66e7fe0

Finalize batch mode

68be13a

Add cmdline argument to feed simulation numbers to heavy computation …

e0c4f90

…scripts

Fix interactive mode

77e2319

Fix slurm nnodes

10961b5

thibautlouis merged commit 2827b72 into master Jan 8, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline manager #181

Add pipeline manager #181

xgarrido commented Jan 2, 2025 •

edited

Loading

Add pipeline manager #181

Add pipeline manager #181

Conversation

xgarrido commented Jan 2, 2025 • edited Loading

xgarrido commented Jan 2, 2025 •

edited

Loading