Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a
pspipe-run
binary to ease the sequential execution of the different python scripts needed when analysing data (such as ACT DR6). The configuration of the pipeline is done via ayaml
file. Several examples for ACT DR6 data are proposed withindata_analysis/yaml
directory.There are several options for the
pipeline.yml
file that relates toslurm
configuration @ NERSC. By default, the location of python script is wherepspipe
has been installed but you can set a different path to thepython
scripts with the variablescript_base_dir
. You can also set the location of theglobal.dict
file and where all the pipeline products will be stored. Both parameters can also be set via the command line.The
variables
block (seeyaml/pipeline_dr6.yml
) allows you to overload values from the originaldict
file without changing the content of this file. This way thedict
file always remains the same and the values are only change the time of the pipeline execution. The currentdict
file used by the run is in any case stored within theproduct_dir
directory.Finally, you have to define a
pipeline
section with the neededpython
modules. For each module, you can also set different option such as the number of tasksntasks
and the number of CPUs per taskcpus_per_task
. The script checks if the module has been already run and skips it. You can force the re-execution of a module by adding the optionforce: true
at the module level. You can also ask for a minimal amount of time needed to run the module : in case the remaining allocation time is not enough at NERSC, then the program will tell you to re-allocate time. Here is an example of such block and its optionsThe
yaml/pipeline_dust.yml
also shows how to handle different options for the same module name (using amatrix
block in a similar way of what github does for CI).