-
Notifications
You must be signed in to change notification settings - Fork 7
4. How to run the pipeline
The pipeline has to be run using a terminal multiplexer like tmux or screen to be able to send the Snakemake process to the background (e.g. see this introduction to tmux).
If your cluster is using a workload manager such as slurm, the pipeline can send jobs automatically to the system, one job per rule. Please check the WIKI page with requirements under point 5 for information how to use a slurm profile for slurm cluster execution.
For more information on Snakemake incl. a tutorial check the official Snakemake documentation.
The following instructions assume that you first went through all the pipeline requirements before attempting to start a pipeline run.
Detailed instructions on how to run GenErode on the PDC/KTH cluster Dardel can be found here.
(replace "generode" with the name you chose when creating the conda environment)
conda activate generode
(rename the --profile
parameter if you called your profile anything else than slurm
):
snakemake --profile slurm -n &> YYMMDD_dry_run.out
Check the log file (YYMMDD_dry_run.out
) if everything works as it should.
(rename the --profile
parameter if you called your profile anything else than slurm
):
snakemake --profile slurm &> YYMMDD_main_run.out
Check the log file (YYMMDD_main_run.out
) regularly while the pipeline is running.
Note that Snakemake has changed their rerun behaviour in Snakemake version 7.8 (see https://github.com/snakemake/snakemake/issues/1694). This means that when changing metadata tables, Snakemake will now run everything from the beginning, stating "Set of input files has changed since last execution". To get around this, use
--rerun-triggers mtime
in the Snakemake command when starting the pipeline from the command line. This also applies to any local changes in code or other parameters.
Useful Snakemake flags:
--ri
or--rerun-incomplete
can be useful whenever the pipeline has to be re-started. It will tell Snakemake to re-run any rules where the output might be corrupt or incomplete.-k
or--keep-going
will ensure the pipeline runs as far as possible when a job fails.