Skip to content

4. How to run the pipeline

Verena Kutschera edited this page Nov 26, 2024 · 6 revisions

The pipeline has to be run using a terminal multiplexer like tmux or screen to be able to send the Snakemake process to the background (e.g. see this introduction to tmux).

If your cluster is using a workload manager such as slurm, the pipeline can send jobs automatically to the system, one job per rule. Please check the WIKI page with requirements under point 5 for information how to use a slurm profile for slurm cluster execution.

For more information on Snakemake incl. a tutorial check the official Snakemake documentation.

The following instructions assume that you first went through all the pipeline requirements before attempting to start a pipeline run.

Detailed instructions on how to run GenErode on the PDC/KTH cluster Dardel can be found here.

How to run the pipeline with a slurm profile

1) Activate the conda environment

(replace "generode" with the name you chose when creating the conda environment)

conda activate generode

2) Run the pipeline in dry mode to check each step

(rename the --profile parameter if you called your profile anything else than slurm):

snakemake --profile slurm -n &> YYMMDD_dry_run.out

Check the log file (YYMMDD_dry_run.out) if everything works as it should.

3) Start the main run

(rename the --profile parameter if you called your profile anything else than slurm):

snakemake --profile slurm &> YYMMDD_main_run.out

Check the log file (YYMMDD_main_run.out) regularly while the pipeline is running.

Note that Snakemake has changed their rerun behaviour in Snakemake version 7.8 (see https://github.com/snakemake/snakemake/issues/1694). This means that when changing metadata tables, Snakemake will now run everything from the beginning, stating "Set of input files has changed since last execution". To get around this, use --rerun-triggers mtime in the Snakemake command when starting the pipeline from the command line. This also applies to any local changes in code or other parameters.

Useful Snakemake flags: --ri or --rerun-incomplete can be useful whenever the pipeline has to be re-started. It will tell Snakemake to re-run any rules where the output might be corrupt or incomplete. -k or --keep-going will ensure the pipeline runs as far as possible when a job fails.