In this example Snakemake workflow, we run a pipeline of two computational steps. The first step loops over multiple input data files and processes them in parallel. The subsequent second step collects the processing results and combines them.
The workflow can be cloned and run on the Aalto Triton cluster following the instructions below.
You can use it as a starting point for your own workflow.
Instructions to setup and run Snakemake on Aalto Triton.
First, connect to Triton using SSH.
Change directory to your work directory
cd $WRKDIR
Clone this repository and change directory
git clone https://github.com/AaltoRSE/snakemake-triton-example.git
cd snakemake-triton-example/multiple-input-files
Load mamba
tool from a module and check that it works by printing its version number
module load mamba
mamba --version
Use mamba
to create a conda environment containing Snakemake and the Slurm executor plugin
mamba env create --file snakemake.yml --prefix env/
Submit the Slurm job which starts Snakemake with
sbatch snakemake_slurm.sh
The batch script snakemake_slurm.sh
requests resources to run Snakemake itself.
We start Snakemake via a batch script so that it runs non-interactively and we are free to close our connection to Triton while we are waiting for the actual workflow computations to finish.
The contents of the batch script:
#!/bin/bash
#SBATCH --job-name=snakemake_slurm
#SBATCH --time=01:00:00 # Runtime needs to be longer than the time it takes to complete the workflow. Modify accordingly!
#SBATCH --cpus-per-task=2 # Generally enough CPUs to run Snakemake. No need to modify.
#SBATCH --mem=2G # Generally enough RAM to run Snakemake. Modify if Snakemake process runs out of memory.
#SBATCH --output=snakemake_slurm-%j.out
#SBATCH --error=snakemake_slurm-%j.err
module load mamba
source activate env/
snakemake --snakefile workflow/Snakefile --profile profiles/slurm/ --software-deployment-method conda --conda-frontend mamba --cores 2
Here's a breakdown of the Snakemake command (the final line):
-
--snakefile workflow/Snakefile
Explicitly state which Snakefile to run.
If omitted, Snakemake will try to find a Snakefile in the project root folder or under the
workflow/
folder.Check
workflow/Snakefile
for details of the workflow logic. -
--profile profiles/slurm/
Explicitly state the folder from which Snakemake looks for the
config.yaml
file.The config.yaml defines the cluster resources to request for rules in the Snakefile (cluster partition, number of CPUs and GPUs, RAM, runtime).
Check
profiles/slurm/config.yaml
for details. -
--software-deployment-method conda
Tell Snakemake to create and use the rule-specific conda environments defined in the Snakefile.
-
--conda-frontend mamba
Use
mamba
instead ofconda
. -
--cores 2
Tell Snakemake to use two CPUs to run itself.
The intermediate and final result files will be written to the project root folder under results/
.
In order to adapt the project to your own workflow, do the following:
- Replace the data and scripts in folders
resources/
andworkflow/scripts
with your own, respectively. - Replace/modify the workflow in
workflow/Snakefile
with your own. - Add/modify the requested computational resources for the Snakefile jobs (rules) to the profile file in
profiles/slurm/config.yaml
. - Increase the runtime value in the batch script
snakemake_slurm.sh
so that it is longer than the time it takes to complete the workflow. (Start e.g. with 12 hours and increase if necessary.)
Finally, it is often convenient to develop the workflow locally on your own laptop/desktop computer before moving onto Triton to run actual computation-intensive experiments.
To run the workflow locally, first clone the repository
git clone [email protected]:AaltoRSE/snakemake-triton-example.git
cd snakemake-triton-example/multiple-input-files
Create the conda environment and activate it
mamba env create --file snakemake.yml --prefix env/
mamba activate env/
Then, run Snakemake with
snakemake --snakefile workflow/Snakefile --software-deployment-method conda --conda-frontend mamba --cores 1
Since we now omitted the --profile
option which defined the Slurm resource configuration, all the computations will be run locally instead.