Skip to content

kamransafi/MPCDF_HPC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting started: MPCDF Raven HPC

This tutorial is written for the Max Planck Computing and Data Facility (MPCDF) Raven HPC users from the MPI of Animal Behavior. It focuses on using R on Raven and provides specific instructions on how to run jobs that require popular spatio-temporal R packages. This tutorial has been put together and is maintained by the Animal-Environment Interactions research group.

The Raven user guide can be found here

Overview of the workflow

The Raven HPC system is a cluster of interconnected computers that is designed to process large amounts of data and perform complex computations. Raven's workflow involves submitting a job to the system, which is then processed in a queue.

A job is a set of instructions that tells the system what computations to perform. It is submitted using a batch system, which allows the user to specify the resources needed to run the job, such as the number of processors, the amount of memory, and the amount of time needed to complete the job. When the job is completed, the results are stored in the file system for retrieval or later analysis.

To make the whole process more enjoyable for yourself, have a look at these tutorials on directory management in Unix and basics of shell scripting if these topics are new to you.

Step 1: Create an account

  1. Go to https://selfservice.mpcdf.mpg.de/register/antrag.php?lang=en
  2. Select MPI of Animal Behavior
  3. Pick one person to approve your application

Step 2: Set up Two-factor authentication (2FA)

  1. Install a 2FA app on your mobile phone (e.g. Aegis Authenticator, andOTP, FreeOTP, etc.)
  2. When using the app for the first time, read the QR code available here to connect to the cluster: https://selfservice.mpcdf.mpg.de/index.php?r=security

Find more info here

Step 3: Log in

If on the MPI network (or using a VPN to connect to one. this is required for the Bücklestraße or uni Konstanz), open the terminal and type:

Enter your MPCDF password and then the token generated by the 2FA app on your phone.

If not on the MPI network, open the terminal and type:

Enter your MPCDF password and then the token generated by the 2FA app on your phone.

Then, ssh into Raven:

Enter your MPCDF password and then the token generated by the 2FA app on your phone.

Note on file systems

File System /u

This is your home directory $HOME, i.e. your default working directory when logging in to Raven and when using R. The complete path to this directory is /raven/u/username/. The default disk quota in /u is 2.5 TB.

File System /ptmp

This directory is designed for batch jobs. The compete path is /raven/ptmp/username/. Files in this directory are not backed up and will be removed if not accessed for more than 12 weeks. There is no size limit on this directory, so that the user can manage their data according to their needs. Users are required to remove all files that are not currently used. A best practice is to move your output files to your /u directory and then to your local machine as soon as your batch job is complete. This is done in the example files accompanying this tutorial.

Step 4: Transfer your files

Raven does not have access to the files on your local machine. You need to copy the files that you need for your job (e.g. input files, scripts, etc.) to your /u directory on Raven.

From your local terminal, use the shell function copy cp or secure copy scp to move your files to Raven. After each copying attempt, you will be prompted to enter your MPCDF password and the 2FA token.

scp path_to_file_on_local_machine [email protected]:/raven/u/username/

Step 5: Load the required software packages and test your code

MPCDF uses environment modules to provide software packages and enable using different software versions. No modules are automatically loaded, so load the modules that require using the command module load package_name/version. For example, load the R module as follows:

module load R/4.2

Some other useful module commands:

module avail # see all available modules
find-module R # to locate modules and its dependencies
module avail R # see all available versions
module load R # loads the default version of R
module unload R # unload R
module purge # unload all currently loaded modules
module show R # see details of the module
module list # shows all currently loaded modules

Before going ahead and starting R, open a screen. A screen will allow you to go back to your last instance in case your connection to Raven is interrupted, your terminal window closes, etc. Here are some useful screen related commands:

screen -S my_screen_name # open a new screen
screen -r my_screen_name # open already existing (detached) screen 
screen -list # see a list of created screens 
screen -d -r my_screen_name # open a screen that is still attached
screen -S my_screen_name -X quit  # kill a screen or use ctrl+AK
screen ???? # detach or use ctrl+AD

Now, create a new screen and open R to test your code: sh screen -S new_R_screen R

Once in R, you can test your code to make sure that you can install the necessary libraries, read in your file, and overall make sure that your code (or a small version of it) works on the cluster. You will then have more confidence in your code before submitting a batch job using the slurm batch system.

install.packages("tidyverse")
install.packages("move")

DO NOT RUN long scripts or your entire job here! The node that you long into is only for editing and managing your data. If you run memory- or time-consuming jobs, you will get an email from the MPCDF asking you to stop the job.

NOTE: if you have issues installing R libraries, like e.g. sf, units, ctmm or others, one solution is to build a container and work within a docker. Instructions on how to do that are here "Using_apptainer.md".

Step 6: Prepare your slurm file

Note on SLURM

SLURM (Simple Linux Utility for Resource Management) is a job scheduler and resource manager used by the Raven HPC system. It is a software system that helps manage the allocation of computing resources (such as processors, memory, and storage) on a cluster of computers, so that jobs can be run efficiently and effectively.

Based on the resources that you need, your job will be either exclusive, where all resources on the nodes are allocated to the job, or shared, where several jobs share the resources of one node. In this case it is necessary that the number of CPUs and the amount of memory are specified for each job. Overview of the available per-job resources on Raven is as follows. See the Raven user guide for more information.


    Job type          Max. CPUs            Number of GPUs   Max. Memory      Number     Max. Run
                      per Node               per node        per Node       of Nodes      Time
   =============================================================================================
    shared    cpu     36 / 72  in HT mode                     120 GB          < 1       24:00:00
    ............................................................................................
                      18 / 36  in HT mode        1            125 GB          < 1       24:00:00
    shared    gpu     36 / 72  in HT mode        2            250 GB          < 1       24:00:00
                      54 / 108 in HT mode        3            375 GB          < 1       24:00:00
   ---------------------------------------------------------------------------------------------
                                                              240 GB         1-360      24:00:00
    exclusive cpu     72 / 144 in HT mode                     500 GB         1-64       24:00:00
                                                             2048 GB         1-4        24:00:00
    ............................................................................................
    exclusive gpu     72 / 144 in HT mode        4            500 GB         1-80       24:00:00
    exclusive gpu bw  72 / 144 in HT mode        4            500 GB         1-16       24:00:00
   ---------------------------------------------------------------------------------------------

A job submit will automatically choose the right partition and job parameters from the resource specification.

The SLURM script

The SLURM file is a shell program that contains instructions for the cluster and the job that is to be run. The header includes the instructions. The lines starting with #SBATCH are SLURM directives. Here a detailed explanation of each of the elements contained in the scripts (also see example scripts):

  • specifies the shell to be used to run the script
#!/bin/bash -l 
  • Create files with the standard output and error. It is recommended to save these files within a specific folder (e.g. here called "messages"), specially if you are running many jobs, or jobs as a array, many of these files will be created. Remember to delete these files when not needed any more to not use up unnecessary space. The job.out.%j files contain the information on the job duration, how much memory it needed and the CPU utilization. The job.err.%j files contain everything that is being printed in the R console, therefore also the errors and is very useful fo debugging the code. %j will be the job ID. Use _%A_%a in array jobs, it will append the jobID_arrayIndex
#SBATCH -o ./messages/job.out_%j
#SBATCH -e ./messages/job.err_%j

## if the job is an array:
#SBATCH -o ./messages/job.out_%A_%a
#SBATCH -e ./messages/job.err_%A_%a
  • Initial working directory. You can specify a directory in which the R and slrm file can be found and the folder where the output and error (from above) will be stored.
#SBATCH -D ./your_directory/
  • When using apptainer, the initail working directory has to contain the .sif file, so most probably the home directory.
#SBATCH -D ./
  • Give your job a name. This name will appear in the job.out and job.err files and in the list of jobs running or queuing when checking with e.g. squeue -u <user_name> (see below)
#SBATCH -J your_job_name
  • Setting number of nodes, cpus and memory. Depending on the type of job you run, a different combination of options is needed. The standard node has 72 CPUs and 120GB memory (see table above). If you ask for (or need) one entire node, the job will be queued until one entire node is available and that can take time. Requiring an entire node can be done setting the memory to 120GB or by setting the CPUs per task to 72. To speed up the process, one should put some thought in what is really needed and not just asking for a high number.

Description of options:
--nodes: is the number of nodes, if it is and array job, you can set to several nodes, but only if you request at least half of the node, i.e. 36 CPUs per task. Assumption: if you request all memory or CPUs of the nodes the waiting time will probaly increase and slow down the process.
--ntasks-per-node: assumption: this value cannot be greater than 72. In the case of an array job that has as input very heavy files, it might be worth while to state how many jobs should be done on a single node. When iterations of the same job run on different nodes, all files are copied to each node and that can take time. In all other cases, it is probably faster if the system distributes the jobs as space is made available.
--cpus-per-task: a single job will always run on a single core. If one is doing paralelization within the R script, here the number of CPUs should be specified. This number has to be the same as the one set in the R script via e.g. doMC::registerDoMC() or doParallel::registerDoParallel().
--array: this is the number of times the R code will be executed. The maximum is 300 at a time. The numbers stated here will be the value "i" in the R script (see example file). If you have more than 300, you will have to submit the next job when the previous has finished, stating in the second one e.g. 301-600, and so on.
--mem: (in MB) by limiting the memory of each job, the nodes can be shared for multiple jobs, of the same user or of different users. Set the memory according to what you think you will need, do tests, and check in the "job.out_jobId" file created above the memory needed for a test run (+~20%). Remember that if you assign the entire memory of a node (i.e. 120Gb) but are only using 1 CPU, this will block the entire node. If you need that much memory, it is fine, but if not you’ll be only using 1/20 of the power. Try to optimize the memory you need, so you can use multiple CPUs per node.

A. single job - R code is run once: given that a single job always will run on one CPU, and therefore on 1 node

#SBATCH --nodes=1 ## another value does not make sense
#SBATCH --ntasks-per-node=1 ## another value does not make sense
#SBATCH --cpus-per-task=1 ## another value does not make sense
#SBATCH --mem=8000 ## what ever is required by your job 

B. parallelization type 1 - array job: R script runs multiple times, each time with different input data. No parallelization in the R code.

#SBATCH --nodes=1 ## another value does not make sense
#SBATCH --ntasks-per-node=1 ## can be multiple tasks per node, but can also be omited and system will optimally distribute the jobs
#SBATCH --cpus-per-task=1 ## another value does not make sense
#SBATCH --array=1-300 # this is the number of times your R code will be executed, the maximum is 300
#SBATCH --mem=8000 ## what ever is required by your job. This is the memory per each instance (each time the R script is run)

C. parallelization type 2: R script runs once, but there is paralelization in the R code

#SBATCH --nodes=1 ## another value does not make sense
#SBATCH --ntasks-per-node=1 ## another value does not make sense
#SBATCH --cpus-per-task=12 ##this number of CPUs has to be the same as the one stated in the R script
#SBATCH --mem=8000 ## what ever is required by your job.

D. array job with with parallelization within the R code: combination of B and C

#SBATCH --nodes=15 ## can be multiple nodes, but only when at least 50% of the cpus are requested, i.e. when "--cpus-per-task" 36 or more, if not job submission will give error
#SBATCH --ntasks-per-node=10 ## can be multiple tasks per node, but can also be omitted and system will optimally distribute the jobs
#SBATCH --cpus-per-task=36 ##this number of CPUs has to be the same as the one stated in the R script
#SBATCH --array=1-300 ## this is the number of times your R code will be executed, the maximum is 300
#SBATCH --mem=8000 ## what ever is required by your job. This is the memory per each instance (each time the R script is run)
  • Setting the max duration of each instance (max. is 24 hours). A single job (with internal parallelization or not) can run max. 24h. With array jobs, each instance can run max. 24. Use the information of the job.out_jobId file of the test run to estimate the necessary time. Do not use 24h by default, try be state the time that you estimate that you actually need +20%
#SBATCH --time=00:05:00  
  • Set an email alert. If mail-type=all than you will get an email when your job starts, is completed or failed. This is very useful than alternatively having to constantly login an check if the job is completed.
#SBATCH --mail-type=all 
#SBATCH [email protected]

The script then continues with loading the required modules and running the R script. Load the same modules that you used in Step 5 when testing your code.

# Load compiler and modules:
module purge 
module load R/4.2

# Run the program:
R CMD BATCH your_R_script.R 2>&1 errorlog

The 2>&1 errorlog command will write all messages that R produces while running your program (including warning and error messages) to an errorlog file. This is very helpful for debugging your code.

Save your slurm script as: your_slrm_script.slrm

If you are using a apptainer (see detailed instructions here "Using_apptainer.md"), the two sections above are slightly different:

# Load compiler and MPI modules. Probably other module needs to loaded as all is happening within the container
module purge
module load apptainer/1.2.2

# Run the program if your data is in the home directoy:
srun apptainer exec geospatial_latest_updated.sif Rscript myproject/myscripts/my_R_script.r

# Run the program if your data is on ptmp, it has to be mounted:
srun apptainer exec --bind /ptmp/<your_username> geospatial_latest_updated.sif Rscript myproject/myscripts/my_R_script.r 

# Run the program if job is submitted as an array:
srun apptainer exec geospatial_latest_updated.sif Rscript myproject/myscripts/my_R_script.r $SLURM_ARRAY_TASK_ID

Step 7: Submit your job

Make sure that you have your slrm script, R script, and any input files on Raven. It's easier if they are all in one directory, which you can specify in your slurm script as your working directory.

The maximum number of jobs a user can run or have queuing simultaneously are 300, independently if they are coming from an array job or multiple separate jobs.

sbatch your_slrm_script.slrm

Other useful commands:

squeue -u <user_name> # Check the status of your job(s)
scontrol show job <job_id> # Details of job status
scancel <job_id> # Cancel a job
sinfo # List the available batch queues (partitions).

Step 8: Transfer your output files

Just like copying the files from the local system to Raven, to copy your files from Raven to your local machine can be done using cp or secure copy scp from your local terminal. After each copying attempt, you will be prompted to enter your MPCDF password and the 2FA token. You can copy all contents of a directory, and any subdirectories using the recursive -r flag.

scp [email protected]:/raven/u/username/file_to_copy path_to_target_directory_on_local_machine
scp -r [email protected]:/raven/u/username/dir_to_copy path_to_target_directory_on_local_machine

About

Tutorial and sample files for using the MPCDF HPC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published