-
Notifications
You must be signed in to change notification settings - Fork 0
/
exercise.txt
55 lines (40 loc) · 1.81 KB
/
exercise.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
EESSI pilot scenario
====================
(based on what Radovan presented: start small, experiment, scale up)
Assume you are already using Saga for some calculations, but you need
to start using Fram. Specifically, you want to run GROMACS.
Goal: Your goal is to run GROMACS with a target payload (number of
steps: 20000) as fast as you can/should/need on Fram. You may
also think about costs (billing units) and efficiency (Amdahl's
law).
- Begin small (single core, few steps) on Saga.
- Experiment with job settings (ntasks,mem,time).
Change them in the example script saga_GROMACS.sh
Submit the script (sbatch saga_GROMACS.sh) and
verify the result in the slurm-*.out file.
- When you reached 4-8 ntasks on Saga, move over to Fram.
- Scripts available on GitHub
Example workflow
cd $USERWORK
git clone https://github.com/trz42/HPC-NTK-user-course-2021.git
cd HPC-NTK-user-course-2021
#submit saga_GROMACS.sh as is
sbatch saga_GROMACS.sh
Iterate until you use about 4-8 tasks/cores (ntasks)
1 - Edit scripts (adjust memory (+/-), adjust time (+/-),
adjust ntasks (+/-), adjust NSTEPS (target 20000))
2 - Submit (sbatch saga_GROMACS.sh or sbatch fram_GROMACS.sh)
3 - Monitor (squeue) and check results (less or cat slurm out
file, cat slurm-23458183.out)
You may encounter problems such as
- Job was cancelled because it ran out of time.
- Job was cancelled because it tried to use to much memory.
- Look for error messages in slurm-*.out files.
- Fix them by adjusting parameters (time, mem, ntasks, NSTEPS).
Move over to Fram
Get the scripts from GitHub (see above).
Use the example script fram_GROMACS.sh
Begin with the last set of parameters you used on Saga (except
memory is not necessary to specify on Fram).
Repeat the iteration until you reach the target number of
steps (20000).