forked from USCbiostats/slurmR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgetting-started.Rmd
263 lines (196 loc) · 9.16 KB
/
getting-started.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
title: "Getting Started with slurmR"
author: "George G. Vega Yon"
date: "June 26, 2019 (last update Feb 4, 2020)"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 2
numbered_sections: true
vignette: >
%\VignetteIndexEntry{Getting Started with slurmR}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# The slurmR R package
The `slurmR` package provides wrappers and tools for integrating R with the
HPC workload manager [Slurm](https://slurm.schedmd.com/). Overall, there are
two different approaches to do so, either using Socket clusters, in essence,
following the workflow of CRAN's `parallel` package, or using Job arrays, which
are a different implementation of the same idea behind the `par*apply` functions
in the `parallel` package, which, at times, can be more powerful.
## Socket clusters
Another important component of `slurmR` is `makeSlurmCluster` function. This
allow users creating multi-node PSOCKCluster class objects. The implementation
of this function, wrapper of `parallel::makePSOCKcluster`, is very simple:
1. It submits a job to Slurm requesting the desired number of tasks. Each task
will then return information regarding the node at which it is operating.
2. Once Slurm allocates the resources, the master R session (from which the
job was submitted) will read in the node names returned by each task.
3. With the full list of nodenames in usage, `makeSlurmCluster` will pass the
list of names to `parallel::makePSOCKcluster`, which ultimately creates the
`cluster` class object.
After creating the cluster object, the workflow is exactly the same as with
the `parallel` package. Here is an example from the `makeSlurmCluster`
manual
```r
# Creating a cluster with 100 workers/offpring/child R sessions
cl <- makeSlurmCluster(100)
# Computing the mean of a 100 random uniforms within each worker
# for this we can use any of the function available in the parallel package.
ans <- parSapply(cl, 1:200, function(x) mean(runif(100)))
# We simply call stopCluster as we would do with any other cluster
# object
stopCluster(cl)
```
## Job arrays using the *apply family
Whenever `Slurm_lapply`, `Slurm_sapply`, or `Slurm_Map` are called, a lot of
things happen under the hood. What the user does not see is the way in which
`slurmR` sets us a job and submits it to the queue.
Just like `rslurm`, `slurmR` has two levels of job distribution: first, Slurm
Jobs, and second, within each job via `parallel::mclapply` and `parallel::mcMap`
(task forking). In general, the function `Slurm_*` is implemented as follows:
1. List whatever R packages are loaded, including the path to the R package.
2. List all the objects passed via ellipsis (`...`), and, together with `X` and
`FUN` or `f`, save them at `[tmp_path]/[job_name]/` as `[object-name].rds`.
3. Write out the corresponding R script and Slurm bash file, and save them as
`[tmp_path]/[job_name]/00-rscript.r`, and `[tmp_path]/[job_name]/01-bash.sh`
respectively.
4. If `plan = "collect"` (the default), the job will be submitted to the queue
via `sbatch()`, and the function will wait until is flagged as completed
by Slurm.
5. Once `sbatch()` is called, a Job Array will be submitted in which each R
job will lunch up to `mc.cores` forked processes (second layer of palatalization)
Once it is done, the the results can be collected using `Slurm_collect`, which
happens automatically if the user set `plan = "collect"`.
The next section discusses some advantages of submitting jobs using socket
clusters versus job arrays.
## Sockets vs Arrays
While socket clusters, created via `makePSOCKcluster` or, in the case of slurmR,
via `makeSlurmCluster`, may be more efficient in terms of data
communication[^dataonthenet], using job arrays has some important benefits over
socket cluster:
1. The number of workers can be much higher than clusters with the parallel
package.[^sessions] Users needing to work with hundreds or thousands of
jobs/instances may need to use job arrays instead.
2. If part of the job fails due to a failure of one of the tasks in the array,
the job can be easily resubmitted. The same is not necessarily true for
socket clusters.
3. Job arrays can run independently from the main session that started the
job. This means that, if for some reason the main session crashes or
stops, the job arrays will continue working regardless, and what's more,
the results can be collected anyway.
[^dataonthenet]: Data transfering on Socket clusters is done using serialization
with the `serialize` and `unserialize` functions. This way, data is sent directly
through the connection. In the case of job arrays, data is sent using `saveRDS`
and `readRDS` which involves I/O on the disk.
[^sessions]: The current default configuration of R does not allow having more
than 128 connections simulatenously (see `?connection`). This can be changed
during installation time.
# Example simulating Pi
We would like to implement a simulation algorithm to be run in a cluster. In this
case, we have the very simple function we would like to parallelize:
```r
simpi <- function(n) {
points <- matrix(runif(n*2), ncol=2)
mean(rowSums(points^2) <= 1)*4
}
```
This simple function generates an estimate of Pi. This approximation is based on
the following observation
$$
\mbox{Area} = \pi\times r^2 \implies \frac{Area}{r^2} = \pi
$$
Since we know what $r$ is, we just need to get an estimate of the Area to obtain
an approximation of $\pi$. A rather simple way of doing so is with Monte Carlo
simulations, in particular, sampling points in a unit square. The proportion
of points that fall within the unit circle, i.e. the proportion of points whose
distance to the origin is smaller than the radius of the circle, has an expected
value equal to the area of its circumscribed circle (for more details, check out
the Wikipedia article about this topic [here](https://en.wikipedia.org/wiki/Approximations_of_%CF%80#Summing_a_circle's_area)).
## Single node (machine), multi-core simulation
Using `parallel::mclapply`, we could just type
```r
set.seed(12)
ans <- parallel::mclapply(rep(1e6, 100), simpi)
mean(unlist(ans))
```
Which estimates pi using a single node(computer).
## Multi-job submission with job arrays
In the case of job arrays, we can use the `Slurm_lapply` function implemented in
the package. Before submitting a job to the queue, we need to specify some
options that are needed to create it:
- `tmp_path`: A path to a directory to which all computing nodes of the cluster
have read+write access.
- `job_name`: The name of the job, passed to `sbatch` via the `job-name` flag.
This will also be used as the name of the folder that is created within
`tmp_path`.
Ultimately, all the objects saved by the job will be located in the path defined
by `tmp_path`/`job_name`.
```r
library(slurmR)
# Setting required parameters
opts_slurmR$set_tmp_path("/stagging/slurmr-jobs/")
opts_slurmR$set_job_name("simulating-pi")
```
Moreover, we can specify more options to be set as default options for all the
jobs submitted for the current session. For example, we can set the default
partition and account as follows:
```r
# Optional parameters are set via set_opts
opts_slurmR$set_opts(partition="conti", account="lc_dvc")
```
A comprehensive list of options can be found
[here](https://slurm.schedmd.com/sbatch.html). To see what are all the current
defaults, we can just print the `opts_slurmR` object:
```
opts_slurmR
```
```{r echo=FALSE}
suppressMessages(library(slurmR))
suppressWarnings({
opts_slurmR$set_tmp_path("/stagging/slurmr-jobs/")
opts_slurmR$set_job_name("simulating-pi")
# Optional parameters are set via set_opts
opts_slurmR$set_opts(partition="conti", account="lc_dvc")
# We can look at the setup
opts_slurmR
})
```
Once we have specified all the needed options, we can do our `Slurm_lapply` call
and submit the job to the queue as follows:
```r
job <- Slurm_lapply(rep(1e6, 100), simpi, njobs=10, mc.cores=10, plan = "wait")
```
If `plan = "wait"`, then `Slurm_lapply` will return once the job is done (or failed).
To collect the results we can use the `Slurm_collect` function:
```r
ans <- Slurm_collect(job)
mean(unlist(ans))
```
Alternatively, we could have collected the results on the fly by telling `slurmR`
that the plan is to `"collect"` the results:
```r
ans <- Slurm_lapply(rep(1e6, 100), simpi, njobs=10, mc.cores=10, plan = "collect")
mean(unlist(job))
```
This way `Slurm_lapply` will do the `Slurm_collect` call before returning.
## Multi-node cluster object
Another way to do this is using `parallel::parLapply` with a multi-node socket
cluster.[^multinodereally] To do this, we can use the `makeSlurmCluster` function
and proceed as follows:
```r
cl <- makeSlurmCluster(50)
ans <- parallel::parLapply(cl, rep(1e6, 100), simpi)
mean(unlist(ans))
```
Once we are done with the calculations, we can stop the cluster object by
simply calling the `stopCluster` function:
```r
stopCluster(cl)
```
And `slurmR` will kill the job (and thus, the socket connections) calling
`scancel`.
[^multinodereally]: In general, Slurm will try to allocate multiple tasks in the
same node (machine). But if no node with that many resources is available,
the tasks will span multiple nodes.