Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
wrathematics committed Feb 19, 2018
1 parent 689be1b commit 72f57c8
Showing 1 changed file with 76 additions and 29 deletions.
105 changes: 76 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,14 @@
* **Author:** Drew Schmidt


Tools for task-based parallelism with MPI via pbdMPI.
Tools for task-based parallelism with MPI via pbdMPI. Currently we provide:

1. `crlapply()` --- a serial `lapply()` with automatic checkpoint/restart
2. `mpi_napply()` --- a distributed `lapply()` that operates on an integer sequence. Supports checkpoint/restart and non-prescheduled workloads.
3. `mpi_lapply()` --- a fully general, distributed `lapply()`.

These functions are conceptually similar to `pbdMPI::pbdLapply()`, but with some key differences.



## Installation
Expand All @@ -28,9 +35,9 @@ remotes::install_github("RBigData/tasktools")



## Package Use
## Examples

We'll take a very simple example with a fake "expensive" function:
Complete source code for all of these examples can be found in the `inst/examples` directory of the tasktools source tree. Here we'll take a look at them in pieces. Throughout, we'll use a (fake) "expensive" function for our evaluations:

```r
costly = function(x, waittime)
Expand All @@ -40,55 +47,95 @@ costly = function(x, waittime)

sqrt(x)
}
```

We can run a checkpointed `lapply()` in serial via `crlapply()`:

crlapply::crlapply(1:10, costly, FILE="/tmp/cr.rdata", waittime=0.5)
```r
ret = crlapply(1:10, costly, FILE="/tmp/cr.rdata", waittime=0.5)
unlist(ret)
```

We can save this to the file `example.r`. We'll run it and kill it a few times:
If we save this source to the file `crlapply.r`. We can run it and kill it a few times to show its effectiveness:

```bash
$ r example.r
$ r crlapply.r
[1] "iteration: 1"
[1] "iteration: 2"
[1] "iteration: 3"
[1] "iteration: 4"
^C
$ r example.r
$ r crlapply.r
[1] "iteration: 4"
[1] "iteration: 5"
[1] "iteration: 6"
[1] "iteration: 7"
^C
$ r example.r
$ r crlapply.r
[1] "iteration: 8"
[1] "iteration: 9"
[1] "iteration: 10"
[[1]]
[1] 1

[[2]]
[1] 1.414214
[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
[9] 3.000000 3.162278
```

[[3]]
[1] 1.732051
Since we are operating on the integer sequence of values 1 to 10, we can easily parallelize this, even distributing the work across multiple nodes, with `mpi_napply()`:

[[4]]
[1] 2
```r
ret = mpi_napply(10, costly, checkpoint_path="/tmp", waittime=1)
comm.print(unlist(ret))
```

[[5]]
[1] 2.236068
To see exactly what happens during execution, we modify the printing in the "costly" function to be:

[[6]]
[1] 2.44949
```r
cat(paste("iter", i, "executed on rank", comm.rank(), "\n"))
```

Let's run this with 3 MPI ranks. We can again run and kill it a few times to demonstrate the checkpointing:

```bash
$ mpirun -np 3 r mpi_napply.r
iter 4 executed on rank 1
iter 7 executed on rank 2
iter 1 executed on rank 0
^Citer 2 executed on rank 0
iter 8 executed on rank 2
iter 5 executed on rank 1

$ mpirun -np 3 r mpi_napply.r
iter 9 executed on rank 2
iter 3 executed on rank 0
iter 6 executed on rank 1
iter 10 executed on rank 2

[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
[9] 3.000000 3.162278
```

[[7]]
[1] 2.645751
There is also a non-prescheduling variant. This can be useful if there is a lot of variance among function evaluation for the inputs, and you want the values to be executed on a "first come, first serve" basis. All we have to do is set `preschedule=FALSE`:

[[8]]
[1] 2.828427
```r
ret = mpi_napply(10, costly, preschedule=FALSE, waittime=1)
comm.print(unlist(ret))
```

[[9]]
[1] 3
Now, it's worth noting that in this case, rank 0 behaves as the manager, doling out work. So it is not used in computation:

[[10]]
[1] 3.162278
```bash
iter 1 executed on rank 1
iter 2 executed on rank 2
iter 3 executed on rank 1
iter 4 executed on rank 2
iter 5 executed on rank 1
iter 6 executed on rank 2
iter 7 executed on rank 1
iter 8 executed on rank 2
iter 9 executed on rank 1
iter 10 executed on rank 2

[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
[9] 3.000000 3.162278
```

This too supports checkpointing, but hopefully how that works is clear.

0 comments on commit 72f57c8

Please sign in to comment.