update readme

RBigData · Feb 19, 2018 · 72f57c8 · 72f57c8
1 parent 689be1b
commit 72f57c8
Showing 1 changed file with 76 additions and 29 deletions.
diff --git a/README.md b/README.md
@@ -6,7 +6,14 @@
 * **Author:** Drew Schmidt
 
 
-Tools for task-based parallelism with MPI via pbdMPI.
+Tools for task-based parallelism with MPI via pbdMPI. Currently we provide:
+
+1. `crlapply()` --- a serial `lapply()` with automatic checkpoint/restart
+2. `mpi_napply()` --- a distributed `lapply()` that operates on an integer sequence. Supports checkpoint/restart and non-prescheduled workloads.
+3. `mpi_lapply()` --- a fully general, distributed `lapply()`.
+
+These functions are conceptually similar to `pbdMPI::pbdLapply()`, but with some key differences.
+
 
 
 ## Installation
@@ -28,9 +35,9 @@ remotes::install_github("RBigData/tasktools")
 
 
 
-## Package Use
+## Examples
 
-We'll take a very simple example with a fake "expensive" function:
+Complete source code for all of these examples can be found in the `inst/examples` directory of the tasktools source tree. Here we'll take a look at them in pieces. Throughout, we'll use a (fake) "expensive" function for our evaluations:
 
 ```r
 costly = function(x, waittime)
@@ -40,55 +47,95 @@ costly = function(x, waittime)
 
   sqrt(x)
 }
+```
+
+We can run a checkpointed `lapply()` in serial via `crlapply()`:
 
-crlapply::crlapply(1:10, costly, FILE="/tmp/cr.rdata", waittime=0.5)
+```r
+ret = crlapply(1:10, costly, FILE="/tmp/cr.rdata", waittime=0.5)
+unlist(ret)
 ```
 
-We can save this to the file `example.r`. We'll run it and kill it a few times:
+If we save this source to the file `crlapply.r`. We can run it and kill it a few times to show its effectiveness:
 
 ```bash
-$ r example.r
+$ r crlapply.r 
 [1] "iteration: 1"
 [1] "iteration: 2"
 [1] "iteration: 3"
-[1] "iteration: 4"
 ^C
-$ r example.r
+$ r crlapply.r 
+[1] "iteration: 4"
 [1] "iteration: 5"
 [1] "iteration: 6"
 [1] "iteration: 7"
 ^C
-$ r example.r
+$ r crlapply.r 
 [1] "iteration: 8"
 [1] "iteration: 9"
 [1] "iteration: 10"
-[[1]]
-[1] 1
 
-[[2]]
-[1] 1.414214
+ [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
+ [9] 3.000000 3.162278
+```
 
-[[3]]
-[1] 1.732051
+Since we are operating on the integer sequence of values 1 to 10, we can easily parallelize this, even distributing the work across multiple nodes, with `mpi_napply()`:
 
-[[4]]
-[1] 2
+```r
+ret = mpi_napply(10, costly, checkpoint_path="/tmp", waittime=1)
+comm.print(unlist(ret))
+```
 
-[[5]]
-[1] 2.236068
+To see exactly what happens during execution, we modify the printing in the "costly" function to be:
 
-[[6]]
-[1] 2.44949
+```r
+cat(paste("iter", i, "executed on rank", comm.rank(), "\n"))
+```
+
+Let's run this with 3 MPI ranks. We can again run and kill it a few times to demonstrate the checkpointing:
+
+```bash
+$ mpirun -np 3 r mpi_napply.r 
+iter 4 executed on rank 1 
+iter 7 executed on rank 2 
+iter 1 executed on rank 0 
+^Citer 2 executed on rank 0 
+iter 8 executed on rank 2 
+iter 5 executed on rank 1 
+
+$ mpirun -np 3 r mpi_napply.r 
+iter 9 executed on rank 2 
+iter 3 executed on rank 0 
+iter 6 executed on rank 1 
+iter 10 executed on rank 2 
+
+ [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
+ [9] 3.000000 3.162278
+```
 
-[[7]]
-[1] 2.645751
+There is also a non-prescheduling variant. This can be useful if there is a lot of variance among function evaluation for the inputs, and you want the values to be executed on a "first come, first serve" basis. All we have to do is set `preschedule=FALSE`:
 
-[[8]]
-[1] 2.828427
+```r
+ret = mpi_napply(10, costly, preschedule=FALSE, waittime=1)
+comm.print(unlist(ret))
+```
 
-[[9]]
-[1] 3
+Now, it's worth noting that in this case, rank 0 behaves as the manager, doling out work. So it is not used in computation:
 
-[[10]]
-[1] 3.162278
+```bash
+iter 1 executed on rank 1 
+iter 2 executed on rank 2 
+iter 3 executed on rank 1 
+iter 4 executed on rank 2 
+iter 5 executed on rank 1 
+iter 6 executed on rank 2 
+iter 7 executed on rank 1 
+iter 8 executed on rank 2 
+iter 9 executed on rank 1 
+iter 10 executed on rank 2 
+
+ [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
+ [9] 3.000000 3.162278
 ```
+
+This too supports checkpointing, but hopefully how that works is clear.