Skip to content

Commit

Permalink
Update execution_modes.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
Delaunay authored Aug 1, 2024
1 parent a590686 commit 4d2f716
Showing 1 changed file with 25 additions and 24 deletions.
49 changes: 25 additions & 24 deletions docs/execution_modes.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,30 @@
Milabench processes overview
============================

* milabench main process
* gather metrics from benchmark processes, save them to file
* manages the benchmarks (timeout etc...)

* if ``per_gpu`` is used, milabench will launch one process per GPU (sets ``CUDA_VISIBLE_DEVCES``)
* each processes log their GPU data
* might spawn a monitor process
* will init pynvml
* dataloader will also spawn process workers
* usually not using GPU

* if ``njobs`` is used, milabench will launch a single process (torchrun)
* torchrun in turn will spawn one process per GPU
* RANK 0 is used for logging
* RANK 0 might spawn a monitor process
* will init pynvml
* dataloader will also spawn process workers
* usually not using GPU

Plan
====
----

per_gpu
-------
+++++++

``per_gpu``: used for mono gpu benchmarks, spawn one process per gpu and run the same benchmark

Expand Down Expand Up @@ -36,7 +58,7 @@ Milabench will essentially execute something akin to below.
)
njobs
-----
+++++

``njobs`` used to launch a single jobs that can see all the gpus.

Expand Down Expand Up @@ -64,27 +86,6 @@ Milabench will essentially execute something akin to below.
)
Milabench processes overview
----------------------------

* milabench main process
* gather metrics from benchmark processes, save them to file
* manages the benchmarks (timeout etc...)

* if ``per_gpu`` is used, milabench will launch one process per GPU (sets ``CUDA_VISIBLE_DEVCES``)
* each processes log their GPU data
* might spawn a monitor process
* will init pynvml
* dataloader will also spawn process workers
* usually not using GPU

* if ``njobs`` is used, milabench will launch a single process (torchrun)
* torchrun in turn will spawn one process per GPU
* RANK 0 is used for logging
* RANK 0 might spawn a monitor process
* will init pynvml
* dataloader will also spawn process workers
* usually not using GPU
Expand Down

0 comments on commit 4d2f716

Please sign in to comment.