REQ - Guidance on writing R code that will run in parallel #10

terrymclaughlin · 2023-03-09T14:23:12Z

For example:

Explain single-threaded vs multi-threaded and what parallel processing is
Explore the futureverse! 🚀
- Correctly identifying the number of CPUs available to use the session with {parallelly}
- Write functions and interate using the {furrr} package, rather than {purrr}
Use the {multidplyr} backend with {dplyr}

Other useful links:

The text was updated successfully, but these errors were encountered:

terrymclaughlin · 2023-03-09T14:25:22Z

Just alerting you to this issue. My plan is to draft some guidance, as it's clear that most R code in PHS is being written as single-threaded and not taking advantage of the multiple CPUs available in a Posit Workbench session. This could result in significant performance improvements when processing large datasets.

terrymclaughlin · 2023-03-09T14:26:31Z

@CliveWG @fraserstirrat

In case you see any queries coming in requesting guidance on parallel processing, you can tell people that this is on our radar and we're developing guidance for this.

Moohan · 2023-03-09T16:03:43Z

furrr is low-hanging fruit - It requires code to already be written to use purrr but if it has it's a super simple switch. There is a bit of overhead on 'setting up the workers' so it's a subjective call on when it's worth it though. I guess that applies to all of the parallelisation methods though!

jakeybob · 2023-03-09T16:26:21Z

I'm not sure what the best route is here. All the different available methods make it quite thorny.

I don't think purrr is used that widely internally at the moment. Or at least, I suspect any code that uses purrr heavily was probably written by a techy person who would be able to convert to furrr easily on their own.

And, I feel like any guidance along the lines of "here are several different ways you can do this" won't be well received.

So do we choose one way to recommend...? This would be better for consistency and support/training but a) I'm not convinced this is the best idea and b) even if it is, I don't know which method would be the best to pick...

Should probably sidestep the foreach and doParallel side of things and go with furrr or multidplyr though I guess? They're both tidyverse friendly. multidplyr probably slots into existing dplyr code blocks the easiest and has the smaller mental overhead, but 🤷🏻

Moohan · 2023-03-29T16:33:06Z

Thought this might be the best place to ask this question, and if no one knows it's just another thing to add to future guidance!

If I use plan(multisession) which is the one you're led to when using RStudio, on PWB will this create new nodes?

For example, if I have a session with 8 CPUs and 4GB of RAM, will this be shared among the 'sessions' or will it spawn new nodes for the new sessions, in which case what limits/specs do they have?

jakeybob · 2023-03-29T16:49:07Z

I suspect this will run in the current session only and the workers spawned will be more equivalent to "background jobs" (running as independent R processes but sharing the parent session total resources) than "workbench jobs" (starting new sessions with their own resources).

Only one way to find out for sure though – give it a punt and see what happens? 😀

terrymclaughlin self-assigned this Mar 9, 2023

terrymclaughlin added the enhancement New feature or request label Mar 9, 2023

rmccreath transferred this issue from Public-Health-Scotland/R-Resources Mar 22, 2023

rmccreath changed the title ~~Write guidance on writing R code that will run in parallel i.e. on multiple CPUs~~ REQ - Guidance on writing R code that will run in parallel Mar 22, 2023

rmccreath added documentation Improvements or additions to documentation and removed enhancement New feature or request labels Mar 22, 2023

terrymclaughlin linked a pull request Jun 18, 2024 that will close this issue

Writing R code that will run in Parallel #110

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REQ - Guidance on writing R code that will run in parallel #10

REQ - Guidance on writing R code that will run in parallel #10

terrymclaughlin commented Mar 9, 2023

terrymclaughlin commented Mar 9, 2023

terrymclaughlin commented Mar 9, 2023

Moohan commented Mar 9, 2023

jakeybob commented Mar 9, 2023

Moohan commented Mar 29, 2023 •

edited

Loading

jakeybob commented Mar 29, 2023

REQ - Guidance on writing R code that will run in parallel #10

REQ - Guidance on writing R code that will run in parallel #10

Comments

terrymclaughlin commented Mar 9, 2023

terrymclaughlin commented Mar 9, 2023

terrymclaughlin commented Mar 9, 2023

Moohan commented Mar 9, 2023

jakeybob commented Mar 9, 2023

Moohan commented Mar 29, 2023 • edited Loading

jakeybob commented Mar 29, 2023

Moohan commented Mar 29, 2023 •

edited

Loading