Use `Threads.@spawn` instead of `@async` #86

devmotion · 2024-10-11T09:49:00Z

A very mechanical PR that goes beyond #72 and replaces every occurrence of @async with Threads.@spawn, as recommended in the Julia documentation of @async:

  │ Warning
  │
  │  It is strongly encouraged to favor Threads.@spawn over @async always even when no parallelism is required especially in publicly distributed libraries. This is
  │  because a use of @async disables the migration of the parent task across worker threads in the current implementation of Julia. Thus, seemingly innocent use of
  │  @async in a library function can have a large impact on the performance of very different parts of user applications.

Fixes #71.

fonsp · 2024-10-11T13:45:06Z

Hey @devmotion, thanks for contributing!

I would like to get some feedback on when to use Threads.@spawn vs @async, I believe the first is not always a better replacement of the latter. For example, Distributed uses both async and spawn in different places.

fonsp · 2024-10-11T13:48:54Z

I see that you also contributed to PumasAI/QuartoNotebookRunner.jl#192, I think this is a good case that we should also take into account. What is the desired behaviour with RNGs for Malt?

devmotion · 2024-10-11T13:49:05Z

Based on the docs and PRs such as JuliaLang/julia#55315 I assumed one should always use Threads.@spawn instead of @async. It would be good to know (and document) if that's not always the case 🙂

devmotion · 2024-10-11T13:53:44Z

What is the desired behaviour with RNGs for Malt?

I'm not sure actually, I think it depends on the intended use of Malt. For linear computations such as in QuartoNotebookRunner I think one would expect that state of all objects (including the RNG) is preserved between subsequent calls of remote_eval_fetch. But if the intended use are somewhat arbitrary unrelated Julia evaluations, then state does not matter. I guess the most common use is the former though?

I think this discussion is not directly related to this PR though, for that problem it doesn't seem to matter if one uses @async or @spawn.

devmotion · 2024-10-11T17:41:08Z

I assume the Mac test failures on 1.10 are unrelated since these tests also fail in other recent PRs #79 and #85.

pankgeorg · 2024-10-12T10:39:26Z

The change is not trivial without locking. The server-global objects that may be accessed by multiple threads work differently with @async vs @Spawn. This needs some investigation on Pluto internals too.

devmotion · 2024-10-12T13:17:07Z

Do you have an example for an object for which it would be problematic if the accessing task is not pinned to a thread anymore? And which Threads.@spawn might be problematic?

pankgeorg · 2024-10-14T10:15:02Z

I initially replied

Do you have an example for an object for which it would be problematic if the accessing task is not pinned to a thread anymore?

https://github.com/JuliaPluto/PlutoSliderServer.jl/blob/17f45ea66bc6b3ce2d70334efb2fc6bbbf77e17f/src/PlutoSliderServer.jl#L236

It's mainly this one.

And which Threads.@spawn might be problematic?

All of them, until proven otherwise. Julia's threads can migrate through processors, which means that they can run at the same clock time on a different processor. And if they do, they may update the same memory location from two threads, for any shared memory that is not protected by a lock. It's highly unlikely, but it's not impossible, and when it will happen, it will be super hard to debug.

See also JuliaWeb/HTTP.jl#1102 (comment). This basically means that the parent thread is pinned to thread 0, no matter what you do, because HTTP.jl uses @async which pins it.

Doing threads correctly is hard.

As a final note, it's not clear in which exact scenario that is a good idea, performance wise. Julia only writes to I/O from thread 1, so to get benefit from this approach you'd need to have some performance-heavy operations that don't write much (which is often the case for scientific notebooks). But it may be hard to actually get performance out of this, because pushing computations to another processor (from the main one), means that the data need to migrate to the other processor, and the reply needs to come back, to be written to the tcp socket to find its way back to the network interface!

That is not to say we be underwhelmed though! If you're interested, you can start by understanding the access patterns to objects in the shared address space from the HTTP handlers and introduce locking around them. It's mainly the server_session in Pluto itself.

which is wrong, because it assumes we're on PlutoSliderServer. But also that it's not 100% wrong, because there is the implicit assumption that the handler is pinned on the main thread of the Malt worker on every client that uses Malt. (the exact problem mentioned in JuliaWeb/HTTP.jl#1102 (comment))

User code can escape the main thread by doing Threads.@spawn themselves. But if we do this, we change the promises of Malt per se. It's the same reason why Julia doesn't define macro async to equal Threads.@spawn to begin with.

Use Threads.@spawn instead of @async

fe8b1a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `Threads.@spawn` instead of `@async` #86

Use `Threads.@spawn` instead of `@async` #86

devmotion commented Oct 11, 2024

fonsp commented Oct 11, 2024

fonsp commented Oct 11, 2024

devmotion commented Oct 11, 2024 •

edited

Loading

devmotion commented Oct 11, 2024

devmotion commented Oct 11, 2024 •

edited

Loading

pankgeorg commented Oct 12, 2024

devmotion commented Oct 12, 2024

pankgeorg commented Oct 14, 2024

Use Threads.@spawn instead of @async #86

Are you sure you want to change the base?

Use Threads.@spawn instead of @async #86

Conversation

devmotion commented Oct 11, 2024

fonsp commented Oct 11, 2024

fonsp commented Oct 11, 2024

devmotion commented Oct 11, 2024 • edited Loading

devmotion commented Oct 11, 2024

devmotion commented Oct 11, 2024 • edited Loading

pankgeorg commented Oct 12, 2024

devmotion commented Oct 12, 2024

pankgeorg commented Oct 14, 2024

Use `Threads.@spawn` instead of `@async` #86

Use `Threads.@spawn` instead of `@async` #86

devmotion commented Oct 11, 2024 •

edited

Loading

devmotion commented Oct 11, 2024 •

edited

Loading