Allow using `Distributed` #300

fonsp · 2020-08-16T23:01:22Z

Pluto uses Distributed to create worker processes and to send Julia data structures between them. It works really well! And Distributed is pleasant to work with.

However, it means that Pluto notebooks cannot use Distributed, because your notebook's code is executed on a slave process - you can't create processes, and you can accidentally control other running notebooks.

One solution is to run all your notebooks in the master process by setting the parameter:

import Pluto; Pluto.run(workspace_use_distributed=false)

But this makes the notebook server unresponsive while any notebook is running code, and the stop button is disabled.

Solutions

What would a solution be? Should Pluto implement its own Distributed? This seems silly - we would get the most robust implementation by copying Distributed's source code directly. Maybe there is a way to internally use a copy of Distributed? Copy the contents of julia/stdlib/Distributed into /tmp, rename Distributed to DistributedCopy and import that? But is the session state completely contained inside the package?

Is there a way to use Distributed, but with "multiple global sessions"?

Can we create a thin wrapper around Distributed and make sure that the Pluto process is using this instead? For example, Distributed.add_procs() would be RealDistributed.remoteeval(Main, 1, :(RealDistributed.add_procs())). Will that also work for the packages that you import inside your notebook, that depend on Distributed?

The text was updated successfully, but these errors were encountered:

marius311 · 2020-09-07T07:48:55Z

Is there a reason why Pluto's workers need to be processes and can't just be threads?

(Two other upsides of threads would be reduced memory usage and the ability to work with non-serializable objects.)

fonsp · 2020-09-07T10:01:59Z

Hmm. Can one thread interrupt another thread stuck in while true end? What happens when one thread segfaults?

fonsp · 2020-09-07T10:47:53Z

Can different threads use separate package environments? Can they load different versions of the same package?

fonsp · 2020-09-07T10:48:32Z

(Those aren't rhetorical questions 🙃 it's just that I have never used threads in Julia)

Moelf · 2020-09-13T06:25:17Z

jupyter notebook can do it because kernel is one process and jupyter notebook is running on an entirely separate (python) process.

I see two outs:

mimic jupyter and make each notebook a separate process and that the web UI is tied to main session where Pluto.run() happened (so we can send SIGINT to each notebook process)
use main session as a worker broker, i.e when adding worker in a notebook, add at main process and 'shadow' / forward it to notebook worker via a socket file or something

carlocab · 2020-09-13T23:24:52Z

One solution is to run all your notebooks in the master process by setting an environment variable:
import Pluto; withenv(Pluto.run, "PLUTO_WORKSPACE_USE_DISTRIBUTED" => "false")

This workaround doesn't work for me, unfortunately. I'm still getting the same errors like workspace3 not defined, which is what would happen if I try to run Pluto normally but insist on using Distributed.

I guess it's not a big deal since this isn't the intended usage, but I thought it might be useful for you to know.

Thanks for the work you've put into Pluto!

fonsp · 2020-10-21T21:02:09Z

Some motivational words:

https://www.youtube.com/watch?v=nwdGsz4rc3Q

lukeburns · 2021-04-22T21:31:52Z

Workaround also fails for me. I get @everywhere not defined and Distributed not defined when I try to access it via Distributed.@everywhere.

This behavior is unexpected:

Why is this happening?

lukeburns · 2021-04-24T03:14:25Z

Quick and dirty workaround for addprocs and @everywhere, in case it's helpful to others.

### A Pluto.jl notebook ###
# v0.14.3

using Markdown
using InteractiveUtils

# ╔═╡ 797267f8-c7e6-4cb3-81d9-3ccc12956f56
begin
	macro everywhere(procs, ex)
		return esc(:(Main.@everywhere $procs $ex))
	end
	workers() = filter(pid -> pid != Main.myid(), Main.workers())
	macro everywhere(ex)
		# have pluto handle evaluation on workspace process
		return esc(:(@everywhere workers() $ex; eval($(Expr(:quote, ex)))))
	end
end

# ╔═╡ cfa09121-2457-42b1-9d20-e2518e7474e0
begin
	@everywhere 1 using Distributed
	addprocs(args...; kwargs...) = @everywhere 1 addprocs($args...; $kwargs...)
	rmprocs(args...; kwargs...) = @everywhere 1 rmprocs($args...; $kwargs...)
end

# ╔═╡ 6c0e0a11-3cc5-4ebe-b6f5-8df3e409cd05
@everywhere a = 2

# ╔═╡ ef45e5cc-cd10-4ad4-811a-9b767670dbf4
a^2

# ╔═╡ Cell order:
# ╠═797267f8-c7e6-4cb3-81d9-3ccc12956f56
# ╠═cfa09121-2457-42b1-9d20-e2518e7474e0
# ╠═6c0e0a11-3cc5-4ebe-b6f5-8df3e409cd05
# ╠═ef45e5cc-cd10-4ad4-811a-9b767670dbf4

fonsp · 2021-05-18T10:53:11Z

Hey @r-acad !

Can you remove this question here and open a new Discussion?

Oblynx · 2021-12-30T00:31:14Z

I wonder why Distributed can't nest. Is there any ongoing discussion with upstream?

Oblynx · 2022-01-07T17:27:53Z

You mention that

you can't create processes
you can accidentally control other running notebooks

If (2) is undesirable, we can use https://docs.julialang.org/en/v1/manual/distributed-computing/#Specifying-Network-Topology-(Experimental) like in #1812

For (1), I wonder if it would be possible to launch independent Julia processes from each worker, instead of running the notebooks inside the process created by Distributed.addproc. This kind of encapsulation might be possible with a custom Distributed.ClusterManager.
Pluto uses the default LocalManager ClusterManager; however, by overriding the Distributed.launch method, we could possibly do this.

Second thought, without having seen the internals of Distributed but after checking the ClusterManager API docs: there might be shared state between worker and master that contains a map pid => worker_id, maybe belonging to the ClusterManager.

In such a case, I wonder if it is simply a matter of explicitly creating a new LocalManager instance to use in the notebooks.

Oblynx · 2022-01-10T11:57:58Z

I believe this issue is important because Distributed is part of Julia stdlib and supports a core paradigm of modern programming. It arguably has a high teaching value and can help "beginner" programmers to understand how to make use of modern computing infrastructure in a very high level way. IIUC, Pluto aims to facilitate prototyping of code that can be used in practice, and spawning processes or using Dagger is usually best considered from the beginning, it's not just an optimization.

What do the Pluto maintainers think, is this still interesting?

fonsp · 2022-01-11T13:03:17Z

Hi @Oblynx ! Thanks so much for your input, we really want this fixed! I fully agree that Distributed is essential to Julia's ecosystem, and also to beginners.

We have not posted much to this issue, but we have been regularly discussing this for a long time now. @dralletje has made a prototype of Pluto without distributed, but I think the performance hit was too big.

I did not bring this up at julia itself because I am quite intimidated by the problem, since I have little experience with distributed computing. I am also worried that the API of Distributed is not designed to handle a nested tree structure.

Your approach sounds very promising! Going through the distributed codebase, I felt like a good approach would be to override some globals, simulating the PID=1 context on notebook processes. Creating a new ClusterManager sounds even better!

dralletje · 2022-01-25T19:30:50Z

😏

Oblynx · 2022-01-26T11:11:25Z

🐙 ! But how?

fonsp · 2022-06-15T16:20:18Z

Good news! We have a GSoC student working on this issue this summer! @savq

schlichtanders · 2023-02-01T08:57:34Z

Awesome to see progress with replacing Distributed!

As the GSoC is over, is there a further roadmap with next steps planned?

fonsp · 2023-02-01T15:18:28Z

Lots of progress happening in https://github.com/JuliaPluto/Malt.jl ! Take a look at https://github.com/JuliaPluto/Malt.jl/milestone/1

fonsp · 2023-09-18T13:46:03Z

We fixed it! 🎉

The fix is in #2240, thanks to @savq (GSoC), @Pangoraw, @habemus-papadum and @pankgeorg! Also thanks to @dralletje for previous work in #1854 and #1896.

Test release

The upcoming Pluto release will start a testing period where Pluto still uses Distributed by default, but you can enable Malt with:

Pluto.run(workspace_use_distributed_stdlib=false)

Please try it out (] add Pluto#main) and give us your feedback!

dralletje · 2023-09-18T13:55:53Z

O MY GODDDDD

schlichtanders · 2023-10-11T08:04:32Z

Now that Distributed is able to be loaded, it seems it runs into the serialization problem reported here #1030

so still no real use of Distributed till serialization is handled?

fonsp · 2023-10-18T09:32:40Z

@schlichtanders Can you give an example?

schlichtanders · 2023-10-19T08:59:35Z

I was adding it to the other mentioned open issue #1030 (comment)

Now as Pluto supports the use of Distributed via Malt.jl, this issue appears again.

A simple remote call like
remotecall_fetch(() -> readchomp(`hostname`), pid) 
already gives the error like
UndefVarError: `workspace#72` not defined
I opened an issue on julialang/julia for this JuliaLang/Distributed.jl#1

fonsp changed the title ~~Allow Distributed~~ Allow using Distributed Aug 16, 2020

fonsp added backend Concerning the julia server and runtime enhancement New feature or request help welcome If you are experienced in this topic - let us know! labels Aug 16, 2020

carlocab mentioned this issue Sep 13, 2020

detect using Distributed and warn about incompatibility #420

Open

fonsp mentioned this issue Sep 18, 2020

Can't interrupt on Windows #452

Closed

fonsp mentioned this issue Mar 25, 2021

Cannot use process-based parallel computing inside Pluto #1023

Closed

pankgeorg mentioned this issue Dec 29, 2021

Dagger @spawn can't launch workers from Pluto #1792

Open

Oblynx mentioned this issue Jan 7, 2022

Restrict notebook communication topology #1812

Closed

fonsp linked a pull request Jan 26, 2022 that will close this issue

Without distributed #1854

Closed

fonsp linked a pull request Feb 4, 2022 that will close this issue

With Distributed 2 #1896

Closed

fonsp removed the help welcome If you are experienced in this topic - let us know! label Jun 15, 2022

savq mentioned this issue Aug 5, 2022

Replace Distributed with Malt soon! #2240

Merged

fonsp closed this as completed in #2240 Sep 18, 2023

jamblejoe mentioned this issue Nov 1, 2023

Cannot use GaussianMixtures.jl inside Pluto because parallelization uses ALL workers davidavdav/GaussianMixtures.jl#108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using `Distributed` #300

Allow using `Distributed` #300

fonsp commented Aug 16, 2020 •

edited

Loading

marius311 commented Sep 7, 2020

fonsp commented Sep 7, 2020

fonsp commented Sep 7, 2020

fonsp commented Sep 7, 2020

Moelf commented Sep 13, 2020 •

edited

Loading

carlocab commented Sep 13, 2020

fonsp commented Oct 21, 2020

lukeburns commented Apr 22, 2021

lukeburns commented Apr 24, 2021 •

edited

Loading

fonsp commented May 18, 2021 •

edited

Loading

Oblynx commented Dec 30, 2021

Oblynx commented Jan 7, 2022 •

edited

Loading

Oblynx commented Jan 10, 2022 •

edited

Loading

fonsp commented Jan 11, 2022

dralletje commented Jan 25, 2022

Oblynx commented Jan 26, 2022

fonsp commented Jun 15, 2022

schlichtanders commented Feb 1, 2023

fonsp commented Feb 1, 2023

fonsp commented Sep 18, 2023 •

edited

Loading

dralletje commented Sep 18, 2023

schlichtanders commented Oct 11, 2023 •

edited

Loading

fonsp commented Oct 18, 2023

schlichtanders commented Oct 19, 2023

Allow using Distributed #300

Allow using Distributed #300

Comments

fonsp commented Aug 16, 2020 • edited Loading

Solutions

marius311 commented Sep 7, 2020

fonsp commented Sep 7, 2020

fonsp commented Sep 7, 2020

fonsp commented Sep 7, 2020

Moelf commented Sep 13, 2020 • edited Loading

carlocab commented Sep 13, 2020

fonsp commented Oct 21, 2020

lukeburns commented Apr 22, 2021

lukeburns commented Apr 24, 2021 • edited Loading

fonsp commented May 18, 2021 • edited Loading

Oblynx commented Dec 30, 2021

Oblynx commented Jan 7, 2022 • edited Loading

Oblynx commented Jan 10, 2022 • edited Loading

fonsp commented Jan 11, 2022

dralletje commented Jan 25, 2022

Oblynx commented Jan 26, 2022

fonsp commented Jun 15, 2022

schlichtanders commented Feb 1, 2023

fonsp commented Feb 1, 2023

fonsp commented Sep 18, 2023 • edited Loading

Test release

dralletje commented Sep 18, 2023

schlichtanders commented Oct 11, 2023 • edited Loading

fonsp commented Oct 18, 2023

schlichtanders commented Oct 19, 2023

Allow using `Distributed` #300

Allow using `Distributed` #300

fonsp commented Aug 16, 2020 •

edited

Loading

Moelf commented Sep 13, 2020 •

edited

Loading

lukeburns commented Apr 24, 2021 •

edited

Loading

fonsp commented May 18, 2021 •

edited

Loading

Oblynx commented Jan 7, 2022 •

edited

Loading

Oblynx commented Jan 10, 2022 •

edited

Loading

fonsp commented Sep 18, 2023 •

edited

Loading

schlichtanders commented Oct 11, 2023 •

edited

Loading