-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeded RNGs only work reproducibly within a cell #192
Comments
I think this might be an upstream issue but I'm not too familiar with Malt so maybe it's intended and/or could be avoided: julia> using Malt
julia> w = Malt.Worker();
julia> Malt.remote_eval_fetch(w, quote
using Random
Random.seed!(42)
randn()
end)
0.7883556016042917
julia> Malt.remote_eval_fetch(w, quote
randn()
end)
0.8653602653617717
julia> Malt.remote_eval_fetch(w, quote
using Random
Random.seed!(42)
randn()
end)
0.7883556016042917
julia> Malt.remote_eval_fetch(w, quote
randn()
end)
-0.6715764280647957 |
Will investigate, thanks for the thorough issue description @penelopeysm. |
Maybe this has something to do with task-local RNG state? Could be that Malt evals in different tasks per fetch and that this has an effect on the RNG. If they were doing some random number generation themselves that changed the RNG state, then it should still lead to the same numbers if it was the same |
Hmm, the RNG object id seems to be stable, so it seems to be the same RNG? julia> Malt.remote_eval_fetch(w, quote
objectid(Random.default_rng())
end)
0xffffffff0968f0b4
julia> Malt.remote_eval_fetch(w, quote
objectid(Random.default_rng())
end)
0xffffffff0968f0b4
julia> Malt.remote_eval_fetch(w, quote
objectid(Random.default_rng())
end)
0xffffffff0968f0b4
julia> Malt.remote_eval_fetch(w, quote
objectid(Random.default_rng())
end)
0xffffffff0968f0b4 |
I think that's because it's julia> TaskLocalRNG() === TaskLocalRNG()
true The RNG state for that is stored in the task itself. |
cc @fonsp in case you happen to have any insights. |
This shows that we get different task-local RNG state in a new julia> Malt.remote_eval_fetch(w, quote
using Random
Random.seed!(123)
rngfields = filter(fieldnames(Task)) do field
startswith(string(field), "rng")
end
state = NamedTuple(map(rngfields) do field
field => getfield(Base.current_task(), field)
end)
end)
(rngState0 = 0xfefa8d41b8f5dca5, rngState1 = 0xf80cc98e147960c1, rngState2 = 0x20e2ccc17662fc1d, rngState3 = 0xea7a7dcb2e787c01, rngState4 = 0xf4e85a418b9c4f80)
julia> Malt.remote_eval_fetch(w, quote
rngfields = filter(fieldnames(Task)) do field
startswith(string(field), "rng")
end
state = NamedTuple(map(rngfields) do field
field => getfield(Base.current_task(), field)
end)
end)
(rngState0 = 0x4e291591396795b9, rngState1 = 0xc9a680b08b4c9006, rngState2 = 0x3bb5088d1c600f58, rngState3 = 0xe2bb063e7e800a67, rngState4 = 0xe264ff418d928577) Probably because each invocation is a different task? julia> Malt.remote_eval_fetch(w, quote
objectid(Base.current_task())
end)
0xb68e01fb526e2337
julia> Malt.remote_eval_fetch(w, quote
objectid(Base.current_task())
end)
0x49246d797797f2ad |
If this is hard to fix in Malt, I guess we could store the |
Maybe one could take some inspiration from how Test resets the task-local RNG: https://github.com/JuliaLang/julia/blob/1438b1578941d0f1cc8f8c958cf3bd2927fd482c/stdlib/Test/src/Test.jl#L1694-L1720 |
Interesting, it seems this looked quite different on 1.10 where |
@Pangoraw knows best about RNGs and threading in Pluto :) |
|
The more I thought about this issue, the more I got the feeling that this is an upstream issue in Malt. The behaviour seems completely surprising, in particular given that everything else is consistent between subsequent But then I tried the same example with Distributed: julia> using Distributed
julia> addprocs(1)
1-element Vector{Int64}:
2
julia> remotecall_fetch(eval, 2, quote
using Random
Random.seed!(42)
randn()
end)
0.7883556016042917
julia> remotecall_fetch(eval, 2, quote
randn()
end)
0.08641660475950964
julia> remotecall_fetch(eval, 2, quote
using Random
Random.seed!(42)
randn()
end)
0.7883556016042917
julia> remotecall_fetch(eval, 2, quote
randn()
end)
1.559594774368473
julia> remotecall_fetch(eval, 2, quote
objectid(Base.current_task())
end)
0xebc117a52ab2a7be
julia> remotecall_fetch(eval, 2, quote
objectid(Base.current_task())
end)
0xe14f2bb195cecba7 So it seems Malt is consistent with Distributed. I couldn't find any issue related to it in the Distributed repo but in the Julia repo someone had opened an issue regarding this behaviour: JuliaLang/julia#51225 It also links to a Pluto issue: fonsp/Pluto.jl#2290 So my take away is that the behaviour of Malt is consistent with Distributed but that this behaviour - both in Distributed and in Malt - is confusing and surprising for users. Based on the discussion in the Julia issue, I assume Distributed won't be changed - but since there are issues from both Pluto and Quarto users, maybe it could be addressed in Malt? Maybe Malt could expose an API for evaluations within a persistent task? |
This is an approach for Distributed (maybe it can be done more elegantly or efficiently?): julia> using Distributed
julia> addprocs(1)
1-element Vector{Int64}:
2
julia> const in = RemoteChannel(() -> Channel{Expr}(0));
julia> const out = RemoteChannel(() -> Channel{Any}(0));
julia> @everywhere function evalloop(out, in)
while true
expr = take!(in)
put!(out, eval(expr))
end
end
julia> remote_do(evalloop, 2, out, in)
julia> put!(in, quote
using Random
Random.seed!(42)
randn()
end);
julia> take!(out)
0.7883556016042917
julia> put!(in, quote
randn()
end);
julia> take!(out)
-0.8798585959543993
julia> put!(in, quote
using Random
Random.seed!(42)
randn()
end);
julia> take!(out)
0.7883556016042917
julia> put!(in, quote
randn()
end);
julia> take!(out)
-0.8798585959543993 What's the best way to do the same with Malt? I tried the following without success: julia> using Malt
julia> w = Malt.Worker();
julia> cin = Malt.worker_channel(w, :(cin = Channel{Expr}(0)));
julia> cout = Malt.worker_channel(w, :(cout = Channel{Any}(0)));
julia> Malt.remote_eval(w, quote
while true
expr = take!(cin)
put!(cout, eval(expr))
end
end);
julia> put!(cin, quote
using Random
Random.seed!(42)
randn()
end);
ERROR: Remote exception from Malt.Worker on port 9591 with PID 96591:
MethodError: Cannot `convert` an object of type Float64 to an object of type Expr
The function `convert` exists, but no method is defined for this combination of argument types.
Closest candidates are:
Expr(::Any...)
@ Core boot.jl:271
convert(::Type{T}, !Matched::T) where T
@ Base Base.jl:126
Stacktrace:
[1] put!(c::Channel{Expr}, v::Float64)
@ Base ./channels.jl:357
[2] top-level scope
@ REPL[7]:4
[3] eval
@ ./boot.jl:430 [inlined]
[4] (::var"#1#2"{Sockets.TCPSocket, UInt64, Bool, @Kwargs{}, Tuple{Module, Expr}, typeof(Core.eval)})()
@ Main ~/.julia/packages/Malt/Z3YQq/src/worker.jl:120
Stacktrace:
[1] unwrap_worker_result(worker::Malt.Worker, result::Malt.WorkerResult)
@ Malt ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:50
[2] _wait_for_response(worker::Malt.Worker, msg_id::UInt64)
@ Malt ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:325
[3] _send_receive
@ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:336 [inlined]
[4] #remote_call_wait#43
@ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:422 [inlined]
[5] remote_call_wait
@ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:421 [inlined]
[6] remote_eval_wait
@ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:491 [inlined]
[7] put!(rc::Malt.RemoteChannel{Any}, v::Expr)
@ Malt ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:532
[8] top-level scope
@ REPL[7]:1 OK, the Malt example doesn't error anymore when I remove the type parameter of the julia> using Malt
julia> w = Malt.Worker();
julia> cin = Malt.worker_channel(w, :(cin = Channel(0)));
julia> cout = Malt.worker_channel(w, :(cout = Channel(0)));
julia> put!(cin, quote
using Random
Random.seed!(42)
randn()
end);
julia> take!(cout)
0.7883556016042917
julia> put!(cin, quote
randn()
end)
julia> take!(cout)
0.10690497033454253
julia> put!(cin, quote
using Random
Random.seed!(42)
randn()
end);
julia> take!(cout)
0.7883556016042917
julia> put!(cin, quote
randn()
end)
julia> take!(cout)
-0.047221952638015874
julia> put!(cin, quote
using Random
Random.seed!(42)
randn()
end);
julia> take!(cout)
0.7883556016042917
julia> put!(cin, quote
randn()
end)
julia> take!(cout)
0.37395225126097864 Should this discrepancy with Distributed be considered a bug? Or do I use Malt incorrectly? |
Should we go with that? The reproducibility issue came up in our triage call today, and we thought we should inform users in the documentation about this problem if it takes more time to fix it. But if we can fix it right away, this might be the better option? |
Sounds fine to me 👍 |
I'll attempt a PR |
Nice! Could you also make a PR to Malt to add this consistency as a test? |
@fonsp Wait I thought I'd implement this logic in QuartoNotebookRunner, did you think it was a good fit for Malt itself? |
No my suggestion is to add a test to Malt for this behaviour, to guarantee that this will continue working in future Malt versions, since you depend on it. |
To my understanding I'm not depending on anything new in Malt with this workaround, I'm depending on Julia's channel and task behavior. If you change your implementation some day to use a stable task for all evals then the workaround might not be necessary anymore but it shouldn't break. |
Description
Calling
Random.seed!
at the top of a notebook doesn't ensure reproducibility of random number generation in other cells; it only works within the same cell thatseed!
is called.I'm not sure if this is intentional, but it's slightly counterintuitive behaviour so if it's intentional it might be good to document it somewhere.
A related issue (and perhaps an example of how this behaviour is confusing!) is that the Pumas tutorial on reproducibility says that the second batch of samples in
Random.seed!(123); rand(3); rand(3)
should always result in "the same behavior [...] every time we run the code". This is true if done in a REPL, but the Quarto source has the secondrand(3)
in a different cell and consequently the secondrand(3)
will not always return the same numbers.MWE
The following notebook works to demonstrate the behaviour. The first call to randn() should always give the same result no matter how many times the notebook is rendered. The second call however will not.
My
versioninfo
from the notebook aboveThe text was updated successfully, but these errors were encountered: