-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results for single vs multi-thread #1766
Comments
I can confirm it crashes with the same error on 1 thread but runs on 5 threads. I'm running on an M2 ARM Mac with Julia 1.9.4. |
Can you maybe reduce it further?
I'm also wondering why you set |
No, without AMR things are fine. Sorry for the confusion with the BCs, originally I ran an example with outflow BCs but then simplified it. |
The simulations crash (at least) at the same timestep. If you save a restart file prior to that crashing timestep and then restart the simulations, they finish for some reason. You can check that for the current simulation e.g. with changes cfl = 0.42
save_restart = SaveRestartCallback(interval = 1600,
save_final_restart = false)
sol = solve(ode, SSPRK33(;thread = OrdinaryDiffEq.False());
dt = 1.0,
ode_default_options()..., callback=callbacks) |
I found the place where this behaviour originates from, but I am severely confused. I added in the implementation of the Trixi.jl/src/auxiliary/math.jl Lines 76 to 84 in 3d899bc
an @inline function inv_ln_mean(x, y)
epsilon_f2 = 1.0e-4
f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
if f2 < epsilon_f2
return @evalpoly(f2, 2, 2/3, 2/5, 2/7) / (x + y)
else
if y/x < 0 # New
println("y/x = ", y/x)
println("log should crash, evaluating log:")
println("log(y/x) = ", log(y/x))
end
return log(y / x) / (y - x)
end
end On one thread, this gives the expected outcome:
On e.g. 16 threads however, I receive:
and the simulation continues - i.e., it seems like the line What is going on here? The elixir to reproduce this is given below: using OrdinaryDiffEq
using Trixi
###############################################################################
# semidiscretization of the compressible ideal GLM-MHD equations
equations = IdealGlmMhdEquations2D(1.4)
"""
initial_condition_rotor(x, t, equations::IdealGlmMhdEquations2D)
The classical MHD rotor test case. Here, the setup is taken from
- Dominik Derigs, Gregor J. Gassner, Stefanie Walch & Andrew R. Winters (2018)
Entropy Stable Finite Volume Approximations for Ideal Magnetohydrodynamics
[doi: 10.1365/s13291-018-0178-9](https://doi.org/10.1365/s13291-018-0178-9)
"""
function initial_condition_rotor(x, t, equations::IdealGlmMhdEquations2D)
# setup taken from Derigs et al. DMV article (2018)
# domain must be [0, 1] x [0, 1], γ = 1.4
dx = x[1] - 0.5
dy = x[2] - 0.5
r = sqrt(dx^2 + dy^2)
f = (0.115 - r)/0.015
if r <= 0.1
rho = 10.0
v1 = -20.0*dy
v2 = 20.0*dx
elseif r >= 0.115
rho = 1.0
v1 = 0.0
v2 = 0.0
else
rho = 1.0 + 9.0*f
v1 = -20.0*f*dy
v2 = 20.0*f*dx
end
v3 = 0.0
p = 1.0
B1 = 5.0/sqrt(4.0*pi)
B2 = 0.0
B3 = 0.0
psi = 0.0
return prim2cons(SVector(rho, v1, v2, v3, p, B1, B2, B3, psi), equations)
end
initial_condition = initial_condition_rotor
surface_flux = (flux_lax_friedrichs, flux_nonconservative_powell)
volume_flux = (flux_hindenlang_gassner, flux_nonconservative_powell)
polydeg = 4
basis = LobattoLegendreBasis(polydeg)
indicator_sc = IndicatorHennemannGassner(equations, basis,
alpha_max=0.5,
alpha_min=0.001,
alpha_smooth=true,
variable=density_pressure)
volume_integral = VolumeIntegralShockCapturingHG(indicator_sc;
volume_flux_dg=volume_flux,
volume_flux_fv=surface_flux)
solver = DGSEM(basis, surface_flux, volume_integral)
coordinates_min = (0.0, 0.0)
coordinates_max = (1.0, 1.0)
mesh = TreeMesh(coordinates_min, coordinates_max,
initial_refinement_level=4,
n_cells_max=10_000,
periodicity=true)
semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver)
tspan = (0.0, 0.0891)
ode = semidiscretize(semi, tspan)
###############################################################################
# ODE solvers, callbacks etc.
#=
restart_filename = "out/restart_001639.h5"
mesh = load_mesh(restart_filename)
semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver,
boundary_conditions = boundary_conditions)
tspan = (load_time(restart_filename), 0.089)
dt = load_dt(restart_filename)
ode = semidiscretize(semi, tspan, restart_filename);
=#
summary_callback = SummaryCallback()
amr_indicator = IndicatorHennemannGassner(semi,
alpha_max=0.5,
alpha_min=0.001,
alpha_smooth=false,
variable=density_pressure)
# For density_pressure
amr_controller = ControllerThreeLevel(semi, amr_indicator,
base_level=3,
med_level =7, med_threshold=0.0025,
max_level =9, max_threshold=0.25)
amr_callback = AMRCallback(semi, amr_controller,
interval = 8,
adapt_initial_condition=true,
adapt_initial_condition_only_refine=true)
cfl = 0.42 # SSPRK33
save_restart = SaveRestartCallback(interval = 1639,
save_final_restart = false)
stepsize_callback = StepsizeCallback(cfl=cfl)
analysis_interval = 1600
analysis_callback = AnalysisCallback(semi, interval = analysis_interval)
glm_speed_callback = GlmSpeedCallback(glm_scale=0.5, cfl=cfl)
callbacks = CallbackSet(summary_callback,
amr_callback,
stepsize_callback,
analysis_callback,
#save_restart,
glm_speed_callback)
###############################################################################
# run the simulation
sol = solve(ode, SSPRK33(),
dt=42.0,
save_everystep=false, callback=callbacks);
summary_callback() # print the timer summary |
It would seem to me that the line is not necessarily skipped (that would be super-duper weird) but rather that the exception that was thrown somehow got lost and did not properly surface (which is just regular weird). Can you try running this again but instead of using Polyester.jl threads with regular Julia threads? |
I.e., |
Yes. You can achieve this by modifying the Trixi.jl/src/auxiliary/auxiliary.jl Lines 227 to 235 in 3d899bc
instead of the code a few lines below that. |
Ah nice, yeah now the program fails with
I check if I can create a MWE for the |
For me using ThreadingUtilities.jl with version v0.5.1 resolves this issue in the sense that at least thread errors are printed (although the simulation still continues) |
I close this issue as it is caused by an upstream package and we found a workaround. |
Consider this example derived from the MHD rotor:
This does not run when I execute it single-threaded:
ERROR: DomainError with -7.2495041331637795:
And it gives yet another message when run with 2, 3, 4, threads:
ERROR: DomainError with -0.17505094228860651:
But works e.g. for 5 threads.
I was able to reproduce this on a different machine (Rocky Linux) and on Julia 1.9.3 (besides 1.9.4).
I am not sure what is going on here, most likely something in the interaction of CFL, GLM and AMR callback is not working right.
If the AMR callback is omitted, things are running.
The same story applies essentially also for different integrators such as our custom ones.
Can someone check this if this is reproducible, ideally on Mac or Windows?
The text was updated successfully, but these errors were encountered: