-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Offload DG method to GPUs #1485
Draft
jkravs
wants to merge
96
commits into
trixi-framework:main
Choose a base branch
from
jkravs:dg_gpu_port
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
96 commits
Select commit
Hold shift + click to select a range
096b860
Initial; Added KernelAbstraction dependency
jkravs 61adfd3
Add bad initial gpu offloading for volume integral
jkravs 3aa6897
Add possiblity to test correctness using CI
jkravs 92b9e68
Merge branch 'main' into dg_gpu_port
jkravs fabd803
Fixed data race
jkravs bf5102c
RWTH cluster CI added
jkravs 5f14874
Merge branch 'main' into dg_gpu_port
jkravs 4a13bde
Fixed invalid typo in yml key
jkravs 22cf6e4
Change Backend with trixi_include
jkravs 92c9f34
Fixed gitlab ci
jkravs 35f4926
Removed after scripts
jkravs a05ed8a
Install OrdinaryDiffEq in CI
jkravs a848f72
Fixed ODE version because of dependency breakage
jkravs fdf3931
Added KernelAbstractions dependency in CI
jkravs 54d9fc7
Fixed crash of default cpu computation
jkravs 48c31b8
Removing workgroupsize autotunes kernels in CUDA.jl
jkravs 651d26d
Initial interface flux calculation offloaded
jkravs 9b0bc7e
Merge branch 'main' into dg_gpu_port
jkravs ad2763f
Fixed CUDA offloading bugs
jkravs 79c5286
Fixed scalar indexing issue during interface init
jkravs e79d8bb
Julia 1.9 test pipeline
jkravs 9e3db29
Fixed invalid yaml file
jkravs eeb04d2
Added Extensions to better deal with missing KA API calls
jkravs 7e48fb7
Removed runtime downcasting bottleneck
jkravs 1b3ac6c
Merge branch 'main' into dg_gpu_port
jkravs b4dd4da
Boundary GPU ported
jkravs 382712f
Fixed issues with zero element arrays and CUDA.jl
jkravs ee93f61
Prolong2mortars on gpu
jkravs baf75e2
Dummy for mortar calc
jkravs 6b7a738
Surface integral computation offloaded
jkravs 39cbf80
apply_jacobian offloaded
jkravs 58d165e
calc_sources offloaded
jkravs ba1a530
Merge branch 'main' into dg_gpu_port
jkravs f27bb8e
Added suggestions
jkravs b075711
Initalize derivative_dhat on GPU
jkravs 6df94be
surface_flux_values initalized on gpu
jkravs 8fc47fe
boundary_interpolation initalized on gpu
jkravs 3eb35d0
initalize inverse_jacobian on gpu
jkravs 5350e78
Fixed init of derivative matrix
jkravs 7e23af5
Fixed scalar indexing issue
jkravs 530d10a
Merge branch 'main' into dg_gpu_port
jkravs c39790b
u/du initalized on gpu
jkravs 7cd2045
Merge branch 'main' into dg_gpu_port
jkravs 8a1de82
Better compability of StrideArrays and KA
jkravs 31915a7
Fix scalar indexing on test
jkravs 9a81463
Removed unesseccary allocations in rhs_gpu
jkravs dce23a9
Merge branch 'main' into dg_gpu_port
jkravs 9a5a8c4
fixed scalar indexing issues on analysis callback
jkravs be39bb8
Removed allowed scalar indexing because of performance bottleneck
jkravs 35843bf
Benchmark CI
jkravs 19ba7c8
Merge branch 'main' into dg_gpu_port
jkravs dd65936
Display benchmark results
jkravs 14db899
P4est advection basic testing
jkravs 1c17e35
Replace Symbols with Int
jkravs 651ef17
Merge branch 'main' into dg_gpu_port
jkravs ce44e4f
Typo
jkravs c243726
Change Ints to Index Enum
jkravs 92736d8
Changed all Indices Symbols to Enum
jkravs f877a12
Separate internal loop for prolong2interfaces
jkravs 184cf37
Inital CPU offloading possible
jkravs d428557
Merge branch 'main' into dg_gpu_port
jkravs 1edb32c
Test p4est elixir in CI
jkravs 93daf64
Separated internal loop in calc_interface_flux
jkravs 0459fdf
calc_interface_flux offloaded to cpu
jkravs 44343de
Merge branch 'main' into dg_gpu_port
jkravs a3cd9d4
Separated shared loop of p4est weak form kernel
jkravs 9d8c156
CPU offloading of p4est weak form kernel
jkravs c2eec36
Separated internal loop of surface integral calc
jkravs 2084f33
Add CPU Offloading of surface integral calc
jkravs 8fb7a3e
Longer CI Timeout
jkravs 21dfe5a
apply_jacobian offloaded for p4est meshes
jkravs 6eb5933
P4est advection basic on GPU
jkravs d81fe1a
CI Test of p4est on GPU
jkravs 85efe0a
Init interfaces.u on GPU
jkravs d3b5e09
Merge branch 'main' into dg_gpu_port
jkravs 38ea4bc
Init data from interface container
jkravs 2c6443e
Data from element container init on GPU
jkravs bd2696d
Removed scalar indexing with dg init on GPU
jkravs ad1b7b6
Merge branch 'main' into dg_gpu_port
jkravs 8442991
New elixirs
jkravs 2186fd4
Remove all symbols
jkravs 1164e4c
Merge branch 'main' into dg_gpu_port
jkravs a4fb500
reset du offload
jkravs 3cfc529
Flux differencing kernel offload
jkravs 369518f
prolong2interfaces offloaded
jkravs 50a3ba0
calc_interfaces in 3d p4est offloaded
jkravs 0c1ea4b
dummy functions for p4est 3d boundaries/mortars
jkravs 6e89971
surface_integral p4est 3d offload
jkravs 4ea72b1
apply jacobian p4est gpu offload
jkravs a597b10
calc source terms 3d offload
jkravs 2ddb8d6
Merge branch 'main' into dg_gpu_port
jkravs 3022e5c
elixir_advection_basic_fd offloaded to gpu
jkravs 74b61a1
p4est euler taylor green vortex elixir offloaded to gpu
jkravs 8e69091
Reduced memory usage of calculate dt
jkravs a880c16
Merge branch 'main' into dg_gpu_port
jkravs c7df644
Merge branch 'main' into dg_gpu_port
jkravs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
stages: | ||
- test | ||
|
||
.trigger-template: | ||
stage: test | ||
trigger: | ||
include: /.test-ci.yml | ||
strategy: depend | ||
forward: | ||
yaml_variables: true | ||
|
||
julia-1.8-test: | ||
extends: .trigger-template | ||
allow_failure: true | ||
variables: | ||
JULIA_EXEC: "julia-1.8" | ||
|
||
julia-1.9-test: | ||
extends: .trigger-template | ||
variables: | ||
JULIA_EXEC: "julia-1.9" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
stages: | ||
- precompile | ||
- test | ||
- benchmark | ||
|
||
|
||
default: | ||
tags: [ "downscope" ] | ||
|
||
.julia-job: | ||
variables: | ||
SLURM_PARAM_ACCOUNT: "-A thes1464" | ||
SLURM_PARAM_TASKS: "-n 1" | ||
SLURM_PARAM_CPUS: "--cpus-per-task=24" | ||
SLURM_PARAM_TIME: "-t 10:00:00" | ||
before_script: | ||
- source /work/co693196/MA/julia.sh | ||
|
||
precompile-job: | ||
extends: .julia-job | ||
stage: precompile | ||
script: | ||
- mkdir run | ||
- cd run | ||
- $JULIA_EXEC --project="." -e 'using Pkg; Pkg.develop(PackageSpec(path=".."))' | ||
|
||
.test-job: | ||
extends: .julia-job | ||
stage: test | ||
before_script: | ||
- source /work/co693196/MA/julia.sh | ||
- mkdir run | ||
- cd run | ||
- $JULIA_EXEC --project="." -e 'using Pkg; Pkg.add(["OrdinaryDiffEq", "KernelAbstractions"]); Pkg.develop(PackageSpec(path=".."));' | ||
|
||
.benchmark-job: | ||
extends: .julia-job | ||
stage: benchmark | ||
before_script: | ||
- source /work/co693196/MA/julia.sh | ||
- mkdir run | ||
- cd run | ||
- $JULIA_EXEC --project="." -e 'using Pkg; Pkg.add(["OrdinaryDiffEq", "KernelAbstractions", "BenchmarkTools"]); Pkg.develop(PackageSpec(path=".."));' | ||
|
||
cpu-test-job: | ||
extends: .test-job | ||
script: | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi; trixi_include(pkgdir(Trixi, "test", "test_tree_2d_advection.jl"), offload=false)' | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi; trixi_include(pkgdir(Trixi, "test", "test_p4est_2d.jl"), offload=false)' | ||
|
||
cpu-offload-test-job: | ||
extends: .test-job | ||
script: | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi; trixi_include(pkgdir(Trixi, "test", "test_tree_2d_advection.jl"), offload=true)' | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi; trixi_include(pkgdir(Trixi, "test", "test_p4est_2d.jl"), offload=true)' | ||
|
||
gpu-offload-test-job: | ||
extends: .test-job | ||
variables: | ||
SLURM_PARAM_GPUS: "--gres=gpu:volta:1" | ||
SLURM_PARAM_PARTITION: "--partition=c18g" | ||
script: | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Pkg; Pkg.add("CUDA")' | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi, CUDA; using CUDA.CUDAKernels; trixi_include(pkgdir(Trixi, "test", "test_tree_2d_advection.jl"), offload=true, backend=CUDABackend())' | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi, CUDA; using CUDA.CUDAKernels; trixi_include(pkgdir(Trixi, "test", "test_p4est_2d.jl"), offload=true, backend=CUDABackend())' | ||
|
||
cpu-benchmark-job: | ||
extends: .benchmark-job | ||
script: | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi, BenchmarkTools; show(stderr, "text/plain", @benchmark trixi_include($joinpath(examples_dir(), "tree_2d_dgsem", "elixir_advection_basic.jl"), offload=false))' 1> /dev/null | ||
|
||
cpu-offload-benchmark-job: | ||
extends: .benchmark-job | ||
script: | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi, BenchmarkTools; show(stderr, "text/plain", @benchmark trixi_include($joinpath(examples_dir(), "tree_2d_dgsem", "elixir_advection_basic.jl"), offload=true))' 1> /dev/null | ||
|
||
gpu-offload-benchmark-job: | ||
extends: .benchmark-job | ||
variables: | ||
SLURM_PARAM_GPUS: "--gres=gpu:volta:1" | ||
SLURM_PARAM_PARTITION: "--partition=c18g" | ||
script: | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Pkg; Pkg.add("CUDA")' | ||
- $JULIA_EXEC --project="." --threads=24 -e 'using Trixi, CUDA, CUDA.CUDAKernels, BenchmarkTools; show(stderr, "text/plain", @benchmark trixi_include($joinpath(examples_dir(), "tree_2d_dgsem", "elixir_advection_basic.jl"), offload=true, backend=CUDABackend()))' 1> /dev/null |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
using OrdinaryDiffEq | ||
using Trixi | ||
using KernelAbstractions | ||
|
||
############################################################################### | ||
# semidiscretization of the linear advection equation | ||
|
||
backend = CPU() | ||
|
||
advection_velocity = (0.2, -0.7, 0.5) | ||
equations = LinearScalarAdvectionEquation3D(advection_velocity) | ||
|
||
# Create DG solver with polynomial degree = 3 and (local) Lax-Friedrichs/Rusanov flux as surface flux | ||
solver = DGSEM(polydeg=3, surface_flux=flux_lax_friedrichs, | ||
volume_integral=VolumeIntegralFluxDifferencing(flux_lax_friedrichs), backend=backend) | ||
|
||
coordinates_min = (-1.0, -1.0, -1.0) # minimum coordinates (min(x), min(y), min(z)) | ||
coordinates_max = ( 1.0, 1.0, 1.0) # maximum coordinates (max(x), max(y), max(z)) | ||
|
||
# Create P4estMesh with 8 x 8 x 8 elements (note `refinement_level=1`) | ||
trees_per_dimension = (4, 4, 4) | ||
mesh = P4estMesh(trees_per_dimension, polydeg=1, | ||
coordinates_min=coordinates_min, coordinates_max=coordinates_max, | ||
initial_refinement_level=1) | ||
|
||
# A semidiscretization collects data structures and functions for the spatial discretization | ||
semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition_convergence_test, solver; backend=backend) | ||
|
||
############################################################################### | ||
# ODE solvers, callbacks etc. | ||
|
||
# Create ODE problem with time span from 0.0 to 1.0 | ||
tspan = (0.0, 1.0) | ||
ode = semidiscretize(semi, tspan; offload=false, backend=backend) | ||
|
||
# At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup | ||
# and resets the timers | ||
summary_callback = SummaryCallback() | ||
|
||
# The AnalysisCallback allows to analyse the solution in regular intervals and prints the results | ||
analysis_callback = AnalysisCallback(semi, interval=100) | ||
|
||
# The SaveSolutionCallback allows to save the solution to a file in regular intervals | ||
save_solution = SaveSolutionCallback(interval=100, | ||
solution_variables=cons2prim) | ||
|
||
# The StepsizeCallback handles the re-calculation of the maximum Δt after each time step | ||
stepsize_callback = StepsizeCallback(cfl=1.2) | ||
|
||
# Create a CallbackSet to collect all callbacks such that they can be passed to the ODE solver | ||
callbacks = CallbackSet(summary_callback, analysis_callback, save_solution, stepsize_callback) | ||
|
||
|
||
############################################################################### | ||
# run the simulation | ||
|
||
# OrdinaryDiffEq's `solve` method evolves the solution in time and executes the passed callbacks | ||
sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), | ||
dt=1.0, # solve needs some value here but it will be overwritten by the stepsize_callback | ||
save_everystep=false, callback=callbacks); | ||
|
||
# Print the timer summary | ||
summary_callback() |
80 changes: 80 additions & 0 deletions
80
examples/p4est_3d_dgsem/elixir_euler_taylor_green_vortex.jl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
using OrdinaryDiffEq | ||
using Trixi | ||
using KernelAbstractions | ||
|
||
############################################################################### | ||
# semidiscretization of the compressible Euler equations | ||
|
||
equations = CompressibleEulerEquations3D(1.4) | ||
|
||
""" | ||
initial_condition_taylor_green_vortex(x, t, equations::CompressibleEulerEquations3D) | ||
|
||
The classical inviscid Taylor-Green vortex. | ||
""" | ||
function initial_condition_taylor_green_vortex(x, t, equations::CompressibleEulerEquations3D) | ||
A = 1.0 # magnitude of speed | ||
Ms = 0.1 # maximum Mach number | ||
|
||
rho = 1.0 | ||
v1 = A * sin(x[1]) * cos(x[2]) * cos(x[3]) | ||
v2 = -A * cos(x[1]) * sin(x[2]) * cos(x[3]) | ||
v3 = 0.0 | ||
p = (A / Ms)^2 * rho / equations.gamma # scaling to get Ms | ||
p = p + 1.0/16.0 * A^2 * rho * (cos(2*x[1])*cos(2*x[3]) + 2*cos(2*x[2]) + 2*cos(2*x[1]) + cos(2*x[2])*cos(2*x[3])) | ||
|
||
return prim2cons(SVector(rho, v1, v2, v3, p), equations) | ||
end | ||
|
||
backend = CPU() | ||
|
||
initial_condition = initial_condition_taylor_green_vortex | ||
|
||
solver = DGSEM(polydeg=3, surface_flux=flux_lax_friedrichs, | ||
volume_integral=VolumeIntegralFluxDifferencing(flux_lax_friedrichs), backend=backend) | ||
|
||
coordinates_min = (-1.0, -1.0, -1.0) .* pi | ||
coordinates_max = ( 1.0, 1.0, 1.0) .* pi | ||
|
||
# Create P4estMesh with 8 x 8 x 8 elements (note `refinement_level=1`) | ||
trees_per_dimension = (4, 4, 4) | ||
mesh = P4estMesh(trees_per_dimension, polydeg=1, | ||
coordinates_min=coordinates_min, coordinates_max=coordinates_max, | ||
initial_refinement_level=1) | ||
|
||
semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver; backend=backend) | ||
|
||
|
||
############################################################################### | ||
# ODE solvers, callbacks etc. | ||
|
||
tspan = (0.0, 5.0) | ||
ode = semidiscretize(semi, tspan; offload=true, backend=backend) | ||
|
||
summary_callback = SummaryCallback() | ||
|
||
analysis_interval = 100 | ||
analysis_callback = AnalysisCallback(semi, interval=analysis_interval) | ||
|
||
alive_callback = AliveCallback(analysis_interval=analysis_interval) | ||
|
||
save_solution = SaveSolutionCallback(interval=100, | ||
save_initial_solution=true, | ||
save_final_solution=true, | ||
solution_variables=cons2prim) | ||
|
||
stepsize_callback = StepsizeCallback(cfl=0.9) | ||
|
||
callbacks = CallbackSet(summary_callback, | ||
analysis_callback, alive_callback, | ||
save_solution, | ||
stepsize_callback) | ||
|
||
|
||
############################################################################### | ||
# run the simulation | ||
|
||
sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), | ||
dt=1.0, # solve needs some value here but it will be overwritten by the stepsize_callback | ||
save_everystep=false, callback=callbacks); | ||
summary_callback() # print the timer summary |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Package extension for some GPGPU API calls missing in KernelAbstractions | ||
|
||
module TrixiAMDGPUExt | ||
|
||
using Trixi | ||
if isdefined(Base, :get_extension) | ||
using AMDGPU: ROCArray | ||
using AMDGPU.ROCKernels: ROCBackend | ||
else | ||
# Until Julia v1.9 is the minimum required version for Trixi.jl, we still support Requires.jl | ||
using ..AMDGPU: ROCArray | ||
using ..AMDGPU.ROCKernels: ROCBackend | ||
end | ||
|
||
function Trixi.get_array_type(backend::ROCBackend) | ||
return ROCArray | ||
end | ||
|
||
end |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to GPUArraysCore.jl (see discussion on Julia Slack)