-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
769c9fd
commit 635a50f
Showing
4 changed files
with
174 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[deps] | ||
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" | ||
OhMyThreads = "67456a42-1dca-4109-a031-0a68de7e3ad5" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,70 @@ | ||
# # Trapezoidal Integration | ||
# | ||
|
||
# Function to be integrated (from 0 to 1). The analytic result is π. | ||
# In this example, we want to parallelize the computation of a simple numerical integral | ||
# via the trapezoidal rule. The latter is given by | ||
# | ||
# $\int_{a}^{b}f(x)\,dx \approx h \sum_{i=1}^{N}\frac{f(x_{i-1})+f(x_{i})}{2}.$ | ||
# | ||
# The function to be integrated is the following. | ||
|
||
f(x) = 4 * √(1 - x^2) | ||
|
||
# Integration routine | ||
# The analytic result of the definite integral (from 0 to 1) is known to be $\pi$. | ||
# | ||
# ## Sequential | ||
# | ||
# Naturally, we implement the trapezoidal rule as a straightforward, sequential `for` loop. | ||
|
||
"Evaluate definite integral (from `a` to `b`) by using the trapezoidal rule." | ||
function trapezoidal(a, b, n, h) | ||
function trapezoidal(a, b, n; h = (b - a) / n) | ||
y = (f(a) + f(b)) / 2.0 | ||
for i in 1:n-1 | ||
for i in 1:(n - 1) | ||
x = a + i * h | ||
y = y + f(x) | ||
end | ||
return y * h | ||
end | ||
|
||
# Let's compute the integral of `f` above and see if we get the expected result. | ||
# For simplicity, we choose `N`, the number of panels used to discretize the integration | ||
# interval, as a multiple of the number of available Julia threads. | ||
|
||
using Base.Threads: nthreads | ||
@show nthreads() | ||
N = nthreads() * 1_000_000 | ||
|
||
N = 10_000 | ||
# Calling `trapezoidal` we do indeed find the (approximate) value of $\pi$. | ||
|
||
function integrate_parallel(N) | ||
# compute local integration interval etc. | ||
h = 1.0 / N | ||
a_loc = [i * n_loc * h for i in 0:nthreads()-1] | ||
b_loc = a_loc + n_loc * h | ||
trapezoidal(0, 1, N) ≈ π | ||
|
||
result = tmapreduce(+, ...) do | ||
trapezoidal(a_loc, b_loc, n, h) | ||
end | ||
# ## Parallel | ||
# | ||
# Our strategy is the following: Divide the integration interval among the available | ||
# Julia threads. On each thread, use the sequential trapezoidal rule to compute the partial | ||
# integral. | ||
# It is straightforward to implement this strategy with `tmapreduce`. The `map` part | ||
# is, essentially, the application of `trapezoidal` and the reduction operator is chosen to | ||
# be `+` to sum up the local integrals. | ||
|
||
res_loc = trapezoidal(a_loc, b_loc, n_loc, h) | ||
using OhMyThreads | ||
|
||
println("π (numerical integration) ≈ ", res) | ||
function trapezoidal_parallel(a, b, N) | ||
n = N ÷ nthreads() | ||
h = (b - a) / N | ||
return tmapreduce(+, 1:nthreads()) do i | ||
local α = a + (i - 1) * n * h | ||
local β = α + n * h | ||
trapezoidal(α, β, n; h) | ||
end | ||
end | ||
|
||
# First, we check the correctness of our parallel implementation. | ||
trapezoidal_parallel(0, 1, N) ≈ π | ||
|
||
# perform local integration | ||
res_loc = trapezoidal(a_loc, b_loc, n_loc, h) | ||
# Then, we benchmark and compare the performance of the sequential and parallel versions. | ||
|
||
# parallel reduction | ||
res = MPI.Reduce(res_loc, +, comm) | ||
using BenchmarkTools | ||
@btime trapezoidal(0, 1, $N); | ||
@btime trapezoidal_parallel(0, 1, $N); | ||
|
||
# print result | ||
if 0 == rank | ||
@printf("π (numerical integration) ≈ %20.16f\\n", res) | ||
end | ||
# Because the problem is trivially parallel - all threads to the same thing and don't need | ||
# to communicate - we expect an ideal speedup of (close to) the number of available threads. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
```@meta | ||
EditURL = "integration.jl" | ||
``` | ||
|
||
# Trapezoidal Integration | ||
|
||
In this example, we want to parallelize the computation of a simple numerical integral | ||
via the trapezoidal rule. The latter is given by | ||
|
||
$\int_{a}^{b}f(x)\,dx \approx h \sum_{i=1}^{N}\frac{f(x_{i-1})+f(x_{i})}{2}.$ | ||
|
||
The function to be integrated is the following. | ||
|
||
````julia | ||
f(x) = 4 * √(1 - x^2) | ||
```` | ||
|
||
```` | ||
f (generic function with 1 method) | ||
```` | ||
|
||
The analytic result of the definite integral (from 0 to 1) is known to be $\pi$. | ||
|
||
## Sequential | ||
|
||
Naturally, we implement the trapezoidal rule as a straightforward, sequential `for` loop. | ||
|
||
````julia | ||
function trapezoidal(a, b, n; h = (b - a) / n) | ||
y = (f(a) + f(b)) / 2.0 | ||
for i in 1:(n - 1) | ||
x = a + i * h | ||
y = y + f(x) | ||
end | ||
return y * h | ||
end | ||
```` | ||
|
||
```` | ||
trapezoidal (generic function with 1 method) | ||
```` | ||
|
||
Let's compute the integral of `f` above and see if we get the expected result. | ||
For simplicity, we choose `N`, the number of panels used to discretize the integration | ||
interval, as a multiple of the number of available Julia threads. | ||
|
||
````julia | ||
using Base.Threads: nthreads | ||
@show nthreads() | ||
N = nthreads() * 1_000_000 | ||
```` | ||
|
||
```` | ||
5000000 | ||
```` | ||
|
||
Calling `trapezoidal` we do indeed find the (approximate) value of $\pi$. | ||
|
||
````julia | ||
trapezoidal(0, 1, N) ≈ π | ||
```` | ||
|
||
```` | ||
true | ||
```` | ||
|
||
## Parallel | ||
|
||
Our strategy is the following: Divide the integration interval among the available | ||
Julia threads. On each thread, use the sequential trapezoidal rule to compute the partial | ||
integral. | ||
It is straightforward to implement this strategy with `tmapreduce`. The `map` part | ||
is, essentially, the application of `trapezoidal` and the reduction operator is chosen to | ||
be `+` to sum up the local integrals. | ||
|
||
````julia | ||
using OhMyThreads | ||
|
||
function trapezoidal_parallel(a, b, N) | ||
n = N ÷ nthreads() | ||
h = (b - a) / N | ||
return tmapreduce(+, 1:nthreads()) do i | ||
local α = a + (i - 1) * n * h | ||
local β = α + n * h | ||
trapezoidal(α, β, n; h) | ||
end | ||
end | ||
```` | ||
|
||
```` | ||
trapezoidal_parallel (generic function with 1 method) | ||
```` | ||
|
||
First, we check the correctness of our parallel implementation. | ||
|
||
````julia | ||
trapezoidal_parallel(0, 1, N) ≈ π | ||
```` | ||
|
||
```` | ||
true | ||
```` | ||
|
||
Then, we benchmark and compare the performance of the sequential and parallel versions. | ||
|
||
````julia | ||
using BenchmarkTools | ||
@btime trapezoidal(0, 1, $N); | ||
@btime trapezoidal_parallel(0, 1, $N); | ||
```` | ||
|
||
```` | ||
13.871 ms (0 allocations: 0 bytes) | ||
2.781 ms (38 allocations: 3.19 KiB) | ||
```` | ||
|
||
Because the problem is trivially parallel - all threads to the same thing and don't need | ||
to communicate - we expect an ideal speedup of (close to) the number of available threads. | ||
|
||
--- | ||
|
||
*This page was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).* | ||
|