diff --git a/previews/PR119/.documenter-siteinfo.json b/previews/PR119/.documenter-siteinfo.json index 44069f8..69b2254 100644 --- a/previews/PR119/.documenter-siteinfo.json +++ b/previews/PR119/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-26T18:11:48","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-26T18:21:19","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/previews/PR119/basics/index.html b/previews/PR119/basics/index.html index ec12d86..b1259c8 100644 --- a/previews/PR119/basics/index.html +++ b/previews/PR119/basics/index.html @@ -1,2 +1,2 @@ -Basics · OhMyThreads.jl

Basics

This section is still in preparation. For now, you might want to take a look at the translation guide and the examples.

+Basics · OhMyThreads.jl

Basics

This section is still in preparation. For now, you might want to take a look at the translation guide and the examples.

diff --git a/previews/PR119/index.html b/previews/PR119/index.html index bef2798..a045a94 100644 --- a/previews/PR119/index.html +++ b/previews/PR119/index.html @@ -32,4 +32,4 @@ @btime mc_parallel($N) # using all threads @btime mc_parallel_macro($N) # using all threads

With 5 threads, timings might be something like this:

417.282 ms (14 allocations: 912 bytes)
 83.578 ms (38 allocations: 3.08 KiB)
-83.573 ms (38 allocations: 3.08 KiB)

(Check out the full Parallel Monte Carlo example if you like.)

No Transducers

Unlike most JuliaFolds2 packages, OhMyThreads.jl is not built off of Transducers.jl, nor is it a building block for Transducers.jl. Rather, it is meant to be a simpler, more maintainable, and more accessible alternative to high-level packages like, e.g., ThreadsX.jl or Folds.jl.

Acknowledgements

The idea for this package came from Carsten Bauer and Mason Protter. Check out the list of contributors for more information.

+83.573 ms (38 allocations: 3.08 KiB)

(Check out the full Parallel Monte Carlo example if you like.)

No Transducers

Unlike most JuliaFolds2 packages, OhMyThreads.jl is not built off of Transducers.jl, nor is it a building block for Transducers.jl. Rather, it is meant to be a simpler, more maintainable, and more accessible alternative to high-level packages like, e.g., ThreadsX.jl or Folds.jl.

Acknowledgements

The idea for this package came from Carsten Bauer and Mason Protter. Check out the list of contributors for more information.

diff --git a/previews/PR119/literate/falsesharing/falsesharing/index.html b/previews/PR119/literate/falsesharing/falsesharing/index.html index 772d21f..c585976 100644 --- a/previews/PR119/literate/falsesharing/falsesharing/index.html +++ b/previews/PR119/literate/falsesharing/falsesharing/index.html @@ -47,4 +47,4 @@ @test sum(data) ≈ treduce(+, data; ntasks = nthreads()) @btime treduce($+, $data; ntasks = $nthreads());
  899.097 μs (68 allocations: 5.92 KiB)
-

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR119/literate/integration/integration/index.html b/previews/PR119/literate/integration/integration/index.html index 3ed2b60..a48a215 100644 --- a/previews/PR119/literate/integration/integration/index.html +++ b/previews/PR119/literate/integration/integration/index.html @@ -35,4 +35,4 @@ @btime trapezoidal(0, 1, $N); @btime trapezoidal_parallel(0, 1, $N);
  24.348 ms (0 allocations: 0 bytes)
   2.457 ms (69 allocations: 6.05 KiB)
-

Because the problem is trivially parallel - all threads to the same thing and don't need to communicate - we expect an ideal speedup of (close to) the number of available threads.

nthreads()
10

This page was generated using Literate.jl.

+

Because the problem is trivially parallel - all threads to the same thing and don't need to communicate - we expect an ideal speedup of (close to) the number of available threads.

nthreads()
10

This page was generated using Literate.jl.

diff --git a/previews/PR119/literate/juliaset/juliaset/index.html b/previews/PR119/literate/juliaset/juliaset/index.html index 0ceab00..067ac20 100644 --- a/previews/PR119/literate/juliaset/juliaset/index.html +++ b/previews/PR119/literate/juliaset/juliaset/index.html @@ -71,4 +71,4 @@

Note that while this turns out to be a bit faster, it comes at the expense of much more allocations.

To quantify the impact of load balancing we can opt out of dynamic scheduling and use the StaticScheduler instead. The latter doesn't provide any form of load balancing.

using OhMyThreads: StaticScheduler
 
 @btime compute_juliaset_parallel!($img; scheduler=:static) samples=10 evals=3;
  30.097 ms (73 allocations: 6.23 KiB)
-

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR119/literate/mc/mc/index.html b/previews/PR119/literate/mc/mc/index.html index 8e63793..c4676e3 100644 --- a/previews/PR119/literate/mc/mc/index.html +++ b/previews/PR119/literate/mc/mc/index.html @@ -67,4 +67,4 @@ @btime mc($(length(idcs))) samples=10 evals=3;
  41.750 ms (0 allocations: 0 bytes)
   30.148 ms (0 allocations: 0 bytes)
-

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/previews/PR119/literate/tls/tls/index.html b/previews/PR119/literate/tls/tls/index.html index 0796822..07fc9a4 100644 --- a/previews/PR119/literate/tls/tls/index.html +++ b/previews/PR119/literate/tls/tls/index.html @@ -194,4 +194,4 @@ sort(res) ≈ sort(res_bumper) @btime matmulsums_bumper($As, $Bs);
  7.814 ms (134 allocations: 27.92 KiB)
-

Note that the benchmark is lying here about the total memory allocation, because it doesn't show the allocation of the task-local bump allocators themselves (the reason is that SlabBuffer uses malloc directly).


This page was generated using Literate.jl.

+

Note that the benchmark is lying here about the total memory allocation, because it doesn't show the allocation of the task-local bump allocators themselves (the reason is that SlabBuffer uses malloc directly).


This page was generated using Literate.jl.

diff --git a/previews/PR119/refs/api/index.html b/previews/PR119/refs/api/index.html index 0e8162a..fb15a87 100644 --- a/previews/PR119/refs/api/index.html +++ b/previews/PR119/refs/api/index.html @@ -19,7 +19,7 @@ chunksize=10 end println("i=", i, " → ", threadid()) -endsource
OhMyThreads.@setMacro
@set name = value

This can be used inside a @tasks for ... end block to specify settings for the parallel execution of the loop.

Multiple settings are supported, either as separate @set statements or via @set begin ... end.

Settings

  • reducer (e.g. reducer=+): Indicates that a reduction should be performed with the provided binary function. See tmapreduce for more information.
  • collect (e.g. collect=true): Indicates that results should be collected (similar to map).

All other settings will be passed on to the underlying parallel functions (e.g. tmapreduce) as keyword arguments. Hence, you may provide whatever these functions accept as keyword arguments. Among others, this includes

  • scheduler (e.g. scheduler=:static): Can be either a Scheduler or a Symbol (e.g. :dynamic, :static, :serial, or :greedy).
  • init (e.g. init=0.0): Initial value to be used in a reduction (requires reducer=...).

Settings like ntasks, chunksize, and split etc. can be used to tune the scheduling policy (if the selected scheduler supports it).

source
OhMyThreads.@localMacro
@local name = value
+end
source
OhMyThreads.@setMacro
@set name = value

This can be used inside a @tasks for ... end block to specify settings for the parallel execution of the loop.

Multiple settings are supported, either as separate @set statements or via @set begin ... end.

Settings

  • reducer (e.g. reducer=+): Indicates that a reduction should be performed with the provided binary function. See tmapreduce for more information.
  • collect (e.g. collect=true): Indicates that results should be collected (similar to map).

All other settings will be passed on to the underlying parallel functions (e.g. tmapreduce) as keyword arguments. Hence, you may provide whatever these functions accept as keyword arguments. Among others, this includes

  • scheduler (e.g. scheduler=:static): Can be either a Scheduler or a Symbol (e.g. :dynamic, :static, :serial, or :greedy).
  • init (e.g. init=0.0): Initial value to be used in a reduction (requires reducer=...).

Settings like ntasks, chunksize, and split etc. can be used to tune the scheduling policy (if the selected scheduler supports it).

source
OhMyThreads.@localMacro
@local name = value
 
 @local name::T = value

Can be used inside a @tasks for ... end block to specify task-local values (TLV) via explicitly typed assignments. These values will be allocated once per task (rather than once per iteration) and can be re-used between different task-local iterations.

There can only be a single @local block in a @tasks for ... end block. To specify multiple TLVs, use @local begin ... end. Compared to regular assignments, there are some limitations though, e.g. TLVs can't reference each other.

Examples

using OhMyThreads: @tasks
 using OhMyThreads.Tools: taskid
@@ -42,7 +42,7 @@
 end

Task local variables created by @local are by default constrained to their inferred type, but if you need to, you can specify a different type during declaration:

@tasks for i in 1:10
     @local x::Vector{Float64} = some_hard_to_infer_setup_function()
     # ...
-end
source
OhMyThreads.@only_oneMacro
@only_one begin ... end

This can be used inside a @tasks for ... end block to mark a region of code to be executed by only one of the parallel tasks (all other tasks skip over this region).

Example

using OhMyThreads: @tasks
+end
source
OhMyThreads.@only_oneMacro
@only_one begin ... end

This can be used inside a @tasks for ... end block to mark a region of code to be executed by only one of the parallel tasks (all other tasks skip over this region).

Example

using OhMyThreads: @tasks
 
 @tasks for i in 1:10
     @set ntasks = 10
@@ -53,7 +53,7 @@
         sleep(1)
     end
     println(i, ": after")
-end
source
OhMyThreads.@one_by_oneMacro
@one_by_one begin ... end

This can be used inside a @tasks for ... end block to mark a region of code to be executed by one parallel task at a time (i.e. exclusive access). The order may be arbitrary and non-deterministic.

Example

using OhMyThreads: @tasks
+end
source
OhMyThreads.@one_by_oneMacro
@one_by_one begin ... end

This can be used inside a @tasks for ... end block to mark a region of code to be executed by one parallel task at a time (i.e. exclusive access). The order may be arbitrary and non-deterministic.

Example

using OhMyThreads: @tasks
 
 @tasks for i in 1:10
     @set ntasks = 10
@@ -64,21 +64,21 @@
         sleep(0.5)
     end
     println(i, ": after")
-end
source

Functions

OhMyThreads.tmapreduceFunction
tmapreduce(f, op, A::AbstractArray...;
+end
source

Functions

OhMyThreads.tmapreduceFunction
tmapreduce(f, op, A::AbstractArray...;
            [scheduler::Union{Scheduler, Symbol} = :dynamic],
            [outputtype::Type = Any],
            [init])

A multithreaded function like Base.mapreduce. Perform a reduction over A, applying a single-argument function f to each element, and then combining them with the two-argument function op.

Note that op must be an associative function, in the sense that op(a, op(b, c)) ≈ op(op(a, b), c). If op is not (approximately) associative, you will get undefined results.

Example:

using OhMyThreads: tmapreduce
 
-tmapreduce(√, +, [1, 2, 3, 4, 5])

is the parallelized version of sum(√, [1, 2, 3, 4, 5]) in the form

(√1 + √2) + (√3 + √4) + √5

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.
  • outputtype::Type (default Any): will work as the asserted output type of parallel calculations. We use StableTasks.jl to make setting this option unnecessary, but if you experience problems with type stability, you may be able to recover it with this keyword argument.
  • init: initial value of the reduction. Will be forwarded to mapreduce for the task-local sequential parts of the calculation.

In addition, tmapreduce accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

tmapreduce(√, +, [1, 2, 3, 4, 5]; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.treduceFunction
treduce(op, A::AbstractArray...;
+tmapreduce(√, +, [1, 2, 3, 4, 5])

is the parallelized version of sum(√, [1, 2, 3, 4, 5]) in the form

(√1 + √2) + (√3 + √4) + √5

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.
  • outputtype::Type (default Any): will work as the asserted output type of parallel calculations. We use StableTasks.jl to make setting this option unnecessary, but if you experience problems with type stability, you may be able to recover it with this keyword argument.
  • init: initial value of the reduction. Will be forwarded to mapreduce for the task-local sequential parts of the calculation.

In addition, tmapreduce accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

tmapreduce(√, +, [1, 2, 3, 4, 5]; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.treduceFunction
treduce(op, A::AbstractArray...;
         [scheduler::Union{Scheduler, Symbol} = :dynamic],
         [outputtype::Type = Any],
         [init])

A multithreaded function like Base.reduce. Perform a reduction over A using the two-argument function op.

Note that op must be an associative function, in the sense that op(a, op(b, c)) ≈ op(op(a, b), c). If op is not (approximately) associative, you will get undefined results.

Example:

using OhMyThreads: treduce
 
-treduce(+, [1, 2, 3, 4, 5])

is the parallelized version of sum([1, 2, 3, 4, 5]) in the form

(1 + 2) + (3 + 4) + 5

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.
  • outputtype::Type (default Any): will work as the asserted output type of parallel calculations. We use StableTasks.jl to make setting this option unnecessary, but if you experience problems with type stability, you may be able to recover it with this keyword argument.
  • init: initial value of the reduction. Will be forwarded to mapreduce for the task-local sequential parts of the calculation.

In addition, treduce accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

treduce(+, [1, 2, 3, 4, 5]; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tmapFunction
tmap(f, [OutputElementType], A::AbstractArray...;
+treduce(+, [1, 2, 3, 4, 5])

is the parallelized version of sum([1, 2, 3, 4, 5]) in the form

(1 + 2) + (3 + 4) + 5

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.
  • outputtype::Type (default Any): will work as the asserted output type of parallel calculations. We use StableTasks.jl to make setting this option unnecessary, but if you experience problems with type stability, you may be able to recover it with this keyword argument.
  • init: initial value of the reduction. Will be forwarded to mapreduce for the task-local sequential parts of the calculation.

In addition, treduce accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

treduce(+, [1, 2, 3, 4, 5]; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tmapFunction
tmap(f, [OutputElementType], A::AbstractArray...;
      [scheduler::Union{Scheduler, Symbol} = :dynamic])

A multithreaded function like Base.map. Create a new container similar to A and fills it in parallel such that the ith element is equal to f(A[i]).

The optional argument OutputElementType will select a specific element type for the returned container, and will generally incur fewer allocations than the version where OutputElementType is not specified.

Example:

using OhMyThreads: tmap
 
-tmap(sin, 1:10)

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.

In addition, tmap accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

tmap(sin, 1:10; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tmap!Function
tmap!(f, out, A::AbstractArray...;
-      [scheduler::Union{Scheduler, Symbol} = :dynamic])

A multithreaded function like Base.map!. In parallel on multiple tasks, this function assigns each element of out[i] = f(A[i]) for each index i of A and out.

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.

In addition, tmap! accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tforeachFunction
tforeach(f, A::AbstractArray...;
+tmap(sin, 1:10)

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.

In addition, tmap accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

tmap(sin, 1:10; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tmap!Function
tmap!(f, out, A::AbstractArray...;
+      [scheduler::Union{Scheduler, Symbol} = :dynamic])

A multithreaded function like Base.map!. In parallel on multiple tasks, this function assigns each element of out[i] = f(A[i]) for each index i of A and out.

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.

In addition, tmap! accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tforeachFunction
tforeach(f, A::AbstractArray...;
          [scheduler::Union{Scheduler, Symbol} = :dynamic]) :: Nothing

A multithreaded function like Base.foreach. Apply f to each element of A on multiple parallel tasks, and return nothing. I.e. it is the parallel equivalent of

for x in A
     f(x)
 end

Example:

using OhMyThreads: tforeach
@@ -87,15 +87,15 @@
     println(i^2)
 end

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.

In addition, tforeach accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

tforeach(1:10; chunksize=2, scheduler=:static) do i
     println(i^2)
-end

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tcollectFunction
tcollect([OutputElementType], gen::Union{AbstractArray, Generator{<:AbstractArray}};
+end

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.tcollectFunction
tcollect([OutputElementType], gen::Union{AbstractArray, Generator{<:AbstractArray}};
          [scheduler::Union{Scheduler, Symbol} = :dynamic])

A multithreaded function like Base.collect. Essentially just calls tmap on the generator function and inputs.

The optional argument OutputElementType will select a specific element type for the returned container, and will generally incur fewer allocations than the version where OutputElementType is not specified.

Example:

using OhMyThreads: tcollect
 
-tcollect(sin(i) for i in 1:10)

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.

In addition, tcollect accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

tcollect(sin(i) for i in 1:10; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.treducemapFunction
treducemap(op, f, A::AbstractArray...;
+tcollect(sin(i) for i in 1:10)

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.

In addition, tcollect accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

tcollect(sin(i) for i in 1:10; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source
OhMyThreads.treducemapFunction
treducemap(op, f, A::AbstractArray...;
            [scheduler::Union{Scheduler, Symbol} = :dynamic],
            [outputtype::Type = Any],
            [init])

Like tmapreduce except the order of the f and op arguments are switched. This is sometimes convenient with do-block notation. Perform a reduction over A, applying a single-argument function f to each element, and then combining them with the two-argument function op.

Note that op must be an associative function, in the sense that op(a, op(b, c)) ≈ op(op(a, b), c). If op is not (approximately) associative, you will get undefined results.

Example:

using OhMyThreads: treducemap
 
-treducemap(+, √, [1, 2, 3, 4, 5])

is the parallelized version of sum(√, [1, 2, 3, 4, 5]) in the form

(√1 + √2) + (√3 + √4) + √5

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.
  • outputtype::Type (default Any): will work as the asserted output type of parallel calculations. We use StableTasks.jl to make setting this option unnecessary, but if you experience problems with type stability, you may be able to recover it with this keyword argument.
  • init: initial value of the reduction. Will be forwarded to mapreduce for the task-local sequential parts of the calculation.

In addition, treducemap accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

treducemap(+, √, [1, 2, 3, 4, 5]; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source

Schedulers

OhMyThreads.Schedulers.SchedulerType

Supertype for all available schedulers:

source
OhMyThreads.Schedulers.DynamicSchedulerType
DynamicScheduler (aka :dynamic)

The default dynamic scheduler. Divides the given collection into chunks and then spawns a task per chunk to perform the requested operation in parallel. The tasks are assigned to threads by Julia's dynamic scheduler and are non-sticky, that is, they can migrate between threads.

Generally preferred since it is flexible, can provide load balancing, and is composable with other multithreaded code.

Keyword arguments:

  • nchunks::Integer or ntasks::Integer (default nthreads(threadpool)):
    • Determines the number of chunks (and thus also the number of parallel tasks).
    • Increasing nchunks can help with load balancing, but at the expense of creating more overhead. For nchunks <= nthreads() there are not enough chunks for any load balancing.
    • Setting nchunks < nthreads() is an effective way to use only a subset of the available threads.
  • chunksize::Integer (default not set)
    • Specifies the desired chunk size (instead of the number of chunks).
    • The options chunksize and nchunks/ntasks are mutually exclusive (only one may be a positive integer).
  • split::Union{Symbol, OhMyThreads.Split} (default OhMyThreads.Consecutive()):
    • Determines how the collection is divided into chunks (if chunking=true). By default, each chunk consists of contiguous elements and order is maintained.
    • See ChunkSplitters.jl for more details and available options. We also allow users to pass :consecutive in place of Consecutive(), and :roundrobin in place of RoundRobin()
    • Beware that for split=OhMyThreads.RoundRobin() the order of elements isn't maintained and a reducer function must not only be associative but also commutative!
  • chunking::Bool (default true):
    • Controls whether input elements are grouped into chunks (true) or not (false).
    • For chunking=false, the arguments nchunks/ntasks, chunksize, and split are ignored and input elements are regarded as "chunks" as is. Hence, there will be one parallel task spawned per input element. Note that, depending on the input, this might spawn many(!) tasks and can be costly!
  • threadpool::Symbol (default :default):
    • Possible options are :default and :interactive.
    • The high-priority pool :interactive should be used very carefully since tasks on this threadpool should not be allowed to run for a long time without yielding as it can interfere with heartbeat processes.
source
OhMyThreads.Schedulers.StaticSchedulerType
StaticScheduler (aka :static)

A static low-overhead scheduler. Divides the given collection into chunks and then spawns a task per chunk to perform the requested operation in parallel. The tasks are statically assigned to threads up front and are made sticky, that is, they are guaranteed to stay on the assigned threads (no task migration).

Can sometimes be more performant than DynamicScheduler when the workload is (close to) uniform and, because of the lower overhead, for small workloads. Isn't well composable with other multithreaded code though.

Keyword arguments:

  • nchunks::Integer or ntasks::Integer (default nthreads()):
    • Determines the number of chunks (and thus also the number of parallel tasks).
    • Setting nchunks < nthreads() is an effective way to use only a subset of the available threads.
    • For nchunks > nthreads() the chunks will be distributed to the available threads in a round-robin fashion.
  • chunksize::Integer (default not set)
    • Specifies the desired chunk size (instead of the number of chunks).
    • The options chunksize and nchunks/ntasks are mutually exclusive (only one may be non-zero).
  • chunking::Bool (default true):
    • Controls whether input elements are grouped into chunks (true) or not (false).
    • For chunking=false, the arguments nchunks/ntasks, chunksize, and split are ignored and input elements are regarded as "chunks" as is. Hence, there will be one parallel task spawned per input element. Note that, depending on the input, this might spawn many(!) tasks and can be costly!
  • split::Union{Symbol, OhMyThreads.Split} (default OhMyThreads.Consecutive()):
    • Determines how the collection is divided into chunks. By default, each chunk consists of contiguous elements and order is maintained.
    • See ChunkSplitters.jl for more details and available options. We also allow users to pass :consecutive in place of Consecutive(), and :roundrobin in place of RoundRobin()
    • Beware that for split=OhMyThreads.RoundRobin() the order of elements isn't maintained and a reducer function must not only be associative but also commutative!
source
OhMyThreads.Schedulers.GreedySchedulerType
GreedyScheduler (aka :greedy)

A greedy dynamic scheduler. The elements of the collection are first put into a Channel and then dynamic, non-sticky tasks are spawned to process the channel content in parallel.

Note that elements are processed in a non-deterministic order, and thus a potential reducing function must be commutative in addition to being associative, or you could get incorrect results!

Can be good choice for load-balancing slower, uneven computations, but does carry some additional overhead.

Keyword arguments:

  • ntasks::Int (default nthreads()):
    • Determines the number of parallel tasks to be spawned.
    • Setting ntasks < nthreads() is an effective way to use only a subset of the available threads.
  • chunking::Bool (default false):
    • Controls whether input elements are grouped into chunks (true) or not (false) before put into the channel. This can improve the performance especially if there are many iterations each of which are computationally cheap.
    • If nchunks or chunksize are explicitly specified, chunking will be automatically set to true.
  • nchunks::Integer (default 10 * nthreads()):
    • Determines the number of chunks (that will eventually be put into the channel).
    • Increasing nchunks can help with load balancing. For nchunks <= nthreads() there are not enough chunks for any load balancing.
  • chunksize::Integer (default not set)
    • Specifies the desired chunk size (instead of the number of chunks).
    • The options chunksize and nchunks are mutually exclusive (only one may be a positive integer).
  • split::Union{Symbol, OhMyThreads.Split} (default OhMyThreads.RoundRobin()):
    • Determines how the collection is divided into chunks (if chunking=true).
    • See ChunkSplitters.jl for more details and available options. We also allow users to pass :consecutive in place of Consecutive(), and :roundrobin in place of RoundRobin()
source
OhMyThreads.Schedulers.SerialSchedulerType
SerialScheduler (aka :serial)

A scheduler for turning off any multithreading and running the code in serial. It aims to make parallel functions like, e.g., tmapreduce(sin, +, 1:100) behave like their serial counterparts, e.g., mapreduce(sin, +, 1:100).

source

Re-exported

OhMyThreads.chunkssee ChunkSplitters.jl
OhMyThreads.index_chunkssee ChunkSplitters.jl

Public but not exported

OhMyThreads.@spawnsee StableTasks.jl
OhMyThreads.@spawnatsee StableTasks.jl
OhMyThreads.@fetchsee StableTasks.jl
OhMyThreads.@fetchfromsee StableTasks.jl
OhMyThreads.TaskLocalValuesee TaskLocalValues.jl
OhMyThreads.Splitsee ChunkSplitters.jl
OhMyThreads.Consecutivesee ChunkSplitters.jl
OhMyThreads.RoundRobinsee ChunkSplitters.jl
OhMyThreads.WithTaskLocalsType
struct WithTaskLocals{F, TLVs <: Tuple{Vararg{TaskLocalValue}}} <: Function

This callable function-like object is meant to represent a function which closes over some TaskLocalValues. This is, if you do

TLV{T} = TaskLocalValue{T}
+treducemap(+, √, [1, 2, 3, 4, 5])

is the parallelized version of sum(√, [1, 2, 3, 4, 5]) in the form

(√1 + √2) + (√3 + √4) + √5

Keyword arguments:

  • scheduler::Union{Scheduler, Symbol} (default :dynamic): determines how the computation is divided into parallel tasks and how these are scheduled. See Scheduler for more information on the available schedulers.
  • outputtype::Type (default Any): will work as the asserted output type of parallel calculations. We use StableTasks.jl to make setting this option unnecessary, but if you experience problems with type stability, you may be able to recover it with this keyword argument.
  • init: initial value of the reduction. Will be forwarded to mapreduce for the task-local sequential parts of the calculation.

In addition, treducemap accepts all keyword arguments that are supported by the selected scheduler. They will simply be passed on to the corresponding Scheduler constructor. Example:

treducemap(+, √, [1, 2, 3, 4, 5]; chunksize=2, scheduler=:static)

However, to avoid ambiguity, this is currently only supported for scheduler::Symbol (but not for scheduler::Scheduler).

source

Schedulers

OhMyThreads.Schedulers.SchedulerType

Supertype for all available schedulers:

source
OhMyThreads.Schedulers.DynamicSchedulerType
DynamicScheduler (aka :dynamic)

The default dynamic scheduler. Divides the given collection into chunks and then spawns a task per chunk to perform the requested operation in parallel. The tasks are assigned to threads by Julia's dynamic scheduler and are non-sticky, that is, they can migrate between threads.

Generally preferred since it is flexible, can provide load balancing, and is composable with other multithreaded code.

Keyword arguments:

  • nchunks::Integer or ntasks::Integer (default nthreads(threadpool)):
    • Determines the number of chunks (and thus also the number of parallel tasks).
    • Increasing nchunks can help with load balancing, but at the expense of creating more overhead. For nchunks <= nthreads() there are not enough chunks for any load balancing.
    • Setting nchunks < nthreads() is an effective way to use only a subset of the available threads.
  • chunksize::Integer (default not set)
    • Specifies the desired chunk size (instead of the number of chunks).
    • The options chunksize and nchunks/ntasks are mutually exclusive (only one may be a positive integer).
  • split::Union{Symbol, OhMyThreads.Split} (default OhMyThreads.Consecutive()):
    • Determines how the collection is divided into chunks (if chunking=true). By default, each chunk consists of contiguous elements and order is maintained.
    • See ChunkSplitters.jl for more details and available options. We also allow users to pass :consecutive in place of Consecutive(), and :roundrobin in place of RoundRobin()
    • Beware that for split=OhMyThreads.RoundRobin() the order of elements isn't maintained and a reducer function must not only be associative but also commutative!
  • chunking::Bool (default true):
    • Controls whether input elements are grouped into chunks (true) or not (false).
    • For chunking=false, the arguments nchunks/ntasks, chunksize, and split are ignored and input elements are regarded as "chunks" as is. Hence, there will be one parallel task spawned per input element. Note that, depending on the input, this might spawn many(!) tasks and can be costly!
  • threadpool::Symbol (default :default):
    • Possible options are :default and :interactive.
    • The high-priority pool :interactive should be used very carefully since tasks on this threadpool should not be allowed to run for a long time without yielding as it can interfere with heartbeat processes.
source
OhMyThreads.Schedulers.StaticSchedulerType
StaticScheduler (aka :static)

A static low-overhead scheduler. Divides the given collection into chunks and then spawns a task per chunk to perform the requested operation in parallel. The tasks are statically assigned to threads up front and are made sticky, that is, they are guaranteed to stay on the assigned threads (no task migration).

Can sometimes be more performant than DynamicScheduler when the workload is (close to) uniform and, because of the lower overhead, for small workloads. Isn't well composable with other multithreaded code though.

Keyword arguments:

  • nchunks::Integer or ntasks::Integer (default nthreads()):
    • Determines the number of chunks (and thus also the number of parallel tasks).
    • Setting nchunks < nthreads() is an effective way to use only a subset of the available threads.
    • For nchunks > nthreads() the chunks will be distributed to the available threads in a round-robin fashion.
  • chunksize::Integer (default not set)
    • Specifies the desired chunk size (instead of the number of chunks).
    • The options chunksize and nchunks/ntasks are mutually exclusive (only one may be non-zero).
  • chunking::Bool (default true):
    • Controls whether input elements are grouped into chunks (true) or not (false).
    • For chunking=false, the arguments nchunks/ntasks, chunksize, and split are ignored and input elements are regarded as "chunks" as is. Hence, there will be one parallel task spawned per input element. Note that, depending on the input, this might spawn many(!) tasks and can be costly!
  • split::Union{Symbol, OhMyThreads.Split} (default OhMyThreads.Consecutive()):
    • Determines how the collection is divided into chunks. By default, each chunk consists of contiguous elements and order is maintained.
    • See ChunkSplitters.jl for more details and available options. We also allow users to pass :consecutive in place of Consecutive(), and :roundrobin in place of RoundRobin()
    • Beware that for split=OhMyThreads.RoundRobin() the order of elements isn't maintained and a reducer function must not only be associative but also commutative!
source
OhMyThreads.Schedulers.GreedySchedulerType
GreedyScheduler (aka :greedy)

A greedy dynamic scheduler. The elements of the collection are first put into a Channel and then dynamic, non-sticky tasks are spawned to process the channel content in parallel.

Note that elements are processed in a non-deterministic order, and thus a potential reducing function must be commutative in addition to being associative, or you could get incorrect results!

Can be good choice for load-balancing slower, uneven computations, but does carry some additional overhead.

Keyword arguments:

  • ntasks::Int (default nthreads()):
    • Determines the number of parallel tasks to be spawned.
    • Setting ntasks < nthreads() is an effective way to use only a subset of the available threads.
  • chunking::Bool (default false):
    • Controls whether input elements are grouped into chunks (true) or not (false) before put into the channel. This can improve the performance especially if there are many iterations each of which are computationally cheap.
    • If nchunks or chunksize are explicitly specified, chunking will be automatically set to true.
  • nchunks::Integer (default 10 * nthreads()):
    • Determines the number of chunks (that will eventually be put into the channel).
    • Increasing nchunks can help with load balancing. For nchunks <= nthreads() there are not enough chunks for any load balancing.
  • chunksize::Integer (default not set)
    • Specifies the desired chunk size (instead of the number of chunks).
    • The options chunksize and nchunks are mutually exclusive (only one may be a positive integer).
  • split::Union{Symbol, OhMyThreads.Split} (default OhMyThreads.RoundRobin()):
    • Determines how the collection is divided into chunks (if chunking=true).
    • See ChunkSplitters.jl for more details and available options. We also allow users to pass :consecutive in place of Consecutive(), and :roundrobin in place of RoundRobin()
source
OhMyThreads.Schedulers.SerialSchedulerType
SerialScheduler (aka :serial)

A scheduler for turning off any multithreading and running the code in serial. It aims to make parallel functions like, e.g., tmapreduce(sin, +, 1:100) behave like their serial counterparts, e.g., mapreduce(sin, +, 1:100).

source

Re-exported

OhMyThreads.chunkssee ChunkSplitters.jl
OhMyThreads.index_chunkssee ChunkSplitters.jl

Public but not exported

OhMyThreads.@spawnsee StableTasks.jl
OhMyThreads.@spawnatsee StableTasks.jl
OhMyThreads.@fetchsee StableTasks.jl
OhMyThreads.@fetchfromsee StableTasks.jl
OhMyThreads.TaskLocalValuesee TaskLocalValues.jl
OhMyThreads.Splitsee ChunkSplitters.jl
OhMyThreads.Consecutivesee ChunkSplitters.jl
OhMyThreads.RoundRobinsee ChunkSplitters.jl
OhMyThreads.WithTaskLocalsType
struct WithTaskLocals{F, TLVs <: Tuple{Vararg{TaskLocalValue}}} <: Function

This callable function-like object is meant to represent a function which closes over some TaskLocalValues. This is, if you do

TLV{T} = TaskLocalValue{T}
 f = WithTaskLocals((TLV{Int}(() -> 1), TLV{Int}(() -> 2))) do (x, y)
     z -> (x + y)/z
 end

then that is equivalent to

g = let x = TLV{Int}(() -> 1), y = TLV{Int}(() -> 2)
@@ -104,7 +104,7 @@
     end
 end

however, the main difference is that you can call promise_task_local on a WithTaskLocals closure in order to turn it into something equivalent to

let x=x[], y=y[]
     z -> (x + y)/z
-end

which doesn't have the overhead of accessing the task_local_storage each time the closure is called. This of course will lose the safety advantages of TaskLocalValue, so you should never do f_local = promise_task_local(f) and then pass f_local to some unknown function, because if that unknown function calls f_local on a new task, you'll hit a race condition.

source
OhMyThreads.promise_task_localFunction
promise_task_local(f) = f
+end

which doesn't have the overhead of accessing the task_local_storage each time the closure is called. This of course will lose the safety advantages of TaskLocalValue, so you should never do f_local = promise_task_local(f) and then pass f_local to some unknown function, because if that unknown function calls f_local on a new task, you'll hit a race condition.

source
OhMyThreads.promise_task_localFunction
promise_task_local(f) = f
 promise_task_local(f::WithTaskLocals) = f.inner_func(map(x -> x[], f.tasklocals))

Take a WithTaskLocals closure, grab the TaskLocalValues, and passs them to the closure. That is, it turns a WithTaskLocals closure from the equivalent of

TLV{T} = TaskLocalValue{T}
 let x = TLV{Int}(() -> 1), y = TLV{Int}(() -> 2)
     z -> let x = x[], y=y[]
@@ -114,4 +114,4 @@
     let x = x[], y = y[]
         z -> (x + y)/z
     end
-end

which doesn't have the overhead of accessing the task_local_storage each time the closure is called. This of course will lose the safety advantages of TaskLocalValue, so you should never do f_local = promise_task_local(f) and then pass f_local to some unknown function, because if that unknown function calls f_local on a new task, you'll hit a race condition. ```

source
+end

which doesn't have the overhead of accessing the task_local_storage each time the closure is called. This of course will lose the safety advantages of TaskLocalValue, so you should never do f_local = promise_task_local(f) and then pass f_local to some unknown function, because if that unknown function calls f_local on a new task, you'll hit a race condition. ```

source
diff --git a/previews/PR119/refs/experimental/index.html b/previews/PR119/refs/experimental/index.html index e4bc930..254cc38 100644 --- a/previews/PR119/refs/experimental/index.html +++ b/previews/PR119/refs/experimental/index.html @@ -18,4 +18,4 @@ println(i, ": before") @barrier println(i, ": after") -endsource
+endsource
diff --git a/previews/PR119/refs/internal/index.html b/previews/PR119/refs/internal/index.html index a4c2cd8..57b4cf4 100644 --- a/previews/PR119/refs/internal/index.html +++ b/previews/PR119/refs/internal/index.html @@ -1,5 +1,5 @@ -Internal · OhMyThreads.jl

Internal

Warning

Everything on this page is internal and and might changed or dropped at any point!

References

OhMyThreads.Tools.SimpleBarrierType

SimpleBarrier(n::Integer)

Simple reusable barrier for n parallel tasks.

Given b = SimpleBarrier(n) and n parallel tasks, each task that calls wait(b) will block until the other n-1 tasks have called wait(b) as well.

Example

n = nthreads()
+Internal · OhMyThreads.jl

Internal

Warning

Everything on this page is internal and and might changed or dropped at any point!

References

OhMyThreads.Tools.SimpleBarrierType

SimpleBarrier(n::Integer)

Simple reusable barrier for n parallel tasks.

Given b = SimpleBarrier(n) and n parallel tasks, each task that calls wait(b) will block until the other n-1 tasks have called wait(b) as well.

Example

n = nthreads()
 barrier = SimpleBarrier(n)
 @sync for i in 1:n
     @spawn begin
@@ -9,7 +9,7 @@
         wait(barrier) # synchronize all tasks (reusable)
         println("C")
     end
-end
source
OhMyThreads.Tools.taskidMethod
taskid() :: UInt

Return a UInt identifier for the current running Task. This identifier will be unique so long as references to the task it came from still exist.

source
OhMyThreads.Tools.try_enter!Method
try_enter!(f, s::OnlyOneRegion)

When called from multiple parallel tasks (on a shared s::OnlyOneRegion) only a single task will execute f.

Example

using OhMyThreads: @tasks
+end
source
OhMyThreads.Tools.taskidMethod
taskid() :: UInt

Return a UInt identifier for the current running Task. This identifier will be unique so long as references to the task it came from still exist.

source
OhMyThreads.Tools.try_enter!Method
try_enter!(f, s::OnlyOneRegion)

When called from multiple parallel tasks (on a shared s::OnlyOneRegion) only a single task will execute f.

Example

using OhMyThreads: @tasks
 using OhMyThreads.Tools: OnlyOneRegion, try_enter!
 
 only_one = OnlyOneRegion()
@@ -23,4 +23,4 @@
         sleep(1)
     end
     println(i, ": after")
-end
source
+end
source
diff --git a/previews/PR119/translation/index.html b/previews/PR119/translation/index.html index 135d8ed..64bdd10 100644 --- a/previews/PR119/translation/index.html +++ b/previews/PR119/translation/index.html @@ -126,4 +126,4 @@ # or using OhMyThreads: tcollect -data = tcollect(calc(i) for i in 1:10) +data = tcollect(calc(i) for i in 1:10)