Skip to content

Latest commit

 

History

History
287 lines (208 loc) · 7.92 KB

techdetails.adoc

File metadata and controls

287 lines (208 loc) · 7.92 KB

Benchmarks & Technical Data

Compatibility

Copper requires to be used with a C++ compiler that supports at least the C++17 standard. If supported, Copper also accesses some features of C++20. These are tested by __cpp_concepts >= 201907L and __cpp_lib_erase_if >= 202002L.

Copper and its unit tests were successfully compiled and run on the following systems:

  • Mac OS X 10.15.7, AppleClang 12.0.0.12000032

  • Ubuntu 20.04.2, GCC 9.3.0

  • Ubuntu 20.04.2, Clang 10.0.0

  • Arch Linux 5.11.16, GCC 10.2.0

  • Arch Linux 5.11.16, Clang 11.1.0

  • Microsoft Windows Server 2019 10.0.17763, MSVC 19.28.29914.0

  • Microsoft Windows Server 2019 10.0.17763, msys2 mingw64-GCC 10.2.0 [*]

  • Microsoft Windows Server 2019 10.0.17763, msys2 mingw64-Clang 11.0.0 [*]

[*]: Mingw builds can be unstable. In Github’s containers, both GCC and Clang failed at the linker stage but only for Debug builds. There are also machines on which the same config built and ran successfully, so it’s best to just try it out. Assistance on the respective issue is appreciated.

Type Requirements

A type non-void T needs to satisfy the following requirements to be sent via a copper::channel:

  • default constructible

  • move constructible

  • move assignable

  • destructible

Memory Space

On a typical 64 bit system, types of Copper can be expected to have the following sizes in memory:

type size in bytes

copper::buffered_channel<T> for T ≠ void

296 + buffer size * sizeof(T)

copper::buffered_channel<void>

224

copper::unbuffered_channel<T>

208

copper::channel_push_iterator<copper::buffered_channel<T>>

16

copper::channel_pop_iterator<copper::buffered_channel<T>>

24

copper::channel_read_view<copper::buffered_channel<T>>

8

copper::channel_write_view<copper::buffered_channel<T>>

8

In addition, all channels take up about 40 bytes additional storage for each pending operation that is executed on them.

Operation Cost

For a buffered channel chan that contains element of a non-void type T, the following calls occur for different operations:

operation calls to T() T(const T&) T(T&&) T::operator=(T&&)

chan.push(x) (lvalue ref)

0

1

1

0

chan.push(std::move(x))

0

0

2

0

chan.push_func([&x] {return std::move(x);})

0

0

2

0

chan.pop()

1-2*

0

1-2*

1

chan.pop_func([](T&&) {})

0-1*

0

1-2*

0

[*] If a pop operation is called while a buffered channel is full and even more push operations are already pending, the pop operation will move the next pending push to the internal buffer, causing one additional call to T() and T(T&&) each.

For an unbuffered channel chan that contains element of a non-void type T, the following calls occur for different operations:

operation calls to T() T(const T&) T(T&&) T::operator=(T&&)

chan.push(x) (lvalue ref)

0

1

0

0

chan.push(std::move(x))

0

0

1

0

chan.push_func([&x] {return std::move(x);})

0

0

1

0

chan.pop()

1

0

1

1

chan.pop_func([](T&&) {})

0

0

0

0

(In some specific cases, a buffered channel might be able to skip using its buffer and reach the same performance of unbuffered channels.)

Benchmark (SPSC)

(Disclaimer: The following measurements were done on a single 6-core CPU by a simple "measure multiple times and take the average" strategy. They can vary greatly depending on the CPU architecture, the surrounding memory model, and the thread scheduler.)

In this benchmark, data is being sent from one thread to another via three different methods:

  1. std::queue with a std::mutex and a std::condition_variable

  2. copper::buffered_channel

  3. copper::unbuffered_channel

Also, a program which mimics the behavior as closely as possible was written in Go using native channel objects.

Two types of messages are compared:

  1. int, as an fast-to-move message

  2. std::array<int, 10'000>, as a large, slow-to-move message type

The table below displays the average time to send and receive one message and the resulting data rate.

int std::array<int, 10'000>

std::queue

208.9 ns
19.15 MB/s

18395.9 ns
2174 MB/s

copper::buffered_channel

214.64 ns
18.6 MB/s

24282.8 ns
1647 MB/s

copper::unbuffered_channel

6002.8 ns
0.67 MB/s

17475.1 ns
2289 MB/s

Go chan

273.4 ns
14.6 MB/s

(not doable)

Benchmark (MPSC)

(Disclaimer: The following measurements were done on a single 6-core CPU by a simple "measure multiple times and take the average" strategy. They can vary greatly depending on the CPU architecture, the surrounding memory model, and the thread scheduler.)

In this benchmark, two types of data are sent from two separate producer threads to a single consumer thread. For that, four different methods are used:

  1. std::queue<std::variant> with a std::mutex and a std::condition_variable

  2. Two std::queue with std::mutex and polling

  3. copper::buffered_channel and copper::select

  4. copper::unbuffered_channel and copper::select

Also, a program which mimics the behavior as closely as possible was written in Go using native channel objects.

In the first case, the two types of data are int and float. In the second case, the types are std::array<int, 10'000> and std::array<float, 10'000>.

The table below displays the average time to send and receive one message and the resulting data rate. Note that the messages are not processed further, which means that any overhead from dealing with std::variant is not part of the measurement.

int, float std::array<int, 10'000>, std::array<float, 10'000>

std::queue<std::variant>

283.2 ns
14.12 MB/s

37326.2 ns
1072 MB/s

Two std::queue

267.1 ns
14.98 MB/s

25244.6 ns
1584 MB/s

copper::buffered_channel and copper::select

340.1 ns
11.76 MB/s

33942.6 ns
1178 MB/s

copper::unbuffered_channel and copper::select

8441.3 ns
0.47 MB/s

20570.5 ns
1945 MB/s

Go chan

562.9 ns
7.11 MB/s

(not doable)

Risk of Deadlocks & Slowdowns

The functions channel::push_func, channel::pop_func, and select (as well as their variants for different timeout behaviors) should be used with a certain awareness of their blocking behavior.

A functor is passed to these calls which is then executed to produce or consume a message element. The channel that is operated on is locked for the entire duration of the functor execution. This can, for example, cause unexpectedly long blocking if the functor is complex:

// `chan` will be blocked for an entire second.
chan.push_func([] {
    std::this_thread::sleep_for(1s);
    return 1;
});

Even worse, you can cause deadlocks in your code by introducing cyclic dependencies among channels.

void thread_1() {
    for (;;) {
        chan_1.pop_func([&chan2](int x) {
            chan_2.push(x);
        });
    }
}

void thread_2() {
    for (;;) {
        chan_2.pop_func([&chan1](int x) {
            chan_1.push(x);
        });
    }
}

In this example, chan_1 and chan_2 send messages back and forth forever. It might happen that both channels enter their pop_func concurrently. As both channels have to be blocked to do so, neither can then push their element to the other channel and both threads are locked.

The "best" solution to prevent yourself from running into that bug is to avoid operations to channels within a functor passed to one of the affected functions. If that’s not possible, changing the code to use the non-functor variants (push, pop, and vselect) instead will also prevent any deadlocks like this.

Note that all of this is not caused by bad design of Copper itself but is a necessity due to the guarantees of push_func, pop_func, and select. push_func in particular cannot be implemented correctly without forcing a lock on the channel for the entire call.