Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deliberately leak PTDS thread_local events in stream ordered mr (#1375)
An object with `thread_local` modifier has thread storage duration, its destructor (if it exists) will after the thread exits, which, on the main thread, is below `main` (https://eel.is/c++draft/basic.start.term). The CUDA runtime sets up (when the first call into the runtime is made) a teardown of the driver that runs `atexit`. Although [basic.start.term#5](https://eel.is/c++draft/basic.start.term#5) provides guarantees on the order in which these destructors are called (thread storage duration objects are destructed _before_ any `atexit` handlers run), it appears that gnu libstdc++ does not always implement this correctly (if not compiled with `_GLIBCXX_HAVE___CXA_THREAD_ATEXIT`). Moreover (possibly consequently) it is considered undefined behaviour to call into the CUDA runtime below `main`. Hence, we cannot call `cudaEventDestroy` to deallocate our `thread_local` events. Since there are a finite number of these event (`ndevices * nparticipating_threads`), rather than attempting to destroy them we choose to leak them, thus avoiding any sequencing problems. - Closes #1371 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Mark Harris (https://github.com/harrism) - Jake Hemstad (https://github.com/jrhemstad) URL: #1375
- Loading branch information