Why does TensorBoard's Trace Viewer show blank waiting times? #2219

nanzh-19 · 2024-11-19T03:25:16Z

I ran the program on an x86 machine using oneDNN as the backend library and on an ARM machine using the default library. The TensorBoard profiling data shows blank waiting times on the ARM machine, while the computations on the x86 machine are very continuous. I am currently optimizing performance on the ARM machine and would like to understand the reasons behind the blank times in TensorBoard’s Trace Viewer on the ARM machine.

The TensorBoard data on the ARM machine shows a waiting gap before the two FusedMatmul operations.

computations on the x86 machine are very continuous

I would like to know the reason for the waiting gap and how to resolve it. One hypothesis is that oneDNN uses asynchronous computation, while the default uses synchronous

vpirogov · 2024-11-19T03:31:07Z

This questions seems to be related to integration to Tensorflow, not oneDNN itself. So you'll probably get more insights by asking it on Tensorflow forum.

+@milpuz01 in case he has some insights.

nanzh-19 · 2024-11-19T03:35:16Z

Do you know if oneDNN uses synchronous or asynchronous computation?

Sqvid · 2024-11-19T10:16:49Z

Do you know if oneDNN uses synchronous or asynchronous computation?

I believe oneDNN defaults to synchronous computation but can use async mode when built against the SYCL backend and using a stream that has been initialised with the out_of_order flag.

@mgouicem could you confirm if this is accurate?

vpirogov · 2024-11-19T16:58:45Z

In context of Tensorflow, which seems to be the case here, computations below oneDNN API are synchronous. Though this does not prevent Tensorflow from using oneDNN asynchronously though. Hence I believe it's a Tensorflow question.

nanzh-19 · 2024-11-20T01:18:26Z

Does asynchronous computation with oneDNN have higher efficiency compared to synchronous computation? This is because the thread waiting time is shorter.

mgouicem · 2024-11-20T12:34:02Z

Hi all,

I believe oneDNN defaults to synchronous computation but can use async mode when built against the SYCL backend and using a stream that has been initialised with the out_of_order flag.
@mgouicem could you confirm if this is accurate?

Yes all backend other than SYCL/OCL are synchronous by default, including the threadpool backend used by tensorflow. Though Tensorflow typically executes multiple ops concurrently based on the inter-op parallelism setting (these settings).

Regarding the stalls you are seeing, it might well be caused by the threadpool implementation used in Tensorflow for ARM platform, or the default threading configuration used for each platform. So as As Vadim said, this is a Tensorflow question and would likely be better answered there. @milpuz01 @agramesh1 if you want to chime in.

Does asynchronous computation with oneDNN have higher efficiency compared to synchronous computation? This is because the thread waiting time is shorter.

For GPU devices, asynchronous behavior allows to not block device execution with host side kernel launch. For CPU devices, if each op does not use all the cores, it might allow to run independent computations on different set of cores concurrently: though it is a double edge sword on CPU, if all cores are used by each op, it creates ressource contention and can lower performance.

nanzh-19 added the question label Nov 19, 2024

vpirogov added integration Issues with integrating the library into applications platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does TensorBoard's Trace Viewer show blank waiting times? #2219

Why does TensorBoard's Trace Viewer show blank waiting times? #2219

nanzh-19 commented Nov 19, 2024

vpirogov commented Nov 19, 2024

nanzh-19 commented Nov 19, 2024

Sqvid commented Nov 19, 2024

vpirogov commented Nov 19, 2024

nanzh-19 commented Nov 20, 2024

mgouicem commented Nov 20, 2024

Why does TensorBoard's Trace Viewer show blank waiting times? #2219

Why does TensorBoard's Trace Viewer show blank waiting times? #2219

Comments

nanzh-19 commented Nov 19, 2024

vpirogov commented Nov 19, 2024

nanzh-19 commented Nov 19, 2024

Sqvid commented Nov 19, 2024

vpirogov commented Nov 19, 2024

nanzh-19 commented Nov 20, 2024

mgouicem commented Nov 20, 2024