You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In both ROSE and REX, the private/shared/firstprivate clauses in a target region are first converted into map clause. The lowering module only handles map clauses and ignores all the other data clauses.
Given the following code as an example:
for (intj=0; j<100; j++)
#pragma omp target teams distribute parallel for map(to: x[0:200]) map(from: y[0:200]) firstprivate(a, n)
for (inti=0; i<200; i++)
y[i] +=a*x[i];
REX will convert it to:
for (intj=0; j<100; j++)
#pragma omp target teams distribute parallel for map(to: x[0:200], a, n) map(from: y[0:200]) firstprivate(a, n)
for (inti=0; i<200; i++)
y[i] +=a*x[i];
The firstprivate clause will be used for kernel generation (e.g. private variable initilization) but not for data transferring. However, LLVM transforms the original code without such conversion.
For example code, in LLVM, a and n is not mapped but are directly passed by value. REX creates a mapping between the host and the device. As a result, LLVM performed 200 times of data transfers (100 for x, 100 for y), but REX performed 400 times (100 for x, y, a, and n). It won't cause incorrect computing results but may cause significant performance differences.
In NeoRodinia nn benchmark, in each iteration of a while loop, it launches an omp target region. Because of the issue descriable above, the REX version has 12000 times of data transfers, and the LLVM version only has 4000 times. The data transfer time is 24ms vs. 6.7ms on Carina. When we manually change the mapping type in the REX version from map to to firstprivate, the REX version also shows 4000 times of data transfers, which takes 6.7ms.
Therefore, we need to make significant changes to the way of handling data transfers in REX to address this issue.
The text was updated successfully, but these errors were encountered:
Instead of passing the address of a, we pass its values directly as void *. The mapping type is also different. Original 33 (= 1 + 32) means copying from host to device (1) and kernel argument (32). The new mapping type value 288 (= 256 + 32) means passing by value (256) and ernel argument (32).
The code works fine now. However, the compiler will warn that we are casting an integer to void * with a larger size. We can cast the variable as int64_t and then void * to eliminate the warning.
For the kernel, while using map(to), a was passed by the pointer as follows.
__global__voidoutlined__kernel__(int *a, int *_dev_x) { ... }
With the change, it was passed by value.
__global__voidoutlined__kernel__(int a, int *_dev_x) { ... }
This change was tested with gaussian in NeoRodinia on Carina. Before the change, REX spent 10.5 ms on HtoD data transfer and LLVM spent 2.8 ms. After the change, REX only spent 1.7 ms.
The computing results of the REX and LLVM versions are verified to be the same.
In both ROSE and REX, the
private
/shared
/firstprivate
clauses in a target region are first converted intomap
clause. The lowering module only handlesmap
clauses and ignores all the other data clauses.Given the following code as an example:
REX will convert it to:
The
firstprivate
clause will be used for kernel generation (e.g. private variable initilization) but not for data transferring. However, LLVM transforms the original code without such conversion.For example code, in LLVM,
a
andn
is not mapped but are directly passed by value. REX creates a mapping between the host and the device. As a result, LLVM performed 200 times of data transfers (100 forx
, 100 fory
), but REX performed 400 times (100 forx
,y
,a
, andn
). It won't cause incorrect computing results but may cause significant performance differences.In NeoRodinia
nn
benchmark, in each iteration of a while loop, it launches an omp target region. Because of the issue descriable above, the REX version has 12000 times of data transfers, and the LLVM version only has 4000 times. The data transfer time is 24ms vs. 6.7ms on Carina. When we manually change the mapping type in the REX version frommap to
tofirstprivate
, the REX version also shows 4000 times of data transfers, which takes 6.7ms.Therefore, we need to make significant changes to the way of handling data transfers in REX to address this issue.
The text was updated successfully, but these errors were encountered: