You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On the afternoon of Aug.26th, I compare memcpy-based tuple copy and assignment-based tuple copy in terms of CPU cycles per tuple. In memcpy-based tuple copy, "memcpy" is used to copy a tuple from source address to the target address. In assignment-based tuple copy, the tuple copy is converted to one or more type assignment. For instance, the tuple copy for an 8-byte tuple is as following:
The data type used in the assignment-based copy is termed as assignment unit.
tuple size (bytes)
memcpy (cycles/tuple)
int32_t (cycles/tuple)
int_64_t (cycles/tuple)
struct (cycles/tuple)
4
17
6
N/A
N/A
16
20
13
11
10.5
64
39
44
38
37
The reuslts demonstrate that the assignemnt-based tuple copy outperforms memcpy-based memory copy, especially when the the tuple is small. Considering that the tuple size is usually larger than 32 bytes, assignment-based tuple is not required currently, as it only provides marginal improvement in the common settings.
Also, I found a sse-based memcpy enhencement, and made a comparison with the original memcpy. The results are as follows.
copy-length (bytes)
4
16
64
128
256
512
1024
memcpy (cycles/tuple)
17
20
39
47
83
156
306
sse-memcpy (cycles/tuple)
10
10
33
40
66
128
251
Given the performance enhencement of sse-based memory copy, we can draw a conclusion that generating codes for tuple copy by LLVM is not likely to provide significant performance improvement, when the tuple size is not very small, e.g., larger than 32 bytes. But in the attribute copy, in which 4~8 bytes are usually copied, generating copy code can provide about 2X faster improvement over memcpy-based attribute copy.
In my opinion, given the known copy length, memcpy is slower than assignment for small-sized copy.
Test the performance improve of memory copy in LLVM.
The text was updated successfully, but these errors were encountered: