Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated code vs non-generated code: memory copy #39

Open
wangli1426 opened this issue Jun 2, 2014 · 1 comment
Open

Generated code vs non-generated code: memory copy #39

wangli1426 opened this issue Jun 2, 2014 · 1 comment

Comments

@wangli1426
Copy link
Contributor

In my opinion, given the known copy length, memcpy is slower than assignment for small-sized copy.

Test the performance improve of memory copy in LLVM.

@wangli1426 wangli1426 self-assigned this Jun 2, 2014
@wangli1426 wangli1426 mentioned this issue Jun 2, 2014
@wangli1426
Copy link
Contributor Author

On the afternoon of Aug.26th, I compare memcpy-based tuple copy and assignment-based tuple copy in terms of CPU cycles per tuple. In memcpy-based tuple copy, "memcpy" is used to copy a tuple from source address to the target address. In assignment-based tuple copy, the tuple copy is converted to one or more type assignment. For instance, the tuple copy for an 8-byte tuple is as following:

*((*int32_t)target))=*((*int32_t)src));
*((*int32_t)target+1))=*((*int32_t)src+1));

The data type used in the assignment-based copy is termed as assignment unit.

tuple size (bytes) memcpy (cycles/tuple) int32_t (cycles/tuple) int_64_t (cycles/tuple) struct (cycles/tuple)
4 17 6 N/A N/A
16 20 13 11 10.5
64 39 44 38 37

The reuslts demonstrate that the assignemnt-based tuple copy outperforms memcpy-based memory copy, especially when the the tuple is small. Considering that the tuple size is usually larger than 32 bytes, assignment-based tuple is not required currently, as it only provides marginal improvement in the common settings.

Also, I found a sse-based memcpy enhencement, and made a comparison with the original memcpy. The results are as follows.

copy-length (bytes) 4 16 64 128 256 512 1024
memcpy (cycles/tuple) 17 20 39 47 83 156 306
sse-memcpy (cycles/tuple) 10 10 33 40 66 128 251

Given the performance enhencement of sse-based memory copy, we can draw a conclusion that generating codes for tuple copy by LLVM is not likely to provide significant performance improvement, when the tuple size is not very small, e.g., larger than 32 bytes. But in the attribute copy, in which 4~8 bytes are usually copied, generating copy code can provide about 2X faster improvement over memcpy-based attribute copy.

@wangli1426 wangli1426 removed their assignment Nov 22, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant