-
-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to a better memmove #2984
base: community
Are you sure you want to change the base?
Conversation
Had to check for correctness after realizing that building for Release disabled my test assertions in the benchmark/test. New implementation passed. |
Apparently the multi-load/multi-store intrinsics are new to GCC 14.2 :( |
Idle thought: there's not a ton of memmove use-sites, I wonder if we could guarantee a healthy (like quadword) alignment for most? |
We probably could, but better to just have an agnostic algorithm that's really fast anyways like this one and not accidentally get caught up when something in std:: uses it. |
f5bd078
to
19c9875
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but also we should remember to update these when gcc is updated, this generates worse assembly than the vld1q_u8_x4 intrinsic
Considering we'll likely get 14.2 this week, I'll probably just rip 'em out before the merge even happens. They'll collide and error otherwise |
Currently blocked by xpack-dev-tools/arm-none-eabi-gcc-xpack#38 |
This changes the (extremely naive) memmove implementation out for one that progressively copies in larger blocks until it reaches 8 quadwords.