-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More optimizations #101
base: master
Are you sure you want to change the base?
More optimizations #101
Conversation
for (opIndex = 0; | ||
opIndex < ((Size < ND_MAX_INSTRUCTION_LENGTH) ? Size : ND_MAX_INSTRUCTION_LENGTH); | ||
opIndex++) | ||
if (Size < ND_MAX_INSTRUCTION_LENGTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just moves the condition from the for
outside. Modern compilers should be able to deal with it as it is (for example, MSVC unrolls the entire loop, so introducing this if
actually makes things worse).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very likely GCC doing something stupid. I'll test clang too and see what happens.
@@ -3813,22 +3813,27 @@ NdGetEffectiveAddrAndOpMode( | |||
static const ND_UINT8 szLut[3] = { ND_SIZE_16BIT, ND_SIZE_32BIT, ND_SIZE_64BIT }; | |||
ND_BOOL w64, f64, d64, has66; | |||
|
|||
if ((ND_CODE_64 != Instrux->DefCode) && !!(Instrux->Attributes & ND_FLAG_IWO64)) | |||
// Branchless form of (ND_CODE_64 != Instrux->DefCode) && !!(Instrux->Attributes & ND_FLAG_IWO64) | |||
if (((ND_CODE_64 ^ Instrux->DefCode) * (Instrux->Attributes & ND_FLAG_IWO64)) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the branch predictors in modern CPUs to be quite competent at predicting the branches here, since they operate on INSTRUX
state that will very often be the same, allowing for a very high prediction rate.
I understand that these changes may remove some conditional moves or branches, but is it really worth it? At some point, one has to decide whether the (small) increase in performance is worth the less readable code. I don't think this is the case here. In addition, depending on the architecture, the multiplication may be a new source of overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiplication pipelines easier than branches. I wasn't entirely happy with this commit either but it did make it faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler may even ditch the multiplication altogether, and rely on conditional instructions (such as CMOVcc
or SETcc
), but this is not the point. The point is that it shouldn't really matter, since modern branch prediction should easily handle these branches with minimal overhead. These statements are not a performance bottleneck.
On Windows (MSVC compiler), the performance with these changes is essentially the same. |
Does your CI pipeline upload the MSVC -built binaries somewhere? |
GCC was putting a conditional move in the copy loop for some reason. The second one is less certain and seems to be highly sensitive to changes.