-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up configuration resolution #2905
Conversation
…ageConfigs. Package.name and Package.getAllDependencies both need to reallocate their return value for every call, which results in heavy GC pressure during the configuration resolution algorithm.
Avoids reallocating the edges array every time the configuration resolution eliminates a configuration.
✅ PR OK, no changes in deprecations or warnings Total deprecations: 0 Total warnings: 0 Build statistics: statistics (-before, +after)
-executable size=5364248 bin/dub
-rough build time=63s
+executable size=5376664 bin/dub
+rough build time=64s Full build output
|
d0bc073
to
c40ca1e
Compare
… packages. Avoids costly AA lookups and string comparisons.
c40ca1e
to
c620806
Compare
c620806
to
449d412
Compare
Uses a set to determine redundant edge entries instead of iterating through the list.
Added two more optimizations to get down to 3.1 ms. |
Now down to 2.6 ms. |
}).array; | ||
// eliminate all edges that connect to config 'i' | ||
size_t eti = 0; | ||
foreach (ei, e; edges) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this for loop is small enough to justify variable names such as eti
, ei
, and e
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop contains 8 lines of code. What's your personal threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop contains 8 lines of code. What's your personal threshold?
For one letter variables? 1 line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, a handful of lines is okay, as long variable names are not only making the code easier to understand, but also harder to read - although non-standard variables like the ones here are more critical than the standard ones, if used correctly (i, j, k , x, y, z). However, the parameter i
to removeConfig
actually violates my own style guide and should be a full word instead, which would free up the i
for the foreach
loop.
But IMO, what would really be the right way to go here would be to generalize the array filter algorithm, so that this becomes edges = edges.filterInPlace!((e) => ...)
(analogous to std.algorithm.mutation.remove
). Unfortunately, Phobos doesn't have that AFAIK and I didn't want to mix non-local refactoring (or any substantial refactoring for that matter) into the same PR that does optimization.
I have a concern about code complexity. I found the original function hard to understand; it took me a couple of hours of summarising it in English before I understood what it did well enough to be able to optimise it. This PR makes even more complicated. Having said that, I tried this PR on the pathological case that motivated my own PR and the runtime for |
I agree with that. Part of it is based on the unfortunate design choice of making
Without being able to reproduce your pathological case or more precise information, I cannot help you there. You identified this particular piece of code to be responsible for a considerable slowdown, so I went head and optimized it. But apart from that, a 20% total improvement is not what I would associate with "death by a thousand cuts". |
Yes, and thanks!
Normally, no, but the majority of the runtime is still in |
But no matter how often Just to quickly get the numbers straight here:
What seems odd here is that, speaking of the pathological test case, you both seem to have a larger payload (assuming the single-thread performance of the machine is not much lower) and a considerably worse speedup at the same time. I would usually expect the opposite to be true and it would be interesting to find out why that happens. On the other hand, if I should bet, I would say that the reality is that the majority of the runtime is actually spent somewhere else. Taking the estimation from the last bullet point, this would mean that optimizing |
This speeds up the
getPackageConfigs
call issued bygetDefaultConfiguration
from 14.8 ms to 3.6 ms for the particular project that I'm benchmarking. There is still a lot more to gain, but this is a good start.