-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues during resolve() of large config file #330
Comments
Ugh, I knew there was inefficient copying but I didn't know it would be so bad. Possibly there's something here that's outright silly (copying the same thing a bunch of times, or some broken algorithm), beyond the fact that HashMap isn't designed for immutability. If it's just those hash copies to avoid mutation, I'd probably try changing the hashmaps to persistent hashmaps. The only real downside of that is pasting in or developing the persistent hashmap classes. There was a pretty good looking library for it I found once but I forgot what it was called ... That fix should be mechanical, swap out the hashmap API and done. We should clear the proposed persistent hashmap implementation with Typesafe to be sure the license is fine. I don't know when I can fix this so would value some help finding the right persistent hashmap lib and patching it in. |
https://github.com/blackdrag/pcollections may be worth a try if you want to try implementing ResolveMemos in terms of this and see what impact it has. For now I'd just add the libraryDependency in sbt and try it out to see if it helps. If it solves the problem, we'll have to find a way to get persistent maps without an external dependency. We could get clearance from Typesafe (via say @viktorklang or @jsuereth) to copy part of pcollections into our package namespace, or we could implement our own persistent map class, not really rocket science for what we need here. |
@havocp Not sure on the legal aspects of that suggestion, but using persistent structures for this purpose seems like a great fit. |
@MaxLillack if you were able to re-run your benchmark with ResolveMemos ported to pcollections that would tell us whether switching to a persistent map solves the problem or whether there's a larger algorithm issue. @viktorklang the legal question is whether Typesafe can live with a persistent map class that isn't copyright assigned. Otherwise someone has to write one from scratch or we would have to do it as an external dependency. |
@havocp I will give it a try and get back to you soon. |
@havocp: My change seems quite simple:
to
My test is now down to less than a second but tests are failing:
|
are you using the return value of "copy.plus" as the new map? Performance results are encouraging in any case. I did not expect that copying HashMap was so super slow that it would be 1 second vs. 3 minutes! What on earth does that copy involve... If you have a branch with your experiment on it, I can check it out. |
You are correct, I missed that. You can find my implementation here: MaxLillack@6593b06 |
This is intended to band-aid #330 with a hash table that sucks, but ought to be much faster to copy than the stock Java one. The stock Java hash table rehashes everything every time, while this hash table can often just copy an array. That said, I didn't benchmark it or anything. This is more or less a troll commit to get someone to do better.
This is intended to band-aid #330 with a hash table that sucks, but ought to be much faster to copy than the stock Java one. The stock Java hash table rehashes everything every time, while this hash table can often just copy an array. That said, I didn't benchmark it or anything, it may well also be super slow. This is more or less a troll commit to get someone to do better.
Thanks for trying that out - wow, I didn't expect HashMap to be that bad, but it looks like it rehashes the world when it copies itself, the copy code is not very smart. See branch https://github.com/typesafehub/config/tree/wip/bad-map for the lamest thing I could code in half an hour that might fix it, without any copyright or dependency concerns. But it's pretty gross. Would be better to do something better. Curious how my lame solution does on your benchmark though. I don't even know if it's faster than HashMap, I didn't write a benchmark yet. Someone should just write a real tree-based pmap probably. |
This is intended to band-aid #330 with a hash table that sucks, but ought to be much faster to copy than the stock Java one. The stock Java hash table rehashes everything every time, while this hash table can often just copy an array. That said, I didn't benchmark it or anything, it may well also be super slow. This is more or less a troll commit to get someone to do better.
About 3 seconds - This will work for me. |
@havocp Given the license for the pcollections code, I asked and it looks like you have a preliminary OK from Henninger to embed the pcollections persistent hash map. (if that's still something you'd like to do here) |
As much as I'm tempted to troll all self-respecting devs everywhere by using my code, embedding the pcollections map seems more appropriate so glad to hear we can do that. |
pcollections does add quite a bit of byte code I think, vs my bad hack map. Anyway should put one of them in and solve the problem, both work afaik. |
I've also hit this performance issue on a 100kB HOCON file with just about 100 of |
we could merge my hack branch, I’m not sure if it’s bit-rotted or still works. you might try it out / review the code (with nose held - yes it’s ridiculous but is it broken...) |
I've merged these changes into 1.3.3, but that hasn't helped a lot. Profiling showed me a more essential area of performance degradation, which is merging origins. When I simply made |
Interesting! That is surprising, but that's why they say to profile before optimizing I guess :-) I wonder if it's some quirky thing like the method being too long and breaking the JIT as a result ... breaking up the method if nothing else could show which part of it is slow... |
I think this method is called too often for my particular case. I know that |
@havocp I have encountered similar performance issues parsing large files with multiple substitutions. Profiling indicates the majority of time is spent in the Both solutions decrease our resolve time from 45 seconds to about a second. Would you be open to replacing the HashMap either of these maps? The pcollections map is probably more polished / well tested, but given the magnitude of the improvement with both maps and your understandable concern about binary size, I don't have a strong preference. |
Rereading some of this and the code,
I’d have no objection to reviving a fix here, either one of them really. |
Sounds good. I can take a closer look at BadMap and look into adding some tests and submitting a pull request for that. |
Appreciated. @ktoso @2m @viktorklang if you have any strong preferences let @sam-token know? I don't have a strong view other than "we should use one of these approaches and fix it" |
@havocp Not sure what my preference is. Assuming that there are tests for BadMap I'd probably prefer that. |
This is intended to band-aid lightbend#330 with a hash table that sucks, but ought to be much faster to copy than the stock Java one. The stock Java hash table rehashes everything every time, while this hash table can often just copy an array. That said, I didn't benchmark it or anything, it may well also be super slow. This is more or less a troll commit to get someone to do better.
I skimmed the BadMap, might not be too bad for this specific case. |
This issue should be reopened as it was not resolved. Profiling indicates that this issue occurs in I suggest lazily computing the description and trace information so that the description is only calculated when needed to produce useful debug/error information. In normal happypath usage the description would never be calculated. |
A change in substitution handling code (c8f42ef) released as part of 1.3.0 can lead to long running time for
Config.resolve()
. Performance issues with this commit are already indicated in the commit message and the comment in classResolveMemos
.The following code requires more than 3 minutes on a fast machine:
Where
file
is this config file which in turn includes this config.Profiling indicates a hot spot in
ResolveMemos.put()
where a copy of a HashMap is created.Although I am a big fan of the library, I won't be able to create a fix/PR because I don't know the implementation that well. I appreciate any hint for a workaround (changes in the config file).
The text was updated successfully, but these errors were encountered: