-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscellaneous render speedups #5225
Conversation
49c992b
to
0fd4aba
Compare
Signed-off-by: Marcel Bargull <[email protected]>
Signed-off-by: Marcel Bargull <[email protected]>
Signed-off-by: Marcel Bargull <[email protected]>
Signed-off-by: Marcel Bargull <[email protected]>
Signed-off-by: Marcel Bargull <[email protected]>
Signed-off-by: Marcel Bargull <[email protected]>
Signed-off-by: Marcel Bargull <[email protected]>
Signed-off-by: Marcel Bargull <[email protected]>
0fd4aba
to
7355c72
Compare
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
_selector_placeholder_os_environ = object() | ||
|
||
|
||
@lru_cache(maxsize=200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These maxsize=200
are more or less arbitrarily chosen.
The default is 128
IIRC, but I just wanted to specify some (reasonable) number such that the temptation to add maxsize=None
later is lessened.
We may not want to use maxsize=None
deliberately to avoid unbound resource usage downstream, e.,g. in case multiple recipes are rendered in the same process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memoizing compiled patterns might give a slight advantage here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by @dholth below, the re
module already caches compiled patterns.
In contrast to re
's functions, the wrappers here not only cache for the pattern but also the matched string.
This is because we have redundant re.*(same_pattern, same_string)
calls due to the repeated re-processing mentioned in #5224 (comment) .
As such, the memoizations here not try to sidestep the recompilation of patterns but the actual regex search/matches.
Oh this is fabulous! Thank you! |
Signed-off-by: Marcel Bargull <[email protected]> Co-authored-by: Matthew R. Becker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case of a hyper optimization by adding a bunch of hacks, notably adding pickle into the system where there is no need before. As-is, this can't be merged.
# Use picke.loads(pickle.dumps(...) as a faster deepcopy alternative, if possible. | ||
try: | ||
new.variant = pickle.loads(pickle.dumps(self.variant, -1)) | ||
except: | ||
new.variant = copy.deepcopy(self.variant) | ||
if hasattr(self, "variants"): | ||
new.variants = copy.deepcopy(self.variants) | ||
try: | ||
new.variants = pickle.loads(pickle.dumps(self.variants, -1)) | ||
except: | ||
new.variants = copy.deepcopy(self.variants) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this without adding pickle and a bare except
_selector_placeholder_os_environ = object() | ||
|
||
|
||
@lru_cache(maxsize=200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memoizing compiled patterns might give a slight advantage here
def re_findall(pattern, text, flags=0): | ||
if isinstance(pattern, re.Pattern): | ||
return pattern.findall(text, flags) | ||
return re.findall(pattern, text, flags) | ||
|
||
|
||
@lru_cache(maxsize=200) | ||
def re_match(pattern, text, flags=0): | ||
if isinstance(pattern, re.Pattern): | ||
return pattern.match(text, flags) | ||
return re.match(pattern, text, flags) | ||
|
||
|
||
@lru_cache(maxsize=200) | ||
def re_search(pattern, text, flags=0): | ||
if isinstance(pattern, re.Pattern): | ||
return pattern.search(text, flags) | ||
return re.search(pattern, text, flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of this repetition and the pattern of having to cache regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm also triggered by unnecessary repetitions, but I'm fine with it in this case since it should be a set-and-forget thing.
That said, I would make sense to profile this (for majorly affected cases) again to see which of these memoization wrappers have individually significant impact.
Meaning, we possibly only need a subset of them to get roughly the same results.
# Try to turn namespace into a hashable representation for memoization. | ||
try: | ||
namespace_copy = namespace.copy() | ||
if namespace_copy.get("os") is os: | ||
namespace_copy["os"] = _selector_placeholder_os | ||
if namespace_copy.get("environ") is os.environ: | ||
namespace_copy["environ"] = _selector_placeholder_os_environ | ||
if "pin_run_as_build" in namespace_copy: | ||
# This raises TypeError if pin_run_as_build is not a dict of dicts. | ||
try: | ||
namespace_copy["pin_run_as_build"] = tuple( | ||
(key, tuple((k, v) for k, v in value.items())) | ||
for key, value in namespace_copy["pin_run_as_build"].items() | ||
) | ||
except (AttributeError, TypeError, ValueError): | ||
# AttributeError: no .items method | ||
# TypeError: .items() not iterable of iterables | ||
# ValueError: .items() not iterable of only (k, v) tuples | ||
raise TypeError | ||
for k, v in namespace_copy.items(): | ||
# Convert list/sets/tuple to tuples (of tuples if it contains | ||
# list/set elements). Copy any other type verbatim and rather fall | ||
# back to the non-memoized version to avoid wrong/lossy conversions. | ||
if isinstance(v, (list, set, tuple)): | ||
namespace_copy[k] = tuple( | ||
tuple(e) if isinstance(e, (list, set)) else e for e in v | ||
) | ||
namespace_tuple = tuple(namespace_copy.items()) | ||
# Raise TypeError if anything in namespace_tuple is not hashable. | ||
hash(namespace_tuple) | ||
except TypeError: | ||
return _select_lines(data, namespace, variants_in_place) | ||
return _select_lines_memoized(data, namespace_tuple, variants_in_place) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs tests to verify that it's doing the right thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Makes sense to add/extend tests for select_lines
which ensure repeated calls to select_lines
and the non-memoized _select_lines
always yield the same results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling CodSpeed benchmarks and adding a new test_select_lines_battery
test in #5233 so we can better measure improvements
Please adjust you attitude here. I will not continue to work on this given such preface. These optimizations are hacks around the actual problem, yes (see top comment).
There would be no need for this if Analyzing the root causes and untangling the code flow paths is a much more elaborate work that I am not willing to do (as it enters the realm of questionable effort given that resources could be directed to the newer recipe format instead). These changes have not been made out of fun, but because it poses an non-negligible strain on resources (compute and maintainer "wait" times) for conda-forge (that I have not been aware of before), see the case laid out in #5224 . If you do want alternative implementations to the proposed ones, please profile this yourself or get someone to do so (I've used |
Hey @jezdez! Sorry I have been MIA in the conda world for a while. It may be useful to detail the pain points on render times throughout conda-forge. This is important context behind this PR. @mbargull mentioned a few of the locations, but I can detail more.
So I strongly disagree with the characterization of this PR as hyper-optimization. I do understand that code like this brings in additional maintenance burdens. @mbargull has been a long-term contributor to conda-build and is active in the community. I think he'll be around to help on maintenance here, but he should speak for himself. |
For reference, Python's built-in re cache appears to have 512 and 256 entries. |
@beckermr, thanks for giving more details!
Yes, happy to do so -- but even happier to rip such workarounds out again whenever they cease to be relevant! |
As context for everyone, while preparing conda 24.3.0 yesterday, I found conda/conda#13568 having been merged with little to no discussion which introduced a custom version parser into the recently added conda deprecations system, diverging from a standard parser. This is what I referred to above as "another case" related to inconvenient tradeoffs. Generally speaking, I'm trying to encourage to follow engineering practices that solve the core problems rather than pile hacks onto hacks. I'll try harder to elaborate why I think some tradeoffs are not worth making in that context.
To be clear, giving feedback of requiring changes to this PR does not imply that I have to provide a better solution instead. I agree that it is worth documenting the performance bottleneck that you're seeing (which you started in #5224, thank you!), so we can make the appropriate fixes, without having to compromise on code quality. You know that's how we gotten into this mess in the first place.
I totally agree! This is significant work and it's not helped by patching over it again.
I understand the context as I've read the ticket before making the code review, and take my responsibilities seriously, just like your contribution. I'm sorry if my review didn't live up to that.
Thanks for the advise, I'll add this to #5224, if you don't mind since it would enable others to pick this up as well. |
Hello! No apologies needed, as always :)
This is good context, which was missing in #5224, thank's for elaborating here!
I have to disagree that it's been effective for conda and conda-build to move solving fundamental problems to a later time, it's made both increasingly hard to maintain, maybe in some areas even impossible as @mbargull alluded in relation to the flow of the application logic. |
The deepcopy code is a slow way to copy Python objects, json loads/dumps or pickle are known optimizations. We used the technique in conda-index to save time. In an old stackoverflow post e.g. the pickle loop is 2.7x faster than copy.deepcopy. On my machine against https://conda.anaconda.org/conda-forge/noarch repodata.json,
|
Yes this point is totally fair. However, I don't think this PR is meant to be used instead of solving those problems. We have the tools we have and there is some balance to be struck between only future looking work and helping people along now with usability issues. Taken to the extreme, which you are not doing OFC, one could claim fixing bugs in old code is not warranted since that effort should be used to build something better. We will have to strike a balance. You know of course know all of these things. :) Finally, even if conda-build didn't do repeated parsing, we'd like still want some, if not all, of these optimizations. There is no reason to not use a compiled regex from a cache, as evidenced by the python source code references above. (edit: ditto for the deepcopy stuff too apparently!). There is also no reason to excessively copy data structures. Given how fraught changing those copies to references in other spots might be right now, the proposed changes here are a pretty reasonable compromise IMHO. |
@jezdez, thanks for elaborating more. To be clear from my side: None of these memoization approaches (in this PR and already elsewhere in the code base) are what I want us to have. Do note that the changes herein are intentionally structured such that they can easily be reverted:
These workarounds, of course, do unfortunately add maintenance burden. That being said, what this PR needs is definitely more explanatory comments (if we add workarounds, we should mark them as such and explain, e.g., why the heck we'd add 50 lines only to be able to memoize |
I opt'ed to not use |
Ha, I'd even argue against this. If we'd eliminate most of the redundant repetitions, these changes might have such a small impact that I'd be definitely more on the side of "maintenance burden not worth given the negligible impact" ;). |
Thanks everyone for their input here. In an effort to get this discussion moving again (and to help us make data driven decisions with benchmarks) this PR is superseded by several other PRs. Smaller PRs with isolated changes for easier review and clearer understanding of which are syntax improvements versus speed improvements:
And an alternative solution to the |
Description
Addresses parts of gh-5224 .
The proposed changes do not fix the actual underlying issues with redundancy in the recipe re-processing happening while rendering.
They do treat symptoms of it, though.
This is done by avoiding a couple of obvious redundant function calls, substituting
copy.deepcopy
bypickle.dumps/loads
, and foremost memoizing the lower levelselect_lines
and regular expression evaluations.It is intentionally narrowly scoped and does not alter the overall rendering process to avoid potential breaking changes (i.e., this PR is meant to be able to be "merged whenever" with no further dependencies/conflicts).
For the
conda-forge/arrow-cpp-feedstock
case from gh-5224 it cut the the overall processing time to about a quarter.Checklist - did you ...
news
directory (using the template) for the next release's release notes?[ ] Add / update necessary tests?[ ] Add / update outdated documentation?