Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous render speedups #5225

Closed
wants to merge 10 commits into from
Closed

Conversation

mbargull
Copy link
Member

@mbargull mbargull commented Mar 11, 2024

Description

Addresses parts of gh-5224 .

The proposed changes do not fix the actual underlying issues with redundancy in the recipe re-processing happening while rendering.
They do treat symptoms of it, though.
This is done by avoiding a couple of obvious redundant function calls, substituting copy.deepcopy by pickle.dumps/loads, and foremost memoizing the lower level select_lines and regular expression evaluations.

It is intentionally narrowly scoped and does not alter the overall rendering process to avoid potential breaking changes (i.e., this PR is meant to be able to be "merged whenever" with no further dependencies/conflicts).

For the conda-forge/arrow-cpp-feedstock case from gh-5224 it cut the the overall processing time to about a quarter.

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • [ ] Add / update necessary tests?
  • [ ] Add / update outdated documentation?

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Mar 11, 2024
@mbargull mbargull force-pushed the render-misc-speedups branch from 49c992b to 0fd4aba Compare March 11, 2024 13:58
@mbargull mbargull force-pushed the render-misc-speedups branch from 0fd4aba to 7355c72 Compare March 11, 2024 13:59
@mbargull
Copy link
Member Author

pre-commit.ci autofix

_selector_placeholder_os_environ = object()


@lru_cache(maxsize=200)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These maxsize=200 are more or less arbitrarily chosen.
The default is 128 IIRC, but I just wanted to specify some (reasonable) number such that the temptation to add maxsize=None later is lessened.
We may not want to use maxsize=None deliberately to avoid unbound resource usage downstream, e.,g. in case multiple recipes are rendered in the same process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memoizing compiled patterns might give a slight advantage here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @dholth below, the re module already caches compiled patterns.
In contrast to re's functions, the wrappers here not only cache for the pattern but also the matched string.
This is because we have redundant re.*(same_pattern, same_string) calls due to the repeated re-processing mentioned in #5224 (comment) .
As such, the memoizations here not try to sidestep the recompilation of patterns but the actual regex search/matches.

@mbargull mbargull marked this pull request as ready for review March 11, 2024 17:29
@mbargull mbargull requested a review from a team as a code owner March 11, 2024 17:29
@beckermr
Copy link
Contributor

Oh this is fabulous! Thank you!

conda_build/config.py Outdated Show resolved Hide resolved
Signed-off-by: Marcel Bargull <[email protected]>
Co-authored-by: Matthew R. Becker <[email protected]>
Copy link
Member

@jezdez jezdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case of a hyper optimization by adding a bunch of hacks, notably adding pickle into the system where there is no need before. As-is, this can't be merged.

Comment on lines +826 to +835
# Use picke.loads(pickle.dumps(...) as a faster deepcopy alternative, if possible.
try:
new.variant = pickle.loads(pickle.dumps(self.variant, -1))
except:
new.variant = copy.deepcopy(self.variant)
if hasattr(self, "variants"):
new.variants = copy.deepcopy(self.variants)
try:
new.variants = pickle.loads(pickle.dumps(self.variants, -1))
except:
new.variants = copy.deepcopy(self.variants)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this without adding pickle and a bare except

_selector_placeholder_os_environ = object()


@lru_cache(maxsize=200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memoizing compiled patterns might give a slight advantage here

Comment on lines +118 to +135
def re_findall(pattern, text, flags=0):
if isinstance(pattern, re.Pattern):
return pattern.findall(text, flags)
return re.findall(pattern, text, flags)


@lru_cache(maxsize=200)
def re_match(pattern, text, flags=0):
if isinstance(pattern, re.Pattern):
return pattern.match(text, flags)
return re.match(pattern, text, flags)


@lru_cache(maxsize=200)
def re_search(pattern, text, flags=0):
if isinstance(pattern, re.Pattern):
return pattern.search(text, flags)
return re.search(pattern, text, flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this repetition and the pattern of having to cache regex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm also triggered by unnecessary repetitions, but I'm fine with it in this case since it should be a set-and-forget thing.

That said, I would make sense to profile this (for majorly affected cases) again to see which of these memoization wrappers have individually significant impact.
Meaning, we possibly only need a subset of them to get roughly the same results.

Comment on lines +296 to +328
# Try to turn namespace into a hashable representation for memoization.
try:
namespace_copy = namespace.copy()
if namespace_copy.get("os") is os:
namespace_copy["os"] = _selector_placeholder_os
if namespace_copy.get("environ") is os.environ:
namespace_copy["environ"] = _selector_placeholder_os_environ
if "pin_run_as_build" in namespace_copy:
# This raises TypeError if pin_run_as_build is not a dict of dicts.
try:
namespace_copy["pin_run_as_build"] = tuple(
(key, tuple((k, v) for k, v in value.items()))
for key, value in namespace_copy["pin_run_as_build"].items()
)
except (AttributeError, TypeError, ValueError):
# AttributeError: no .items method
# TypeError: .items() not iterable of iterables
# ValueError: .items() not iterable of only (k, v) tuples
raise TypeError
for k, v in namespace_copy.items():
# Convert list/sets/tuple to tuples (of tuples if it contains
# list/set elements). Copy any other type verbatim and rather fall
# back to the non-memoized version to avoid wrong/lossy conversions.
if isinstance(v, (list, set, tuple)):
namespace_copy[k] = tuple(
tuple(e) if isinstance(e, (list, set)) else e for e in v
)
namespace_tuple = tuple(namespace_copy.items())
# Raise TypeError if anything in namespace_tuple is not hashable.
hash(namespace_tuple)
except TypeError:
return _select_lines(data, namespace, variants_in_place)
return _select_lines_memoized(data, namespace_tuple, variants_in_place)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs tests to verify that it's doing the right thing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Makes sense to add/extend tests for select_lines which ensure repeated calls to select_lines and the non-memoized _select_lines always yield the same results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling CodSpeed benchmarks and adding a new test_select_lines_battery test in #5233 so we can better measure improvements

@mbargull
Copy link
Member Author

Another case of a hyper optimization by adding a bunch of hacks, notably adding pickle into the system where there is no need before.

Please adjust you attitude here. I will not continue to work on this given such preface.

These optimizations are hacks around the actual problem, yes (see top comment).
The actual problem has not been addressed because of hard to reason about code flow with many places touching global or semi-global state; which is why this is scoped to functions that do not interact with these (semi-)global states.

notably adding pickle into the system where there is no need before.

There would be no need for this if .config.Config.copy would not be called an excessive amount of time (same code flow/state problem as above). Unless the number of calls is reduced, one needs to replace deepcopy with something else like pickle. I'm not fan of using pickle here (still imposes high resource usage) but also not a fan of having deepcopy here at all; if you have better or at least comparable solutions to this, please do apply them.

Analyzing the root causes and untangling the code flow paths is a much more elaborate work that I am not willing to do (as it enters the realm of questionable effort given that resources could be directed to the newer recipe format instead).


These changes have not been made out of fun, but because it poses an non-negligible strain on resources (compute and maintainer "wait" times) for conda-forge (that I have not been aware of before), see the case laid out in #5224 .

If you do want alternative implementations to the proposed ones, please profile this yourself or get someone to do so (I've used py-instrument for sampling and yappi for tracing -- advice: only use yappi with changes from this PR or on a small test case than conda-smithy rerender of conda-forge/arrow-cpp-feedstock); it's otherwise hard to reason about the impact on such changes.

@beckermr
Copy link
Contributor

Hey @jezdez! Sorry I have been MIA in the conda world for a while.

It may be useful to detail the pain points on render times throughout conda-forge. This is important context behind this PR. @mbargull mentioned a few of the locations, but I can detail more.

  • As @mbargull mentioned, long rerenders are becoming more common as we expand our build matrices and for certain things like compilers. This stresses our automated rerendering service and causes usability issues.
  • The conda-forge autotick bot spends a significant amount of time rerendering feedstocks. The long rerenders rate limit and stress the bot's infrastructure.
  • Every conda-forge build of an actual package also renders the recipe. Summed over all of the packages we build, even small changes in performance will have a big effect on our CI usage.

So I strongly disagree with the characterization of this PR as hyper-optimization. I do understand that code like this brings in additional maintenance burdens. @mbargull has been a long-term contributor to conda-build and is active in the community. I think he'll be around to help on maintenance here, but he should speak for himself.

@dholth
Copy link
Contributor

dholth commented Mar 12, 2024

For reference, Python's built-in re cache appears to have 512 and 256 entries.

@mbargull
Copy link
Member Author

mbargull commented Mar 12, 2024

@beckermr, thanks for giving more details!
You are far more involved with this day-to-day workings at conda-forge and knowledgeable about what this impacts.
The motivation for this PR is mostly an empathy thing realizing that you and the others have to deal with it...
To me personally, this is actually very low priority since I'm at most tangential (if at all) affected by this.
Which is also to say: IMO, the decision on when to include these or similar changes (24.5 already? this year? next year?) is somewhat dependent on how the affected parties push for it.

@mbargull has been a long-term contributor to conda-build and is active in the community. I think he'll be around to help on maintenance here, but he should speak for himself.

Yes, happy to do so -- but even happier to rip such workarounds out again whenever they cease to be relevant!

@jezdez
Copy link
Member

jezdez commented Mar 12, 2024

Another case of a hyper optimization by adding a bunch of hacks, notably adding pickle into the system where there is no need before.

Please adjust you attitude here. I will not continue to work on this given such preface.

As context for everyone, while preparing conda 24.3.0 yesterday, I found conda/conda#13568 having been merged with little to no discussion which introduced a custom version parser into the recently added conda deprecations system, diverging from a standard parser. This is what I referred to above as "another case" related to inconvenient tradeoffs.

Generally speaking, I'm trying to encourage to follow engineering practices that solve the core problems rather than pile hacks onto hacks. I'll try harder to elaborate why I think some tradeoffs are not worth making in that context.

These optimizations are hacks around the actual problem, yes (see top comment). The actual problem has not been addressed because of hard to reason about code flow with many places touching global or semi-global state; which is why this is scoped to functions that do not interact with these (semi-)global states.

notably adding pickle into the system where there is no need before.

There would be no need for this if .config.Config.copy would not be called an excessive amount of time (same code flow/state problem as above). Unless the number of calls is reduced, one needs to replace deepcopy with something else like pickle. I'm not fan of using pickle here (still imposes high resource usage) but also not a fan of having deepcopy here at all; if you have better or at least comparable solutions to this, please do apply them.

To be clear, giving feedback of requiring changes to this PR does not imply that I have to provide a better solution instead. I agree that it is worth documenting the performance bottleneck that you're seeing (which you started in #5224, thank you!), so we can make the appropriate fixes, without having to compromise on code quality. You know that's how we gotten into this mess in the first place.

Analyzing the root causes and untangling the code flow paths is a much more elaborate work that I am not willing to do (as it enters the realm of questionable effort given that resources could be directed to the newer recipe format instead).

I totally agree! This is significant work and it's not helped by patching over it again.

These changes have not been made out of fun, but because it poses an non-negligible strain on resources (compute and maintainer "wait" times) for conda-forge (that I have not been aware of before), see the case laid out in #5224 .

I understand the context as I've read the ticket before making the code review, and take my responsibilities seriously, just like your contribution. I'm sorry if my review didn't live up to that.

If you do want alternative implementations to the proposed ones, please profile this yourself or get someone to do so (I've used py-instrument for sampling and yappi for tracing -- advice: only use yappi with changes from this PR or on a small test case than conda-smithy rerender of conda-forge/arrow-cpp-feedstock); it's otherwise hard to reason about the impact on such changes.

Thanks for the advise, I'll add this to #5224, if you don't mind since it would enable others to pick this up as well.

@jezdez
Copy link
Member

jezdez commented Mar 12, 2024

Hey @jezdez! Sorry I have been MIA in the conda world for a while.

Hello! No apologies needed, as always :)

It may be useful to detail the pain points on render times throughout conda-forge. This is important context behind this PR. @mbargull mentioned a few of the locations, but I can detail more.

  • As @mbargull mentioned, long rerenders are becoming more common as we expand our build matrices and for certain things like compilers. This stresses our automated rerendering service and causes usability issues.
  • The conda-forge autotick bot spends a significant amount of time rerendering feedstocks. The long rerenders rate limit and stress the bot's infrastructure.
  • Every conda-forge build of an actual package also renders the recipe. Summed over all of the packages we build, even small changes in performance will have a big effect on our CI usage.

This is good context, which was missing in #5224, thank's for elaborating here!

So I strongly disagree with the characterization of this PR as hyper-optimization. I do understand that code like this brings in additional maintenance burdens. @mbargull has been a long-term contributor to conda-build and is active in the community. I think he'll be around to help on maintenance here, but he should speak for himself.

I have to disagree that it's been effective for conda and conda-build to move solving fundamental problems to a later time, it's made both increasingly hard to maintain, maybe in some areas even impossible as @mbargull alluded in relation to the flow of the application logic.

@dholth
Copy link
Contributor

dholth commented Mar 12, 2024

The deepcopy code is a slow way to copy Python objects, json loads/dumps or pickle are known optimizations. We used the technique in conda-index to save time. In an old stackoverflow post e.g. the pickle loop is 2.7x faster than copy.deepcopy.

On my machine against https://conda.anaconda.org/conda-forge/noarch repodata.json,

>>> timeit.timeit(lambda: copy.deepcopy(r), number=10)
15.68
>>> timeit.timeit(lambda: json.loads(json.dumps(r)), number=10)
7.88
>>> timeit.timeit(lambda: pickle.loads(pickle.dumps(r)), number=10)
8.26

@beckermr
Copy link
Contributor

beckermr commented Mar 12, 2024

I have to disagree that it's been effective for conda and conda-build to move solving fundamental problems to a later time, it's made both increasingly hard to maintain, maybe in some areas even impossible as @mbargull alluded in relation to the flow of the application logic.

Yes this point is totally fair. However, I don't think this PR is meant to be used instead of solving those problems. We have the tools we have and there is some balance to be struck between only future looking work and helping people along now with usability issues. Taken to the extreme, which you are not doing OFC, one could claim fixing bugs in old code is not warranted since that effort should be used to build something better. We will have to strike a balance. You know of course know all of these things. :)

Finally, even if conda-build didn't do repeated parsing, we'd like still want some, if not all, of these optimizations. There is no reason to not use a compiled regex from a cache, as evidenced by the python source code references above. (edit: ditto for the deepcopy stuff too apparently!). There is also no reason to excessively copy data structures. Given how fraught changing those copies to references in other spots might be right now, the proposed changes here are a pretty reasonable compromise IMHO.

@mbargull
Copy link
Member Author

@jezdez, thanks for elaborating more.
To explain my averse reaction to your initial review: The main review comment read more as a blanket statement which to me gave the overall issue (and how we have to tackle it) less recognition than warranted and bode a likely less productive and strenuous discussion (which I'm not eager to invest my time in).

To be clear from my side: None of these memoization approaches (in this PR and already elsewhere in the code base) are what I want us to have.
The problem we are facing is that we would want actual fixes, but coming up with those requires an unknown amount of work (that I can't provide at the moment and I don't know who else would do/sponsor such work).
Since conda-build is continuously used, we'd have to decide whether we are OK with leaving things as is until someone provides actual fixes or whether we want to provide some aid in the meantime that can be easily reverted once it's fixed.
(This is similar to what I tell people who voice it would be better to more or less only focus on rattler-build and not not invest in conda-build anymore; until we have a solution ready for direct deployment, we need pragmatic approaches.)

Do note that the changes herein are intentionally structured such that they can easily be reverted:

  1. Different changes are put into separate commits so can be reverted (and also cherry-picked if we want to split this),
  2. Even if single commits can't be automatically reverted, one only has to rename function names/calls (_select_lines --> select_lines; re_* --> re.*) or re-indent code (in Config.copy) and remove the then extraneous code.
  3. The actual workings are unchanged without any hard to follow restructuring (which is what makes point 2 possible).

These workarounds, of course, do unfortunately add maintenance burden.
In my opinion, due to the above points and the narrowed scope of the changes, it is an acceptable burden to take.
That is, not in the general sense, but only in light of the strain on the (conda-forge's) ecosystem this is trying to alleviate.


That being said, what this PR needs is definitely more explanatory comments (if we add workarounds, we should mark them as such and explain, e.g., why the heck we'd add 50 lines only to be able to memoize select_lines) and ideally someone reproduce/confirm the impact of these changes (since we have to gauge impact vs added maintenance).

@mbargull
Copy link
Member Author

The deepcopy code is a slow way to copy Python objects, json loads/dumps or pickle are known optimizations.

I opt'ed to not use json for this since, depending on your data, it is a destructive process.
E.g., not only would it not capture instances of custom classes but also loads(dumps({1:2})) == {"1": 2} != {1: 2}.

@mbargull
Copy link
Member Author

Finally, even if conda-build didn't do repeated parsing, we'd like still want some, if not all, of these optimizations. There is no reason to not use a compiled regex from a cache, as evidenced by the python source code references above. (edit: ditto for the deepcopy stuff too apparently!).

Ha, I'd even argue against this. If we'd eliminate most of the redundant repetitions, these changes might have such a small impact that I'd be definitely more on the side of "maintenance burden not worth given the negligible impact" ;).

@kenodegard
Copy link
Contributor

Thanks everyone for their input here. In an effort to get this discussion moving again (and to help us make data driven decisions with benchmarks) this PR is superseded by several other PRs.

Smaller PRs with isolated changes for easier review and clearer understanding of which are syntax improvements versus speed improvements:

And an alternative solution to the select_lines caching proposed here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants