Miscellaneous render speedups #5225

mbargull · 2024-03-11T13:55:19Z

Description

Addresses parts of gh-5224 .

The proposed changes do not fix the actual underlying issues with redundancy in the recipe re-processing happening while rendering.
They do treat symptoms of it, though.
This is done by avoiding a couple of obvious redundant function calls, substituting copy.deepcopy by pickle.dumps/loads, and foremost memoizing the lower level select_lines and regular expression evaluations.

It is intentionally narrowly scoped and does not alter the overall rendering process to avoid potential breaking changes (i.e., this PR is meant to be able to be "merged whenever" with no further dependencies/conflicts).

For the conda-forge/arrow-cpp-feedstock case from gh-5224 it cut the the overall processing time to about a quarter.

Checklist - did you ...

Add a file to the news directory (using the template) for the next release's release notes?
~~[ ] Add / update necessary tests?~~
~~[ ] Add / update outdated documentation?~~

Signed-off-by: Marcel Bargull <[email protected]>

mbargull · 2024-03-11T13:59:43Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

mbargull · 2024-03-11T14:05:47Z

conda_build/metadata.py

+_selector_placeholder_os_environ = object()
+
+
+@lru_cache(maxsize=200)


These maxsize=200 are more or less arbitrarily chosen.
The default is 128 IIRC, but I just wanted to specify some (reasonable) number such that the temptation to add maxsize=None later is lessened.
We may not want to use maxsize=None deliberately to avoid unbound resource usage downstream, e.,g. in case multiple recipes are rendered in the same process.

Memoizing compiled patterns might give a slight advantage here

As mentioned by @dholth below, the re module already caches compiled patterns.
In contrast to re's functions, the wrappers here not only cache for the pattern but also the matched string.
This is because we have redundant re.*(same_pattern, same_string) calls due to the repeated re-processing mentioned in #5224 (comment) .
As such, the memoizations here not try to sidestep the recompilation of patterns but the actual regex search/matches.

beckermr · 2024-03-11T17:36:28Z

Oh this is fabulous! Thank you!

conda_build/config.py

Signed-off-by: Marcel Bargull <[email protected]> Co-authored-by: Matthew R. Becker <[email protected]>

jezdez

Another case of a hyper optimization by adding a bunch of hacks, notably adding pickle into the system where there is no need before. As-is, this can't be merged.

jezdez · 2024-03-12T14:04:53Z

conda_build/config.py

+        # Use picke.loads(pickle.dumps(...) as a faster deepcopy alternative, if possible.
+        try:
+            new.variant = pickle.loads(pickle.dumps(self.variant, -1))
+        except:
+            new.variant = copy.deepcopy(self.variant)
        if hasattr(self, "variants"):
-            new.variants = copy.deepcopy(self.variants)
+            try:
+                new.variants = pickle.loads(pickle.dumps(self.variants, -1))
+            except:
+                new.variants = copy.deepcopy(self.variants)


Let's do this without adding pickle and a bare except

jezdez · 2024-03-12T14:06:00Z

conda_build/metadata.py

+_selector_placeholder_os_environ = object()
+
+
+@lru_cache(maxsize=200)


Memoizing compiled patterns might give a slight advantage here

jezdez · 2024-03-12T14:08:08Z

conda_build/metadata.py

+def re_findall(pattern, text, flags=0):
+    if isinstance(pattern, re.Pattern):
+        return pattern.findall(text, flags)
+    return re.findall(pattern, text, flags)
+
+
+@lru_cache(maxsize=200)
+def re_match(pattern, text, flags=0):
+    if isinstance(pattern, re.Pattern):
+        return pattern.match(text, flags)
+    return re.match(pattern, text, flags)
+
+
+@lru_cache(maxsize=200)
+def re_search(pattern, text, flags=0):
+    if isinstance(pattern, re.Pattern):
+        return pattern.search(text, flags)
+    return re.search(pattern, text, flags)


Not a fan of this repetition and the pattern of having to cache regex

Personally, I'm also triggered by unnecessary repetitions, but I'm fine with it in this case since it should be a set-and-forget thing.

That said, I would make sense to profile this (for majorly affected cases) again to see which of these memoization wrappers have individually significant impact.
Meaning, we possibly only need a subset of them to get roughly the same results.

jezdez · 2024-03-12T14:10:26Z

conda_build/metadata.py

+    # Try to turn namespace into a hashable representation for memoization.
+    try:
+        namespace_copy = namespace.copy()
+        if namespace_copy.get("os") is os:
+            namespace_copy["os"] = _selector_placeholder_os
+        if namespace_copy.get("environ") is os.environ:
+            namespace_copy["environ"] = _selector_placeholder_os_environ
+        if "pin_run_as_build" in namespace_copy:
+            # This raises TypeError if pin_run_as_build is not a dict of dicts.
+            try:
+                namespace_copy["pin_run_as_build"] = tuple(
+                    (key, tuple((k, v) for k, v in value.items()))
+                    for key, value in namespace_copy["pin_run_as_build"].items()
+                )
+            except (AttributeError, TypeError, ValueError):
+                # AttributeError: no .items method
+                # TypeError: .items() not iterable of iterables
+                # ValueError: .items() not iterable of only (k, v) tuples
+                raise TypeError
+        for k, v in namespace_copy.items():
+            # Convert list/sets/tuple to tuples (of tuples if it contains
+            # list/set elements). Copy any other type verbatim and rather fall
+            # back to the non-memoized version to avoid wrong/lossy conversions.
+            if isinstance(v, (list, set, tuple)):
+                namespace_copy[k] = tuple(
+                    tuple(e) if isinstance(e, (list, set)) else e for e in v
+                )
+        namespace_tuple = tuple(namespace_copy.items())
+        # Raise TypeError if anything in namespace_tuple is not hashable.
+        hash(namespace_tuple)
+    except TypeError:
+        return _select_lines(data, namespace, variants_in_place)
+    return _select_lines_memoized(data, namespace_tuple, variants_in_place)


This needs tests to verify that it's doing the right thing

+1 Makes sense to add/extend tests for select_lines which ensure repeated calls to select_lines and the non-memoized _select_lines always yield the same results.

Enabling CodSpeed benchmarks and adding a new test_select_lines_battery test in #5233 so we can better measure improvements

mbargull · 2024-03-12T14:41:22Z

Another case of a hyper optimization by adding a bunch of hacks, notably adding pickle into the system where there is no need before.

Please adjust you attitude here. I will not continue to work on this given such preface.

These optimizations are hacks around the actual problem, yes (see top comment).
The actual problem has not been addressed because of hard to reason about code flow with many places touching global or semi-global state; which is why this is scoped to functions that do not interact with these (semi-)global states.

notably adding pickle into the system where there is no need before.

There would be no need for this if .config.Config.copy would not be called an excessive amount of time (same code flow/state problem as above). Unless the number of calls is reduced, one needs to replace deepcopy with something else like pickle. I'm not fan of using pickle here (still imposes high resource usage) but also not a fan of having deepcopy here at all; if you have better or at least comparable solutions to this, please do apply them.

Analyzing the root causes and untangling the code flow paths is a much more elaborate work that I am not willing to do (as it enters the realm of questionable effort given that resources could be directed to the newer recipe format instead).

These changes have not been made out of fun, but because it poses an non-negligible strain on resources (compute and maintainer "wait" times) for conda-forge (that I have not been aware of before), see the case laid out in #5224 .

If you do want alternative implementations to the proposed ones, please profile this yourself or get someone to do so (I've used py-instrument for sampling and yappi for tracing -- advice: only use yappi with changes from this PR or on a small test case than conda-smithy rerender of conda-forge/arrow-cpp-feedstock); it's otherwise hard to reason about the impact on such changes.

beckermr · 2024-03-12T15:33:21Z

Hey @jezdez! Sorry I have been MIA in the conda world for a while.

It may be useful to detail the pain points on render times throughout conda-forge. This is important context behind this PR. @mbargull mentioned a few of the locations, but I can detail more.

As @mbargull mentioned, long rerenders are becoming more common as we expand our build matrices and for certain things like compilers. This stresses our automated rerendering service and causes usability issues.
The conda-forge autotick bot spends a significant amount of time rerendering feedstocks. The long rerenders rate limit and stress the bot's infrastructure.
Every conda-forge build of an actual package also renders the recipe. Summed over all of the packages we build, even small changes in performance will have a big effect on our CI usage.

So I strongly disagree with the characterization of this PR as hyper-optimization. I do understand that code like this brings in additional maintenance burdens. @mbargull has been a long-term contributor to conda-build and is active in the community. I think he'll be around to help on maintenance here, but he should speak for himself.

dholth · 2024-03-12T16:11:14Z

For reference, Python's built-in re cache appears to have 512 and 256 entries.

mbargull · 2024-03-12T16:15:41Z

@beckermr, thanks for giving more details!
You are far more involved with this day-to-day workings at conda-forge and knowledgeable about what this impacts.
The motivation for this PR is mostly an empathy thing realizing that you and the others have to deal with it...
To me personally, this is actually very low priority since I'm at most tangential (if at all) affected by this.
Which is also to say: IMO, the decision on when to include these or similar changes (24.5 already? this year? next year?) is somewhat dependent on how the affected parties push for it.

@mbargull has been a long-term contributor to conda-build and is active in the community. I think he'll be around to help on maintenance here, but he should speak for himself.

Yes, happy to do so -- but even happier to rip such workarounds out again whenever they cease to be relevant!

jezdez · 2024-03-12T16:33:12Z

Another case of a hyper optimization by adding a bunch of hacks, notably adding pickle into the system where there is no need before.

Please adjust you attitude here. I will not continue to work on this given such preface.

As context for everyone, while preparing conda 24.3.0 yesterday, I found conda/conda#13568 having been merged with little to no discussion which introduced a custom version parser into the recently added conda deprecations system, diverging from a standard parser. This is what I referred to above as "another case" related to inconvenient tradeoffs.

Generally speaking, I'm trying to encourage to follow engineering practices that solve the core problems rather than pile hacks onto hacks. I'll try harder to elaborate why I think some tradeoffs are not worth making in that context.

These optimizations are hacks around the actual problem, yes (see top comment). The actual problem has not been addressed because of hard to reason about code flow with many places touching global or semi-global state; which is why this is scoped to functions that do not interact with these (semi-)global states.

notably adding pickle into the system where there is no need before.

There would be no need for this if .config.Config.copy would not be called an excessive amount of time (same code flow/state problem as above). Unless the number of calls is reduced, one needs to replace deepcopy with something else like pickle. I'm not fan of using pickle here (still imposes high resource usage) but also not a fan of having deepcopy here at all; if you have better or at least comparable solutions to this, please do apply them.

To be clear, giving feedback of requiring changes to this PR does not imply that I have to provide a better solution instead. I agree that it is worth documenting the performance bottleneck that you're seeing (which you started in #5224, thank you!), so we can make the appropriate fixes, without having to compromise on code quality. You know that's how we gotten into this mess in the first place.

Analyzing the root causes and untangling the code flow paths is a much more elaborate work that I am not willing to do (as it enters the realm of questionable effort given that resources could be directed to the newer recipe format instead).

I totally agree! This is significant work and it's not helped by patching over it again.

These changes have not been made out of fun, but because it poses an non-negligible strain on resources (compute and maintainer "wait" times) for conda-forge (that I have not been aware of before), see the case laid out in #5224 .

I understand the context as I've read the ticket before making the code review, and take my responsibilities seriously, just like your contribution. I'm sorry if my review didn't live up to that.

If you do want alternative implementations to the proposed ones, please profile this yourself or get someone to do so (I've used py-instrument for sampling and yappi for tracing -- advice: only use yappi with changes from this PR or on a small test case than conda-smithy rerender of conda-forge/arrow-cpp-feedstock); it's otherwise hard to reason about the impact on such changes.

Thanks for the advise, I'll add this to #5224, if you don't mind since it would enable others to pick this up as well.

jezdez · 2024-03-12T16:50:29Z

Hey @jezdez! Sorry I have been MIA in the conda world for a while.

Hello! No apologies needed, as always :)

It may be useful to detail the pain points on render times throughout conda-forge. This is important context behind this PR. @mbargull mentioned a few of the locations, but I can detail more.

As @mbargull mentioned, long rerenders are becoming more common as we expand our build matrices and for certain things like compilers. This stresses our automated rerendering service and causes usability issues.

The conda-forge autotick bot spends a significant amount of time rerendering feedstocks. The long rerenders rate limit and stress the bot's infrastructure.

Every conda-forge build of an actual package also renders the recipe. Summed over all of the packages we build, even small changes in performance will have a big effect on our CI usage.

This is good context, which was missing in #5224, thank's for elaborating here!

So I strongly disagree with the characterization of this PR as hyper-optimization. I do understand that code like this brings in additional maintenance burdens. @mbargull has been a long-term contributor to conda-build and is active in the community. I think he'll be around to help on maintenance here, but he should speak for himself.

I have to disagree that it's been effective for conda and conda-build to move solving fundamental problems to a later time, it's made both increasingly hard to maintain, maybe in some areas even impossible as @mbargull alluded in relation to the flow of the application logic.

dholth · 2024-03-12T17:31:03Z

The deepcopy code is a slow way to copy Python objects, json loads/dumps or pickle are known optimizations. We used the technique in conda-index to save time. In an old stackoverflow post e.g. the pickle loop is 2.7x faster than copy.deepcopy.

On my machine against https://conda.anaconda.org/conda-forge/noarch repodata.json,

>>> timeit.timeit(lambda: copy.deepcopy(r), number=10)
15.68
>>> timeit.timeit(lambda: json.loads(json.dumps(r)), number=10)
7.88
>>> timeit.timeit(lambda: pickle.loads(pickle.dumps(r)), number=10)
8.26

beckermr · 2024-03-12T17:37:35Z

I have to disagree that it's been effective for conda and conda-build to move solving fundamental problems to a later time, it's made both increasingly hard to maintain, maybe in some areas even impossible as @mbargull alluded in relation to the flow of the application logic.

Yes this point is totally fair. However, I don't think this PR is meant to be used instead of solving those problems. We have the tools we have and there is some balance to be struck between only future looking work and helping people along now with usability issues. Taken to the extreme, which you are not doing OFC, one could claim fixing bugs in old code is not warranted since that effort should be used to build something better. We will have to strike a balance. You know of course know all of these things. :)

Finally, even if conda-build didn't do repeated parsing, we'd like still want some, if not all, of these optimizations. There is no reason to not use a compiled regex from a cache, as evidenced by the python source code references above. (edit: ditto for the deepcopy stuff too apparently!). There is also no reason to excessively copy data structures. Given how fraught changing those copies to references in other spots might be right now, the proposed changes here are a pretty reasonable compromise IMHO.

mbargull · 2024-03-13T09:41:03Z

@jezdez, thanks for elaborating more.
To explain my averse reaction to your initial review: The main review comment read more as a blanket statement which to me gave the overall issue (and how we have to tackle it) less recognition than warranted and bode a likely less productive and strenuous discussion (which I'm not eager to invest my time in).

To be clear from my side: None of these memoization approaches (in this PR and already elsewhere in the code base) are what I want us to have.
The problem we are facing is that we would want actual fixes, but coming up with those requires an unknown amount of work (that I can't provide at the moment and I don't know who else would do/sponsor such work).
Since conda-build is continuously used, we'd have to decide whether we are OK with leaving things as is until someone provides actual fixes or whether we want to provide some aid in the meantime that can be easily reverted once it's fixed.
(This is similar to what I tell people who voice it would be better to more or less only focus on rattler-build and not not invest in conda-build anymore; until we have a solution ready for direct deployment, we need pragmatic approaches.)

Do note that the changes herein are intentionally structured such that they can easily be reverted:

Different changes are put into separate commits so can be reverted (and also cherry-picked if we want to split this),
Even if single commits can't be automatically reverted, one only has to rename function names/calls (_select_lines --> select_lines; re_* --> re.*) or re-indent code (in Config.copy) and remove the then extraneous code.
The actual workings are unchanged without any hard to follow restructuring (which is what makes point 2 possible).

These workarounds, of course, do unfortunately add maintenance burden.
In my opinion, due to the above points and the narrowed scope of the changes, it is an acceptable burden to take.
That is, not in the general sense, but only in light of the strain on the (conda-forge's) ecosystem this is trying to alleviate.

That being said, what this PR needs is definitely more explanatory comments (if we add workarounds, we should mark them as such and explain, e.g., why the heck we'd add 50 lines only to be able to memoize select_lines) and ideally someone reproduce/confirm the impact of these changes (since we have to gauge impact vs added maintenance).

mbargull · 2024-03-13T09:44:51Z

The deepcopy code is a slow way to copy Python objects, json loads/dumps or pickle are known optimizations.

I opt'ed to not use json for this since, depending on your data, it is a destructive process.
E.g., not only would it not capture instances of custom classes but also loads(dumps({1:2})) == {"1": 2} != {1: 2}.

mbargull · 2024-03-13T10:00:28Z

Finally, even if conda-build didn't do repeated parsing, we'd like still want some, if not all, of these optimizations. There is no reason to not use a compiled regex from a cache, as evidenced by the python source code references above. (edit: ditto for the deepcopy stuff too apparently!).

Ha, I'd even argue against this. If we'd eliminate most of the redundant repetitions, these changes might have such a small impact that I'd be definitely more on the side of "maintenance burden not worth given the negligible impact" ;).

kenodegard · 2024-04-12T21:11:53Z

Thanks everyone for their input here. In an effort to get this discussion moving again (and to help us make data driven decisions with benchmarks) this PR is superseded by several other PRs.

Smaller PRs with isolated changes for easier review and clearer understanding of which are syntax improvements versus speed improvements:

And an alternative solution to the select_lines caching proposed here:

Add cached _split_line_selector to avoid redundant parsing in select_lines #5237

conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Mar 11, 2024

mbargull force-pushed the render-misc-speedups branch from 49c992b to 0fd4aba Compare March 11, 2024 13:58

mbargull added 8 commits March 11, 2024 14:59

Use pickle's loads/dumps as a faster deepcopy

7d56907

Signed-off-by: Marcel Bargull <[email protected]>

Properly re.escape variant variable in find_used_variables_*

a7d86c3

Signed-off-by: Marcel Bargull <[email protected]>

Search static strings before costlier regex search

0d3a0d4

Signed-off-by: Marcel Bargull <[email protected]>

Remove/avoid redundant function calls

bd2dc10

Signed-off-by: Marcel Bargull <[email protected]>

Memoize re.match/re.findall calls in .metadata

0b774d0

Signed-off-by: Marcel Bargull <[email protected]>

Memoize .metadata.select_lines/_filter_recipe_text

31fefdb

Signed-off-by: Marcel Bargull <[email protected]>

Remove/avoid redundant function calls

0d3a9b8

Signed-off-by: Marcel Bargull <[email protected]>

Add news/5225-render-misc-speedups

7355c72

Signed-off-by: Marcel Bargull <[email protected]>

mbargull force-pushed the render-misc-speedups branch from 0fd4aba to 7355c72 Compare March 11, 2024 13:59

[pre-commit.ci] auto fixes from pre-commit.com hooks

42fc34d

for more information, see https://pre-commit.ci

mbargull commented Mar 11, 2024

View reviewed changes

mbargull marked this pull request as ready for review March 11, 2024 17:29

mbargull requested a review from a team as a code owner March 11, 2024 17:29

mbargull mentioned this pull request Mar 11, 2024

Recipe rendering very slow in certain cases #5224

Closed

2 tasks

beckermr reviewed Mar 11, 2024

View reviewed changes

conda_build/config.py Outdated Show resolved Hide resolved

Fallback to deepcopy if pickle.loads/dumps fails

66cd6d8

Signed-off-by: Marcel Bargull <[email protected]> Co-authored-by: Matthew R. Becker <[email protected]>

jezdez requested changes Mar 12, 2024

View reviewed changes

kenodegard mentioned this pull request Mar 14, 2024

Add cached _split_line_selector to avoid redundant parsing in select_lines #5237

Merged

3 tasks

mbargull mentioned this pull request Mar 21, 2024

Speed up rendering by avoiding costly/redundant function calls #5248

Closed

1 task

kenodegard mentioned this pull request Mar 25, 2024

Remove duplicate extract_package_and_build_text call #5253

Merged

3 tasks

kenodegard closed this Apr 12, 2024

kenodegard mentioned this pull request Apr 18, 2024

Minor refactor of find_used_variables_* functions #5296

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscellaneous render speedups #5225

Miscellaneous render speedups #5225

mbargull commented Mar 11, 2024 •

edited

Loading

mbargull commented Mar 11, 2024

mbargull Mar 11, 2024

jezdez Mar 12, 2024

mbargull Mar 13, 2024

beckermr commented Mar 11, 2024

jezdez left a comment

jezdez Mar 12, 2024

jezdez Mar 12, 2024

jezdez Mar 12, 2024

mbargull Mar 13, 2024

jezdez Mar 12, 2024

mbargull Mar 13, 2024

kenodegard Mar 14, 2024

mbargull commented Mar 12, 2024

beckermr commented Mar 12, 2024

dholth commented Mar 12, 2024

mbargull commented Mar 12, 2024 •

edited

Loading

jezdez commented Mar 12, 2024

jezdez commented Mar 12, 2024

dholth commented Mar 12, 2024 •

edited

Loading

beckermr commented Mar 12, 2024 •

edited

Loading

mbargull commented Mar 13, 2024

mbargull commented Mar 13, 2024

mbargull commented Mar 13, 2024

kenodegard commented Apr 12, 2024

		_selector_placeholder_os_environ = object()


		@lru_cache(maxsize=200)

Miscellaneous render speedups #5225

Miscellaneous render speedups #5225

Conversation

mbargull commented Mar 11, 2024 • edited Loading

Description

Checklist - did you ...

mbargull commented Mar 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beckermr commented Mar 11, 2024

jezdez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbargull commented Mar 12, 2024

beckermr commented Mar 12, 2024

dholth commented Mar 12, 2024

mbargull commented Mar 12, 2024 • edited Loading

jezdez commented Mar 12, 2024

jezdez commented Mar 12, 2024

dholth commented Mar 12, 2024 • edited Loading

beckermr commented Mar 12, 2024 • edited Loading

mbargull commented Mar 13, 2024

mbargull commented Mar 13, 2024

mbargull commented Mar 13, 2024

kenodegard commented Apr 12, 2024

mbargull commented Mar 11, 2024 •

edited

Loading

mbargull commented Mar 12, 2024 •

edited

Loading

dholth commented Mar 12, 2024 •

edited

Loading

beckermr commented Mar 12, 2024 •

edited

Loading