Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cached _split_line_selector to avoid redundant parsing in select_lines #5237

Merged
merged 13 commits into from
Apr 18, 2024

Conversation

kenodegard
Copy link
Contributor

@kenodegard kenodegard commented Mar 14, 2024

Description

A continuation of #5233 with inspiration from #5225.

When rendering a recipe we end up invoking conda_build.metadata.select_lines a number of times.

Within select_lines we first iterate over every line and split the line into the line content versus the selector and second we evaluate the selector to determine whether the line should be kept.

The first process of iterating over every line and splitting it should (and can) be a one time operation. This is gained by introducing a new _split_line_selector function which caches the parsed result for a given text.

The second process of evaluating the selectors can also be cached but needs to be cached differently. Instead of caching this globally (since selectors can change between rendering passes) we cache the selectors locally, so a given selector will only be evaluated once for a single pass of the text in question (it's common for a given selector to be used multiple times within a single file, e.g., conda-forge's conda_buid_config.yaml).

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Mar 14, 2024
Copy link

codspeed-hq bot commented Mar 14, 2024

CodSpeed Performance Report

Merging #5237 will improve performances by ×9.8

Comparing kenodegard:select_lines-speedup (4035765) with main (28b51fb)

Summary

⚡ 2 improvements
✅ 1 untouched benchmarks

Benchmarks breakdown

Benchmark main kenodegard:select_lines-speedup Change
test_pin_subpackage_benchmark 82.7 s 70.5 s +17.43%
test_select_lines_battery 1,105.7 ms 112.6 ms ×9.8

@kenodegard kenodegard force-pushed the select_lines-speedup branch from e472295 to 60783ee Compare March 15, 2024 00:59
Reworks select_lines into a new cached helper function (_split_line_selector) that returns the parsed lines and selectors eliminating repeat parsing of the same file.
@kenodegard kenodegard force-pushed the select_lines-speedup branch from 60783ee to 3a1b5a7 Compare March 15, 2024 19:56
@kenodegard kenodegard force-pushed the select_lines-speedup branch from 3a1b5a7 to adef976 Compare March 15, 2024 20:01
conda_build/metadata.py Outdated Show resolved Hide resolved
@kenodegard kenodegard marked this pull request as ready for review March 18, 2024 14:50
@kenodegard kenodegard requested a review from a team as a code owner March 18, 2024 14:50
@kenodegard kenodegard changed the title [WIP] Add cached _split_line_selector to avoid redundant parsing in select_lines Add cached _split_line_selector to avoid redundant parsing in select_lines Mar 18, 2024
@kenodegard kenodegard mentioned this pull request Mar 18, 2024
2 tasks
@kenodegard kenodegard requested review from mbargull and beeankha March 18, 2024 19:02
@beckermr
Copy link
Contributor

Have you tested this with the recipe @mbargull was working on? I'm wondering if the hotspots measured here are the same as those when big rerendering happens.

@beckermr
Copy link
Contributor

Here is a timing test on rerendering

(cf-dev) beckermr@finnegan ctng-compilers-feedstock % time conda-smithy rerender 
INFO:conda_smithy.configure_feedstock:Downloading conda-forge-pinning-2024.03.19.15.37.47
INFO:conda_smithy.configure_feedstock:Extracting conda-forge-pinning to /Users/beckermr/.cache/conda-smithy
INFO:conda_smithy.configure_feedstock:__pycache__ rendering is skipped
INFO:conda_smithy.configure_feedstock:README rendering is skipped
WARNING: Setting build platform. This is only useful when pretending to be on another platform, such as for rendering necessary dependencies on a non-native platform. I trust that you know what you're doing.
WARNING: Setting build arch. This is only useful when pretending to be on another arch, such as for rendering necessary dependencies on a non-native arch. I trust that you know what you're doing.
WARNING: No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.23
Adding in variants from internal_defaults
Adding in variants from /Users/beckermr/.cache/conda-smithy/conda_build_config.yaml
Adding in variants from /Users/beckermr/Desktop/conda-forge/ctng-compilers-feedstock/recipe/conda_build_config.yaml
Adding in variants from argument_variants
INFO:conda_smithy.configure_feedstock:Re-rendered with conda-build 24.1.3.dev39, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47
INFO:conda_smithy.configure_feedstock:You can commit the changes with:

    git commit -m "MNT: Re-rendered with conda-build 24.1.3.dev39, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47"

INFO:conda_smithy.configure_feedstock:These changes need to be pushed to github!

conda-smithy rerender  475.86s user 79.66s system 88% cpu 10:27.78 total
(cf-dev) beckermr@finnegan ctng-compilers-feedstock % time conda-smithy rerender 
INFO:conda_smithy.configure_feedstock:__pycache__ rendering is skipped
INFO:conda_smithy.configure_feedstock:README rendering is skipped
WARNING: Setting build platform. This is only useful when pretending to be on another platform, such as for rendering necessary dependencies on a non-native platform. I trust that you know what you're doing.
WARNING: Setting build arch. This is only useful when pretending to be on another arch, such as for rendering necessary dependencies on a non-native arch. I trust that you know what you're doing.
WARNING: No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.23
Adding in variants from internal_defaults
Adding in variants from /Users/beckermr/.cache/conda-smithy/conda_build_config.yaml
Adding in variants from /Users/beckermr/Desktop/conda-forge/ctng-compilers-feedstock/recipe/conda_build_config.yaml
Adding in variants from argument_variants
INFO:conda_smithy.configure_feedstock:Re-rendered with conda-build 24.1.2, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47
INFO:conda_smithy.configure_feedstock:You can commit the changes with:

    git commit -m "MNT: Re-rendered with conda-build 24.1.2, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47"

INFO:conda_smithy.configure_feedstock:These changes need to be pushed to github!

conda-smithy rerender  642.32s user 95.69s system 89% cpu 13:42.06 total

So the code is about 35% faster with this change on one of our slowest rerendering tasks. That is a lot less than the 10x in the performance tests.

@mbargull
Copy link
Member

@kenodegard, thanks for taking a stab on this!
I haven't yet gotten to review this.
I could imagine this giving similar speed ups as my bolted-on approach for select_lines in gh-5225.
Having the line selection and per-variant-evaluation split (with the former in a global cache) could very well offset the ugly and costly "try to make variant hashable" dance I did in gh-5225.


@beckermr, thanks for trying this out already!
35% actually sounds good!
We have other parts that contribute much to the runtime overall.
For one, we have a big offset due to the unnecessary full-index creation (see gh-5154).
And then we do have the issue with conda-forge's large conda_build_config.yaml which is severely affected by the .config.Config.copy/copy.deepcopy issue mentioned in gh-5225.
Meaning, for select_lines alone, the timings look promising!


@kenodegard, I'll later try to strip the arrow-cpp example down to a test case we can use with your recent CodSpeed setup and update the tracking issue gh-5224 with the recent progress.

@kenodegard
Copy link
Contributor Author

@beckermr @mbargull the new benchmarks results are in: #5237 (comment)

beeankha
beeankha previously approved these changes Mar 22, 2024
beeankha
beeankha previously approved these changes Apr 15, 2024
conda_build/metadata.py Outdated Show resolved Hide resolved
@jezdez jezdez merged commit c7c69b1 into conda:main Apr 18, 2024
28 checks passed
@kenodegard kenodegard deleted the select_lines-speedup branch April 18, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants