Add cached `_split_line_selector` to avoid redundant parsing in `select_lines` #5237

kenodegard · 2024-03-14T23:08:58Z

Description

A continuation of #5233 with inspiration from #5225.

When rendering a recipe we end up invoking conda_build.metadata.select_lines a number of times.

Within select_lines we first iterate over every line and split the line into the line content versus the selector and second we evaluate the selector to determine whether the line should be kept.

The first process of iterating over every line and splitting it should (and can) be a one time operation. This is gained by introducing a new _split_line_selector function which caches the parsed result for a given text.

The second process of evaluating the selectors can also be cached but needs to be cached differently. Instead of caching this globally (since selectors can change between rendering passes) we cache the selectors locally, so a given selector will only be evaluated once for a single pass of the text in question (it's common for a given selector to be used multiple times within a single file, e.g., conda-forge's conda_buid_config.yaml).

Checklist - did you ...

Add a file to the news directory (using the template) for the next release's release notes?
Add / update necessary tests?
~~Add / update outdated documentation?~~

codspeed-hq · 2024-03-14T23:12:23Z

CodSpeed Performance Report

Merging #5237 will improve performances by ×9.8

_{Comparing kenodegard:select_lines-speedup (4035765) with main (28b51fb)}

Summary

⚡ 2 improvements
✅ 1 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`kenodegard:select_lines-speedup`	Change
⚡	`test_pin_subpackage_benchmark`	82.7 s	70.5 s	+17.43%
⚡	`test_select_lines_battery`	1,105.7 ms	112.6 ms	×9.8

Reworks select_lines into a new cached helper function (_split_line_selector) that returns the parsed lines and selectors eliminating repeat parsing of the same file.

conda_build/metadata.py

beckermr · 2024-03-19T00:56:58Z

Have you tested this with the recipe @mbargull was working on? I'm wondering if the hotspots measured here are the same as those when big rerendering happens.

beckermr · 2024-03-19T16:24:34Z

Here is a timing test on rerendering

(cf-dev) beckermr@finnegan ctng-compilers-feedstock % time conda-smithy rerender 
INFO:conda_smithy.configure_feedstock:Downloading conda-forge-pinning-2024.03.19.15.37.47
INFO:conda_smithy.configure_feedstock:Extracting conda-forge-pinning to /Users/beckermr/.cache/conda-smithy
INFO:conda_smithy.configure_feedstock:__pycache__ rendering is skipped
INFO:conda_smithy.configure_feedstock:README rendering is skipped
WARNING: Setting build platform. This is only useful when pretending to be on another platform, such as for rendering necessary dependencies on a non-native platform. I trust that you know what you're doing.
WARNING: Setting build arch. This is only useful when pretending to be on another arch, such as for rendering necessary dependencies on a non-native arch. I trust that you know what you're doing.
WARNING: No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.23
Adding in variants from internal_defaults
Adding in variants from /Users/beckermr/.cache/conda-smithy/conda_build_config.yaml
Adding in variants from /Users/beckermr/Desktop/conda-forge/ctng-compilers-feedstock/recipe/conda_build_config.yaml
Adding in variants from argument_variants
INFO:conda_smithy.configure_feedstock:Re-rendered with conda-build 24.1.3.dev39, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47
INFO:conda_smithy.configure_feedstock:You can commit the changes with:

    git commit -m "MNT: Re-rendered with conda-build 24.1.3.dev39, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47"

INFO:conda_smithy.configure_feedstock:These changes need to be pushed to github!

conda-smithy rerender  475.86s user 79.66s system 88% cpu 10:27.78 total
(cf-dev) beckermr@finnegan ctng-compilers-feedstock % time conda-smithy rerender 
INFO:conda_smithy.configure_feedstock:__pycache__ rendering is skipped
INFO:conda_smithy.configure_feedstock:README rendering is skipped
WARNING: Setting build platform. This is only useful when pretending to be on another platform, such as for rendering necessary dependencies on a non-native platform. I trust that you know what you're doing.
WARNING: Setting build arch. This is only useful when pretending to be on another arch, such as for rendering necessary dependencies on a non-native arch. I trust that you know what you're doing.
WARNING: No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.23
Adding in variants from internal_defaults
Adding in variants from /Users/beckermr/.cache/conda-smithy/conda_build_config.yaml
Adding in variants from /Users/beckermr/Desktop/conda-forge/ctng-compilers-feedstock/recipe/conda_build_config.yaml
Adding in variants from argument_variants
INFO:conda_smithy.configure_feedstock:Re-rendered with conda-build 24.1.2, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47
INFO:conda_smithy.configure_feedstock:You can commit the changes with:

    git commit -m "MNT: Re-rendered with conda-build 24.1.2, conda-smithy 3.32.0, and conda-forge-pinning 2024.03.19.15.37.47"

INFO:conda_smithy.configure_feedstock:These changes need to be pushed to github!

conda-smithy rerender  642.32s user 95.69s system 89% cpu 13:42.06 total

So the code is about 35% faster with this change on one of our slowest rerendering tasks. That is a lot less than the 10x in the performance tests.

mbargull · 2024-03-20T08:08:26Z

@kenodegard, thanks for taking a stab on this!
I haven't yet gotten to review this.
I could imagine this giving similar speed ups as my bolted-on approach for select_lines in gh-5225.
Having the line selection and per-variant-evaluation split (with the former in a global cache) could very well offset the ugly and costly "try to make variant hashable" dance I did in gh-5225.

@beckermr, thanks for trying this out already!
35% actually sounds good!
We have other parts that contribute much to the runtime overall.
For one, we have a big offset due to the unnecessary full-index creation (see gh-5154).
And then we do have the issue with conda-forge's large conda_build_config.yaml which is severely affected by the .config.Config.copy/copy.deepcopy issue mentioned in gh-5225.
Meaning, for select_lines alone, the timings look promising!

@kenodegard, I'll later try to strip the arrow-cpp example down to a test case we can use with your recent CodSpeed setup and update the tracking issue gh-5224 with the recent progress.

kenodegard · 2024-03-22T04:25:37Z

@beckermr @mbargull the new benchmarks results are in: #5237 (comment)

This reverts commit c89995d.

Signed-off-by: Marcel Bargull <[email protected]>

conda_build/metadata.py

conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Mar 14, 2024

kenodegard force-pushed the select_lines-speedup branch from e472295 to 60783ee Compare March 15, 2024 00:59

kenodegard added 5 commits March 15, 2024 14:54

Indent test_select_lines & mark as benchmark

9caa1c6

Add test_select_lines_battery

0522d17

Typing get_recipe_text

cd963f3

Remove duplicate extract_package_and_build_text call

c89995d

Refactor select_lines

b57721a

Reworks select_lines into a new cached helper function (_split_line_selector) that returns the parsed lines and selectors eliminating repeat parsing of the same file.

kenodegard force-pushed the select_lines-speedup branch from 60783ee to 3a1b5a7 Compare March 15, 2024 19:56

Comments

adef976

kenodegard force-pushed the select_lines-speedup branch from 3a1b5a7 to adef976 Compare March 15, 2024 20:01

kenodegard commented Mar 15, 2024

View reviewed changes

conda_build/metadata.py Outdated Show resolved Hide resolved

Add news

7c8972a

kenodegard marked this pull request as ready for review March 18, 2024 14:50

kenodegard requested a review from a team as a code owner March 18, 2024 14:50

kenodegard changed the title ~~[WIP] Add cached _split_line_selector to avoid redundant parsing in select_lines~~ Add cached _split_line_selector to avoid redundant parsing in select_lines Mar 18, 2024

beckermr reviewed Mar 18, 2024

View reviewed changes

conda_build/metadata.py Show resolved Hide resolved

kenodegard mentioned this pull request Mar 18, 2024

Remove sys.exit calls #5242

Closed

2 tasks

kenodegard requested review from mbargull and beeankha March 18, 2024 19:02

Merge branch 'main' into select_lines-speedup

84b4165

mbargull mentioned this pull request Mar 21, 2024

Recipe rendering very slow in certain cases #5224

Closed

2 tasks

kenodegard mentioned this pull request Mar 22, 2024

Speed up rendering by avoiding costly/redundant function calls #5248

Closed

1 task

beeankha previously approved these changes Mar 22, 2024

View reviewed changes

kenodegard mentioned this pull request Mar 25, 2024

Remove duplicate extract_package_and_build_text call #5253

Merged

3 tasks

Revert "Remove duplicate extract_package_and_build_text call"

345eb63

This reverts commit c89995d.

mbargull and others added 2 commits April 12, 2024 12:57

Search static strings before costlier regex search

81dd087

Signed-off-by: Marcel Bargull <[email protected]>

Combine if clauses

92595c4

kenodegard dismissed beeankha’s stale review via 92595c4 April 12, 2024 18:16

kenodegard mentioned this pull request Apr 12, 2024

Miscellaneous render speedups #5225

Closed

1 task

beeankha previously approved these changes Apr 15, 2024

View reviewed changes

kenodegard mentioned this pull request Apr 16, 2024

Replace using sys.exit with higher level error/logic handling, e.g. custom exception #4209

Open

Merge remote-tracking branch 'upstream/main' into select_lines-speedup

2d2d0d3

kenodegard dismissed beeankha’s stale review via 2d2d0d3 April 18, 2024 00:11

Remove unrelated changes

4035765

beckermr approved these changes Apr 18, 2024

View reviewed changes

jezdez approved these changes Apr 18, 2024

View reviewed changes

conda_build/metadata.py Outdated Show resolved Hide resolved

jezdez merged commit c7c69b1 into conda:main Apr 18, 2024
28 checks passed

kenodegard deleted the select_lines-speedup branch April 18, 2024 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cached `_split_line_selector` to avoid redundant parsing in `select_lines` #5237

Add cached `_split_line_selector` to avoid redundant parsing in `select_lines` #5237

kenodegard commented Mar 14, 2024 •

edited

Loading

codspeed-hq bot commented Mar 14, 2024 •

edited

Loading

beckermr commented Mar 19, 2024

beckermr commented Mar 19, 2024

mbargull commented Mar 20, 2024

kenodegard commented Mar 22, 2024

Add cached _split_line_selector to avoid redundant parsing in select_lines #5237

Add cached _split_line_selector to avoid redundant parsing in select_lines #5237

Conversation

kenodegard commented Mar 14, 2024 • edited Loading

Description

Checklist - did you ...

codspeed-hq bot commented Mar 14, 2024 • edited Loading

CodSpeed Performance Report

Merging #5237 will improve performances by ×9.8

Summary

Benchmarks breakdown

beckermr commented Mar 19, 2024

beckermr commented Mar 19, 2024

mbargull commented Mar 20, 2024

kenodegard commented Mar 22, 2024

Add cached `_split_line_selector` to avoid redundant parsing in `select_lines` #5237

Add cached `_split_line_selector` to avoid redundant parsing in `select_lines` #5237

kenodegard commented Mar 14, 2024 •

edited

Loading

codspeed-hq bot commented Mar 14, 2024 •

edited

Loading