Remove old full-index-in-memory loading and `conda.plan`-related solver handling #5154

mbargull · 2024-01-26T20:02:26Z

In gh-5074 we removed the conda.models.dist.Dist usage from conda-build.

This included copying over the install_actions/execute_actions/display_actions functions from conda.plan
(which only continued to exist to serve these deprecated functions to conda-build).

These functions, which are as of gh-5074 now part of conda_build.environ, are reduced to have only the code actually used in conda-build.
But their behavior still remained the same, i.e., they still use the same old solver invocations that loads the whole repodata index into memory, not benefiting from the reduced resource usage via conda/conda#12050 .

To be able to work with the reduced/only partially loaded index, we need to deprecate/remove some functionality that relies on iterating over the full index.
gh-5152 marks parts related to that as deprecated.

The major part to look out for when we'll replace the current install_actions code is that the index has to stay in a consistent state throughout a recipe build.

Closes #4961

The text was updated successfully, but these errors were encountered:

mbargull · 2024-01-30T16:29:30Z

I'll look into this during the next couple of weeks.
In preparation of it, to get a sense on what memory/CPU consumption is currently happening, I stumbled upon the fact that peak memory usage (with libmamba solver) is actually dominated by a bug in conda>=4.12: conda/conda#13541 .
So, when doing comparisons of resources used before/after changes in conda-build, one should factor out the surplus coming from that bug (patching conda when testing -- or, if lucky, maybe we get a patch release in the meantime).

jaimergp · 2024-04-11T13:31:15Z

xref #4961. Copied here:

In #4431 I was investigating some differences in timings between CONDA_SOLVER=libmamba conda-build and conda mambabuild, which should be similar. However, conda-build spends a few minutes before the solver kicks in, while mambabuild does not.

My research led me to finding out that those extra minutes are spent creating the build index, which is the aggregation of all the source channels (e.g. defaults, conda-forge) and their platforms (noarch is collapsed into e.g. linux-64 🤷), plus the local channels (the CONDA_BLD_PATH cache and/or the chosen output folder).

The slow part is not the repodata fetching, but creating the million+ PackageRecord objects greedily, instead of letting conda do it lazily as needed (introduced in conda/conda#12050). This actually happens in conda, in two places, but conda-build is the sole consumer of those endpoints AFAIK.

In index.py, in the exports module, where an identity dict[PackageRecord, PackageRecord] is built by aggregating all the SubdirData.iter_records() instances.
In exports.py, where that map is processed again just to convert the keys into Dist objects.

It feels like we could do this better, as @dholth was saying #4431 (comment). The interface with the Solver could also be better; it currently overwrites Solver._index with this greedily built object, and conda-libmamba-solver won't even use that 😬

jaimergp · 2024-04-11T13:51:03Z

@zklaus, summarizing our conversation earlier today:

To note:

conda-libmamba-solver detects whether Solver._index or not has been messed with, and extracts the local channels from the package records there defined. This is how we estimate which ones need to be reloaded on each call.
conda solver=classic uses the whole index for the solve, I think. Not sure if they reduce it later or not.

Approaches:

The easy way out: detect context.solver == libmamba and only populate the local part of the _index (no need to build PackageRecords for the whole thing) so we can still get the local channels.
Same but a bit more elegant, add a class attribute to Solver to announce whether a full index is needed in memory or not.
Better even: change the solver API to have a standardized, companion Index API that handles the interfacing better (with and without conda-build) instead of relying on Solver._index manipulation. This API should mention which channels need to be reloaded after an artifact is built and how.

dholth · 2024-04-11T14:03:56Z

It would also be a great idea to make PackageRecord instantiate more quickly.

jaimergp · 2024-11-29T17:00:07Z

It would also be a great idea to make PackageRecord instantiate more quickly.

I'll create an issue for this if it doesn't exist already.

Besides that, I think this is ready to close after #5488 and conda/conda#13880, right?

jaimergp · 2024-12-05T22:59:24Z

Done: conda/conda#14426

conda-bot added this to 🧭 Planning Jan 26, 2024

github-project-automation bot moved this to 🆕 New in 🧭 Planning Jan 26, 2024

mbargull added the source::contributor created by a frequent contributor label Jan 26, 2024

mbargull mentioned this issue Mar 20, 2024

Add cached _split_line_selector to avoid redundant parsing in select_lines #5237

Merged

3 tasks

jaimergp mentioned this issue Apr 11, 2024

conda-build traverses the whole index greedily, undoing the lazy-loading optimizations #4961

Closed

zklaus mentioned this issue May 6, 2024

Add conda.core.index.Index as a faster drop-in replacement of the realized dictionary index. conda/conda#13880

Merged

3 tasks

This was referenced May 21, 2024

Cross compiling with virtual packages #5341

Closed

Remove deprecated code for conda-build 24.7 #5333

Merged

jaimergp linked a pull request Jul 31, 2024 that will close this issue

Use lazy Index in conda-build #5363

Closed

3 tasks

kenodegard mentioned this issue Sep 26, 2024

Lazy index #5488

Merged

3 tasks

jaimergp mentioned this issue Dec 5, 2024

More performant conda.models conda/conda#14426

Open

2 tasks

jaimergp closed this as completed Dec 5, 2024

github-project-automation bot moved this from 🆕 New to 🏁 Done in 🧭 Planning Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove old full-index-in-memory loading and `conda.plan`-related solver handling #5154

Remove old full-index-in-memory loading and `conda.plan`-related solver handling #5154

mbargull commented Jan 26, 2024 •

edited by jaimergp

Loading

mbargull commented Jan 30, 2024

jaimergp commented Apr 11, 2024 •

edited

Loading

jaimergp commented Apr 11, 2024 •

edited

Loading

dholth commented Apr 11, 2024

jaimergp commented Nov 29, 2024

jaimergp commented Dec 5, 2024

Remove old full-index-in-memory loading and conda.plan-related solver handling #5154

Remove old full-index-in-memory loading and conda.plan-related solver handling #5154

Comments

mbargull commented Jan 26, 2024 • edited by jaimergp Loading

mbargull commented Jan 30, 2024

jaimergp commented Apr 11, 2024 • edited Loading

jaimergp commented Apr 11, 2024 • edited Loading

dholth commented Apr 11, 2024

jaimergp commented Nov 29, 2024

jaimergp commented Dec 5, 2024

Remove old full-index-in-memory loading and `conda.plan`-related solver handling #5154

Remove old full-index-in-memory loading and `conda.plan`-related solver handling #5154

mbargull commented Jan 26, 2024 •

edited by jaimergp

Loading

jaimergp commented Apr 11, 2024 •

edited

Loading

jaimergp commented Apr 11, 2024 •

edited

Loading