Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate cache after adding/removing packages #5140

Closed
wants to merge 3 commits into from

Conversation

kenodegard
Copy link
Contributor

Description

Alternative to #5138

When a package has been installed/removed the $PREFIX/conda-meta modified time is updated. We include this to invalidate the which_package cache whenever modifications occur to the environment.

Are there other environment changes we care about?

This change does not have any noticeable slowdown on which_package's benchmark test.

Resolves #5136

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@kenodegard kenodegard self-assigned this Jan 16, 2024
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Jan 16, 2024
@kenodegard
Copy link
Contributor Author

I have successfully built conda-forge/libsecp256k1-feedstock locally with this change:

conda build -m .ci_support/linux64.yaml recipes --error-overlinking

@mbargull
Copy link
Member

I don't believe we should go with a caching approach here in any case.
(Sorry, I wasn't around to review gh-5130 in time.)
Such caching only addresses the symptom rather than the cause of the problem here.

For the 3.28.x branch I suggest we revert to the previous behavior via gh-5141.

For future versions, it would make more sense to rather look at the places from which which_package is called.
The performance problematic stems from the fact that we do (in pseudo-code):

for any library of a small number of linked libraries:
  start a new iteration over all files in the prefix:
    yield packages containing the library

where the number of files in the prefixes would usually be much larger than the number of libraries searched for.
What should actually be done is:

for each linked library:
  add "path to library" -> "list of binaries that linked to the library" to a map
for each file in the prefix:
  check if file path is in the previously constructed map

i.e., iterate once over the larger set (files in prefix) and query a small map.

I did not review the code from post.py but only took a quick glance:
which_package is called in 3 different places therein and the code doesn't really look straight forward.
Meaning, unfortunately it would probably take a bit to restructure it to address the performance issues that way (which, IMO, would be the proper way to do it, though) :/ ...

@kenodegard
Copy link
Contributor Author

@mbargull yeah I'm starting to think the same, I had noticed previously how which_package was called from _map_file_to_package so all of this was starting to feel redundant (creating a file to package mapping in order to create a file to package mapping)

I've been wondering for a while now whether which_package actually makes more sense as something that PrefixData does instead of a helper function within conda-build, but as you mention that would require an even larger refactor


closing this since trying to invalidate the cache based on $PREFIX/conda-meta's mtime resulted in a race condition on Linux (the directory mtime appears to be cached and not immediately flushed to disk)

@kenodegard kenodegard closed this Jan 17, 2024
@kenodegard kenodegard removed this from the 24.1.x milestone Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Link checker cannot find dependencies for multi-output recipe in 3.28.3
3 participants