-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
which_package
to yield unique collection of package records
#5108
Conversation
8812608
to
f955083
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gh-5041 actually had two related regressions:
- The one fixed here.
conda_build.post._lookup_in_prefix_packages
is nowprint
ing aList[PrefixRecord]
which means it printsrepr(PrefixRecord)
that have allPathData
etc. expanded (leading to very verbose logs with sometimes >1M characters per line).
_lookup_in_prefix_packages
needs to be changed to print(..., [str(record) for record in precs_in_reqs], ...)
or the like.
for file in prec["files"]: | ||
if samefile(prefix / file, path): | ||
yield prec | ||
if any(samefile(prefix / file, path) for file in prec["files"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(n * m)
? I see this version skips work if true.
Too bad to iterate over all installed files to do the search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worst case yes, same as before
any
returns on the first True
: https://docs.python.org/3/library/functions.html#any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what avoids the duplicated records to begin with, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a subsequent PR, we could see if implementing a cached path -> artifact
map would speed this up. It can be a veeery slow part of the post processing stage in conda-build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what avoids the duplicated records to begin with, right?
correct, this is essentially what was happening pre #5041:
conda-build/conda_build/inspect_pkg.py
Lines 57 to 62 in c71c4ab
for dist in fn(prefix): | |
# dfiles = set(dist.get('files', [])) | |
dfiles = dist_files(prefix, dist) | |
# TODO :: This is completely wrong when the env is on a case-sensitive FS! | |
if any(norm_ipp == normcase(w) for w in dfiles): | |
yield dist |
cached
path -> artifact
map
definitely something to consider
"{}: {} found in {}{}".format( | ||
msg_prelude, n_dso_p, [prec.name for prec in precs], and_also | ||
), | ||
f"{msg_prelude}: {n_dso_p} found in {[str(prec) for prec in precs]}{and_also}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This upgrades from name to the full str representation. i.e. from python
to defaults::python-3.12.0-h123abc_0
, I think? Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is now more detailed (but still less verbose than str(precs)
) than before
the change was inspired by the fix below on line 1214-1215, where precs_in_reqs: List[PackageRecord]
was cast to string resulting in a very verbose output, see #5108 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good that this is now more consistent with the full spec which was already output at https://github.com/conda/conda-build/pull/5108/files#diff-5073339a2e88f80714f82a001af334869aa5df9627b2f00ffd18d7a2035f537eR1190 .
Do we need to de-dup in line 1180 in precs = list(which_package(in_prefix_dso, run_prefix)) |
@jaimergp AFAIK no deduping is needed once we merge this PR, with the modifications in |
" (likely) or a missing dependency (less likely)".format( | ||
msg_prelude, [prec.name for prec in precs] | ||
), | ||
f"{msg_prelude}: .. but {[str(prec) for prec in precs]} not in reqs/run, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I find the change to use str(prec)
in the len(precs_in_reqs) == 0 and len(precs) > 0
case above a welcome change, one could argue that we might only want to output the name here so users get a more succinct "you might've missed package
in run requirements" message.
But I don't have a strong opinion on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor nit comments, but overall this LGTM, thanks!
Co-authored-by: Marcel Bargull <[email protected]>
Description
The refactor of
which_package
accidentally resulted in the same package record being returned multiple times (once for each file collision). This results in misleading build warnings.Instead we only want to return a unique set of package records.
Resolves #5106
Checklist - did you ...
news
directory (using the template) for the next release's release notes?Add / update outdated documentation?