-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: smithy spuriously dropping valid configurations #2012
Comments
@beckermr, any thoughts about this one? Would that patch make sense in your eyes? TBH I don't see how this could affect things (in the sense that |
Did you fix the pythonhashseed when debugging? |
Didn't need to - every run was perfectly reproducible in itself, even though there seemed to be no rhyme or reason which specific debug change caused it. |
Well as we all know computers are not magic. Deep copying a string should do nothing. Can you try a cast via "str(...)"? Are we sure that dict entry is a string and not a Path object? |
The debug prints fixing things versus not also seems like a red herring. |
To directly answer you, that patch makes no sense to me at all. |
To that I counter "Any sufficiently advanced technology is indistinguishable from magic." 😛 IOW, just because it's reproducible doesn't meant it isn't (close to) magic. I wish python had a way to opt into pure functions... mutating self-referential state creates mind-bending puzzles. 😑
Fair enough, I did point out in the OP that that's a distinct possibility (and that was even before I understood that the copied variable is just a string).
With --- a/conda_smithy/configure_feedstock.py
+++ b/conda_smithy/configure_feedstock.py
@@ -983,7 +983,7 @@ def _render_ci_provider(
config = conda_build.config.get_or_merge_config(
None,
- exclusive_config_file=forge_config["exclusive_config_file"],
+ exclusive_config_file=str(forge_config["exclusive_config_file"]),
platform=platform,
arch=arch,
) on top of 3.37.2, things fail again 🙃 |
Yeah that's just crazy town right there. Whatever is happening it is subtle. I don't think we should patch just yet only because I don't think we've actually found the bug. Hrmmmmm. |
The plot thickens further... in conda-forge/ctng-compilers-feedstock#148, even the bot gets it wrong (and the above patch doesn't help). I parked a branch with that commit on my fork. The issue is that the rerender sets the |
Yeah this version always fails for me on the original issue (even with the patch) PYTHONHASHSEED=100 CONDA_SMITHY_LOGLEVEL=debug conda-smithy rerender This version works without the patch above PYTHONHASHSEED=4 CONDA_SMITHY_LOGLEVEL=debug conda-smithy rerender We should always try the hashseed with issues like this even if things appear stable. |
The value is not coming out of the function |
Yeah, just got something similar. I put the following debug statements in (among others), hidden now because I was looking at the wrong variable--- a/conda_smithy/configure_feedstock.py
+++ b/conda_smithy/configure_feedstock.py
@@ -1094,6 +1128,9 @@ def _render_ci_provider(
if os.path.exists(_recipe_cbc + ".conda.smithy.bak"):
os.rename(_recipe_cbc + ".conda.smithy.bak", _recipe_cbc)
+ print("individual configs")
+ print(sorted(list({el["c_stdlib_version"] for meta in metas for el in meta[0].config.variants})))
+
# render returns some download & reparsing info that we don't care about
metas = [m for m, _, _ in metas]
@@ -1134,6 +1171,9 @@ def _render_ci_provider(
fancy_platforms = []
unfancy_platforms = set()
+ print("merged configs")
+ pprint.pprint(sorted(list({el["c_stdlib_version"] for metas in metas_list_of_lists for meta in metas for el in meta.config.variants})))
+
configs = []
for metas, platform, arch, enable, upload in zip(
metas_list_of_lists, and the result was
Or maybe I was on the wrong track there, in the sense that 2.12 should have been one of he 2.17's... 🤷 |
FWIW, I had tried the clang-win one with conda-build 24.5 out of curiosity (seeing that I just ran into a regression w.r.t. CBC-handling in 24.7 today), but the result was the same. |
The bug is here: https://github.com/conda-forge/conda-smithy/blob/main/conda_smithy/configure_feedstock.py#L895 For whatever reason, conda-build is marking the missing variant as skipped. |
The original recipe has a jinja2 if statement around the skip:
Maybe we convert that to selectors and try again? |
the default jinja2 values are also suspicious:
|
The problem is I didn't manage to make that combination work with selectors...
why? |
So for selectors, this works fine, but is ofc hard coded:
The ordering of when conda-build does the selectors vs jinja2 and it fills in values for jinja2 vars that are missing is a mystery to me. We shouldn't have to set defaults and yet we have to a lot. (edit to something less speculative) |
AFAIU that's primarily due to the linter, which doesn't run per target, but only once without any config, and so would fail if the jinja variables being used aren't set. |
Adding a single random selector on some line like this
causes things to work again. |
Also smithy fails to render without the defaults, ending in an error inside conda-build. So it is not only the linter that needs the defaults. |
FWIW, the ctng-compilers recipe has no such jinja-skip. Perhaps the issues are separate though. |
I think the easiest path forward is to change the CBC to zip the version parts you need into the variant variables and use them in the selectors. |
One common thing is that both recipes set jinja2 default values that are also set in the variant configs. I wonder if that is causing wires to cross. |
As a temporary workaround or as a longer-term fix? I'd be fine with the former, not excited about the latter (not least because this used to work, and it's a legitimate use of the various mechanisms). |
Can you find a combination of current smithy and old conda-build where this does work? That would help eliminate sources of bugs. |
@h-vetinari here is a minimal reproducer This confirms the bug is in conda-build, not smithy. |
Thank you so much for chasing down that bug and coming up with a fix already! 🙏😊 |
One thing it does confirm though is that the issue in ctng-compilers is a separate one; I tried a local install of conda/conda-build#5447, and it still incorrectly bumps the |
Oh boy yay fun! We'll have to keep digging. |
I tried to reduce to an example like you did in conda/conda-build#5445, but I still see 2.12 in the So it looks like this is at least partly smithy's fault... Indeed, I had been looking at the wrong variable above; if I do the same with
MisdiagnosisChasing this down a bit further, we seem to be dropping conda-smithy/conda_smithy/configure_feedstock.py Lines 203 to 253 in 66d7293
IOW, after that block
whereas before that block it looks like:
|
Nevermind, that was a misdiagnosis, the bug happens later still. After the block I mentioned, the correct config is still found in more aimless investigating
|
UGH, the solution is: the rerender works, and I managed to confuse myself in a cross-compilation recipe. What was happening is that we were doing the first rerender since we dropped cos6, and so the Sorry about the hassle here. I think we can close this since we've identified the error for clang-win-activation, realised my error for ctng-compilers. That leaves just conda/conda-build#5443 of the recent compiler-related rendering regressions unresolved 😅 |
This has been one of the most painful debugging sessions in a long time - the bug went away with random debug prints at random points in the code, even if those prints were long after the (presumed) point of failure.
The situation is as follows, wanting to add another configuration to clang-win-activation, like so:
CLANG_VERSION: - 16.0.6 - 17.0.6 - 18.1.8 + - 19.1.0.rc1
the combination of conda-smithy 3.37.2 and conda-build 24.7.1 would spuriously drop the builds for 16.0.6, rather than doing something purely additively. Adding a 5th line would bring back all 5 variants, but 4 somehow means only 3 variants get created. There's also some zipping involved, so I put the exact commit I was rerendering (over and over and over) here.
I ultimately managed to get this to render correctly without debug prints (on top of 3.37.2 &
main
) with the following diff:Since this happens at the interface between smithy and conda-build, I cannot tell who is responsible here; whether conda-build incorrectly mutates state that's passed in, or whether smithy incorrectly assumes that state will not be mutated.
It's also possible that my diagnosis is still not correct, and that the
deepcopy
just happen to randomly enforce some collision/synchronization that's not there otherwise, but doesn't (fully) get rid of the bug yet (like the print statements before; sidenote, there was often - but not always - a difference thatpprint.pprint
would remove the bug, but bareprint
wouldn't).The text was updated successfully, but these errors were encountered: