-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"No such file or directory" when upgrading from v6.5.0 to 7.x.x #23743
Comments
Some ideas to gather more information:
|
It'll take me a while to go through all of those suggestions, so I'll update this comment as I go, but: 1. Can you reproduce this with --spawn_strategy=standalone (i.e., does it also happen with sandboxing disabled?) Yes, it appears I can:
2. Can you inspect the contents of the sandbox left over by --sandbox_debug and verify that the executable is present at the expected location? In particular, if it is a symlink, does the symlink dangle, or does it point to the expected file? I think the answer here is "sometimes" (or "I don't always know how to read the sandbox debug output properly")
and then:
In fact, there seemed to be quite a few missing files and at least one broken symlink under But on another run:
gives:
which does indeed seem to exist? 3. Since this seems to occur for at least two different rules, can you reduce it further? Say, does a minimal genrule like the one below also reproduce the issue? I'm not able to reproduce it with a simple However, tangentially related, I have seen some of our custom Bazel rules using 4. Can you provide the full list of flags you're using, including the ones set in blazercs?
I've tried removing the Java overrides (we need them in CI, but I can build without them on the development machine) but it had no effect. Similarly, enabling/disabling the remote caching also seems to change nothing. Finally I also tried removing |
I couldn't get Tentative culprit: 1b729a5 I've managed to do a clean build using |
This sounds a bit like #22073. Could you try this out with a bazel version >= 7.2.0? |
I've tried on v7.0.0, v7.1.2, v7.2.1 and v7.3.1 and all of them seem to exhibit the same behaviour. |
Adding another report, encountered this issue in a Bazel 7.4.1 -> 8.0.0 upgrade https://github.com/aspect-build/aspect-workflows-template/pull/268/files#diff-558f048066251623c05b2c02a050849a674273bc96f50a667d890074962e81c0R12 |
Description of the bug:
I'm trying to update a fairly complicated Bazel project from Bazel v6.5.0 to v7.x.x, but encountering strange issues. Unfortunately, I can't pinpoint exactly where the issue lies, but I believe it is in Bazel itself, rather than any of the rules being imported.
Expected behaviour: upgrading from v6.5.0 to v7.x.x "just works"
Actual behaviour: the build fails due to files inside the
linux-sandbox
not being foundMore details
Currently, on Bazel v6.5.0, the build reliably passes both on local development workstations and in the CI environment. Upgrading to v7.x.x causes the build to occasionally fail on development machines and much more consistently fail in CI. Unfortunately, I've been unable to reproduce in an isolated example project, and I'm not sure exactly how to go about collecting more information on the problem.
I've tried upgrading to v7.0.0, v7.1.2, v7.2.1 and v7.3.1 but they all behave the same way.
It's not always the same target that fails, but it's always roughly for the same reason, which is that a file is not found within the sandbox.
One example of this is shown below. A
bazel clean --expunge
was run first, and then (an equivalent of)bazel test //... --test_tag_filters=smoke
, which first failed when trying to create anijar
for ajava_import
for a file checked into version control:Running the same
bazel test
command a second time also resulted in a failure, this time failing to runjava
:A third run of
bazel test
completed successfully.A separate example, taken from the CI exhibits a similar mode of failure; this time during running of some tests:
At first glance, this looked to be a different type of failure; however,
cat
ing thetest.log
shows:Some observations:
node
) also fail.--incompatible_sandbox_hermetic_tmp
in these builds (so I believe it is set to the default value oftrue
?); however, from what I can tell, addingcommon --noincompatible_sandbox_hermetic_tmp
to.bazelrc
does not help (I've tried explicitly setting both values, and the problem occurs with both)--sandbox_debug
gives a lot more output, but I couldn't see anything useful in thereCurrently, I'm leaning towards this being a problem with multiple processes trying to interact with the sandbox at the same time; this would explain why I'm unable to reproduce it on a small project and why it fails more consistently in CI (bigger box with more cores to run tasks in parallel). However if I add
--jobs=1
the problem persists, which suggests this hypothesis is wrong.Any suggestions on what could be tried in order to further triage or resolve this issue would be appreciated.
Which category does this issue belong to?
Core
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
As mentioned, I can reproduce it fairly reliably on a large project; unfortunately, I'm yet to find a way of reproducing it on a example project. I'll keep trying though!
Which operating system are you running Bazel on?
Linux (Fedora & CentOS)
What is the output of
bazel info release
?7.3.1
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.n/a
What's the output of
git remote get-url origin; git rev-parse HEAD
?If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
I'm unable to run
bazelisk --bisect=6.5.0..7.0.0
because it attempts to revert back to v6.0.0, which is incompatible with most of the rules inMODULE.bazel
:(I'm trying to work out if I can use some of the 7.0.0 pre-release candidates to narrow down when it started failing, but so far no luck.
Have you found anything relevant by searching the web?
I found a similar sounding issue here: #22151; however, this affects v6.5.0 and that is the version that is working for me.
Similarly, some comments in #19943 seemed relevant, but I couldn't really turn them into useful avenues of investigation.
Any other information, logs, or outputs that you want to share?
Output of `bazel mod graph`
I've tried upgrading various rules, but with no luck (and generally bringing in other difficulties!).
The text was updated successfully, but these errors were encountered: