Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No such file or directory" when upgrading from v6.5.0 to 7.x.x #23743

Open
swarren12 opened this issue Sep 24, 2024 · 8 comments
Open

"No such file or directory" when upgrading from v6.5.0 to 7.x.x #23743

swarren12 opened this issue Sep 24, 2024 · 8 comments
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged

Comments

@swarren12
Copy link

swarren12 commented Sep 24, 2024

Description of the bug:

I'm trying to update a fairly complicated Bazel project from Bazel v6.5.0 to v7.x.x, but encountering strange issues. Unfortunately, I can't pinpoint exactly where the issue lies, but I believe it is in Bazel itself, rather than any of the rules being imported.

Expected behaviour: upgrading from v6.5.0 to v7.x.x "just works"
Actual behaviour: the build fails due to files inside the linux-sandbox not being found

More details
Currently, on Bazel v6.5.0, the build reliably passes both on local development workstations and in the CI environment. Upgrading to v7.x.x causes the build to occasionally fail on development machines and much more consistently fail in CI. Unfortunately, I've been unable to reproduce in an isolated example project, and I'm not sure exactly how to go about collecting more information on the problem.

I've tried upgrading to v7.0.0, v7.1.2, v7.2.1 and v7.3.1 but they all behave the same way.

It's not always the same target that fails, but it's always roughly for the same reason, which is that a file is not found within the sandbox.

One example of this is shown below. A bazel clean --expunge was run first, and then (an equivalent of) bazel test //... --test_tag_filters=smoke, which first failed when trying to create an ijar for a java_import for a file checked into version control:

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
src/main/tools/linux-sandbox-pid1.cc:530: "execvp(external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar, 0x1d1014a0)": No such file or directory
ERROR: lib/BUILD:2782:13: Extracting interface for jar lib/3rd-party/io.netty/netty-codec-haproxy/netty-codec-haproxy-4.1.113.Final.jar failed: (Exit 1): ijar failed: error executing JavaIjar command (from target //lib:netty-codec-haproxy) 
  (cd /home/warrens/.cache/bazel/_bazel_warrens/cf05af78ffeddb63393e16c80fd92083/sandbox/linux-sandbox/2/execroot/_main && \
  exec env - \
    PATH=/bin:/usr/bin:/usr/local/bin \
  external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar lib/3rd-party/io.netty/netty-codec-haproxy/netty-codec-haproxy-4.1.113.Final.jar bazel-out/k8-fastbuild/bin/lib/_ijar/netty-codec-haproxy/lib/3rd-party/io.netty/netty-codec-haproxy/netty-codec-haproxy-4.1.113.Final-ijar.jar --target_label //lib:netty-codec-haproxy)

Running the same bazel test command a second time also resulted in a failure, this time failing to run java:

ERROR: [snip]/BUILD:31:14: Building [snip]/SomeJavaTest.jar () failed: IOException while preparing the execution environment of a worker:
...
---8<---8<--- Exception details ---8<---8<---
java.io.IOException: Cannot run program "/home/warrens/.cache/bazel/_bazel_warrens/cf05af78ffeddb63393e16c80fd92083/execroot/_main/external/_main~java_repositories~jdk11/bin/java" (in directory "/home/warrens/.cache/bazel/_bazel_warrens/cf05af78ffeddb63393e16c80fd92083/execroot/_main"): error=2, No such file or directory
        at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1170)
        at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1089)
        at com.google.devtools.build.lib.shell.JavaSubprocessFactory.start(JavaSubprocessFactory.java:152)
...
Caused by: java.io.IOException: error=2, No such file or directory
        at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
        at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:295)
        at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:225)
        at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1126)

A third run of bazel test completed successfully.

A separate example, taken from the CI exhibits a similar mode of failure; this time during running of some tests:

ERROR: [snip]/BUILD:11:10: Testing //...:some-custom-test-rule failed: (Exit 1): generate-xml.sh failed: error executing TestRunner command (from target //...:some-custom-test-rule) 
  (cd /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/34/execroot/_main && \
  exec env - \
    EXPERIMENTAL_SPLIT_XML_GENERATION=1 \
    JAVA_RUNFILES=bazel-out/k8-fastbuild/bin/.../some-custom-test-rule.runfiles \
    PATH=/bin:/usr/bin:/usr/local/bin \
    PYTHON_RUNFILES=bazel-out/k8-fastbuild/bin/.../some-custom-test-rule.runfiles \
    RUNFILES_DIR=bazel-out/k8-fastbuild/bin/.../some-custom-test-rule.runfiles \
    RUN_UNDER_RUNFILES=1 \
    TEST_BINARY=.../some-custom-test-rule \
    TEST_INFRASTRUCTURE_FAILURE_FILE=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.infrastructure_failure \
    TEST_LOGSPLITTER_OUTPUT_FILE=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.raw_splitlogs/test.splitlogs \
    TEST_NAME=//...:some-custom-test-rule \
    TEST_PREMATURE_EXIT_FILE=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.exited_prematurely \
    TEST_SHARD_INDEX=0 \
    TEST_SIZE=small \
    TEST_SRCDIR=bazel-out/k8-fastbuild/bin/.../some-custom-test-rule.runfiles \
    TEST_TARGET=//...:some-custom-test-rule \
    TEST_TIMEOUT=60 \
    TEST_TMPDIR=_tmp/ff60cd74048852c7bacd3c1d1b00a8f2 \
    TEST_TOTAL_SHARDS=0 \
    TEST_UNDECLARED_OUTPUTS_ANNOTATIONS=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.outputs_manifest/ANNOTATIONS \
    TEST_UNDECLARED_OUTPUTS_ANNOTATIONS_DIR=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.outputs_manifest \
    TEST_UNDECLARED_OUTPUTS_DIR=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.outputs \
    TEST_UNDECLARED_OUTPUTS_MANIFEST=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.outputs_manifest/MANIFEST \
    TEST_UNDECLARED_OUTPUTS_ZIP=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.outputs/outputs.zip \
    TEST_UNUSED_RUNFILES_LOG_FILE=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.unused_runfiles_log \
    TEST_WARNINGS_OUTPUT_FILE=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.warnings \
    TEST_WORKSPACE=_main \
    TZ=UTC \
    XML_OUTPUT_FILE=bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.xml \
  external/bazel_tools/tools/test/generate-xml.sh bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.log bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.xml 0 1)
# Configuration: 96d1f52e073df1fb1edb92e576742c56c7c33cdfdf7dc366cbda968896be461f
# Execution platform: @@platforms//host:host

At first glance, this looked to be a different type of failure; however, cating the test.log shows:

$ cat bazel-out/k8-fastbuild/testlogs/.../some-custom-test-rule/test.log
src/main/tools/linux-sandbox-pid1.cc:530: "execvp(external/bazel_tools/tools/test/test-setup.sh, 0xbd0c10)": No such file or directory

Some observations:

  • In all cases, re-running the build seems to result in different behaviour; it will either succeed or fail at a different point
  • The examples above are fairly Java-centric, because that is the vast majority of the project; however, I've seen other external tools (e.g. node) also fail.
  • We have not explicitly changed the value of --incompatible_sandbox_hermetic_tmp in these builds (so I believe it is set to the default value of true?); however, from what I can tell, adding common --noincompatible_sandbox_hermetic_tmp to .bazelrc does not help (I've tried explicitly setting both values, and the problem occurs with both)
  • turning on --sandbox_debug gives a lot more output, but I couldn't see anything useful in there

Currently, I'm leaning towards this being a problem with multiple processes trying to interact with the sandbox at the same time; this would explain why I'm unable to reproduce it on a small project and why it fails more consistently in CI (bigger box with more cores to run tasks in parallel). However if I add --jobs=1 the problem persists, which suggests this hypothesis is wrong.

Any suggestions on what could be tried in order to further triage or resolve this issue would be appreciated.

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

As mentioned, I can reproduce it fairly reliably on a large project; unfortunately, I'm yet to find a way of reproducing it on a example project. I'll keep trying though!

Which operating system are you running Bazel on?

Linux (Fedora & CentOS)

What is the output of bazel info release?

7.3.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

n/a

What's the output of git remote get-url origin; git rev-parse HEAD ?

n/a

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

I'm unable to run bazelisk --bisect=6.5.0..7.0.0 because it attempts to revert back to v6.0.0, which is incompatible with most of the rules in MODULE.bazel :(

I'm trying to work out if I can use some of the 7.0.0 pre-release candidates to narrow down when it started failing, but so far no luck.

Have you found anything relevant by searching the web?

I found a similar sounding issue here: #22151; however, this affects v6.5.0 and that is the version that is working for me.

Similarly, some comments in #19943 seemed relevant, but I couldn't really turn them into useful avenues of investigation.

Any other information, logs, or outputs that you want to share?

Output of `bazel mod graph`

<root> (monorepo@_)
├───[email protected] 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   └───[email protected] 
│       ├───[email protected] (*) 
│       ├───[email protected] (*) 
│       └───[email protected] (*) 
├───[email protected] 
│   └───[email protected] (*) 
├───[email protected] 
│   └───[email protected] (*) 
├───[email protected] 
│   └───[email protected] (*) 
├───[email protected] 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   └───[email protected] 
│       ├───[email protected] (*) 
│       ├───[email protected] (*) 
│       └───[email protected] (*) 
├───[email protected] 
│   ├───[email protected] (*) 
│   └───[email protected] (*) 
├───[email protected] 
├───[email protected] 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   └───[email protected] 
│       ├───[email protected] (*) 
│       ├───[email protected] (*) 
│       └───[email protected] (*) 
├───[email protected] 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   └───[email protected] (*) 
└───[email protected] 
    ├───[email protected] (*) 
    ├───[email protected] (*) 
    ├───[email protected] (*) 
    ├───[email protected] (*) 
    ├───[email protected] 
    │   └───[email protected] (*) 
    └───[email protected] 
        ├───[email protected] (*) 
        ├───[email protected] (*) 
        ├───[email protected] (*) 
        ├───[email protected] (*) 
        ├───[email protected] (*) 
        ├───[email protected] (*) 
        ├───[email protected] (*) 
        ├───[email protected] 
        │   ├───[email protected] (*) 
        │   ├───[email protected] (*) 
        │   ├───[email protected] (*) 
        │   └───[email protected] (*) 
        ├───[email protected] 
        │   ├───[email protected] (*) 
        │   ├───[email protected] (*) 
        │   └───[email protected] (*) 
        ├───[email protected] 
        │   ├───[email protected] (*) 
        │   ├───[email protected] (*) 
        │   ├───[email protected] (*) 
        │   ├───[email protected] (*) 
        │   └───[email protected] (*) 
        └───[email protected] 
            ├───[email protected] (*) 
            └───[email protected] (*) 

I've tried upgrading various rules, but with no luck (and generally bringing in other difficulties!).

@github-actions github-actions bot added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Sep 24, 2024
@tjgq
Copy link
Contributor

tjgq commented Sep 24, 2024

Some ideas to gather more information:

  1. Can you reproduce this with --spawn_strategy=standalone (i.e., does it also happen with sandboxing disabled?)
  2. Can you inspect the contents of the sandbox left over by --sandbox_debug and verify that the executable is present at the expected location? In particular, if it is a symlink, does the symlink dangle, or does it point to the expected file?
  3. Since this seems to occur for at least two different rules, can you reduce it further? Say, does a minimal genrule like the one below also reproduce the issue?
  4. Can you provide the full list of flags you're using, including the ones set in blazercs?
genrule(
  name = "gen",
  outs = ["out.txt"],
  cmd = "touch $@",
)

@swarren12
Copy link
Author

swarren12 commented Sep 24, 2024

It'll take me a while to go through all of those suggestions, so I'll update this comment as I go, but:

1. Can you reproduce this with --spawn_strategy=standalone (i.e., does it also happen with sandboxing disabled?)

Yes, it appears I can:

$ cat bazel-out/k8-fastbuild/testlogs/.../test.log
src/main/tools/process-wrapper-legacy.cc:80: "execvp(external/bazel_tools/tools/test/test-setup.sh, ...)": No such file or directory

2. Can you inspect the contents of the sandbox left over by --sandbox_debug and verify that the executable is present at the expected location? In particular, if it is a symlink, does the symlink dangle, or does it point to the expected file?

I think the answer here is "sometimes" (or "I don't always know how to read the sandbox debug output properly")

ERROR: /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/external/rules_jvm_external~/private/tools/java/com/github/bazelbuild/rules_jvm_external/zip/BUILD:1:13: Compiling Java headers external/rules_jvm_external~/private/tools/java/com/github/bazelbuild/rules_jvm_external/zip/libzip-hjar.jar (1 source file) [for tool] failed: (Exit 1): linux-sandbox failed: error executing Turbine command 
  (cd /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/62/execroot/_main && \
...
src/main/tools/linux-sandbox-pid1.cc:530: "execvp(external/_main~java_repositories~jdk11/bin/java, 0xf5ab30)": No such file or directory
ERROR: /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/external/rules_jvm_external~/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/BUILD:3:12 Building external/rules_jvm_external~/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry.jar (1 source file) [for tool] failed: (Exit 1): linux-sandbox failed: error executing Turbine command 

and then:

$ ls /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/62/execroot/_main/external/rules_jvm_external~/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry.jar
ls: cannot access '/var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/62/execroot/_main/external/rules_jvm_external~/private/tools/java/com/github/bazelbuild/rules_jvm_external/jar/AddJarManifestEntry.jar': No such file or directory

In fact, there seemed to be quite a few missing files and at least one broken symlink under sandbox/linux-sandbox/62/execroot/_main/external/.

But on another run:

ERROR: lib/BUILD:1498:13: Extracting interface for jar lib/3rd-party/org.hamcrest/hamcrest/hamcrest-core-1.3.jar failed: (Exit 1): linux-sandbox failed: error executing JavaIjar command 
  (cd /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/105/execroot/_main && \
  exec env - \
    PATH=/bin:/usr/bin:/usr/local/bin \
    TMPDIR=/tmp \
  /var/lib/jenkins/.cache/bazel/_bazel_jenkins/install/5d4256ba95eeafc7a3485f16e4778c0d/linux-sandbox -t 15 -w /dev/shm -w /tmp -w /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/105/execroot/_main -M /tmp -S /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/105/stats.out -N -D /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/105/debug.out -- external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar lib/3rd-party/org.hamcrest/hamcrest/hamcrest-core-1.3.jar bazel-out/k8-fastbuild/bin/lib/_ijar/hamcrest-core/lib/3rd-party/org.hamcrest/hamcrest/hamcrest-core-1.3-ijar.jar --target_label //lib:hamcrest-core)
src/main/tools/linux-sandbox-pid1.cc:530: "execvp(external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar, 0x122e710)": No such file or directory

gives:

$ ls --color -lA /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/105/execroot/_main/external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar
lrwxrwxrwx. 1 jenkins jenkins 169 Sep 24 21:23 /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/sandbox/linux-sandbox/105/execroot/_main/external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar -> /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/execroot/_main/external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar

$ ls -lA /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/execroot/_main/external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar
-r-xr-xr-x. 1 jenkins jenkins 228368 Nov 30  2023 /var/lib/jenkins/.cache/bazel/_bazel_jenkins/38b07d741dde33298ed2fff99f485394/execroot/_main/external/rules_java~~toolchains~remote_java_tools_linux/java_tools/ijar/ijar

which does indeed seem to exist?

3. Since this seems to occur for at least two different rules, can you reduce it further? Say, does a minimal genrule like the one below also reproduce the issue?

I'm not able to reproduce it with a simple genrule; I've tried adding the sample and we have quite a few basic rules already in the code base, but I'm yet to observe any of them fail. I'll keep an eye out, but so far it appears these rules are not affected.

However, tangentially related, I have seen some of our custom Bazel rules using ctx.actions.run() and ctx.actions.run_shell() fail.

4. Can you provide the full list of flags you're using, including the ones set in blazercs?

INFO: Reading 'startup' options from /var/lib/jenkins/workspace/.bazelrc: --host_jvm_args=-Djavax.net.ssl.trustStore=internal.truststore
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=184
INFO: Reading rc options for 'test' from /var/lib/jenkins/workspace/.bazelrc:
  Inherited 'common' options: --enable_bzlmod  --experimental_downloader_config=build/bazel/downloader.cfg --experimental_allow_tags_propagation --incompatible_no_implicit_file_export --java_language_version=11 --tool_java_language_version=11 --java_runtime_version=custom_11 --tool_java_runtime_version=custom_11 --announce_rc --attempt_to_print_relative_paths
INFO: Reading rc options for 'test' from /var/lib/jenkins/workspace/.bazelrc:
  Inherited 'build' options: --incompatible_strict_action_env --incompatible_enable_cc_toolchain_resolution --use_ijars --experimental_strict_java_deps=strict --explicit_java_test_deps --strategy=MakeRpm=local --nosandbox_default_allow_network --verbose_failures --show_result=50 --strategy_regexp=benchmark=standalone
INFO: Reading rc options for 'test' from /var/lib/jenkins/workspace/cache.bazelrc:
  Inherited 'build' options: --remote_cache=http://internal.cache:8081 --remote_timeout=120 --noremote_upload_local_results
INFO: Reading rc options for 'test' from /var/lib/jenkins/workspace/.bazelrc:
  'test' options: --test_output=errors --test_summary=terse
INFO: Reading rc options for 'test' from /var/lib/jenkins/workspace/local.bazelrc:
  'test' options: --test_output=summary --test_summary=short

I've tried removing the Java overrides (we need them in CI, but I can build without them on the development machine) but it had no effect. Similarly, enabling/disabling the remote caching also seems to change nothing. Finally I also tried removing --worker_quit_after_build and --worker_sandboxing locally, but the issue still persisted.

@swarren12
Copy link
Author

swarren12 commented Sep 24, 2024

I couldn't get bazelisk --bisect to play nice, but I think I've narrowed it down to something that happened between 7.0.0-pre.20231011.2 and 7.0.0-pre.20231018.3. The former seems to build fine, the latter does not.

Tentative culprit: 1b729a5

I've managed to do a clean build using --noexperimental_merged_skyframe_analysis_execution. I'm going to run it a few more times before saying definitively that that is the cause, but it's looking promising!

@fmeum
Copy link
Collaborator

fmeum commented Sep 25, 2024

@joeleba

@joeleba
Copy link
Member

joeleba commented Sep 25, 2024

This sounds a bit like #22073. Could you try this out with a bazel version >= 7.2.0?

@swarren12
Copy link
Author

This sounds a bit like #22073. Could you try this out with a bazel version >= 7.2.0?

I've tried on v7.0.0, v7.1.2, v7.2.1 and v7.3.1 and all of them seem to exhibit the same behaviour.

@joeleba
Copy link
Member

joeleba commented Oct 14, 2024

cc @turmanticant

@alexeagle
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged
Projects
None yet
Development

No branches or pull requests

8 participants