Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for remote-execution / caching #118

Open
gkousik opened this issue May 29, 2024 · 3 comments
Open

Native support for remote-execution / caching #118

gkousik opened this issue May 29, 2024 · 3 comments

Comments

@gkousik
Copy link

gkousik commented May 29, 2024

I'm opening this issue as a discussion to get opinion on whether there is interest in natively supporting remote-execution / remote-caching (through an open source protocol like https://github.com/bazelbuild/remote-apis).

With Ninja, the way I've seen most projects use remote-execution is by adding a wrapper to the command line that hijacks the actual action execution and then do remote-execution. While this model works, it has been cumbersome with some problems:

  1. A binary has to wrap all action command lines (e.g., ./<wrapper-binary> <wrapper-binary-args> -- clang++ .... This makes locally reproducing an action failure more complicated (you have to remove this wrapper binary and its arguments to run the action locally). It can also be confusing at times to see the wrapper binary in the command line, especially for new users.

  2. Skipping intermediate action results of the graph execution becomes very tricky. E.g., when executing a graph that contains a set of compile actions that result in a link action (all of which can be run and cached on a remote compiler farm), some users (or CI systems) would want to skip the intermediate object files since they aren't read locally and would only want to download the output of the link action. This becomes complicated to do in a remote-execution system implemented with wrappers since the individual wrappers don't have sufficient knowledge of the overall buildgraph to effectively skip downloading the output of some actions.

So the question is - is there interest in natively supporting such the remote-apis protocol in N2?

@evmar
Copy link
Owner

evmar commented May 29, 2024

Sounds cool! I don't have a use for it myself but I'd be happy to review or advise on any appropriate changes if they aren't too invasive.

One thing to be aware of is that n2 currently wants to read the mtimes of intermediate files to determine whether they're out of date. This might be more complex if you intend to leave the intermediate objects cached remotely. It's all just code of course, and fixable.

@Colecf
Copy link
Contributor

Colecf commented May 29, 2024

We plan to implement this in the android fork of n2 as well.

We'd like to implement action sandboxing (ideally using nsjail) before remote execution, as that will allow us to easily find all dependency issues that would be a blocker for RE. And if an action worked in a sandbox, there would be no extra work for it to also work in RE.

Note that sandboxing and RE are incompatible with depfiles, you wouldn't know what files to upload in that case.

We'd also like to try switching to file-hash-based manifests, initially just to cut down on unnecessary rebuilds locally, but eventually also to integrate with a hash-based remote fileystem. (ABFS)

Note that skipping intermediate actions may not be trivial, I don't think the bazel remote execution APIs are set up well for that, hence why bazel doesn't support it either. Though we are interested in that as well.

Edit: Oh just realized it's Kousik :) In that case, you'd probably just want to add this functionality to the android fork of n2, as the multithreaded parsing change has stirred up the internal datastructures a bit.

@gkousik
Copy link
Author

gkousik commented May 30, 2024

Thanks - its good to know that there's interest!

One thing to be aware of is that n2 currently wants to read the mtimes of intermediate files to determine whether they're out of date. This might be more complex if you intend to leave the intermediate objects cached remotely. It's all just code of course, and fixable.

This is a challenge yep. I've seen some attempts at this do a fake file on disk to satisfy the mtime check, while others maintain this information with a custom log file.

Note that skipping intermediate actions may not be trivial, I don't think the bazel remote execution APIs are set up well for that, hence why bazel doesn't support it either. Though we are interested in that as well.

I think Bazel does support Build without the bytes? https://blog.bazel.build/2023/10/06/bwob-in-bazel-7.html

In that case, you'd probably just want to add this functionality to the android fork of n2, as the multithreaded parsing change has stirred up the internal datastructures a bit.

Ah interesting.. I will checkout Android's N2 fork!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants