-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-platform native launchers for Python #275
base: main
Are you sure you want to change the base?
Conversation
@rickeylev Can you review this proposal when you have time ;) |
--- | ||
|
||
|
||
# Abstract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm +1 on the core idea.
|
||
# Abstract | ||
|
||
This document describes an approach for launching `py_binary` artifacts hermetically using the resolved python toolchain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is really Python specific and is fairly easy to generalize to other languages. Any language which has a separate "runtime executable" has the same basic problem -- Java, Ruby, JavaScript, etc.
I think there is also some overlap with debuggers, coverage tools, and test runners -- it's not uncommon for them to want to control the invocation of the program's original main.
Finally, there is a bit of conceptual overlap with the --run_under
flag, which basically also does "here's a binary, take another binary and input and then run it".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who would be the most likely person to accept this sort of proposal? I'm happy to expand this beyond Python (I mention in my proposal that it could be expanded). Im just not sure if it's better to start incrementally and pick Python first, or if it's better to start broad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java is the most immediate one that comes to mind. I'm pretty sure they are basically doing the same thing we are doing today (shell script that does setup, an optional launcher for a native binary). I emailed our internal language-rule-owners group.
I'm primarily interested in collecting advice and assessing interest at this point. I like to keep as small a set of Deciders as possible. Not all advice must be obeyed.
|
||
# Background | ||
|
||
Currently, `py_binary` is non-hermetic and launches inconsistently between platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something worth mentioning is the "Python Launcher for Windows".
Basically, a python.org Windows installs have a py.exe
that tries to figure out which python to use. This, of course, just moves the system-dependency from "python.exe" to "py.exe". plus it's going to try and do more logic to try and find the interpreter. On the whole, this is probably undesirable and not a solution -- it just introduces more complication to finding the interpreter, which the toolchain should just know upfront. I think this is also a windows-specific feature of a CPython installation.
See
https://docs.python.org/3/using/windows.html#python-launcher-for-windows
https://peps.python.org/pep-0397/
Currently, `py_binary` is non-hermetic and launches inconsistently between platforms. | ||
|
||
On macos and Linux, there is a [python_stub](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt) | ||
that is non-hermetic and requires a "bootstrap" python interpreter on the host. The "shebang" can be overridden, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: "requires a "bootstrap" python interpreter on the host": Well, not necessarily.
It's possible to make the stub use the same runtime as the program it ultimately invokes. This is a pretty sensible thing to do, I think. This is fine if you're already using a system interpreter (you already have an absolute path)
The problem case are "in build" interpreters -- you have to put a relative path to the interpreter. This makes running the program sensitive to the CWD you start the program in. Transforming that relative path into an absolute path requires some sort of runtime logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine if you're already using a system interpreter (you already have an absolute path)
This is probably the only case where it's fine and it has other issues. It's not portable unless everyone has the same interpreter installed at the same global path.
It also goes against the premise of the proposal, in the sense that it's "non-hermetic". It requires all hosts to have a preinstalled runtime at a particular version, which isn't easy to guarantee.
| Token | Description | | ||
| ---------------------- | ----------- | | ||
| env | Dictionary of key-value pairs for the environment of the process | | ||
| runfiles-interpreter | The resolved python toolchain in runfiles | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description here doesn't quite make sense with what the arg name implies.
The arg name sounds like a string path. The description is "the python toolchain", which is a complex object.
Once this proposal is implemented, it would enable cross-builds of hermetic `py_binary` for all major platforms. It | ||
would also remove the complexity introduced by having so many chains of nested execution to launch a python program. | ||
|
||
Finally, while this proposal is specific to python, this solution could perhaps be reused for `java_binary`, `sh_binary` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reach out to those owners to see if they have input to add.
|
||
# Backward-compatibility | ||
|
||
This proposal could require users to setup a cc toolchain for remote execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required to be CC? e.g., what if someone wants to write a launcher in rust?
Can a pre-built launcher be used? I guess that might be precluded; it depends on how the argv, path to interpretter etc is embedded or passed along. This makes me wonder if supporting code in Bazel itself would work better -- e.g. return a special provider that says "we're doing a launcher thing, use this argv etc when running"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, does not need to be CC. I think the launcher needs to be compiled as native in some way. So Go, Zig, Rust, CC all come to mind. Whatever has the most minimal toolchain requirements on the user and gives the functionality we require I think.
I think whatever is used as a launcher, needs to be standalone from Bazel once built. I don't want to ship bazel to run a binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whatever is used as a launcher, needs to be standalone from Bazel once built. I don't want to ship bazel to run a binary.
I agree.
The part I'm trying to think through is re-use of what is essentially the same binary (the launcher itself).
For a target-config build target, I agree, yes, the launcher essentially needs to be self-contained and standalone. I don't see how to do it otherwise because the invocation of bazel-bin/foo
has to work without any args. (I guess it could look for $0.params or something? But that seems kind of brittle).
For a build tool, the situation is different[1] -- stuff run during the build doesn't need the stricter isolation requirements. For example, when Bazel runs an executable in an action, it could avoid having to build the launcher entirely by doing the exec() call itself when it runs the subprocess. Maybe a target could return e.g. LauncherInfo(runtime=..., runtime_args=..., executable=...)
, or a rule advertises something (similar to a rule's toolchains setting) about how to find the launcher to build to mix it in with the LauncherInfo.
This then leads me to think that, if a rule returned that, Bazel itself could invoke the launcher building instead of the rule having to do so. Which has a Just Works sort of appeal; but risks coupling behavior to the Bazel release (which might be more of a headache).
[1] This case is particularly on my mind because Python is often used for build tools, and building the runtime and C dependencies is pretty expensive, so reuse of that is highly beneficial.
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
|
||
|
||
# Background | ||
|
||
Currently, `py_binary` is non-hermetic and launches inconsistently between platforms. | ||
|
||
On macos and Linux, there is a [python_stub](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt) | ||
that is non-hermetic and requires a "bootstrap" python interpreter on the host. The "shebang" can be overridden, but | ||
that is non-hermetic and requires a bootstrap Python interpreter on the host. The "shebang" can be overridden, but | ||
a "shebang" is always dependent on the runtime host. | ||
|
||
On Windows, there is a [native launcher](https://github.com/meteorcloudy/bazel/blob/master/src/tools/launcher/python_launcher.cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to link to bazelbuild here, not meteorcloudy?
https://github.com/bazelbuild/bazel/blob/master/src/tools/launcher/python_launcher.cc
You know, I kinda get the impression that the bulk of the work is already done for us. A lot of the parts of how the Windows launcher works don't seem that Windows-specific. Sure, it includes windows.h and uses some windows-specific APIs/types, but none of it is that windows specific -- it's still a pretty simple C program that uses a couple strings and exec's. There's even a special action that writes out the launcher's content. |
@rickeylev Yes, porting the launcher to Linux and macOS shouldn't be too hard. There is no fundamental Windows specific dependencies. |
Thanks, that's good to hear. For posterity, the code in the py rules for this is around here: |
I asked if it is possible to get the launcher code itself out of Bazel's core. If we can, then great. Otherwise, I'd be +1 on implementing it in rules_python and rewiring py_binary to, somehow, use that instead of the one built into bazel. |
Moving the //src/tools/launcher to its own repo is going to be very hard, because the launcher also depends on cc_libraries under //src/main/cpp/util. But I do think it's a good idea to start a new repo (ideally under bazel-contrib) which implements a launcher that could be used by different rules in the community. This is going to be helpful for porting rules to Windows, because launching a binary on Windows is a problem for almost all rules, but the launcher in Bazel only supports Java, Python, and Bash. |
meteorcloudy and I talked a bit. We both support the idea of having this code outside of Bazel itself so that it isn't coupled to the Bazel releases. Getting to that point...I'm not clear on that part. I'm willing to entertain ideas -- perhaps a new toolchain the Java impl looks up, but is implemented in rules_python in Starlark? Maybe in combination with some helpers we expose on e.g. py_common? Maybe have the rules_python macro pass something in to the rule? Anyways, I'm very willing to entertain ideas that let us decouple from Bazel core. I'm also very much OK if we put this in rules_python and deal with cross-language collaboration separately (the impression I've gotten from language owners thus far is "yeah that affects us" but not "we want to be part of the solution"). This is an internal detail of the rules, so it shouldn't be too hard to adapt if a collaboration happens. I'm also fine with modifying this launcher-code in Bazel itself; I don't own it, so I'd like to know who we send PRs to for approval. |
I'm the original author of the launcher, so you can send the PRs to me. |
This document describes an approach for launching
py_binary
artifacts hermetically using the resolved python toolchain.cc @meteorcloudy @rickeylev @jesseschalken