Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-platform native launchers for Python #275

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

groodt
Copy link

@groodt groodt commented Sep 11, 2022

This document describes an approach for launching py_binary artifacts hermetically using the resolved python toolchain.

cc @meteorcloudy @rickeylev @jesseschalken

@groodt groodt requested review from jin and lberki as code owners September 11, 2022 11:55
@meteorcloudy
Copy link
Member

@rickeylev Can you review this proposal when you have time ;)

@meteorcloudy meteorcloudy removed the request for review from jin September 16, 2022 08:34
---


# Abstract

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm +1 on the core idea.


# Abstract

This document describes an approach for launching `py_binary` artifacts hermetically using the resolved python toolchain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really Python specific and is fairly easy to generalize to other languages. Any language which has a separate "runtime executable" has the same basic problem -- Java, Ruby, JavaScript, etc.

I think there is also some overlap with debuggers, coverage tools, and test runners -- it's not uncommon for them to want to control the invocation of the program's original main.

Finally, there is a bit of conceptual overlap with the --run_under flag, which basically also does "here's a binary, take another binary and input and then run it".

Copy link
Author

@groodt groodt Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would be the most likely person to accept this sort of proposal? I'm happy to expand this beyond Python (I mention in my proposal that it could be expanded). Im just not sure if it's better to start incrementally and pick Python first, or if it's better to start broad.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java is the most immediate one that comes to mind. I'm pretty sure they are basically doing the same thing we are doing today (shell script that does setup, an optional launcher for a native binary). I emailed our internal language-rule-owners group.

I'm primarily interested in collecting advice and assessing interest at this point. I like to keep as small a set of Deciders as possible. Not all advice must be obeyed.


# Background

Currently, `py_binary` is non-hermetic and launches inconsistently between platforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something worth mentioning is the "Python Launcher for Windows".

Basically, a python.org Windows installs have a py.exe that tries to figure out which python to use. This, of course, just moves the system-dependency from "python.exe" to "py.exe". plus it's going to try and do more logic to try and find the interpreter. On the whole, this is probably undesirable and not a solution -- it just introduces more complication to finding the interpreter, which the toolchain should just know upfront. I think this is also a windows-specific feature of a CPython installation.

See
https://docs.python.org/3/using/windows.html#python-launcher-for-windows
https://peps.python.org/pep-0397/

Currently, `py_binary` is non-hermetic and launches inconsistently between platforms.

On macos and Linux, there is a [python_stub](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt)
that is non-hermetic and requires a "bootstrap" python interpreter on the host. The "shebang" can be overridden, but

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: "requires a "bootstrap" python interpreter on the host": Well, not necessarily.

It's possible to make the stub use the same runtime as the program it ultimately invokes. This is a pretty sensible thing to do, I think. This is fine if you're already using a system interpreter (you already have an absolute path)

The problem case are "in build" interpreters -- you have to put a relative path to the interpreter. This makes running the program sensitive to the CWD you start the program in. Transforming that relative path into an absolute path requires some sort of runtime logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine if you're already using a system interpreter (you already have an absolute path)

This is probably the only case where it's fine and it has other issues. It's not portable unless everyone has the same interpreter installed at the same global path.

It also goes against the premise of the proposal, in the sense that it's "non-hermetic". It requires all hosts to have a preinstalled runtime at a particular version, which isn't easy to guarantee.

designs/2022-09-12-native-launchers-python.md Outdated Show resolved Hide resolved
| Token | Description |
| ---------------------- | ----------- |
| env | Dictionary of key-value pairs for the environment of the process |
| runfiles-interpreter | The resolved python toolchain in runfiles |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here doesn't quite make sense with what the arg name implies.

The arg name sounds like a string path. The description is "the python toolchain", which is a complex object.

designs/2022-09-12-native-launchers-python.md Outdated Show resolved Hide resolved
designs/2022-09-12-native-launchers-python.md Outdated Show resolved Hide resolved
Once this proposal is implemented, it would enable cross-builds of hermetic `py_binary` for all major platforms. It
would also remove the complexity introduced by having so many chains of nested execution to launch a python program.

Finally, while this proposal is specific to python, this solution could perhaps be reused for `java_binary`, `sh_binary`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reach out to those owners to see if they have input to add.


# Backward-compatibility

This proposal could require users to setup a cc toolchain for remote execution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to be CC? e.g., what if someone wants to write a launcher in rust?
Can a pre-built launcher be used? I guess that might be precluded; it depends on how the argv, path to interpretter etc is embedded or passed along. This makes me wonder if supporting code in Bazel itself would work better -- e.g. return a special provider that says "we're doing a launcher thing, use this argv etc when running"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, does not need to be CC. I think the launcher needs to be compiled as native in some way. So Go, Zig, Rust, CC all come to mind. Whatever has the most minimal toolchain requirements on the user and gives the functionality we require I think.

I think whatever is used as a launcher, needs to be standalone from Bazel once built. I don't want to ship bazel to run a binary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whatever is used as a launcher, needs to be standalone from Bazel once built. I don't want to ship bazel to run a binary.

I agree.

The part I'm trying to think through is re-use of what is essentially the same binary (the launcher itself).

For a target-config build target, I agree, yes, the launcher essentially needs to be self-contained and standalone. I don't see how to do it otherwise because the invocation of bazel-bin/foo has to work without any args. (I guess it could look for $0.params or something? But that seems kind of brittle).

For a build tool, the situation is different[1] -- stuff run during the build doesn't need the stricter isolation requirements. For example, when Bazel runs an executable in an action, it could avoid having to build the launcher entirely by doing the exec() call itself when it runs the subprocess. Maybe a target could return e.g. LauncherInfo(runtime=..., runtime_args=..., executable=...), or a rule advertises something (similar to a rule's toolchains setting) about how to find the launcher to build to mix it in with the LauncherInfo.

This then leads me to think that, if a rule returned that, Bazel itself could invoke the launcher building instead of the rule having to do so. Which has a Just Works sort of appeal; but risks coupling behavior to the Bazel release (which might be more of a headache).

[1] This case is particularly on my mind because Python is often used for build tools, and building the runtime and C dependencies is pretty expensive, so reuse of that is highly beneficial.

groodt and others added 12 commits September 19, 2022 08:13


# Background

Currently, `py_binary` is non-hermetic and launches inconsistently between platforms.

On macos and Linux, there is a [python_stub](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt)
that is non-hermetic and requires a "bootstrap" python interpreter on the host. The "shebang" can be overridden, but
that is non-hermetic and requires a bootstrap Python interpreter on the host. The "shebang" can be overridden, but
a "shebang" is always dependent on the runtime host.

On Windows, there is a [native launcher](https://github.com/meteorcloudy/bazel/blob/master/src/tools/launcher/python_launcher.cc)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to link to bazelbuild here, not meteorcloudy?

https://github.com/bazelbuild/bazel/blob/master/src/tools/launcher/python_launcher.cc

@rickeylev
Copy link

You know, I kinda get the impression that the bulk of the work is already done for us. A lot of the parts of how the Windows launcher works don't seem that Windows-specific. Sure, it includes windows.h and uses some windows-specific APIs/types, but none of it is that windows specific -- it's still a pretty simple C program that uses a couple strings and exec's. There's even a special action that writes out the launcher's content.

@meteorcloudy
Copy link
Member

@rickeylev Yes, porting the launcher to Linux and macOS shouldn't be too hard. There is no fundamental Windows specific dependencies.

@rickeylev
Copy link

Thanks, that's good to hear.

For posterity, the code in the py rules for this is around here:
https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/BazelPythonSemantics.java#L294

@rickeylev
Copy link

I asked if it is possible to get the launcher code itself out of Bazel's core. If we can, then great. Otherwise, I'd be +1 on implementing it in rules_python and rewiring py_binary to, somehow, use that instead of the one built into bazel.

@meteorcloudy
Copy link
Member

Moving the //src/tools/launcher to its own repo is going to be very hard, because the launcher also depends on cc_libraries under //src/main/cpp/util.

But I do think it's a good idea to start a new repo (ideally under bazel-contrib) which implements a launcher that could be used by different rules in the community. This is going to be helpful for porting rules to Windows, because launching a binary on Windows is a problem for almost all rules, but the launcher in Bazel only supports Java, Python, and Bash.

@rickeylev
Copy link

meteorcloudy and I talked a bit.

We both support the idea of having this code outside of Bazel itself so that it isn't coupled to the Bazel releases. Getting to that point...I'm not clear on that part. I'm willing to entertain ideas -- perhaps a new toolchain the Java impl looks up, but is implemented in rules_python in Starlark? Maybe in combination with some helpers we expose on e.g. py_common? Maybe have the rules_python macro pass something in to the rule? Anyways, I'm very willing to entertain ideas that let us decouple from Bazel core.

I'm also very much OK if we put this in rules_python and deal with cross-language collaboration separately (the impression I've gotten from language owners thus far is "yeah that affects us" but not "we want to be part of the solution"). This is an internal detail of the rules, so it shouldn't be too hard to adapt if a collaboration happens.

I'm also fine with modifying this launcher-code in Bazel itself; I don't own it, so I'd like to know who we send PRs to for approval.

@comius

@meteorcloudy
Copy link
Member

I'm also fine with modifying this launcher-code in Bazel itself; I don't own it, so I'd like to know who we send PRs to for approval.

I'm the original author of the launcher, so you can send the PRs to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants