Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-platform native launchers for Python #275

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions designs/2022-09-12-native-launchers-python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
created: 2022-09-12
last updated: 2022-09-12
status: To be reviewed
reviewers:
- TODO
title: Cross-platform native launchers for Python
authors:
- groodt
---


# Abstract

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm +1 on the core idea.


This document describes an approach for launching `py_binary` artifacts hermetically using the resolved Python toolchain.


# Background

Currently, `py_binary` is non-hermetic and launches inconsistently between platforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something worth mentioning is the "Python Launcher for Windows".

Basically, a python.org Windows installs have a py.exe that tries to figure out which python to use. This, of course, just moves the system-dependency from "python.exe" to "py.exe". plus it's going to try and do more logic to try and find the interpreter. On the whole, this is probably undesirable and not a solution -- it just introduces more complication to finding the interpreter, which the toolchain should just know upfront. I think this is also a windows-specific feature of a CPython installation.

See
https://docs.python.org/3/using/windows.html#python-launcher-for-windows
https://peps.python.org/pep-0397/


On macos and Linux, there is a [python_stub](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt)
that is non-hermetic and requires a bootstrap Python interpreter on the host. The "shebang" can be overridden, but
a "shebang" is always dependent on the runtime host.

On Windows, there is a [native launcher](https://github.com/meteorcloudy/bazel/blob/master/src/tools/launcher/python_launcher.cc)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to link to bazelbuild here, not meteorcloudy?

https://github.com/bazelbuild/bazel/blob/master/src/tools/launcher/python_launcher.cc

that launches `python.exe` on the host which then launches the `py_binary` with the same `python_stub` as macos and Linux.

Related issues:
* [py_binary with hermetic toolchain requires a system interpreter](https://github.com/bazelbuild/rules_python/issues/691)
* [Neither python_top nor python toolchain works with starlark actions on windows](https://github.com/bazelbuild/bazel/issues/7947#issuecomment-495265016)

This situation is undesirable because it assumes that the target platform has a bootstrapping python interpreter
available and makes the hermetic Python interpreters available with `rules_python` less useful. It is also surprising to
users who expect Bazel to output self-contained binary artifacts for a target platform.

The reason this situation exists is because of bootstrapping. Ultimately, *something* needs to find the Python
interpreter in the runfiles and use that to launch the program. Currently, Bazel assumes the target platform will
be able to provide the bootstrapping functionality.


# Proposal

Extend the native launcher functionality to all platforms and use it to locate the relevant Python interpreter and
Python program in the `runfiles` tree to launch the `py_binary`. No assumptions should be made about the target platform.

In pseudo-code, the proposal is as follows:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty high-level psuedo-code :).

Something a little more concrete would be better. e.g., it has to find the runfiles directory to resolve the relative path names.


```
exec(env, runfiles-interpreter, ["interpreter_arg1",], "main.py", ["arg1",])
```

| Token | Description |
| ---------------------- | ----------- |
| env | Dictionary of key-value pairs for the environment of the process |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why env is one of the inputs? This basically implies that the launcher process may need to use a modified environment from the actual program -- what's the motivation case for this? Why would it not just inherit the existing environment?

Ah, one case I just thought of: LD_PRELOAD (or equiv). Basically, a binary might require such a setting and we wouldn't want the launcher itself to use that (and by "might" i mean, we have this feature internally at Google)

| runfiles-interpreter | The resolved python toolchain in runfiles |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here doesn't quite make sense with what the arg name implies.

The arg name sounds like a string path. The description is "the python toolchain", which is a complex object.

| ["interpreter_arg1",] | An array of arguments to provide to the python interpreter |
| "main.py" | The python program to launch in runfiles |
| ["arg1",] | An array of arguments to provide to the python program as sys.argv[1:] |

This native launcher idea has been proposed a few times by bazel contributors and the community:
* [Greg Roodt (Community)](https://github.com/bazelbuild/rules_python/issues/691#issuecomment-1174935972)
* [Yun Peng (Google)](https://github.com/bazelbuild/bazel/issues/7947#issuecomment-495265016)
* [Richard Levasseur (Google)](https://github.com/bazelbuild/rules_python/issues/691#issuecomment-1186379617)

Some related work has been done that fixes Linux to Windows cross-builds of the Windows launcher. See: [Fix Linux to Windows cross compilation of py_binary, java_binary and sh_binary using MinGW](https://github.com/bazelbuild/bazel/pull/16019)
This proposal would aim to go further and have these launchers available on all platforms, including cross_builds where appropriate toolchains are in place.

Once this proposal is implemented, it would enable cross-builds of hermetic `py_binary` for all major platforms. It
would also remove the complexity introduced by having so many chains of nested execution to launch a Python program.

Finally, while this proposal is specific to Python, this solution could perhaps be reused for `java_binary`, `sh_binary`
and perhaps be made available for any custom rules that require an interpreter to launch.


# Backward-compatibility

This proposal could require users to setup a cc toolchain for remote execution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to be CC? e.g., what if someone wants to write a launcher in rust?
Can a pre-built launcher be used? I guess that might be precluded; it depends on how the argv, path to interpretter etc is embedded or passed along. This makes me wonder if supporting code in Bazel itself would work better -- e.g. return a special provider that says "we're doing a launcher thing, use this argv etc when running"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, does not need to be CC. I think the launcher needs to be compiled as native in some way. So Go, Zig, Rust, CC all come to mind. Whatever has the most minimal toolchain requirements on the user and gives the functionality we require I think.

I think whatever is used as a launcher, needs to be standalone from Bazel once built. I don't want to ship bazel to run a binary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whatever is used as a launcher, needs to be standalone from Bazel once built. I don't want to ship bazel to run a binary.

I agree.

The part I'm trying to think through is re-use of what is essentially the same binary (the launcher itself).

For a target-config build target, I agree, yes, the launcher essentially needs to be self-contained and standalone. I don't see how to do it otherwise because the invocation of bazel-bin/foo has to work without any args. (I guess it could look for $0.params or something? But that seems kind of brittle).

For a build tool, the situation is different[1] -- stuff run during the build doesn't need the stricter isolation requirements. For example, when Bazel runs an executable in an action, it could avoid having to build the launcher entirely by doing the exec() call itself when it runs the subprocess. Maybe a target could return e.g. LauncherInfo(runtime=..., runtime_args=..., executable=...), or a rule advertises something (similar to a rule's toolchains setting) about how to find the launcher to build to mix it in with the LauncherInfo.

This then leads me to think that, if a rule returned that, Bazel itself could invoke the launcher building instead of the rule having to do so. Which has a Just Works sort of appeal; but risks coupling behavior to the Bazel release (which might be more of a headache).

[1] This case is particularly on my mind because Python is often used for build tools, and building the runtime and C dependencies is pretty expensive, so reuse of that is highly beneficial.