Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --strict_repo_env option #24404

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Silic0nS0ldier
Copy link
Contributor

This PR introduces a new flag --strict_repo_env which stops repository rules and module extensions from inheriting the client environment (making --repo_env=NAME more than just an advisory notice).

See test_execute_environment_strict_vars in src/test/shell/bazel/starlark_repository_test.sh for a demonstration.

Note that the behavior is different to the similarly named --incompatible_strict_action_env, which stops all environment variables (--action_env affects actions with use_default_shell_env = True) except those specified within the defining rule. Given the entity targeted by the new flag is a repo rule, I think this makes sense and any further strictness (like blocking implicit usage of variables without establishing a dependency) would be better addressed independently. Same goes for regular rules which can have some counterintuitive behaviors (e.g. only --[host_]action_env=X=Y is accessible, --[host_]action_env=Y will be missing, at least last I checked).

Addresses #10996

@Silic0nS0ldier Silic0nS0ldier marked this pull request as ready for review November 20, 2024 03:11
@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Nov 20, 2024
Silic0nS0ldier and others added 3 commits November 20, 2024 03:52
…AZEL_HTTP_RULES_URLS_AS_DEFAULT_CANONICAL_ID=0`)
- Get `--repo_env` fallback from `clientEnv` not `System`. Fixes issue where server process would "cache" environment variables. Broken since 3ebf658 (v5).
- Feed `--action_env=X=Y` into into strict repo env (for consistency, the general behaviour ought to be removed)
- Stray `//` marker
@Silic0nS0ldier
Copy link
Contributor Author

The CI failure flushed out a potential bug. --repo_env=NAME results in reading values from the server process with System.getenv when clientEnv.get should have been used.

With this issue, any changes to NAME in the client environment won't propagate until the server restarts.

@fmeum
Copy link
Collaborator

fmeum commented Nov 21, 2024

With this issue, any changes to NAME in the client environment won't propagate until the server restarts.

Could you split the fix into a separate PR? That would make it easier to cherry-pick into patch releases and Bazel 8.

Since --strict_repo_env will require users to allowlist env variables, I think that we should also ignore --action_env for repo rules when the flag is enabled. Otherwise users migrating to the flag may accidentally hurt their cache hit ratio and require another migration down the line.

@Silic0nS0ldier
Copy link
Contributor Author

Stale environment issue split to #24433 (this PR is dependent on the fix for tests to pass, so I'll leave it here as well and rebase once merged).

@Silic0nS0ldier
Copy link
Contributor Author

Since --strict_repo_env will require users to allowlist env variables, I think that we should also ignore --action_env for repo rules when the flag is enabled. Otherwise users migrating to the flag may accidentally hurt their cache hit ratio and require another migration down the line.

My initial thoughts were to keep this as a project preference, since currently many repository rules are heavily influenced by the host environment (sometimes intentionally). But if this change is being viewed as something to move towards, it may make sense to classify this as an incompatible change like with --incompatible_strict_action_env to signify a potential flip in a later release.


On a related note, there are a few places that use the "old" repo environment in src/main/java/com/google/devtools/build/lib/bazel/BazelRepositoryModule.java.

  1. The vendor command (used in downloader).
  2. The registry (bzlmod) factory (used in downloader).

And for sake of completion, the client environment is directly used in these places.

  1. The credential helpers. Used in many places, including remote execution/caching.
  2. The download manager (one of the instances at least...).
  3. SpawnAction and ExtraAction instances (most likely used for --action_env=NAME). Usage unfortunately allows for some ambiguity.
  4. The volatile input map (date format via SOURCE_DATE_EPOCH).
  5. The standalone test strategy (for ${inherited} which I'm not familiar with).
  6. The symlink tree strategy helper (for symlink creation via a separate process).
  7. WriteAdbArgsAction action, for user home directory (HOME).
  8. CppCompileAction spawn.
  9. JavaCompileAction spawn.
  10. The "run" command (what you'd expect and as part of configuring test environment emulation).
  11. MobileInstallCommand subprocess. Running the mobile-install deployer.
  12. Passed into SkyframeExecutor (which likely ultimately is the source for most other things listed here).
  13. The sandboxing spawn strategies (via special env provider).
  14. The local spawn strategy (as above).
  15. The worker spawn strategy (as above).

The only spot of concern for me here is the download manager. Seems like in its current state, this PR may introduce some fragmentation (though such fragmentation would have already been possible with --repo_env, the new flag just makes it much more substantial).

@Silic0nS0ldier Silic0nS0ldier requested a review from a team as a code owner November 21, 2024 11:39
@Silic0nS0ldier Silic0nS0ldier requested review from aranguyen and removed request for a team November 21, 2024 11:39
@fmeum
Copy link
Collaborator

fmeum commented Nov 21, 2024

Thanks for compiling this list, I'll have to think about this more.

Since some repo rules do depend on the environment quite heavily, have you considered making this behavior configurable on the repository_rule call instead?

@Silic0nS0ldier
Copy link
Contributor Author

Since some repo rules do depend on the environment quite heavily, have you considered making this behavior configurable on the repository_rule call instead?

No, but only because I didn't think to.

I'm not sure what such an API would look like. There is repository_rule(..., environ = [...]) (to mark variables that will trigger a refetch) however that is deprecated in favor of repository_ctx.getenv. The same cannot be said for module_extension(..., environ = [...]) strangely.

Some questions (not necessarily directed at anyone, just to help with ideation).
Should this be implemented as a binary switch that limits variables to only those explicitly listed by --repo_env? Should they be able to give a strict list of env inputs? What of the existing dynamic dependency mechanism (.getenv)? Should --strict_repo_env be a part of the design, as a way to change default environment variable inclusion behavior? Should there be any default inclusions (like PATH) and should rules be able to opt into a completely bare environment?

Looking even further beyond this, a case could even be made for sandboxing certain repository rules. The download repos created by npm_translate_lock (Rules JS) come to mind, although such repos could hypothetically be implemented with a special "download action" (to allow cache hits with few/no downloads) or more likely a slightly different design to be compatible with vendoring. This is all well outside beyond the scope here though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants