Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows container_pull to postpone image download to build phase. #1749

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

igorgatis
Copy link

Usage:

container_pull(
    name = "base",
    registry = "gcr.io",
    repository = "my-project/my-base",
    digest = "sha256:deadbeef",
    lazy_download = True, # Will postpone download to build phase.
)

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: igorgatis
To complete the pull request process, please assign davegay after the PR has been reviewed.
You can assign the PR to them by writing /assign @davegay in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@igorgatis
Copy link
Author

/assign @smukherj1
/assign @DaveGay
/assign @eytankidron

@zoidyzoidzoid
Copy link
Contributor

We have some large images we pull in in our WORKSPACE, and fetching and creating the intermediate forms even when they're not used is a huge performance cost, so this'd be amazing to see added.

@igorgatis igorgatis changed the title Allows container_push to postpone image download to build phase. Allows container_pull to postpone image download to build phase. Mar 11, 2021
@smukherj1 smukherj1 requested review from alexeagle, gravypod and pcj and removed request for DaveGay and eytankidron March 16, 2021 23:47
@pcj
Copy link
Member

pcj commented Mar 17, 2021

Unfortunately, using the network during an action is considered bad practice and IIRC network access is disabled in many execution contexts, particularly remote execution.

We can consult with the core bazel team but I don't think this is going to work as-is.

genrule(
    name = "download",
    message = "Fetching",
    outs = ["{outs}"],
    cmd = "{args}",
    tools = ["{tool}"],
)

Also, what are the security implications of this change? Is it possible for an attacker to replace content in an image by deferring download to the build phase? Is hash integrity preserved?

@ulfjack any comment here?

@@ -96,6 +96,10 @@ _container_pull_attrs = {
doc = "(optional) The tag of the image, default to 'latest' " +
"if this and 'digest' remain unspecified.",
),
"lazy_download": attr.bool(
default = False,
doc = "(optional) Indicates whether donwload is postponed to build phase.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling

@igorgatis
Copy link
Author

igorgatis commented Mar 22, 2021

Unfortunately, using the network during an action is considered bad practice and IIRC network access is disabled in many execution contexts, particularly remote execution.

I believe everybody agrees with that. However, the "best practice" makes the scenario of multiple container_pull calls in WORKSPACE impractical.

This is why the field lazy_download flag is false by default. User needs to be aware of pros-and-cons. Perhaps, user might make use of this flag only for rarely used images or HUGE images.

Also, what are the security implications of this change? Is it possible for an attacker to replace content in an image by deferring download to the build phase? Is hash integrity preserved?

We can mitigate this issue passing expected SHA to genrule call. It should be pretty easy to implement. Should I proceed with that?

@sluongng
Copy link
Contributor

@pcj
Copy link
Member

pcj commented Mar 29, 2021

Many organizations use an HTTP_CACHE or artifactory to mitigate this. Can you not use that?

@UebelAndre
Copy link
Contributor

Not every organization has the flexibility to stand something like that up and maintain it. There's still a large startup cost in bigger repos when having to download all images when one or none will be used. I think with sufficient documentation this would be a fantastic feature.

@pcj
Copy link
Member

pcj commented Apr 2, 2021

Perhaps we should discuss at next maintainer meeting. @igorgatis @sluongng @UebelAndre would you be willing to join a call in a few weeks to go over this?

@sluongng
Copy link
Contributor

sluongng commented Apr 2, 2021

Perhaps we should discuss at next maintainer meeting. @igorgatis @sluongng @UebelAndre would you be willing to join a call in a few weeks to go over this?

For sure. Please send and invite to sluongng at gmail dot com.

@thekyz
Copy link

thekyz commented Oct 7, 2021

What is the conclusion on this? It is a must have for polyglot monorepos and as stated above, has already been implemented in rules_python so there is a precedent for it. I can assess that the patch works perfectly well (thanks @igorgatis btw!) and would really love this to be upstreamed.

@mindaugasrukas
Copy link
Contributor

@igorgatis could you fix Buildifier error:

##### :bazel: buildifier: found 1 format issue in your WORKSPACE, BUILD and *.bzl files
--
  | Please download <a href="https://github.com/bazelbuild/buildtools/releases/tag/0.22.0">buildifier 0.22.0</a> and run the following command in your workspace:<br/><pre><code>buildifier container/pull.bzl</code></pre>
  | ##### :bazel: buildifier: found 1 lint issue in your WORKSPACE, BUILD and *.bzl files

message = "Fetching",
outs = ["{outs}"],
cmd = "{args}",
tools = ["{tool}"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tools = ["{tool}"],
tools = ["{tool}"],
tags = ["requires-network"],

to make sure network is allowed even in build sandbox

@dmivankov
Copy link

Another interesting thing is compatibility with private registries, for example docker-credential-gcloud helper wants writable $HOME/.config/gcloud/logs/ and probably read access for cached tokens. Maybe "no-sandbox" (and maybe also "local") tag is required to make it work for private registries with credential helpers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.