Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify adopting chisel distroless base images (static / base / cc) #157

Open
polarathene opened this issue Sep 6, 2024 · 5 comments
Open

Comments

@polarathene
Copy link

Summary:

Multiple Canonical/Ubuntu resources refer to minimal image sizes while comparing to Google distroless image size.

  • There are no equivalent images presently published for use?
  • Proper recreation requires more effort and technical confidence from the user. More verbose and fragile than some published rocks?
  • No package/slice that could simplify creation? (along with an official chisel image published)

Are these intentionally not published for some reason? Or are there plans to simplify producing such?


In this Jan 2023 guide (prior to the Nov 2023 General Availability announcement), the first step is to build chisel. The 2nd step was then to use chisel to create a minimal image like the Google distroless ones offer.

Image from guide (click to view)

image

This page shows some comparisions to promote chisel, comparing to published rocks that are available on DockerHub for services and language runtimes, but this last one about a minimal equivalent image suitable for Go and Rust projects is nowhere to be found:

Images from online resources

https://ubuntu.com/containers/chiselled

image

https://ubuntu.com/blog/combining-distroless-and-ubuntu-chiselled-containers

image

https://canonical.com/blog/chiselled-ubuntu-ga

image


The benefits of that minimal base are shown above, and it's useful for projects that just need to have basic deps like glibc with /etc/passwd + /etc/group, and ca-certificates, possibly TZ data.

I don't recall from the various resources I came across, any explanation for why chiseled images are produced for some services and language runtimes, but not these minimal glibc / distroless base image referenced above?

The project README does seem to suggest getting chisel and demonstrating an example which is simple enough. But I'd assume that's not too different for the published images? (like this using this Python3.12 slice)

Recreating the Python rock with chisel

I think the Python image might look something like this?:

# `chisel` lacks the ability to create the `--root` dir for you when it doesn't exist:
mkdir /rootfs

# Produces 40MB output:
# - `ubuntu/python:3.12-24.04_stable` image appears to roughly match contents of these slices, but is 48MB in size.
# - Missing `pebble` binary (8MB) and the `/etc/{passwd,group,localtime}` files.
chisel cut --release ubuntu-24.04 --root /rootfs \
  python3.12_standard \
  tzdata_zoneinfo \
  ca-certificates_data-with-certs \
  openssl_config \
  base-files_base

EDIT: I found the actual rockcraft.yaml used to build ubuntu/python:3.12-24.04_stable (it'd be nice if the DockerHub image README referenced that btw! I wasn't sure where or what the build source was for a while until I learned more about "rocks"). It seems I was reasonably close at guessing it's internals 😎 (this will be easier to derive once that manifest feature arrives I assume)

@polarathene
Copy link
Author

polarathene commented Sep 6, 2024

Side-by-side comparison

The promotional images that compare chiselled Ubuntu images to the Google distroless images do vary in size comparisons. Out of curiosity I have compared (docker image ls output):

# Context: These are all presently published as `-debian12` base images:
REPOSITORY                                  TAG                    IMAGE ID       CREATED              SIZE
gcr.io/distroless/static                    latest                 1d74083b745f   N/A             1.99MB
gcr.io/distroless/base-nossl                latest                 54dce73e4eb1   N/A             14.8MB
gcr.io/distroless/base                      latest                 7273f3276b21   N/A             20.7MB
gcr.io/distroless/cc                        latest                 fb8acc0b7e50   N/A             23.4MB

With chisel they're mostly the same:

# Images built with chisel v0.10.0 + Ubuntu 24.04:
REPOSITORY                                  TAG                    IMAGE ID       CREATED              SIZE
local/chisel                                static                 98514ef8cb5b   1 minute ago        944kB
local/chisel                                base-nossl             75d90a550dcb   1 minute ago        14MB
local/chisel                                base                   e994b20f5619   1 minute ago        20.2MB
local/chisel                                cc                     bffc85cf0b9f   1 minute ago        23.4MB

The slight size advantage with chisel is from some tzdata slice omission (mostly /usr/share/zoneinfo/right) and other /usr/share content (NOTE: chisel images are missing the localtime for the symlink there, but similar to some other files in /etc with containers I think that's fine and they get injected or mounted at runtime?):

image

I think the promotional material was not exactly doing a 1:1 parity comparison, but perhaps trying to show off the extra flexibility chisel offers by tailoring the deps a bit better for your individual project needs.

  • Omitting the gconv dependency for example is viable reduction, while some need to still have glibc and libgcc_s, and if verifying TLS certs ca-certificates, or needing timezone awareness tzdata, etc.
  • Supposedly you can use slices on existing slices, but I've not yet looked at how to do that with chisel.

Building chisel equivalents of Google distroless image variants

Here's the rough equivalent representation of each images content as they're iterative by appending new layers, I've ordered the slices to match the order they appear in the image layers for context:

# Docs:
# https://github.com/GoogleContainerTools/distroless/blob/main/base/README.md
# https://github.com/GoogleContainerTools/distroless/blob/main/cc/README.md

SLICES_STATIC='base-files_base base-files_release-info netbase_config tzdata_zoneinfo base-passwd_data libc-bin_nsswitch ca-certificates_data'
SLICES_BASE_NOSSL="${SLICES_STATIC} libc6_libs libc6_gconv libc6_config"
SLICES_BASE="${SLICES_BASE_NOSSL} libssl3t64_libs"
SLICES_CC="${SLICES_BASE} libgomp1_libs libstdc++6_libs libgcc-s1_libs"

# Create an rootfs roughly similar to `gcr.io/distroless/cc-debian12`:
mkdir /root-fs
chisel cut --release ubuntu-24.04 --root /root-fs ${SLICES_CC}

NOTES:

  • This official chisel docs guide makes mention of SSL support by adding the openssl_config slice, but that would not provide libssl.so.3 where the slice/package changes for Ubuntu 24.04, a low-level change that users interested in such a minimal base image shouldn't really have to think about?
  • A custom slice definition(s) might encode those variants fairly well via essentials, but I haven't yet looked into if supplying your own additional slices is simple. Not that chisel needs to necessarily offer 1:1 parity of the variants, it can be useful as a reference (as per reasoning of the linked distroless docs).

As a Dockerfile:

# Supports building the equivalents of `gcr.io/distroless` variants

FROM alpine AS chiselled
ARG CHISEL_VERSION=0.10.0
ARG TARGETARCH
# NOTE: `--no-same-owner` used as `chisel` release has ownership of `1001:127`
RUN <<HEREDOC
  CHISEL_RELEASE="https://github.com/canonical/chisel/releases/download/v${CHISEL_VERSION}/chisel_v${CHISEL_VERSION}_linux_${TARGETARCH}.tar.gz"
  wget -qO - "${CHISEL_RELEASE}" | tar -xz --no-same-owner -C /usr/local/bin chisel

  # Storing into files so they can be referenced via name (bash has associative arrays and ${!var_name} features, but ash does not):
  echo 'base-files_base base-files_release-info netbase_config tzdata_zoneinfo base-passwd_data libc-bin_nsswitch ca-certificates_data' > /tmp/slices_static
  echo "$(cat /tmp/slices_static) libc6_libs libc6_gconv libc6_config" > /tmp/slices_base-nossl
  echo "$(cat /tmp/slices_base-nossl) libssl3t64_libs" > /tmp/slices_base
  echo "$(cat /tmp/slices_base) libgomp1_libs libstdc++6_libs libgcc-s1_libs" > /tmp/slices_cc
HEREDOC

# Installing packages to `--root /root-fs` (chisel expects dir to exist)
WORKDIR /root-fs
## NOTE: Chisel cache is shared across TARGETARCHs. Use TARGETARCH in `id` if needing separate cache mounts.
ARG IMAGE_VARIANT=cc
RUN --mount=type=cache,target=/root/.cache/chisel,id="chisel-cache" \
  chisel cut --release ubuntu-24.04 --root /root-fs $(cat "/tmp/slices_${IMAGE_VARIANT}")


FROM scratch
COPY --link --from=chiselled /root-fs /
docker build --build-arg IMAGE_VARIANT=cc --tag local/chisel:cc .

@polarathene
Copy link
Author

Producing these types of low-level images is more verbose and requires extra effort vs the published ones for language runtimes or services. Would be nice to simplify that.

  • chisel could encourage broader adoption by offering these images too, as it better enables adoption for projects/users with this common use-case to try. Positive experiences can lead to further exploring the chisel ecosystem (with related projects like pebble and rockcraft).
  • At the very least, publishing an image with chisel to DockerHub reduces some friction / redundancy? It wasn't immediately clear if running chisel required an Ubuntu base image to operate, but it does not 👍 (it does require /etc/ssl/certs/ca-certificates.crt to exist, and this does prevent writing to --root / on a scratch base in a Dockerfile, requiring a COPY)

Feature Request: chisel as an official image

Ideal workflow might be:

FROM --platform=${BUILDPLATFORM} canonical/chisel:0.10 AS rootfs
ARG TARGETARCH
RUN --mount=type=cache,target=/root/.cache/chisel,id="chisel-cache" \
  chisel cut --arch "${TARGETARCH}" --release ubuntu-24.04 --root /root-fs base-distroless_cc

FROM scratch
COPY --link --from=rootfs /root-fs /
docker build --platform linux/amd64,linux/arm64 --tag local/chisel-distroless:cc-24.04 .
  • canonical/chisel or similar image published to registries such as DockerHub.
  • Implicitly create /root-fs when it doesn't exist 🙏
  • New distroless slice. Helpful if these base images won't be published separately as it would be a bit more convenient and align with the goals of chisel by keeping such maintenance upstream (such as with the libssl3 -> libssl3t64 package change).
  • Using $BUILDPLATFORM will pull/use the image for that build hosts native arch, while $TARGETARCH is used to support multiple archs (arm64 + amd64). Faster as non-native archs don't go through emulation overhead. I haven't checked how well that works with concurrent builds, shared cache may be an issue, in which case adjusting the id to use $TARGETARCH to have a separate cache mount per arch would perhaps work? Cache was near 100MB for each AMD64 + ARM64 targets.

Compared to the currently required manual approach

This is currently much noisier, adds technical knowledge/debt, and less friendly for project maintainers to accept.

FROM --platform=${BUILDPLATFORM} alpine AS rootfs
ARG CHISEL_VERSION=0.10.0
ARG BUILDARCH
# NOTE: `--no-same-owner` used as `chisel` release has ownership of `1001:127`
RUN <<HEREDOC
  CHISEL_RELEASE="https://github.com/canonical/chisel/releases/download/v${CHISEL_VERSION}/chisel_v${CHISEL_VERSION}_linux_${BUILDARCH}.tar.gz"
  wget -qO - "${CHISEL_RELEASE}" | tar -xz --no-same-owner -C /usr/local/bin chisel
HEREDOC
# NOTE: ARG and `chisel` command separated here to share the `chisel` bin layer above when target arch changes:
ARG TARGETARCH
RUN --mount=type=cache,target=/root/.cache/chisel,id="chisel-cache" <<HEREDOC
  mkdir /root-fs
  chisel cut --arch "${TARGETARCH}" --release ubuntu-24.04 --root /root-fs \
    base-files_base base-files_release-info netbase_config tzdata_zoneinfo base-passwd_data \
    libc-bin_nsswitch ca-certificates_data libc6_libs libc6_gconv libc6_config libssl3t64_libs \
    libgomp1_libs libstdc++6_libs libgcc-s1_libs
HEREDOC

FROM scratch
COPY --link --from=rootfs /root-fs /

Both producing the rough equivalent of FROM gcr.io/distroless/cc.

  • In my experience project maintainers are less comfortable with syntax like HEREDOC for multi-line RUN, and the cache mounts.
  • However, the package/slice list is also going to throw them off if they need a verbose list like that to maintain vs gcr.io/distroless/cc. Even gcr.io/distroless/static or variants in-between would look more attractive unless the benefits like flexibility can be leveraged without the maintainer feeling it adds too much complexity.

The proposed "ideal" version would still work well enough, and could avoid the cache mount + arch/platform specific optimization.


Variant - FROM scratch without the COPY step

# While not as nice of a DX, this would avoid the separate stage `COPY`

# NOTE: This uses RUN "exec" syntax, which is rare to see.
# - As the default `SHELL ["/bin/sh", "-c"]` and `chisel` cannot support "shell" syntax with RUN that way, use RUN "exec".
# - RUN "exec" syntax prevents adding `"--arch", "${TARGETARCH}"` as interpolation is not supported. Would need separate `RUN`.
FROM scratch
RUN \
  --mount=type=cache,target=/root/.cache/chisel,id="chisel-cache" \
  --mount=type=bind,from=canonical/chisel:0.10,source=/opt/canonical,target=/canonical \
  ["/canonical/chisel", "cut", "--release", "ubuntu-24.04", "--ignore", "/canonical", "--root", "/", "base-distroless_cc"]
  • Uses a bind mount from potential official chisel image that provides chisel and if needed a relative ca-certificates.crt (or CLI option to reference one at a different location).
  • --ignore or similar option as a way to workaround the conflict detection when using --root /. /canonical wouldn't actually conflict with the installed content and the mount will be removed after the RUN. Better if it could be handled implicitly, but presently the issue seems to be with external dependency on /etc/ssl/certs/ca-certificates.crt?

Not sure how useful that is in practice 🤷‍♂️

@letFunny
Copy link
Collaborator

letFunny commented Nov 1, 2024

@polarathene Thanks for your comments here, it was very insightful. Apologies for taking this long but I was going through them now and I see a couple of different ideas that would be good additions for Chisel:

  • canonical/chisel or similar image published to registries such as DockerHub.
  • Implicitly create /root-fs when it doesn't exist 🙏
  • New distroless slice. Helpful if these base images won't be published separately as it would be a bit more convenient and align with the goals of chisel by keeping such maintenance upstream (such as with the libssl3 -> libssl3t64 package change).

it'd be nice if the DockerHub image README referenced that [rockcraft.yaml] btw! I wasn't sure where or what the build source was for a while until I learned more about "rocks"

We'll need to study all of them separately because they all require different levels of effort. I can tell you that for sure we are going to devote time during the next months to improve the UX and vastly revamp the user documentation, which can include some of your suggestions. Presently we are working on improving the UX for slice creation, so the items that we choose to pursue from the above might take some extra time.

Lastly, pull requests are welcomed, especially for simple improvements that do not require a lot of discussion.

@polarathene
Copy link
Author

Lastly, pull requests are welcomed, especially for simple improvements that do not require a lot of discussion.

I did put together a slice definition for these a while back, but ran into an issue where I couldn't use cc as a slice name, two characters was too short. I opened an issue for that, but that has not yet been resolved.

If you'd like I can open a PR, it was about 2 months ago but I had this:

slices/base-distroless.yaml: (24.04)

package: base-distroless

slices:
  # Approx 1MB
  static:
    essential:
      - base-files_base
      - base-files_release-info
      # NOTE: Ideally this would instead only represent root + non-root users like in Distroless?
      - base-passwd_data
      - libc-bin_nsswitch
      - netbase_config
      - ca-certificates_data
      - tzdata_zoneinfo

  # Approx 14MB
  base-nossl:
    essential:
      - base-distroless_static
      - libc6_libs
      - libc6_gconv
      - libc6_config

  # Approx 20MB
  base:
    essential:
      - base-distroless_base-nossl
      - libssl3t64_libs

  # Approx 24MB
  cc:
    essential:
      - base-distroless_base
      - libgomp1_libs
      - libstdc++6_libs
      - libgcc-s1_libs

I'm not sure if it makes sense to match Google Distroless with those names, or to have them as an alias for names that would be more fitting to chisel?


With that SDF one can more easily use chisel to get the equivalent Google Distroless image and add whatever other slices are needed, throw in Pebble for the entrypoint if multiple services in an image are required, etc.

I think the lack of that SDF being available is the bigger friction point to resolve right now for adopting chisel for base images since I can't easily get maintainers of projects onboard when they're presented a Dockerfile with a large list of slices to maintain which presents as more complexity vs alternatives 😓

@letFunny
Copy link
Collaborator

letFunny commented Nov 5, 2024

I think the distroless slice is the most controversial one here :), apologies for not saying that earlier. Right now slices are functional units of a package and this will be deviating a bit from that, plus I am not sure we want to track the concept of google distroless this way. I think a better solution would involve the other part of your comment about [...] these base images won't be published separately which seems like a better way forward. But as I say above we need to discuss it internally first, once we have landed the features we are working on right now. Thanks again for your comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants