Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: Sparse Manifest Lists #29

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jonathankingfc
Copy link
Contributor

@jonathankingfc jonathankingfc commented Jan 30, 2024

Enhancement proposal for sparse manifest lists

@jonathankingfc jonathankingfc changed the title enhancement: Sparse manifest lists enhancement: Sparse Manifest Jan 30, 2024
@jonathankingfc jonathankingfc changed the title enhancement: Sparse Manifest enhancement: Sparse Manifest Lists Jan 30, 2024
Copy link

@sherine-k sherine-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jonathankingfc
This enhancement is pretty interesting as it is also one of the recurring requests of clients on disconnected clusters (use oc-mirror). So thank you!

So far, I found only this OCI spec change that relates to sparse manifest support.

Any specifics that you can share with us about how a push / pull a sparse manifest will look like (at the http level for example)?

I'm not sure how all the clients implement this, but I'm interested to know if what this enhancement proposes would be inline with what skopeo intends to do for sparse manifest: by allowing the client to copy(push) only the index of the image, without the underlying manifests.

Although far from fulfilling all disconnected clusters users' needs, this means that with containers/image (at the base of oc-mirror and skopeo), one can either pull/push:

  • the underlying manifest that correspond to the current arch/os
  • the whole index
  • only the index, which, combined with the first option makes a sparse manifest of a single arch

cc @mtrmac


#### Story 1

A user with a large repository of container images can significantly reduce their storage footprint by using sparse manifest lists, as common layers across different images are stored only once.
Copy link

@mtrmac mtrmac Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know ~nothing about Quay)

I can’t see how sparse manifest lists change anything about this. Removing a per-platform image instance can never add a new layer sharing opportunity.

Is this saying that, independently from accepting sparse manifests lists, the scope of layer sharing is going to be increased?

@mtrmac
Copy link

mtrmac commented Feb 22, 2024

Current c/image status:

  1. When pulling an image to local storage, only the one chosen platform’s image instance must be present.
  2. When writing a multi-platform image, the caller can choose which per-platform image instances to skip (without removing the existing manifest entries); that creates a sparse image
  3. A feature to strip the per-platform image instances also from the manifest (creating a non-sparse image with fewer platforms) is desired but does not yet exist
  4. Perhaps relevant to Quay, reading a multi-platform image when trying to make a copy (skopeo copy --all) will fail on sparse images, unless the caller specifically and manually uses the “skip some per-platform instances” option mentioned in 2. above. IIRC skopeo copy is used for Quay’s mirroring functionality, so that might need changing (while still requiring an opt-in flag?), to allow exact mirroring of sparse multi-platform images.


#### Story 2

In a bandwidth-constrained environment, a user can pull images from Quay more efficiently, as the sparse manifest list allows downloading only the necessary layers, reducing the data transfer volume.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients can pull a portion of an image already, e.g. see this comment in openshift/oc#1334 adding sparse manifest support to oc image mirror .... And I agree with Miloslav's comment about "common layers" seeming orthogonal. My understanding is that the user-story for sparse manifest lists is more like:

Many image authors publish manifest-list images with many architecture-specific children, to support their workload on all of those architectures. Some image consumers only run a subset of those architectures locally. Sparse manifests will allow users mirroring a manifest-list image into their local Quay to only push the architectures they need, while retaining the top-level manifest list. This saves the network bandwidth and local-Quay storage costs of mirroring architectures that are not needed locally. And it preserves the digest and signatures on the original manifest-list.

So for:

$ curl -s https://quay.io/v2/openshift-release-dev/ocp-release/manifests/sha256:39aa3985a4ab715f3ea8d983b72745947249322e4fb4dbcf59b4cc749f4e9ae7 | jq -r '.manifests[] | .digest + " " + (.platform | tostring)'
sha256:49821163426f2f2cb5a2b7cb446c35440d6a5c3905397b48b795dd4bc3b5eaf6 {"architecture":"amd64","os":"linux"}
sha256:f00ca1a7bef6176803cd54ad8ae878dd48fa86215dd002b834840f01039de045 {"architecture":"ppc64le","os":"linux"}
sha256:99696da77b6982057442bdba3854ddd574e5aeba6bd1710e138b8b398b22f883 {"architecture":"s390x","os":"linux"}
sha256:a6352c78572180f0e88cbf62f80f7b45074a157d4e3d8ad172e7d77042f06724 {"architecture":"arm64","os":"linux"}

The sha256:39aa398... manifest-list would be pushed into the local Quay, along with sha256:4982116.... amd64 and sha256:a6352c78572... arm64. But sha256:f00ca1a7bef61... ppc64le and sha256:99696da77b69... s390x would not be pushed in. As far as Quay-side changes go, that's almost entirely on the what-can-we-push-into-Quay? side and not on the what-can-we-pull-from-Quay? side.

As Miloslav points out, clients who are pulling from Quay and expecting a full image but receiving a sparse manifest will fail to pull the layers ("hey, this manifest list references sha256:f00ca1a7bef61..., but that is 404ing!"). They'd have to sort that out with some kind knob. And as Miloslav points out, Quay-to-Quay mirroring would also have to handle the source-manifest-list-is-sparse case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants