Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHA cache image build dependencies #3583

Merged
merged 1 commit into from
Nov 2, 2024

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented Oct 21, 2024

Folllow-on to discussion in #3580

What this does:

  • enable the use of github actions cache for docker build of the integration image
  • in the Dockerfile, separate third-party dependency build stage and nerdctl stage
  • in github workflows, separate out the step that builds these dependencies (read-write to cache) from the steps that prepare the final integration image (read-only from cache)

The result is:

  • with a cold cache, instead of building everything 9 times (4 rootful, 4 rootless, ipv6) , we only build the dependencies stage 4 times
  • the dependencies stage is then cached, and subsequent runs on the same branch will not rebuild it (unless the dependencies versions have changed, or the Dockerfile has been altered)
  • note that we SHOULD also get the cache from the main branch as well, so, once it is merged and the cache is hot, new PRs should benefit from the cache as well on their first run (eg: not build)

The key benefit here is about reducing the network traffic required to produce out test images (hence reducing the opportunity for failure due to third-party server hiccups). It is similar in intent to #3580 .

Incidentally, we will also get a small speed boost for the overall run.
Obviously, GHA cache is not "free": it takes time to retrieve and time to store - so, part of the speed gains from not-building are negated by the cache r/w.

Nevertheless, this looks promising for increased reliability (and reduced transactions with docker hub / debian / ubuntu).

This PR also has a couple of minor changes to the workflow file (reduced timeouts, cosmetic comments, along with bumping the size of the arm64 instance as previously discussed). If preferable, I can split these out.

Further refactoring / changes to the Dockerfile could bring more stuff in the dependencies stage.
This PR has been conservative on that front and staid with the minimal possible changes to the Dockerfile, so that we can decide separately if we want a more in-depth restructuring of it or not.

Finally note that GHA cache is rather limited (10G), and going over the limit will prune prior entries indiscriminately - we might want to keep an eye on that and check that this proposed implementation here stays under the limit to fully benefit from it.

@apostasie apostasie force-pushed the optimize-build-time branch 7 times, most recently from 4fd4147 to ed7df0b Compare October 22, 2024 00:36
@@ -69,6 +69,7 @@ RUN xx-apt-get update -qq && xx-apt-get install -qq --no-install-recommends \
libbtrfs-dev \
libseccomp-dev \
pkg-config
RUN git config --global advice.detachedHead false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silence useless / noisy git messages.

Dockerfile Outdated

FROM build-base AS build-minimal
RUN BINDIR=/out/bin make binaries install
# We do not set CMD to `go test` here, because it requires systemd

FROM build-base AS build-full
FROM build-base AS build-deps
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that splits out third-party dependencies from nerdctl.

@apostasie apostasie force-pushed the optimize-build-time branch from ed7df0b to 3215ecc Compare October 22, 2024 00:54
@apostasie
Copy link
Contributor Author

@AkihiroSuda @djdongjin this should not go in before #3535, but this looks really promising in term of minimizing transactions with Hub / third party services.

LMK your thoughts overall and I will polish it and rebase on top of ^ after it merges

@apostasie apostasie force-pushed the optimize-build-time branch from 3215ecc to d38e4b5 Compare October 29, 2024 15:52
ARG TARGETARCH
ENV GOARCH=${TARGETARCH}
RUN BINDIR=/out/bin make binaries install
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move down nerdctl, which is the part that changes (almost) all the time.

@@ -181,13 +176,6 @@ RUN git clone https://github.com/containerd/imgcrypt.git /go/src/github.com/cont
git checkout "${IMGCRYPT_VERSION}" && \
CGO_ENABLED=0 make && DESTDIR=/out make install && \
echo "- imgcrypt: ${IMGCRYPT_VERSION}" >> /out/share/doc/nerdctl-full/README.md
ARG ROOTLESSKIT_VERSION
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move down rootless kit, as we support two versions.

@@ -237,6 +232,14 @@ RUN echo "" >> /out/share/doc/nerdctl-full/README.md && \
mv /tmp/SHA256SUMS /out/share/doc/nerdctl-full/SHA256SUMS && \
chown -R 0:0 /out

FROM build-dependencies AS build-full
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore nerdctl from above.

@@ -56,7 +99,7 @@ jobs:
run: make test-unit

test-integration:
timeout-minutes: 60
timeout-minutes: 30
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower timeouts.

@@ -262,11 +344,6 @@ jobs:
go-version: ${{ env.GO_VERSION }}
cache: true
check-latest: true
- name: "Print docker info"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging remnants.

steps:
- uses: actions/[email protected]
with:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harmonize

@apostasie apostasie force-pushed the optimize-build-time branch from d38e4b5 to b187986 Compare October 29, 2024 16:19
@apostasie apostasie changed the title Experimental: GHA cache image build dependencies GHA cache image build dependencies Oct 29, 2024
@apostasie apostasie marked this pull request as ready for review October 29, 2024 16:47
@apostasie
Copy link
Contributor Author

apostasie commented Oct 29, 2024

@AkihiroSuda at your convenience.

Would like to get this is in for a few PRs to better evaluate how the cache behaves.

@AkihiroSuda AkihiroSuda added this to the v2.0.0 milestone Oct 29, 2024
@AkihiroSuda AkihiroSuda requested a review from ktock October 29, 2024 18:47
@AkihiroSuda AkihiroSuda added the area/ci e.g., CI failure label Nov 1, 2024
run: |
sudo mkdir -p /etc/docker
echo '{"ipv6": true, "fixed-cidr-v6": "2001:db8:1::/64", "experimental": true, "ip6tables": true}' | sudo tee /etc/docker/daemon.json
echo '{"features": {"containerd-snapshotter": true}, "ipv6": true, "fixed-cidr-v6": "2001:db8:1::/64", "experimental": true, "ip6tables": true}' | sudo tee /etc/docker/daemon.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerd-snapshotter isn't necessary if you use docker buildx ? (probably with docker buildx create to create a standalone non-moby buildkitd instance)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.
That would certainly be better.
Let me look into that later today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the tips.

@apostasie apostasie marked this pull request as draft November 1, 2024 12:15
@apostasie apostasie force-pushed the optimize-build-time branch from aa5862c to 5d836d6 Compare November 1, 2024 12:15
@apostasie apostasie marked this pull request as ready for review November 1, 2024 12:39
@apostasie apostasie force-pushed the optimize-build-time branch from 5d836d6 to 8dc03b7 Compare November 1, 2024 12:41
fetch-depth: 1
- name: "Expose GitHub Runtime variables for gha"
uses: crazy-max/ghaction-github-runtime@v3
- name: "Enable containerd to be able to use gha cache"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not containerd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Thanks.
Removed the separate step altogether with the latest push.

@apostasie apostasie force-pushed the optimize-build-time branch from 8dc03b7 to 3b1d250 Compare November 1, 2024 13:38
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@apostasie
Copy link
Contributor Author

@AkihiroSuda failure is likely #3556
Can you poke it?

Copy link
Member

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for slow review.

.github/workflows/test.yml Outdated Show resolved Hide resolved
@apostasie apostasie force-pushed the optimize-build-time branch from 3b1d250 to de193b3 Compare November 1, 2024 15:46
@apostasie
Copy link
Contributor Author

LGTM, sorry for slow review.

No problem @ktock.
Thanks!

@ktock ktock merged commit b00f8cd into containerd:main Nov 2, 2024
30 checks passed
@apostasie
Copy link
Contributor Author

While working on something else, I am quickly getting 429s from gha cache, and jobs failing / timeouting because of throttled requests.

I am not optimistic about this overall.
GHA cache does not feel like a usable / scalable product - bad API, lilliputian limitations.

Suggesting we give it a couple of weeks and consider reverting this / tweak it / going back to other ideas (eg: proxy caching).

@AkihiroSuda @ktock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci e.g., CI failure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants