[SYSE-398] Delete v5 tag and do not push v{{Major}} tags #357

alephnull · 2024-09-16T09:57:49Z

polarathene · 2024-09-16T12:06:03Z

FWIW, I don't really see that as the right move (TykTechnologies/tyk#6517 (comment)). You still have a v5 tag for ECR if that's relevant?

alephnull · 2024-09-16T13:12:45Z

Can you explain the usecase for using v5 instead of v5.0 (for example)?
There are two LTS versions (5.0 and 5.3) on v5. It is better to use v5.0 or v5.3 explicitly in this scenario.

polarathene · 2024-09-16T22:38:06Z

Can you explain the usecase for using v5 instead of v5.0 (for example)?

I thought I did that early on in the linked comment?

If you are using the 5.0 LTS, you would use the v5.0 tag and get the patch releases.
Someone who instead uses v5 tag would get the same benefits but with minor versions which in semver should not be breaking changes.

I personally use <major>.<minor> myself or have CI manage the version pinning. Those that need the exact same image will pin by digest.

For context, look at your Github Actions workflows, many of your actions are only using the major version tag, you could use minor or even point releases, or have these versions bumped by a PR from dependabot like many projects do. You automatically get the latest minor + patch release of the action as a result. Where it's not so good is your workflows are using the same actions with different major versions as the maintenance of this is ad-hoc.

There are two LTS versions (5.0 and 5.3) on v5. It is better to use v5.0 or v5.3 explicitly in this scenario.

I didn't have LTS in mind tbh. You have v5.2 and each other minor also tagged on DockerHub.

That's part of the release workflow (tagging the published images, distinct from git tagged commits).
On git you only have the explicit full semver as tags, while you maintain major minor release branches.

I don't disagree with you for those that are using LTS.

Some users have tooling to manage their images (either via automated updates or notifications/triggers) that monitor these partial semver tags on the registry for when the associated image digest is changed. I have written these docs for a project I maintain which might provide more insights into that for you?

There is value in having these tags, so long as they're properly updated and not regressing like your CI is publishing. On the project I maintain we have not had this issue as there are no separate release branches, otherwise I'm certain we'd run into that same concern.

alephnull · 2024-09-17T10:31:02Z

Re-usable workflows were considered and discarded as they did not help with our particular problems. The design uses templates (in this repo) to implement releng in about 7-8 repos. The choice was dictated by,

not wanting a single point of failure
variations required in the build process maintained for years
minimising the blast radius of bugs
restructure debt

The problem you mentioned of keeping all the actions versions in sync was a long-standing niggle and the hack implemented by the update-actions-versions make target works surprisingly well and has already been deployed to production.

There is value in having these tags, so long as they're properly updated and not regressing like your CI is publishing

I agree with the spirit of what you are saying. The tags are not managed by hand and the result is purely due to operational choices. I don't think that matters are as dire as you imply :) This has been production for about 2 years and we have shaken out the more egregious bugs. There is certainly room for improvement but it does require understanding of the sometimes incongruous reality that has driven this implementation. 👋

polarathene · 2024-09-17T22:55:09Z

Re-usable workflows were considered and discarded as they did not help with our particular problems.

Could please you elaborate on what went wrong there? (for my own benefit, but could be good context for the PR too)

Responses to each point you provided (collapsed for brevity)

The design uses templates (in this repo) to implement releng in about 7-8 repos.

A central template is not really all that different from the re-usable workflows beyond risk of falling out of sync after distribution?

If you viewed the ones I linked you to as an example (like this one), they modularize and parameterize the workflows:

jobs:
  build-image:
    name: 'Build AMD64 Image'
    uses: docker-mailserver/docker-mailserver/.github/workflows/generic_build.yml@master

  run-tests:
    name: 'Test AMD64 Image'
    needs: build-image
    uses: docker-mailserver/docker-mailserver/.github/workflows/generic_test.yml@master
    with:
      cache-key: ${{ needs.build-image.outputs.build-cache-key }}

  publish-images:
    name: 'Publish AMD64 and ARM64 Image'
    needs: [build-image, run-tests]
    uses: docker-mailserver/docker-mailserver/.github/workflows/generic_publish.yml@master
    with:
      cache-key: ${{ needs.build-image.outputs.build-cache-key }}
    secrets: inherit

You get a much clearer overview of the core jobs being run and dependencies from that as your entrypoint instead of a large single release workflow. A similar scheduled builds workflow reuses the build job, does a security scan and publishes an update of the image, while another similar workflow performs builds and runs tests for PRs.

All three are high-level and doing copy/paste (even if generated via a tool) with adjustments to accommodate those small variations would not be helpful. Those differences are far more visible across all 3 top-level workflows, that is easy to grok. Meanwhile they are sharing common "generic" workflows which again helps ensure they are easy to grok and maintain.

Regardless of what branch those top-level workflows run from, they'll call the same master branch maintained generic workflows, there's really no value I can think of where you benefit from managing CI with copies of the re-usable workflows on the release branches instead, that is just going to add risk for accidents.

not wanting a single point of failure

In what way is it a single point of failure to have your workflows in a repo managed on a specific branch vs another? That really does not make much sense to me sorry?

It's not like Tyk isn't already adopting what I've proposed in some places of CI:

uses: TykTechnologies/github-actions/.github/workflows/godoc.yml@main

If anything that takes it a step further and has centralized an external repo to reference not just a branch of the same repo.

variations required in the build process maintained for years

Without much context, this is difficult to comment on. I've clearly shown that you can have cleaner workflow management and support variations.

If the variation is wildly different, it could still be maintained alongside it's variations rather than spread across branches?

minimising the blast radius of bugs

I would be really interested in how well that's working out vs the approach you seem to disagree taking?

I already did the work to find out why v5.3.4 was never actually published and released properly. It was just v5.3.3, which is misleading to users expecting to get the fixes. Which in the case of v5.3.4 isn't significant, but if this were a vulnerability fix that would not look good. This comes down to the CI not knowing any better and just trusting the correct commit was tagged. Presumably a rare event so I won't harp on about this concern.

With the v5 tag, I clarified why that was regressing and how you could resolve it. Instead your decision is to "delete" the tag. I'm not sure if that is just to stop publishing it, or actually follow-up with removal on the registries. Recently a colleague of mine expressed frustration from another project where they dropped images for an already published tag.

Anyone pulling on the v5 tag and depending on it will either find that to no longer work, or that it ceases to update. Not that the regression with publishing the tag is any better as rolling back to earlier minor releases can be a breaking change for them, or introduce other risks. Ideally you'd favor fixing the cause rather than removal without any deprecation notice to your users.

Similarly you have the latest tag. If you do delete v5 from the registries you should delete latest too and update your docs accordingly.

restructure debt

I think the prior points you made compound with this one.

I'm a little confused how your approach is working out better when it involves PRs to individual release branches to apply the changes, yet still introduces bugs (as was shown here regarding the distroless image for 5.3.4 PR in May that was reverted back to std image](TykTechnologies/tyk#6517 (comment)))?:

EDIT: Apologies, I see that wasn't a merge error. Your template here explicitly does this:

gromit/policy/templates/releng/.github/workflows/release.yml.d/goreleaser.gotmpl

Lines 184 to 188 in 8b4e314

    
           {{- if has "distroless" .Branchvals.Features }} 
        
           file: ci/Dockerfile.distroless 
        
           {{- else }} 
        
           file: ci/Dockerfile.std 
        
           {{ end }}

gromit/config/config.yaml

Lines 104 to 107 in 8b4e314

    
           release-5.3: 
        
             buildenv: 1.22-bullseye 
        
             features: 
        
               - distroless

Which is true the 5.3 series release branch does use Dockerfile.distroless.

The June PR (merged July) that I mentioned switching back to Dockerfile.std from Dockerfile.distroless was for the previous s prefix release you had mentioned earlier to me. My mistake 😓

Later on July 17th a PR labeled [SYSE-370 release-5.3] June template application was merged on August 20th. That applied the above template change to switch back to Dockerfile.distroless. Since the v5.3.4 tag was actually intended for a commit around late August and not July 25, it was meant to be published with the distroless base and whatever additional commits that entails.

I see where the problem with v5.3.4 went wrong CI wise. I thought that was due to a merge mishap with the release branch PR, but that was due to the tagged commit running on an older workflow than the latest version of release-5.3.

If release-5.3 instead was a lightweight workflow like I showed in the collapsed section, it could have referenced @release-5.3 or @main (with a workflow variant name). The older commit (while still building v5.3.3 instead of commit meant for v5.3.4 Tyk Gateway release), would then at least have used the expected workflow for release-5.3 / v5.3.4 instead and published with the expected Distroless image.

I don't think that matters are as dire as you imply :)
This has been production for about 2 years and we have shaken out the more egregious bugs.

You are probably right! I just get OCD about correctness at times 😓

Your template approach here seems good, but the workflow suggestions I provided would have still minimized what happened with v5.3.4, along with resolving the v5 tag regression. I think they can be complimentary.

polarathene · 2024-09-17T22:56:06Z

You still have a v5 tag for ECR if that's relevant?

I didn't get a response to that initial query above. Just a reminder if it matters:

gromit/policy/templates/releng/.github/workflows/release.yml.d/goreleaser.gotmpl

Line 174 in 2690f89

type=semver,pattern={{`{{major}}`}},prefix=v

Additional observation:

You have this part of the template:

gromit/policy/templates/releng/.github/workflows/release.yml.d/goreleaser.gotmpl

Lines 217 to 219 in 2690f89

    
           - name: push image to prod 
        
             if: {{`${{ matrix.golang_cross == '` }}{{.Branchvals.Buildenv}}{{`' }}`}} {{/* push only main build variation */}} 
        
             uses: docker/build-push-action@v6

It was introduced on July 25
On the main Tyk repo (master branch) that update landed August 20 (PR opened July 17)
On the main Tyk repo (master branch) that change was reverted on September 3 (PR opened Aug 7).

Reverting the action version and step name seems unintentional? It's still the newer values on this repo with the template as shown above, so I'm not sure what went wrong with that process. Perhaps since the Go 1.22 PR references a prior attempt in May (which is still open for some reason?), it had the outdated release workflow, and conflict resolution (if any) preferred what appeared to be newer lines/diff?

It'll presumably be fixed again once you do another template update... but I think it highlights another subtle bug that slips by unnoticed?

For release-5.3 it seems to be correct for the time being.

TykTechnologies/tyk#6517

ermirizio

looks good!

alephnull requested a review from a team as a code owner September 16, 2024 09:57

polarathene mentioned this pull request Sep 18, 2024

Release process is misleading for images published TykTechnologies/tyk#6517

Open

Do not push v{{Major}} tags

7a7a3b4

TykTechnologies/tyk#6517

alephnull force-pushed the as/major branch from 8b4e314 to 7a7a3b4 Compare September 19, 2024 08:45

alephnull enabled auto-merge (squash) September 19, 2024 08:50

ermirizio approved these changes Sep 19, 2024

View reviewed changes

alephnull merged commit b0b61d3 into master Sep 19, 2024
11 checks passed

alephnull deleted the as/major branch September 19, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSE-398] Delete v5 tag and do not push v{{Major}} tags #357

[SYSE-398] Delete v5 tag and do not push v{{Major}} tags #357

alephnull commented Sep 16, 2024

polarathene commented Sep 16, 2024

alephnull commented Sep 16, 2024

polarathene commented Sep 16, 2024

alephnull commented Sep 17, 2024

polarathene commented Sep 17, 2024

polarathene commented Sep 17, 2024 •

edited

Loading

ermirizio left a comment

[SYSE-398] Delete v5 tag and do not push v{{Major}} tags #357

[SYSE-398] Delete v5 tag and do not push v{{Major}} tags #357

Conversation

alephnull commented Sep 16, 2024

polarathene commented Sep 16, 2024

alephnull commented Sep 16, 2024

polarathene commented Sep 16, 2024

alephnull commented Sep 17, 2024

polarathene commented Sep 17, 2024

polarathene commented Sep 17, 2024 • edited Loading

ermirizio left a comment

Choose a reason for hiding this comment

polarathene commented Sep 17, 2024 •

edited

Loading