Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AMI cleanup tool #214

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Added AMI cleanup tool #214

wants to merge 17 commits into from

Conversation

fheinecke
Copy link
Contributor

@fheinecke fheinecke commented Jan 11, 2024

This PR adds a tool to cleanup old dev tag AMIs. Last time this was ran manually it reduced our annual AWS spend by about $400k.

This PR is the first one for this repo that conforms to RFD 1. As such, a lot of the changes are repo-wide tooling needed to meet these requirements. These features will make it much easier to add future projects to this repo.

The following features have been added:

Post merge, the following work needs to be done:

  • The GHA environment for this repo needs to be configured for Renovate.
  • The action needs to be used somewhere (probably teleport.e) to actually prune images.

The tool has been tested locally on one of the dev accounts, but the rest of this (GHA workflows, project-specific Renovate configuration) have not been. I expect that there are some problems here that I won't be able to fix until the PR lands and they actually run.

Updates gravitational/teleport.e#3187

@fheinecke fheinecke requested review from a team January 11, 2024 08:24
Copy link
Contributor

@wadells wadells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My big question:

Did you run earthly by @camscale, @tcsc and @r0mant?

From what I've read of it, I'm partial to the tool for isolation and reproducibility -- not to mention "better than the 1980s Makefile syntax". However it is a notable new tool that isn't in our toolchain yet.

Other feedback:

  • I might avoid adding "ami" to the name of this tool. I think it is likely we'll see cleanup workflows expand to other artifact types. Knowing how things go, there is a good chance that e.g. S3 or ECR cleanup could get added to this tool.
  • Including renovate in the same PR is a lot. I might recommend we include it in a separate PR. It looks like it is mostly for Earthfiles currently, but it'd be good to discuss the long term goal for renovate vs dependabot in this repo, and I'm not sure we want to sidtrack the cleanup script discussion with that.

.github/renovate.json5 Outdated Show resolved Hide resolved
.github/renovate.json5 Outdated Show resolved Hide resolved
.github/workflows/ami-cleanup-ci.yaml Outdated Show resolved Hide resolved
tools/ami-cleanup/LICENSE Outdated Show resolved Hide resolved
tools/ami-cleanup/workflows/cd.yaml Outdated Show resolved Hide resolved
@fheinecke
Copy link
Contributor Author

Did you run earthly by @camscale, @tcsc and @r0mant?

I talked with @r0mant about it briefly during our 1x1 last week. While using it here specifically was not discussed, I figured it would be a good starting place to evaluate the tool. If we don't like it, then only ~150 lines of code are lost (not a huge deal IMO).

I personally think there are some huge benefits to using it over our more traditional makefile/dockerfile/shell script/etc. spagetti. I can go into more details here if desired, but I put this item as a proposed quarterly goal for our team which should cover discussing the merits and flaws of the solution.

I might avoid adding "ami" to the name of this tool. I think it is likely we'll see cleanup workflows expand to other artifact types. Knowing how things go, there is a good chance that e.g. S3 or ECR cleanup could get added to this tool.

I thought about this, and I was even thinking about adding public ECR cleanup (ECR lifecycles not supported here) next. However I thought that it might be better to pull common code out into a lib and build a different tool, instead of adding ECR support to this one. Personally I am a fan of the "every tool does one specific job" Unix philosophy (paraphrasing).

What do you think?

Including renovate in the same PR is a lot. I might recommend we include it in a separate PR. It looks like it is mostly for Earthfiles currently, but it'd be good to discuss the long term goal for renovate vs dependabot in this repo, and I'm not sure we want to sidtrack the cleanup script discussion with that.

I thought about this pretty heavily, and I'm still not sure which way is better. On one hand:

  • All the code (workflow) and 90% of the config is already written/tested/in use, as it was copy/pasted from cloud-terraform
  • Renovate is basically useless if there is no tool/dependencies to run it against
  • It's a little easier for me to manage/merge one PR in this repo rather than several (PRs here usually sit open for quite a while)
  • It's needed for this tool per the RFD

On the other hand, separating it out in another PR would:

  • Allow for a more dependency management specific discussion about these changes
  • Make this PR a little easier to review

Do you (or other reviewers) have a strong preference one way of another?

@camscale
Copy link
Contributor

Do you (or other reviewers) have a strong preference one way of another?

Anything you can do to break up a +2,500 line PR is appreciated. If the renovate stuff is essentially independent of the functionality of the tool to clean up the AMIs, then I suggest splitting it out. Layering PRs to build up from a small essential kernel and adding new orthogonal concerns is much easier to review than all in one big bang. At first blush, the bullet point list at the top of the PR description listing the features looks like they could almost be separate PRs too. It may mean adding stuff that does not initially conform to the RFD, but as long as you are heading there I think that's fine.

@fheinecke fheinecke requested review from a team as code owners March 28, 2024 21:20
go.work Outdated Show resolved Hide resolved
tools/ami-cleanup/Earthfile Outdated Show resolved Hide resolved
tools/ami-cleanup/src/cmd/main.go Show resolved Hide resolved
tools/ami-cleanup/src/cmd/main_test.go Show resolved Hide resolved

// TODO move this under /libs at some point, however it is nowhere near
// large enough to justify separate module today
package internal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal is not a very good package name - it doesn't say anything about what the code in the package does.

https://go.dev/blog/package-names

Good package names make code better. A package’s name provides context for its contents, making it easier for clients to understand what the package is for and how to use it. The name also helps package maintainers determine what does and does not belong in the package as it evolves. Well-named packages make it easier to find the code you need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decision was based on the Go standard project layout here, which states that for smaller projects extra structure is not required, and that this code can live under a package named internal. One of the examples provided in the standard uses this naming scheme.

Do you want me to change this? If so, how do you want me to structure the project?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I remember when that repo was published there was a whole buzz about it because it is not in any way a "Go standard project layout." This is random internet people who slapped the term "standard" on their project.

I'd be much more inclined to adopt the device in the official Go blog (linked above).

In any case, the point of internal packages is that they reside in the subtree of a directory named internal, not that the package itself is necessarily named internal. This prevents them from being imported outside of the module but still gives you the opportunity to select a package name that better describes its contents.

tools/ami-cleanup/src/internal/aws_test.go Outdated Show resolved Hide resolved
tools/ami-cleanup/src/internal/cleanup.go Outdated Show resolved Hide resolved

// Cleans up the images in the given region. Returns the total amount of space recovered, number of images deleted,
// and any error.
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mix int32 and int here? Do you actually care about the width of an integer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't, but the AWS API uses int32 for this value. Using int32 keeps the type consistent. Additionally, if I were to switch to int via something like int(<aws API call int32 value>), then the value could overflow on 32 bit systems if AWS ever changes the type to int64 (which would not cause a compilation error and would be unlikely to be manually caught).

tools/ami-cleanup/src/internal/cleanup.go Outdated Show resolved Hide resolved
}

func (ai *ApplicationInstance) cleanupImageIfOld(ctx context.Context, client IEc2Api, image ec2Types.Image) (int32, error) {
creationDate, err := iso8601.ParseString(*image.CreationDate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a 3rd party dependency for this? Can time.Parse do this for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, yes. time does not fully support iso8601, just a subset. According to the docs the returned value is an iso8601 timestamp string.

tools/ami-cleanup/src/internal/aws.go Outdated Show resolved Hide resolved
tools/ami-cleanup/src/internal/cleanup.go Outdated Show resolved Hide resolved
tools/ami-cleanup/src/internal/cleanup.go Outdated Show resolved Hide resolved
tools/ami-cleanup/src/internal/cleanup.go Outdated Show resolved Hide resolved
for _, devImage := range devImages {
imageSpace, err := ai.cleanupImageIfOld(ctx, ec2Client, devImage)
if err != nil {
return 0, 0, trace.Wrap(err, "failed to cleanup dev image %q", devImage.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we continue if we failed to cleanup one image instead of returning on first failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so - if one fails for whatever reason, then it is highly likely that the remaining will fail. The most common failures include invalid AWS credentials, API rate limits, bugs resulting in invalid queries.

return trace.Wrap(err, "failed to load AWS credentials")
}

enabledRegions, err := ai.getEnabledRegions(ctx, ai.accountClientGenerator(&cfg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would just allow the tool to accept regions as a CLI argument for simplicity and avoided an extra API call but no strong opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a better approach because:

  • If we start publishing images to additional regions (such as when AWS adds additional regions) then we don't have to remember to update some cleanup list somewhere.
  • Moving this upwards to a CLI parameter means more shell code (granted this is very little). I'm trying to move as much as reasonably possible via a testable, rather than add additional shell code.

// and any error.
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) {
// A new client must be created for each region
cfg.Region = regionName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions with side effects like this that modify the passed in argument are usually an anti-pattern, can we avoid doing that? E.g. just have a helper method that returns a client for a specified region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the struct by value does create a new copy of the config. See https://go.dev/play/p/yS0-otWZ4KA as an example. The original cfg value that is passed upon call to cleanupRegion is not modified.

tools/ami-cleanup/src/internal/cleanup.go Outdated Show resolved Hide resolved
ImageId: image.ImageId,
DryRun: &ai.shouldDoDryRun,
})
if err != nil && (!IsDryRunError(err) || !ai.shouldDoDryRun) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking for !ai.shouldDoDryRun here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS API will return a "dry run error" upon success, which needs special handling to ignore.

@fheinecke
Copy link
Contributor Author

@r0mant and @zmb3 I addressed your review comments. I plan on changing everything I did not comment on, but how I change them may depend on the outcome of your responses to my comments. Please take another look when you get a chance.

@fheinecke fheinecke requested review from r0mant and zmb3 April 4, 2024 20:30
Copy link
Contributor

@zmb3 zmb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. This is looking a lot more like an idiomatic Go project now! 😄

@@ -0,0 +1,7 @@
{
"recommendations": [
"earthly.earthfile-syntax-highlighting",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary anymore.

```

## Building:
This tool requires [Earthly](https://earthly.dev/) to build, used as an alternative to Makefiles. Installation instructions for Earthly are available [here](https://earthly.dev/get-earthly).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this?

@@ -0,0 +1,19 @@
---
name: Cleanup old AMI images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: Cleanup old AMI images
name: Clean up old AMI images

Comment on lines +17 to +19
// TODO move this under /libs at some point, however it is nowhere near
// large enough to justify separate module today
package cleanup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO move this under /libs at some point, however it is nowhere near
// large enough to justify separate module today
package cleanup
package cleanup

In addition to not being necessary, this will also make sure godoc doesn't interpret your TODO as package documentation.

// TODO consider writing a `go generate` program for this. This will quickly get out
// of hand if used frequently.

// region Account API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// region Account API

In addition to not being common in Go code, this comment will be interpreted as godoc.


// Cleans up the images in the given region. Returns the total amount of space recovered, number of images deleted,
// and any error.
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) {
func (ai *ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) {

It's a bit odd to have a mix of pointer and non-pointer receivers on the same type.

return 0, trace.Wrap(err, "failed to parse image %q creation timestamp %q as an ISO 8601 value", *image.Name, *image.CreationDate)
}

// If the image is less than a month old, don't do anything
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the image is less than a month old, don't do anything
// don't delete newer images

This comment won't go out of date if we change maxImageKeepAge

Comment on lines +107 to +109
if test.checkError == nil {
test.checkError = require.NoError
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd make sure to set checkError in all test cases above and then do away with this if. Less conditionals are easier to reason about and it makes your test cases more explicit.

}

t.Run(test.desc, func(t *testing.T) {
// Setup the application instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Setup the application instance
// Set up the application instance

"result 4",
},
}
allResults := make([]string, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allResults := make([]string, 0)
allResults := make([]string, 0, len(actionSuppliedResults))

@wadells wadells requested review from wadells and removed request for wadells April 11, 2024 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants