-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added AMI cleanup tool #214
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My big question:
Did you run earthly by @camscale, @tcsc and @r0mant?
From what I've read of it, I'm partial to the tool for isolation and reproducibility -- not to mention "better than the 1980s Makefile syntax". However it is a notable new tool that isn't in our toolchain yet.
Other feedback:
- I might avoid adding "ami" to the name of this tool. I think it is likely we'll see cleanup workflows expand to other artifact types. Knowing how things go, there is a good chance that e.g. S3 or ECR cleanup could get added to this tool.
- Including renovate in the same PR is a lot. I might recommend we include it in a separate PR. It looks like it is mostly for Earthfiles currently, but it'd be good to discuss the long term goal for renovate vs dependabot in this repo, and I'm not sure we want to sidtrack the cleanup script discussion with that.
I talked with @r0mant about it briefly during our 1x1 last week. While using it here specifically was not discussed, I figured it would be a good starting place to evaluate the tool. If we don't like it, then only ~150 lines of code are lost (not a huge deal IMO). I personally think there are some huge benefits to using it over our more traditional makefile/dockerfile/shell script/etc. spagetti. I can go into more details here if desired, but I put this item as a proposed quarterly goal for our team which should cover discussing the merits and flaws of the solution.
I thought about this, and I was even thinking about adding public ECR cleanup (ECR lifecycles not supported here) next. However I thought that it might be better to pull common code out into a lib and build a different tool, instead of adding ECR support to this one. Personally I am a fan of the "every tool does one specific job" Unix philosophy (paraphrasing). What do you think?
I thought about this pretty heavily, and I'm still not sure which way is better. On one hand:
On the other hand, separating it out in another PR would:
Do you (or other reviewers) have a strong preference one way of another? |
Anything you can do to break up a +2,500 line PR is appreciated. If the renovate stuff is essentially independent of the functionality of the tool to clean up the AMIs, then I suggest splitting it out. Layering PRs to build up from a small essential kernel and adding new orthogonal concerns is much easier to review than all in one big bang. At first blush, the bullet point list at the top of the PR description listing the features looks like they could almost be separate PRs too. It may mean adding stuff that does not initially conform to the RFD, but as long as you are heading there I think that's fine. |
|
||
// TODO move this under /libs at some point, however it is nowhere near | ||
// large enough to justify separate module today | ||
package internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal
is not a very good package name - it doesn't say anything about what the code in the package does.
https://go.dev/blog/package-names
Good package names make code better. A package’s name provides context for its contents, making it easier for clients to understand what the package is for and how to use it. The name also helps package maintainers determine what does and does not belong in the package as it evolves. Well-named packages make it easier to find the code you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decision was based on the Go standard project layout here, which states that for smaller projects extra structure is not required, and that this code can live under a package named internal
. One of the examples provided in the standard uses this naming scheme.
Do you want me to change this? If so, how do you want me to structure the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I remember when that repo was published there was a whole buzz about it because it is not in any way a "Go standard project layout." This is random internet people who slapped the term "standard" on their project.
I'd be much more inclined to adopt the device in the official Go blog (linked above).
In any case, the point of internal packages is that they reside in the subtree of a directory named internal, not that the package itself is necessarily named internal. This prevents them from being imported outside of the module but still gives you the opportunity to select a package name that better describes its contents.
|
||
// Cleans up the images in the given region. Returns the total amount of space recovered, number of images deleted, | ||
// and any error. | ||
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why mix int32 and int here? Do you actually care about the width of an integer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't, but the AWS API uses int32 for this value. Using int32 keeps the type consistent. Additionally, if I were to switch to int
via something like int(<aws API call int32 value>)
, then the value could overflow on 32 bit systems if AWS ever changes the type to int64
(which would not cause a compilation error and would be unlikely to be manually caught).
} | ||
|
||
func (ai *ApplicationInstance) cleanupImageIfOld(ctx context.Context, client IEc2Api, image ec2Types.Image) (int32, error) { | ||
creationDate, err := iso8601.ParseString(*image.CreationDate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a 3rd party dependency for this? Can time.Parse
do this for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, yes. time
does not fully support iso8601, just a subset. According to the docs the returned value is an iso8601 timestamp string.
for _, devImage := range devImages { | ||
imageSpace, err := ai.cleanupImageIfOld(ctx, ec2Client, devImage) | ||
if err != nil { | ||
return 0, 0, trace.Wrap(err, "failed to cleanup dev image %q", devImage.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we continue if we failed to cleanup one image instead of returning on first failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so - if one fails for whatever reason, then it is highly likely that the remaining will fail. The most common failures include invalid AWS credentials, API rate limits, bugs resulting in invalid queries.
return trace.Wrap(err, "failed to load AWS credentials") | ||
} | ||
|
||
enabledRegions, err := ai.getEnabledRegions(ctx, ai.accountClientGenerator(&cfg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would just allow the tool to accept regions as a CLI argument for simplicity and avoided an extra API call but no strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is a better approach because:
- If we start publishing images to additional regions (such as when AWS adds additional regions) then we don't have to remember to update some cleanup list somewhere.
- Moving this upwards to a CLI parameter means more shell code (granted this is very little). I'm trying to move as much as reasonably possible via a testable, rather than add additional shell code.
// and any error. | ||
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) { | ||
// A new client must be created for each region | ||
cfg.Region = regionName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions with side effects like this that modify the passed in argument are usually an anti-pattern, can we avoid doing that? E.g. just have a helper method that returns a client for a specified region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the struct by value does create a new copy of the config. See https://go.dev/play/p/yS0-otWZ4KA as an example. The original cfg
value that is passed upon call to cleanupRegion
is not modified.
ImageId: image.ImageId, | ||
DryRun: &ai.shouldDoDryRun, | ||
}) | ||
if err != nil && (!IsDryRunError(err) || !ai.shouldDoDryRun) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we checking for !ai.shouldDoDryRun
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AWS API will return a "dry run error" upon success, which needs special handling to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. This is looking a lot more like an idiomatic Go project now! 😄
@@ -0,0 +1,7 @@ | |||
{ | |||
"recommendations": [ | |||
"earthly.earthfile-syntax-highlighting", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem necessary anymore.
``` | ||
|
||
## Building: | ||
This tool requires [Earthly](https://earthly.dev/) to build, used as an alternative to Makefiles. Installation instructions for Earthly are available [here](https://earthly.dev/get-earthly). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this?
@@ -0,0 +1,19 @@ | |||
--- | |||
name: Cleanup old AMI images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: Cleanup old AMI images | |
name: Clean up old AMI images |
// TODO move this under /libs at some point, however it is nowhere near | ||
// large enough to justify separate module today | ||
package cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO move this under /libs at some point, however it is nowhere near | |
// large enough to justify separate module today | |
package cleanup | |
package cleanup |
In addition to not being necessary, this will also make sure godoc doesn't interpret your TODO as package documentation.
// TODO consider writing a `go generate` program for this. This will quickly get out | ||
// of hand if used frequently. | ||
|
||
// region Account API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// region Account API |
In addition to not being common in Go code, this comment will be interpreted as godoc.
|
||
// Cleans up the images in the given region. Returns the total amount of space recovered, number of images deleted, | ||
// and any error. | ||
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (ai ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) { | |
func (ai *ApplicationInstance) cleanupRegion(ctx context.Context, cfg aws.Config, regionName string) (int32, int, error) { |
It's a bit odd to have a mix of pointer and non-pointer receivers on the same type.
return 0, trace.Wrap(err, "failed to parse image %q creation timestamp %q as an ISO 8601 value", *image.Name, *image.CreationDate) | ||
} | ||
|
||
// If the image is less than a month old, don't do anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the image is less than a month old, don't do anything | |
// don't delete newer images |
This comment won't go out of date if we change maxImageKeepAge
if test.checkError == nil { | ||
test.checkError = require.NoError | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd make sure to set checkError
in all test cases above and then do away with this if. Less conditionals are easier to reason about and it makes your test cases more explicit.
} | ||
|
||
t.Run(test.desc, func(t *testing.T) { | ||
// Setup the application instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Setup the application instance | |
// Set up the application instance |
"result 4", | ||
}, | ||
} | ||
allResults := make([]string, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allResults := make([]string, 0) | |
allResults := make([]string, 0, len(actionSuppliedResults)) |
This PR adds a tool to cleanup old dev tag AMIs. Last time this was ran manually it reduced our annual AWS spend by about $400k.
This PR is the first one for this repo that conforms to RFD 1. As such, a lot of the changes are repo-wide tooling needed to meet these requirements. These features will make it much easier to add future projects to this repo.
The following features have been added:
Post merge, the following work needs to be done:
The tool has been tested locally on one of the dev accounts, but the rest of this (GHA workflows, project-specific Renovate configuration) have not been. I expect that there are some problems here that I won't be able to fix until the PR lands and they actually run.
Updates gravitational/teleport.e#3187