Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC data residuals when with large amount of artifact trash #20711

Open
chlins opened this issue Jul 8, 2024 · 2 comments · May be fixed by #20735
Open

GC data residuals when with large amount of artifact trash #20711

chlins opened this issue Jul 8, 2024 · 2 comments · May be fixed by #20735
Labels

Comments

@chlins
Copy link
Member

chlins commented Jul 8, 2024

How can we help you?

Scenario

When there is a large amount of artifact trash data in the harbor, if an external database is used at this time, then when the filter artifact trash time is too long, the user deletes the artifact, which will cause the blob of this artifact to be deleted in advance, and the residual artifact trash cannot be deleted, resulting in artifact_blob and distribution manifest and its revisions cannot be cleaned up.

These resources will be left behind:

  • artifact_trash
  • artifact_blob
  • manifest and revisions(distribution)

Explanation

arts, err := gc.deletedArt(ctx)
this step took long time and in this period if user deleted the artifact, there will be a new record in artifact_trash but will not be covered in this time return, but the blob referenced by it will be captured in the following step
blobs, err := gc.uselessBlobs(ctx)
As the blob belong to it has been deleted, so there will be no chance to delete the artifact_trash as the clean mechanism in
if _, exist := gc.trashedArts[blob.Digest]; exist && blob.IsManifest() {

Copy link

github-actions bot commented Sep 6, 2024

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Sep 6, 2024
@chlins chlins added never-stale Do not stale and removed Stale labels Sep 6, 2024
@DB-Vincent
Copy link

DB-Vincent commented Nov 4, 2024

@chlins Is there any update on this? We're seeing a ~1TB difference between what's being reported on Harbor and what we can see on the filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
2 participants