Skip to content

[GCP] Garbage collection #668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AlCutter
Copy link
Collaborator

@AlCutter AlCutter commented May 27, 2025

This PR adds a "garbage collector" to the GCP storage implementation which periodically removes partial tiles & entry bundles which have become obsoleted by corresponding full versions as the log has grown.

See #672 for corresponding changes to the client code.

Towards #670

@AlCutter AlCutter force-pushed the garbage_collections branch 6 times, most recently from 57d4b4a to 11baf2d Compare May 28, 2025 11:47
@AlCutter AlCutter changed the title WIP garbage collection [GCP] Garbage collection May 28, 2025
@AlCutter AlCutter added the enhancement New feature or request label May 28, 2025
@AlCutter AlCutter force-pushed the garbage_collections branch from 11baf2d to d1ca3d1 Compare May 28, 2025 14:39
@AlCutter AlCutter marked this pull request as ready for review May 28, 2025 14:47
@AlCutter AlCutter requested a review from a team as a code owner May 28, 2025 14:47
@AlCutter AlCutter requested a review from mhutchinson May 28, 2025 14:47
@mhutchinson
Copy link
Contributor

We agreed that this would be a post-beta feature. I think the order of operations should be:

  • client code change should go in
  • tag beta
  • add non-blocking features like this

My concern is that a) this is new code that risks performance degradation, and b) we'll end up delaying beta release until we've written and tested an equivalent feature for the other storage implementations.

Thoughts?

@AlCutter
Copy link
Collaborator Author

We agreed that this would be a post-beta feature. I think the order of operations should be:

  • client code change should go in
  • tag beta
  • add non-blocking features like this

My understanding that beta is going to be tagged ~this week.
I'm ok to delay merging this until after tagging beta if that is the case, but I'd appreciate a review in case there are changes to make in the meantime.

The client code is already in #672, which should of course go in first so that clients are ready to cope with logs which GC before this change is merged.

My concern is that a) this is new code that risks performance degradation, and b) we'll end up delaying beta release until we've written and tested an equivalent feature for the other storage implementations.

Since this isn't a beta-blocking feature, I'm not sure I understand why you'd want to block beta for the other storage implementations?

Personally, I'm not overly concerned about potential for performance regression (all PRs have that potential), but I can see an argument for delaying merging this until post-beta for correctness purposes; I believe it's correct, and it's fairly well covered with fsck tests, but it is after all making structural changes to the log it manages.

@mhutchinson
Copy link
Contributor

My understanding that beta is going to be tagged ~this week. I'm ok to delay merging this until after tagging beta if that is the case, but I'd appreciate a review in case there are changes to make in the meantime.

Sure, sounds good. I'll take a look at it, but we're aligned that we won't merge until beta is tagged.

d := uint(0)
eg := errgroup.Group{}
done:
for l, f, x := uint64(0), fromSize, treeSize; x > 0; l, f, x = l+1, f>>layout.TileHeight, x>>layout.TileHeight {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been deciphering this loop for over 10 minutes now. I think some docs would be beneficial:

  • describe the strategy: this outer loop starts at the bottom level (l = 0) and works its way up the levels until either we delete the max number of tiles (handled inside the loop, by breaking), or we climb past the top of the tree (x > 0). Within each level, we traverse left to right, clearing out garbage.
  • briefly describe l, f, x and their invariants
  • that this loop handles both the entries paths and tile paths at the same time, but really treats entries as a special case at level 0 of the tile paths

if l == 0 {
eg.Go(func() error { return deleteWithPrefix(ctx, layout.EntriesPath(ri.Index, 0)+".p/") })
d++
fromSize += uint64(ri.N)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't convince myself that this loop doesn't leave garbage in the upper layers, but still move along the from size in the coordinator. If true, this means that garbage could be permanently left up there. I think it stems from this relationship between this increment of fromSize and the breaking based on d > maxDeletes above.

Imagine maxDeletes is set to 1. We will always stay at the bottom level of the tree, deleting one thing from entries, and one thing from tile paths (which is 2, technically breaking the "max" deletes, but I think that's fine as long as we're honest in the docs that "max" is a guideline, not a rule). Then we will break the loop, above, and then continue to increment the coordinator. After a sufficient number of repeats of this, we will have bumped fromSize to treeSize and no work will be performed. But we never went to the upper floors.

Have I missed something?

I think it will also skip upper layer as soon as it finds a tile where ri.Partial != 0, which is almost guaranteed after traversing all the way to the right.

My intuition is that the iteration should prioritize traversing up the tree before fully GCing the lower levels. i.e. keep fromSize from being updated until all partial tiles that could have been generated in all levels of the tree have been found and collected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched it a "vertical" implementation - I'd originally discounted that approach in order to avoid repeatedly considering higher level tiles, but avoiding that is far easier than I assumed it would be.

// needlessly attempt to GC over regions which have already been cleaned.
//
// Returns true if we've "caught up" with the current state of the tree.
func (s *spannerCoordinator) garbageCollect(ctx context.Context, treeSize uint64, maxDeletes uint, deleteWithPrefix func(ctx context.Context, prefix string) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discuss this in the comments below, but maxDeletes required me to update my intuition. In addition to the non-strict nature of it, my expectations about what it counted as a deletion were wrong.

What I expected was that this would control the maximum number of partial tiles that would be deleted. It actually controls the number of calls to deleteWithPrefix, each of which could delete up to 255 partial tiles. Nothing wrong with this, but worth having a doc line to make it clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be "how many bundles should be considered for GC before exiting"

@@ -372,6 +428,7 @@ func (a *Appender) publishCheckpoint(ctx context.Context, size uint64, root []by
type objStore interface {
getObject(ctx context.Context, obj string) ([]byte, int64, error)
setObject(ctx context.Context, obj string, data []byte, cond *gcs.Conditions, contType string, cacheCtl string) error
deleteObjectsWithPrefix(ctx context.Context, prefix string) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal interface, so I'm not overly worried, but this looks like a strange cousin next to these other 2 get/set methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to alternative suggestions, but the thinking was that this enables infra to do this specific task in the best way it can (e.g. POSIX could potentially do an rmdir()) rather than force all to conform to an LCD like findObjectsWithPrefix and deleteObject

@AlCutter AlCutter force-pushed the garbage_collections branch from d1ca3d1 to 3b8434f Compare May 29, 2025 11:00
@AlCutter AlCutter force-pushed the garbage_collections branch from 3b8434f to 8ff4fed Compare May 29, 2025 11:20
@AlCutter AlCutter force-pushed the garbage_collections branch from 8ff4fed to 053d6cf Compare May 29, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants