-
Notifications
You must be signed in to change notification settings - Fork 25
[GCP] Garbage collection #668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
57d4b4a
to
11baf2d
Compare
11baf2d
to
d1ca3d1
Compare
We agreed that this would be a post-beta feature. I think the order of operations should be:
My concern is that a) this is new code that risks performance degradation, and b) we'll end up delaying beta release until we've written and tested an equivalent feature for the other storage implementations. Thoughts? |
My understanding that beta is going to be tagged ~this week. The client code is already in #672, which should of course go in first so that clients are ready to cope with logs which GC before this change is merged.
Since this isn't a beta-blocking feature, I'm not sure I understand why you'd want to block beta for the other storage implementations? Personally, I'm not overly concerned about potential for performance regression (all PRs have that potential), but I can see an argument for delaying merging this until post-beta for correctness purposes; I believe it's correct, and it's fairly well covered with |
Sure, sounds good. I'll take a look at it, but we're aligned that we won't merge until beta is tagged. |
storage/gcp/gcp.go
Outdated
d := uint(0) | ||
eg := errgroup.Group{} | ||
done: | ||
for l, f, x := uint64(0), fromSize, treeSize; x > 0; l, f, x = l+1, f>>layout.TileHeight, x>>layout.TileHeight { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been deciphering this loop for over 10 minutes now. I think some docs would be beneficial:
- describe the strategy: this outer loop starts at the bottom level (
l
= 0) and works its way up the levels until either we delete the max number of tiles (handled inside the loop, by breaking), or we climb past the top of the tree (x > 0
). Within each level, we traverse left to right, clearing out garbage. - briefly describe
l
,f
,x
and their invariants - that this loop handles both the entries paths and tile paths at the same time, but really treats entries as a special case at level 0 of the tile paths
storage/gcp/gcp.go
Outdated
if l == 0 { | ||
eg.Go(func() error { return deleteWithPrefix(ctx, layout.EntriesPath(ri.Index, 0)+".p/") }) | ||
d++ | ||
fromSize += uint64(ri.N) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't convince myself that this loop doesn't leave garbage in the upper layers, but still move along the from size in the coordinator. If true, this means that garbage could be permanently left up there. I think it stems from this relationship between this increment of fromSize
and the breaking based on d > maxDeletes
above.
Imagine maxDeletes
is set to 1. We will always stay at the bottom level of the tree, deleting one thing from entries, and one thing from tile paths (which is 2, technically breaking the "max" deletes, but I think that's fine as long as we're honest in the docs that "max" is a guideline, not a rule). Then we will break the loop, above, and then continue to increment the coordinator. After a sufficient number of repeats of this, we will have bumped fromSize
to treeSize
and no work will be performed. But we never went to the upper floors.
Have I missed something?
I think it will also skip upper layer as soon as it finds a tile where ri.Partial != 0
, which is almost guaranteed after traversing all the way to the right.
My intuition is that the iteration should prioritize traversing up the tree before fully GCing the lower levels. i.e. keep fromSize
from being updated until all partial tiles that could have been generated in all levels of the tree have been found and collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched it a "vertical" implementation - I'd originally discounted that approach in order to avoid repeatedly considering higher level tiles, but avoiding that is far easier than I assumed it would be.
storage/gcp/gcp.go
Outdated
// needlessly attempt to GC over regions which have already been cleaned. | ||
// | ||
// Returns true if we've "caught up" with the current state of the tree. | ||
func (s *spannerCoordinator) garbageCollect(ctx context.Context, treeSize uint64, maxDeletes uint, deleteWithPrefix func(ctx context.Context, prefix string) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discuss this in the comments below, but maxDeletes
required me to update my intuition. In addition to the non-strict nature of it, my expectations about what it counted as a deletion were wrong.
What I expected was that this would control the maximum number of partial tiles that would be deleted. It actually controls the number of calls to deleteWithPrefix
, each of which could delete up to 255 partial tiles. Nothing wrong with this, but worth having a doc line to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to be "how many bundles should be considered for GC before exiting"
@@ -372,6 +428,7 @@ func (a *Appender) publishCheckpoint(ctx context.Context, size uint64, root []by | |||
type objStore interface { | |||
getObject(ctx context.Context, obj string) ([]byte, int64, error) | |||
setObject(ctx context.Context, obj string, data []byte, cond *gcs.Conditions, contType string, cacheCtl string) error | |||
deleteObjectsWithPrefix(ctx context.Context, prefix string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an internal interface, so I'm not overly worried, but this looks like a strange cousin next to these other 2 get/set methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to alternative suggestions, but the thinking was that this enables infra to do this specific task in the best way it can (e.g. POSIX could potentially do an rmdir()
) rather than force all to conform to an LCD like findObjectsWithPrefix
and deleteObject
d1ca3d1
to
3b8434f
Compare
3b8434f
to
8ff4fed
Compare
8ff4fed
to
053d6cf
Compare
This PR adds a "garbage collector" to the GCP storage implementation which periodically removes partial tiles & entry bundles which have become obsoleted by corresponding full versions as the log has grown.
See #672 for corresponding changes to the client code.
Towards #670