Skip to content

Add a new GetNewestUserDefinedTimestamp API #13547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jowlyzhang
Copy link
Contributor

This PR adds a DB::GetNewestUserDefinedTimestamp API to get the newest timestamp of the column family. This is only for when the column family enables user defined timestamp.
It checks the mutable memtable, the immutable memtable and the SST files, and returns the first newest user defined timestamp found. When user defined timestamp is not persisted in SST files, there is metadata in MANIFEST tracking upperbound of flushed timestamps, so the newest timestamp in SST files can be found. If user defined timestamps are
persisted in SST files, currently no timestamp metadata info is persisted. A NotSupported status will be returned if SST files need to be checked in that case.

Test plan:
Added tests

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jowlyzhang jowlyzhang requested a review from anand1976 April 16, 2025 18:24
InstrumentedMutexLock l(&mutex_);
bool enter_write_thread = sv->mem == cfd->mem();
WriteThread::Writer w;
// Enter write thread to read the mutable memtable to avoid racing access
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to support concurrent access to newest_udt_ from memtable instead of blocking writes for each call to GetNewestUserDefinedTimestamp()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a more performant direction. Right now, the newest_udt_ is a Slice so concurrent access is not available. For uint64_t timestamp, we can consider using std::atomic<uint64_t>.

Since the internal user for this API currently only need to call this API after DB::Open before any writes start, to get the timestamp to recover from. I think entering write thread to avoid the concurrent issue isn't a bad idea. If we need this API to be more performant, I can revise the implementation. We introduce this API for them because the GetFullHistoryTsLow API doesn't feel generic enough.

Copy link
Member

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// There is currently no metadata in SST files or MANIFEST to track
// timestamps for this case.
// TODO(yuzhangyu): add support for this case.
status = Status::NotSupported(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on data being in memtable feels not reliable, maybe it's better to just not support persist_user_defined_timestamps at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Our internal users currently indeed doesn't persist user defined timestamps. I think it's a good idea to not support it at all if it is not fully supported. Let me update the PR.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in 476a98c.

ybtsdst pushed a commit to ybtsdst/rocksdb that referenced this pull request Apr 27, 2025
Summary:
This PR adds a DB::GetNewestUserDefinedTimestamp API to get the newest timestamp of the column family. This is only for when the column family enables user defined timestamp.
It checks the mutable memtable, the immutable memtable and the SST files, and returns the first newest user defined timestamp found. When user defined timestamp is not persisted in SST files, there is metadata in MANIFEST tracking upperbound of flushed timestamps, so the newest timestamp in SST files can be found. If user defined timestamps are
persisted in SST files, currently no timestamp metadata info is persisted. A NotSupported status will be returned if SST files need to be checked in that case.

Pull Request resolved: facebook#13547

Test Plan: Added tests

Reviewed By: cbi42

Differential Revision: D73123575

Pulled By: jowlyzhang

fbshipit-source-id: 460ac4f9c96926d3c8fcf7944edab8dc0feae1dd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants