Skip to content

Change FieldSummary {upper,lower}_bound to ByteBuf #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 28, 2025

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented May 22, 2025

Which issue does this PR close?

I would like to invite everyone to roast my Rust-skills in order for me to improve myself :)

Unblocks #1328

This aligns closely with PyIceberg and Java and greatly simplifies the use of the Avro readers in PyIceberg. Otherwise, we would need to update public APIs.

What changes are included in this PR?

Are these changes tested?

@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch 2 times, most recently from 62dcdd3 to 601ef29 Compare May 22, 2025 12:24
@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch from 601ef29 to 1da092a Compare May 22, 2025 12:26
@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch 2 times, most recently from bd4691f to d215831 Compare May 22, 2025 16:31
@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch from d215831 to ccd4632 Compare May 22, 2025 20:50
Some(bound) if datum <= bound => ROWS_CANNOT_MATCH,
Some(_) => ROWS_MIGHT_MATCH,
Some(bound_bytes) => {
let bound = ManifestFilterVisitor::bytes_to_datum(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liurenjie1024 what's the cost of datum_from_bytes and bytes_to_datum? Do we need to introduce a new type in between?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost dependends on the type, but I think it's fine for now since it's not supposed to be quite expensive.

@@ -577,7 +575,7 @@ pub struct ManifestFile {
/// A list of field summaries for each partition field in the spec. Each
/// field in the list corresponds to a field in the manifest file’s
/// partition spec.
pub partitions: Vec<FieldSummary>,
pub partitions: Option<Vec<FieldSummary>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this change, it aligns with the spec definition

Screenshot 2025-05-23 at 8 43 56 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a slight semantic difference; when it is omitted, we don't know anything about the partitions. When it is an empty Vec, then it indicates that it is unpartitioned.

@@ -414,6 +444,11 @@ impl ManifestFilterVisitor<'_> {
};
Ok(bound)
}

fn bytes_to_datum(bytes: &ByteBuf, t: Type) -> Datum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit should this be in Datum alongside try_from_bytes?

Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
@kevinjqliu kevinjqliu requested a review from liurenjie1024 May 25, 2025 21:48
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me and can unlock integration from pyiceberg.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @Fokko for this fix!

@liurenjie1024 liurenjie1024 merged commit 91a1e3a into apache:main May 28, 2025
17 checks passed
@Fokko Fokko deleted the fd-change-datum-into-bytes branch May 29, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants