-
Notifications
You must be signed in to change notification settings - Fork 275
Change FieldSummary
{upper,lower}_bound
to ByteBuf
#1369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
62dcdd3
to
601ef29
Compare
601ef29
to
1da092a
Compare
bd4691f
to
d215831
Compare
d215831
to
ccd4632
Compare
Some(bound) if datum <= bound => ROWS_CANNOT_MATCH, | ||
Some(_) => ROWS_MIGHT_MATCH, | ||
Some(bound_bytes) => { | ||
let bound = ManifestFilterVisitor::bytes_to_datum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liurenjie1024 what's the cost of datum_from_bytes
and bytes_to_datum
? Do we need to introduce a new type in between?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost dependends on the type, but I think it's fine for now since it's not supposed to be quite expensive.
@@ -577,7 +575,7 @@ pub struct ManifestFile { | |||
/// A list of field summaries for each partition field in the spec. Each | |||
/// field in the list corresponds to a field in the manifest file’s | |||
/// partition spec. | |||
pub partitions: Vec<FieldSummary>, | |||
pub partitions: Option<Vec<FieldSummary>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a slight semantic difference; when it is omitted, we don't know anything about the partitions. When it is an empty Vec
, then it indicates that it is unpartitioned.
@@ -414,6 +444,11 @@ impl ManifestFilterVisitor<'_> { | |||
}; | |||
Ok(bound) | |||
} | |||
|
|||
fn bytes_to_datum(bytes: &ByteBuf, t: Type) -> Datum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit should this be in Datum
alongside try_from_bytes
?
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me and can unlock integration from pyiceberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @Fokko for this fix!
Which issue does this PR close?
I would like to invite everyone to roast my Rust-skills in order for me to improve myself :)
Unblocks #1328
This aligns closely with PyIceberg and Java and greatly simplifies the use of the Avro readers in PyIceberg. Otherwise, we would need to update public APIs.
What changes are included in this PR?
Are these changes tested?