Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ixdtf to handle unbound fraction length #6036

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

nekevss
Copy link
Contributor

@nekevss nekevss commented Jan 25, 2025

This PR updates ixdtf's fraction handling to parse fraction values beyond 9 digits in length per feedback from #6004.

It introduces a new enum Fraction in place of the previous nanosecond field that allows parsing fraction values beyond 9 digits while preserving the precision in the enum.

It also adds a variety of tests to test the new behavior while adjusting tests from the old behavior to the new behavior

@nekevss nekevss requested a review from a team as a code owner January 25, 2025 06:15
@nekevss nekevss changed the title Update fraction to handle unbound fraction length Update ixdtf to handle unbound fraction length Jan 25, 2025
Comment on lines 221 to 230
pub enum Fraction {
/// Parsed nanoseconds value (A fraction value from 1-9 digits length)
Nanoseconds(u64),
/// Parsed picoseconds value (A fraction value from 10-12 digits length)
Picoseconds(u64),
/// Parsed femtoseconds value (A fraction value from 12-15 digits length)
Femtoseconds(u64),
/// A parsed value truncated to nanoseconds (A fraction value of 15+ digits length)
Truncated(u64), // An unbound fraction value truncated to nanoseconds
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: change this to

pub struct Fraction {
    fraction_digits: u8,
    value: u64,
}

and then give it functions such as

impl Fraction {
    pub fn try_to_nanoseconds(self) -> Option<u64> {
        if self.fraction_digits <= 9 {
            // compute the pow and return
        } else {
            None
        }
    }
    pub fn to_truncated_nanoseconds(self) -> u64 {
        // compute pow, either + or -
    }
}

Another future advantage of this model is that the Fraction can directly return a FixedDecimal, which could make fractional second formatting more efficient in icu4x.

components/timezone/src/ixdtf.rs Outdated Show resolved Hide resolved
components/timezone/src/ixdtf.rs Outdated Show resolved Hide resolved
@@ -46,6 +46,8 @@ pub enum ParseError {
TimeSecond,
#[displaydoc("Invalid character while parsing fraction part value.")]
FractionPart,
#[displaydoc("Fraction part value exceeds a representable range.")]
InvalidFractionRange,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Invalid" is not very descriptive. This is an error, that already tells me that something is not valid.

Suggested change
InvalidFractionRange,
ExcessiveSecondPrecision,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was double checking this and the error is specific to a Duration hour fraction and minute fraction potentially exceeding the range of u64::MAX when calculated. Maybe DurationFractionExceededRange (this would then be moved down into the Duration errors).

components/timezone/src/ixdtf.rs Outdated Show resolved Hide resolved
components/timezone/src/ixdtf.rs Outdated Show resolved Hide resolved
utils/ixdtf/src/error.rs Outdated Show resolved Hide resolved
nekevss and others added 2 commits February 11, 2025 11:33
Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
Comment on lines 239 to 257
pub fn to_nanoseconds(&self) -> Option<u32> {
if self.digits <= 9 {
10u32
.checked_pow(9 - u32::from(self.digits))
.map(|x| x * self.value as u32)
} else {
None
}
}

/// Returns a `u64` representing the `Fraction` as it's computed
/// nanosecond value, truncating any value beyond 9 digits to
/// nanoseconds
pub fn to_truncated_nanoseconds(&self) -> u64 {
if self.digits <= 9 {
self.value
} else {
self.value / 10u64.pow(u32::from(self.digits - 9))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Something here doesn't look right. The code for the self.digits <= 9 case should be the same in both functions I think. Please make sure this is tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's true. I think when I initially wrote that, it was with the idea of truncating on all cases, but that doesn't align with the function name.

I've updated the behavior and added some doc tests and tests for potentially malformed Fractions since the fields are public.

sffc
sffc previously approved these changes Feb 17, 2025
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: kind-of wish the Fraction type didn't have so many weird invariants. It is also a bit sad that we need to do so much checked arithmetic to get the nanoseconds out of it. There's probably still room to iterate on its design.

Note: the TC decided to start using the language "subsecond" instead of "fractional second" in icu_datetime, but I don't know if that also applies to the ixdtf crate, and even if it does, it shouldn't block merging this PR.

@nekevss
Copy link
Contributor Author

nekevss commented Feb 17, 2025

Agreed on the amount of checking that's needed for arithmetic. I wasn't super happy with it. It could probably be avoided by making Fraction non-exhaustive or just removed in favor of GIGO alongside some debug_asserts as most of the checks are primarily for somebody constructing a Fraction of their own. Most of the arithmetic checks would not affect Fractions output from parsing in ixdtf.

I'm more inclined to refer to this as Fraction as RFC3339 defines the fraction in ABNF as time-secfrac and refers to it elsewhere as "fractional second digits". At a quick glance, the language subsecond does not exist in either RFC3339 or RFC9557.

@sffc
Copy link
Member

sffc commented Feb 17, 2025

Maybe make Fraction have private fields so we can mess with the implementation in the future.

And maybe keep to_truncated_nanoseconds() returning a non-Option.

Copy link
Contributor Author

@nekevss nekevss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes to Fraction by making the fields private (pub(crate)) and updating digits to be a NonZeroU8.

I think this is cleaner than the previous version.

General question: should there be getters for digit and value?

.map(|fraction| {
fraction
.to_nanoseconds()
.ok_or(ParseError::ExcessivePrecision)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExcessivePrecision error could be remove here in favor of calling to_truncated_nanoseconds if that were preferred.

Would also remove the transpose below.

@nekevss nekevss requested a review from sffc February 18, 2025 00:39
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, however I'm not happy with the Fraction naming. I think Subsecond would be clearer, and it's a term we're introducing in icu_time and icu_datetime right now as well (#5999).

@sffc
Copy link
Member

sffc commented Feb 18, 2025

What terminology does the RFC use for fractional second / subsecond?

@nekevss
Copy link
Contributor Author

nekevss commented Feb 18, 2025

Subsecond isn't a totally accurate term when applied to the TimeDurationRecord (which actually now that I think about it needs to be adjusted for these changes even more than they currently are). Also, I mentioned it above, the language Subsecond does not exist in any of the RFCs, which uses "fractional second digits". I don't disagree with the change being made in reference to #5999, but I do think the context here is a tad different.

Essentially the above would mean that "T1.5H" is parsed as "T" HOUR_VALUE "." SUBSECOND "H", which to me makes less sense than "T" HOUR_VALUE "." FRACTIONAL_DIGITS "H"

To be fair, the name could be changed to FractionalDigits to more align with the spec.

@sffc
Copy link
Member

sffc commented Feb 18, 2025

Good point about this field being used for things other than seconds in the ixdtf crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants