-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new binaryNumberRep values #36
Comments
Issue #7 specifies a new binaryNumberRep 'offsetBinary'. Per email discussion Jun 8, 2022 title "binaryNumberRep limitations, xs:decimal and binaryDecimalVirtualPoint", several others are also needed: This table comes from a format specification we use: Ignore the 'Logical' column above, that's about enums. Ignore the "*" which is just about when a value must be reserved as an in-band null indicator which is the suggested such value. What is called 'Mod Twos Complement' here is what our existing proposed DFDL 2.0 feature (Issue #7) calls 'offsetBinary'. So this table suggests the need for 'unsignedBinary' (already mentioned), but also two others: 'signPlusMagnitudeBinary', and 'onesComplementBinary'. Google Protocol Buffers has popularized zigZag, a signed integer representation: Binary Value Zig Zag The above can be summarized to:
Those are about how the bit string is interpreted once it is assembled. There is also the issue of variable-length integers. Numerous formats exist where the integer consists of a number of bytes, but there is no stored length to tell us how many bytes. Rather, the most-significant-bit is used as a flag. 0 means 'last byte', 1 means "there is another byte". The 7-bit contributions from each byte are concatenated (taking dfdl:bitOrder into account) and the resulting bits are then interpreted per one of the applicable above schemes and dfdl:byteOrder. This notion of variable length where the most-significant-bit (or least) of a byte is used as a flag is effectively a new dfdl:lengthKind (perhaps called flagBitPerByte) which can be combined with many of the above binaryNumberRep values. Since the length is variable binaryNumberReps which depend on the most-significant bit being a sign bit are problematic. For those, an extra byte of 0x10 (or 0b10000000 must be added if the MSB of the integer would have been 1 as it would otherwise be interpreted as a sign. (ASN.1 BER uses this convention.) Of the above suggested binaryNumberReps, only offsetBinary makes no sense for variable-length representation because the mid-point of the potential integer range must be known. |
Next step would be to create experimental implementations, and an experimental features document to propose for DFDL v2.0 inclusion. |
Another number rep, though this is for decimal, not integer. EXI stores decimal numbers as two integers. One for the integer part, one for the fraction part, but the fraction part integer is created by taking the digits (base 10) of the fraction part, reversing their order, then converting to a binary integer. This preserves the exact number of leading zeros in the fraction part. Trailing zeros in the fraction part are not captured. |
This feature is "in use" in that dfdl:inputValueCalc and dfdl:outputValueCalc are used to synthesize the proper integers from these different representations. |
Closing #7 as duplicate. This was the description in #7 . We have found a number of places that use offset-binary numeric representation. This is also called excess-K, or biased, but I think offset binary is a better description of it. In this representation you take an unsigned binary, and just subtract an offset. E.g., for a 3-bit number, mostSignificantBitFirst: bits unsigned twos-comp offsetBinary 000 0 0 -4 So we suggest that the next revision of DFDL include dfdl:binaryNumberRep="offsetBinary" as a required feature. |
Are needed - for google zigzag integers, offset binary, etc.
The text was updated successfully, but these errors were encountered: