Skip to content

feat(rust/cbork): deterministic map decoding helper #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

cong-or
Copy link
Contributor

@cong-or cong-or commented Jun 10, 2025

The code satisfies RFC 8949's deterministic CBOR map requirements by ensuring:

  1. Key Ordering

    • Shorter keys before longer ones
    • Lexicographical ordering for equal-length keys
  2. Key Uniqueness

    • No duplicate keys allowed
  3. Encoding Rules

    • Uses minimal length encoding
    • No indefinite-length items

All requirements are properly validated and tested

example usage:

use minicbor::Decoder;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Example of a valid CBOR map with 2 entries
    let cbor_bytes = vec![
        0xA2,                   // Map with 2 pairs
        0x41, 0x01,            // Key 1: single byte string "1"
        0x41, 0x30,            // Value 1: single byte string "0"
        0x41, 0x02,            // Key 2: single byte string "2"
        0x41, 0x31,            // Value 2: single byte string "1"
    ];

    // Create a decoder with our CBOR bytes
    let mut decoder = Decoder::new(&cbor_bytes);
    
    // Decode and validate the map
    match decode_map_deterministically(&mut decoder) {
        Ok(validated_bytes) => {
            println!("Valid deterministic map! Bytes: {:?}", validated_bytes);
            // validated_bytes will contain the original CBOR bytes since they were valid
        },
        Err(e) => {
            println!("Invalid deterministic encoding: {}", e);
        }
    }

    Ok(())
}

p.s will do separate PR for abstracted cbor decoder with optional deterministic decoding w logging.

cong-or added 30 commits May 25, 2025 18:32
Adds validation for minimal length encoding of string types (Str and Bytes) in the
DeterministicDecoder according to RFC 8949 Section 4.2. This ensures that string
lengths are encoded using the minimal number of bytes required. For example, strings
of length 0-23 must use direct encoding, length 24-255 must use one byte, etc.

The changes:
- Add length validation for Type::Str and Type::Bytes
- Check for indefinite length strings
- Validate minimal length encoding using check_minimal_length function
Adds comprehensive test coverage for RFC 8949 Section 4.2 deterministic encoding
requirements. The new tests verify:

- Minimal length integer encoding rules for values 0-23, 24-255, etc.
- Floating point value requirements including shortest form and non-finite prohibition
- String/array/map length encoding rules and indefinite length checks
- Map key ordering rules with length-first canonical ordering

Each test includes detailed comments explaining:
- The specific RFC requirement being tested
- Byte-level breakdown of CBOR encodings
- Why each test case is valid or invalid
- References to relevant RFC sections

This ensures proper validation of all deterministic encoding rules and helps
maintainers understand the requirements.
Add detailed test cases for deterministic CBOR encoding rules as specified in
RFC 8949 section 4.2. The new tests cover:

- Integer boundary conditions and minimal encoding requirements
- Negative integer encoding across different ranges
- Map key ordering (length-first, then lexicographic)
- Floating point encoding with different precision requirements
- String comparison ordering including UTF-8 handling
- Nested structure validation
- Array length encoding rules
- Duplicate map key detection

The tests are extensively documented with RFC requirements and include TODOs
for future validation improvements, particularly for floating point handling
where additional checks for non-finite values and minimal encoding could be
added.

Includes commented-out test cases that can be enabled once support for
validating non-finite floating point values is implemented.

RFC: https://datatracker.ietf.org/doc/html/rfc8949#section-4.2
Refactor test cases to fix clippy warnings:
- Use simpler iterator chaining in array length test
- Remove redundant  calls
- Replace explicit type annotations with inferred types
- Fix collect() with redundant map operations

Also simplify floating point test cases to match current implementation
and improve RFC 8949 compliance documentation. The floating point tests
now focus on valid encodings while keeping commented-out future test
cases for non-finite values validation.

Tests still verify the same RFC requirements but with more idiomatic
Rust code.
Improve documentation and refactor validate_next() to align with RFC 8949 § 4.2
specification for deterministically encoded CBOR. Split validation logic into
smaller, focused functions for better maintainability.

- Split validate_next into specialized validation functions:
  * validate_integer() - Handles minimal integer encoding
  * validate_array() - Validates definite-length arrays
  * validate_string() - Checks string/bytes encoding
  * validate_map() - Ensures proper key ordering

- Add comprehensive documentation referencing RFC 8949:
  * Detail core deterministic encoding requirements
  * Document rules for integer minimality
  * Explain length field constraints
  * Specify map key ordering rules
  * Include examples of valid/invalid encodings

This refactoring improves code organization while maintaining full compliance
with the CBOR deterministic encoding specification. The enhanced documentation
helps developers understand both implementation details and RFC requirements.
Improve documentation and refactor validate_next() to align with RFC 8949 § 4.2
specification for deterministically encoded CBOR. Split validation logic into
smaller, focused functions for better maintainability.

- Split validate_next into specialized validation functions:
  * validate_integer() - Handles minimal integer encoding
  * validate_array() - Validates definite-length arrays
  * validate_string() - Checks string/bytes encoding
  * validate_map() - Ensures proper key ordering

- Add comprehensive documentation referencing RFC 8949:
  * Detail core deterministic encoding requirements
  * Document rules for integer minimality
  * Explain length field constraints
  * Specify map key ordering rules
  * Include examples of valid/invalid encodings

This refactoring improves code organization while maintaining full compliance
with the CBOR deterministic encoding specification. The enhanced documentation
helps developers understand both implementation details and RFC requirements.
@cong-or cong-or self-assigned this Jun 10, 2025
@cong-or cong-or changed the title feat(rust/cbork-utils): deterministic map decoding helper feat(rust/cbork): deterministic map decoding helper Jun 10, 2025
@cong-or cong-or added the review me PR is ready for review label Jun 10, 2025
@cong-or cong-or marked this pull request as ready for review June 10, 2025 20:59
Copy link
Contributor

github-actions bot commented Jun 10, 2025

Test Report | ${\color{lightgreen}Pass: 331/331}$ | ${\color{red}Fail: 0/331}$ |

@cong-or cong-or requested review from stevenj, no30bit and bkioshn June 10, 2025 21:25
Copy link
Contributor

@no30bit no30bit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I've left some comments mostly on the error types.

On that topic, I am a bit hesitant about introducing DetermenisticError. Some of its variants have parallels in minicbor::decode::Error. For example, DetermenisticError::UnexpectedEof is the same as Error::end_of_input() from minicbor. It gets a bit confusing, considering the later is also wrapped under one of the variants.

If I were to implement this, I'd rather wrap the other way around: our local error within minicbor's Error::custom. Our error would then be extractable via std::error::Error::source, and we'd get the benefits of integration inside of minicbor::Decode implementations seamlessly and Error::at.

d.skip()?;
let value_end = d.position();

// Extract the raw bytes for both key and value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Decoder::skip skips over a CBOR value, which can't be represented by less than 1 byte, wouldn't it be safe to assume the ranges extracted below to be ok by definition?

Or are we testing the Decoder::skip implementation itself here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressing this now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@no30bit error handling has been addressed along with the cbor header helper which can be used for various types.

@cong-or cong-or requested a review from no30bit June 14, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review me PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants