Skip to content

[UB] Access out-of-bounds data when parsing invalid metadata items #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
henrygab opened this issue Feb 24, 2025 · 9 comments
Open

[UB] Access out-of-bounds data when parsing invalid metadata items #208

henrygab opened this issue Feb 24, 2025 · 9 comments
Milestone

Comments

@henrygab
Copy link
Contributor

henrygab commented Feb 24, 2025

This code effectively accesses the data at i and at i+1:

picotool/bintool/metadata.h

Lines 257 to 261 in de8ae5a

if (new_p.flags & PICOBIN_PARTITION_FLAGS_HAS_ID_BITS) {
uint32_t low = data[i++];
uint32_t high = data[i++];
new_p.id = (uint64_t)low | ((uint64_t)high << 32);
}

However, the code does not ensure that data[1] will be valid:

picotool/bintool/metadata.h

Lines 237 to 242 in de8ae5a

std::vector<uint32_t> data;
for (unsigned int i=2; i < size; i++) {
data.push_back(*it++);
}
size_t i=0;
while (i < data.size()) {

From https://en.cppreference.com/w/cpp/container/vector/operator_at:

No bounds checking is performed.

Therefore, no guarantee that this will throw an exception (which might be expected), and as a result, clearly getting into undefined behavior.....

@henrygab
Copy link
Contributor Author

henrygab commented Feb 25, 2025

See next post....

old thoughts

To fix, verify that data.size() is sufficient before setting high.

For example:

if (new_p.flags & PICOBIN_PARTITION_FLAGS_HAS_ID_BITS) { 
    if ((data.size() - 1u) < i) {
        // handle this error condition properly ... don't access memory that's not ensured to be valid!
    }
    uint32_t low = data[i++]; 
    uint32_t high = data[i++]; 
    new_p.id = (uint64_t)low | ((uint64_t)high << 32); 
 } 

@henrygab
Copy link
Contributor Author

Actually, the code is much, MUCH worse that first anticipated. It's constantly incrementing i throughout, without then validating that data.size() < i before later access of data. This code is not following best practices, and seems worth having a senior developer fix many data-access bugs.

@ndabas
Copy link

ndabas commented Feb 25, 2025

(Disclaimer: I am not a maintainer or author of this repo)

The code is checking the appropriate flags to ensure that the data accesses are valid.

I think it would be constructive for you to:

  • Demonstrate a failure case (perhaps with invalid data input) where this code would actually do the wrong thing;
  • Provide a PR fixing those issues.

Note that even if the code fails on invalid data -- a program crash would be completely expected, and also completely harmless.

@henrygab
Copy link
Contributor Author

Note that even if the code fails on invalid data -- a program crash would be completely expected, and also completely harmless.

I must respectfully disagree with you on multiple levels.

I see you're a seasoned developer. At the same time, you appear to presume UB means the program will crash. Therefore, it's possible that you've been spared the agony of debugging something caused by compilers optimizing around UB....

Unfortunately, C and C++ are much more dangerous, and explicitly allow that, when UB occurs, the compiler can literally choose to do anything it wants. If you're not familiar with the perils of UB, ... welcome to the deep rabbit hole of debugging issues that appear to make no sense! Take some time to enjoy the horror stories, and learn without paying the terrible costs that UB optimizations have inflicted upon many....

As to this template: It does not place restrictions upon its template arguments, not perform any validation of those arguments (and thus ensure at least a clean exception is thrown). Let's see what happens with the following template arguments:

// Iterator points to following uint32_t values:
{ 0xXXXXXXXXu, 0xFFFFFFFFu, 0xFFFFFFFFu } // first value doesn't even matter for these examples....
uint32_t header = 0x00000300u; // aka 0x3u << 8

I'll walk through the code. Since we're both seasoned developers, I'm hoping it's enough to show a few areas of UB. Feel free to point out errors, as this is currently via code review:

// lines 229-235
uint32_t size = 0x00000003u; // result of decode_size(0x00000300u)
uint8_t singleton_count = 0x00u; // (header >> 24)
bool singleton = false;  // header & 0x80000000u;
uint8_t partition_count = 0x00u; // (header >> 24) & 0xFu;
uint32_t unpartitioned_flags = 0xXXXXXXXXu;
// Iterator now points to second element, value 0xFFFFFFFFu
pt = shared_pointer(0xXXXXXXXXu, singleton)

// lines 237-240 
// * here, if header was 0x00007700u, 0x75u values (access of most of which are UB) into the vector
// * here, since size was 3u, vector will initialize with the single next value from iterator parameter
std::vector<uint32_t> data = { 0xFFFFFFFFu }; 

// then, because data.size() == 1, and i is initialized to zero, the while() loop at 242 is entered.
i = 0
// line 244
//  ... = data[i++]; 
uint32_t permissions_locations = 0xFFFFFFFFu;
i = 1;
// line 249 results in UNDEFINED BEHAVIOR
//          per the C++ standard, no bounds checking on std::vector<>
// ... = data[i++]; // OOPS!  Vector doesn't have enough data....
permissions_flags = UNDEFINED BEHAVIOR

The UB potential continues from there. If you come to the understanding that UB is never acceptable, then at best the code here is extremely fragile with many unwritten and untested assumptions. If the code used a container that did bounds checking, then at least there would be the assurance of a proper exception being thrown. As it is, there is no such guarantee.

How long do you think it would take an average developer, when first presented with this function, to understand and explain what this code is actually doing in the normal case? How long until that developer could describe the function's unwritten assumptions? Can you enumerate those assumptions?

Fundamentally, C++ is designed to use exceptions. UB is not an exception. Code that allows for UB is arguably still OK in one-off, unsupported code that is discarded. Where that code is shared and forked and potentially used in other places ... then UB is strictly persona non grata, yes?

@ndabas
Copy link

ndabas commented Feb 27, 2025

Thanks for the detailed write-up and explanation.

I understand what UB is. All I was trying to say was that this isn't a case of UB:

  • The bounds check and access using data[..] are function calls, not direct array accesses.
  • So it would be UB if the actual index access is out of range, which I'm asserting is not actually the case.

In any case -- it's trivial to add the required bounds checks -- so why not create a PR and do that? (Again, I'm not an author or maintainer on this repo, so I cannot say whether they would consider merging it.)

@henrygab
Copy link
Contributor Author

henrygab commented Feb 27, 2025

  • So it would be UB if the actual index access is out of range, which I'm asserting is not actually the case.

Can you help me understand your assertion?

  • Above, the std::vector<> has a single element.
  • A single element means that the only valid index is zero.
  • Thus, data[0] is valid, while data[N] is invalid unless N is zero.
  • The variable i is used to access data multiple times as data[i++].
  • Specifically, i is shown to be non-zero while executing data[i++].
  • Therefore, those are index accesses that are out-of-range.
  • [EDIT] and ... From https://en.cppreference.com/w/cpp/container/vector/operator_at, std:vector<>'s [] operator: "Accessing a nonexistent element through this operator is undefined behavior."

Therefore, can you help me understand how you assert that the index access is NOT out of range?

@will-v-pi
Copy link
Contributor

@henrygab The scenario you describe would not be a valid partition table item - see section 5.9.4 of the RP2350 datasheet for more details on the structure of the partition table item. So you can currently expect undefined behaviour if you've created an incorrectly formatted partition table item, and the same goes for the other metadata items.

If this is causing you issues, then feel free to raise a PR to prevent the out-of-bounds accesses with incorrectly formatted metadata blocks (this can occur with some of the other metadata items too) - I will modify the title of this issue to reflect that this issue is only with incorrectly formatted metadata blocks.

@will-v-pi will-v-pi changed the title Access out-of-bounds data Access out-of-bounds data when parsing invalid metadata items Mar 4, 2025
@will-v-pi will-v-pi added this to the 2.2.0 milestone Mar 4, 2025
@henrygab henrygab changed the title Access out-of-bounds data when parsing invalid metadata items [UB] Access out-of-bounds data when parsing invalid metadata items Mar 6, 2025
@henrygab
Copy link
Contributor Author

henrygab commented Mar 6, 2025

So you can currently expect undefined behaviour if you've created an incorrectly formatted partition table item

Thank you for explaining the official position. Without trying to change this official position, can you confirm that I am understanding it correctly? This could help prioritize my efforts to those that are considered in-scope.

My understanding at the moment:

  1. picotool is allowed to presume that any metadata item provided to it is correctly formatted and valid.
  2. Therefore, hardening picotool against invalid / corrupt metadata items would be out-of-scope.
  3. Accordingly, if picotool parses invalid / corrupt metadata, undefined behavior is an acceptable result.

Thank you!

@will-v-pi
Copy link
Contributor

Thank you for explaining the official position. Without trying to change this official position, can you confirm that I am understanding it correctly? This could help prioritize my efforts to those that are considered in-scope.

Not sure what you mean by “official position”, I was just explaining how it currently works, and how your issue will only affect invalid binaries.

Obviously, it would be better to throw an error in these situations than go to undefined behaviour, but it’s not a high priority to fix as it only affects invalid binaries. If you want it fixed right now then feel free to raise a PR to fix it, otherwise we’ll try to fix it for the 2.2.0 release (you can see I’ve added it to that milestone).

Also, note that in most cases of corrupted metadata (eg truncated block, incorrect sizes of items), picotool will fail earlier during the block search, so it would throw an error there rather than reaching undefined behaviour here. To reach undefined behaviour here you’d probably need to create an invalid metadata block with correct sizes but incorrect flags (eg say your partition has a name of length 10 but you only write a name of length 3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants