Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert application close frame to transport close frame during handshake. #4169

Merged
merged 29 commits into from
Apr 15, 2024

Conversation

rzikm
Copy link
Member

@rzikm rzikm commented Mar 1, 2024

Fixes #4166.

Description

This PR ensures we convert application CONNECTION_CLOSE to transport CONNECTION_CLOSE when the handshake is not confirmed yet (and the peer may not be able to process the frame yet).

Testing

New tests added.

Documentation

No change needed.

@rzikm rzikm requested a review from a team as a code owner March 1, 2024 14:06
@rzikm rzikm changed the title 4166-app-to-transport-close Convert application close frame to transport close frame during handshake. Mar 1, 2024
Copy link

codecov bot commented Mar 1, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 84.04%. Comparing base (7d64cd2) to head (53e6200).
Report is 1 commits behind head on main.

Files Patch % Lines
src/core/packet_builder.c 83.33% 2 Missing ⚠️
src/core/crypto.c 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4169      +/-   ##
==========================================
- Coverage   85.59%   84.04%   -1.56%     
==========================================
  Files          56       56              
  Lines       15358    15384      +26     
==========================================
- Hits        13146    12929     -217     
- Misses       2212     2455     +243     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nibanks nibanks added the Area: Core Related to the shared, core protocol logic label Mar 2, 2024
@rzikm
Copy link
Member Author

rzikm commented Mar 4, 2024

I had some more time to spend on this, and it turns out to be a slightly bigger rabbit hole than I originally thought:

The way TLS abstraction on top of the CRYPTO stream works, is that when there is a pending validation, we keep "checked out" part of the buffer, and if we attempt to process future CRYPTO frame before the async validation is completed, an assert segfaults, which is a good thing I guess. However, this means that we have to pause processing CRYPTO data until the user sets the result for the async operation.

This is inconvenient especially on client, as current MsQuic implementation sends the CONNECTION_CLOSE on 1-RTT level, because it has the keys, but client derives the keys only once the (asynchronous) cert validation completes (on Linux at least). So the client will defer the CONNECTION_CLOSE until the async validation completes. We should implement sending CC on "Current and previous levels" as RFC suggests to get the desired behavior, but I would prefer to do that as a separate change.

@nibanks
Copy link
Member

nibanks commented Mar 4, 2024

Looks like we have down-level test failures, indicating we have changed behavior unexpectedly.

@rzikm rzikm force-pushed the 4166-app-to-transport-close branch from 13eb5b8 to d9d4bd4 Compare March 5, 2024 14:05
@rzikm
Copy link
Member Author

rzikm commented Mar 5, 2024

There seem to be some test failures in the kernel mode runs, @nibanks Do you have any suggestions how to debug those?

For the others, there seems to be some nondeterminism in the added tests. And the best way to fix it seemed to me to bite the bullet and implement the sending of CC on both current and lower encryption levels, but that is going to break down-level tests because it introduces a behavioral change (the client is going to process CC in Handshake packets earlier than CC in 1-RTT packets, where previously we did not send CC in Handshake packets). What do you think?

@nibanks
Copy link
Member

nibanks commented Mar 12, 2024

There seem to be some test failures in the kernel mode runs, @nibanks Do you have any suggestions how to debug those?

For the others, there seems to be some nondeterminism in the added tests. And the best way to fix it seemed to me to bite the bullet and implement the sending of CC on both current and lower encryption levels, but that is going to break down-level tests because it introduces a behavioral change (the client is going to process CC in Handshake packets earlier than CC in 1-RTT packets, where previously we did not send CC in Handshake packets). What do you think?

I'm not sure. I haven't had the time to dedicate to thinking about this recently, but I do suspect we will need to send both, and deal with fixing the down-level tests.

@rzikm rzikm force-pushed the 4166-app-to-transport-close branch from 400449b to 4c69b4e Compare March 25, 2024 12:48
@rzikm
Copy link
Member Author

rzikm commented Mar 26, 2024

Looking at CI errors, there are many failures on different cases of Handshake/WithHandshakeArgs1.Connect

 Handshake/WithHandshakeArgs1.Connect/9
D:\a\msquic\msquic\src\test\lib\HandshakeTest.cpp(427): error: Server->GetPeerClosed() not true

I think this PR creates a race condition. The Handshake.Connect test initiates a connection, asserts some properties and then closes the connection at the client side. However, based on the timing (whether client received HANDSHAKE_COMPLETED frame from Server), it would either send App close (test passes), or transport close followed by App close (test fails because server reports transport close).

An idea occurred to me, if we encounter transport close with APPLICATION_ERROR, we could delay the closing of the connection and check if there is a 1-RTT packet in the same datagram with an App close frame and report that close error instead. This would keep the same behavior as before (at least when running against other MsQuic servers of the same version). What do you think, @nibanks?

@nibanks
Copy link
Member

nibanks commented Mar 26, 2024

An idea occurred to me, if we encounter transport close with APPLICATION_ERROR, we could delay the closing of the connection and check if there is a 1-RTT packet in the same datagram with an App close frame and report that close error instead. This would keep the same behavior as before (at least when running against other MsQuic servers of the same version). What do you think, @nibanks?

I like the idea. I wonder if instead of delay, we treat it more like we allow the 1-RTT app close to overwrite the transport close. The change likely will just be that we need to keep processing the packets (and frames).

@rzikm rzikm force-pushed the 4166-app-to-transport-close branch from 770d900 to 7db08bb Compare April 4, 2024 14:58
@rzikm
Copy link
Member Author

rzikm commented Apr 4, 2024

@nibanks There are some test failures, but there are no logs attached in the downloadable artifacts, they might be the same as in #4169 (comment) but I can't say for sure.

@rzikm
Copy link
Member Author

rzikm commented Apr 9, 2024

I think I see what is going on. In the failed runs, server locally confirms the handshake (since TLS handshake completed), and calls
QuicBindingOnConnectionHandshakeConfirmed, which removes the connection from remote hash-based lookup.

If client sends CC before receiving the HANDSHAKE_DONE frame, it should according to RFC send it in Handshake packet as well. For Long-header packets MsQuic does lookup based on the Remote hash

msquic/src/core/binding.c

Lines 1449 to 1463 in 226138d

QUIC_CONNECTION* Connection;
if (!Binding->ServerOwned || Packets->IsShortHeader) {
Connection =
QuicLookupFindConnectionByLocalCid(
&Binding->Lookup,
Packets->DestCid,
Packets->DestCidLen);
} else {
Connection =
QuicLookupFindConnectionByRemoteHash(
&Binding->Lookup,
&Packets->Route->RemoteAddress,
Packets->SourceCidLen,
Packets->SourceCid);
}

but the connection was already removed from said lookup so both Handshake and 1-RTT packets with CC are discarded.

@rzikm
Copy link
Member Author

rzikm commented Apr 9, 2024

The comment above the mentioned code suggests that at server-side, Destination CIDs are not under server control for long-header packets. I needed to refresh my memory by reading the RFC

A client MUST change the Destination Connection ID it uses for sending packets in response to only the first received Initial or Retry packet. A server MUST set the Destination Connection ID it uses for sending packets based on the first received Initial packet. Any further changes to the Destination Connection ID are only permitted if the values are taken from NEW_CONNECTION_ID frames; if subsequent Initial packets include a different Source Connection ID, they MUST be discarded. This avoids unpredictable outcomes that might otherwise result from stateless processing of multiple Initial packets with different Source Connection IDs.

So it seems to me that RFC-conforming client can only choose the DCID of the first Initial packet, so it seems safe to lookup Handshake packets by DCID as well. what do you think?

@rzikm
Copy link
Member Author

rzikm commented Apr 9, 2024

Following quick and dirty attempt fixes the issue for my local repro

-   if (!Binding->ServerOwned || Packets->IsShortHeader) {
+   if (!Binding->ServerOwned || Packets->IsShortHeader || Packets->LH->Type == QUIC_HANDSHAKE_V1) {
        Connection =
            QuicLookupFindConnectionByLocalCid(
                &Binding->Lookup,
                Packets->DestCid,
                Packets->DestCidLen);
    } else {

Proper fix would have to take QUIC Version-dependent header Type values to account ofc.

What do you think, @nibanks?

@rzikm rzikm force-pushed the 4166-app-to-transport-close branch from 55147c2 to 97b770f Compare April 9, 2024 14:47
@nibanks
Copy link
Member

nibanks commented Apr 9, 2024

So it seems to me that RFC-conforming client can only choose the DCID of the first Initial packet, so it seems safe to lookup Handshake packets by DCID as well. what do you think?

Client can (and do I believe) continue to use the initial DCID beyond the first round trip. AFAIK, there is nothing that prevents that. So, while it might work with MsQuic as a client, it doesn't necessarily work with all clients.

You also need to consider if there is any (probably not) attack implication by allowing these types of packets to be routed.

What do you think, @nibanks?

You also need to make sure those fields have been validated already. Another thought I had though, is should we delay removing the DCID from the lookup table on client, until we have an ACK for the handshake confirmed frame? Would that fix this too?

@rzikm
Copy link
Member Author

rzikm commented Apr 9, 2024

Client can (and do I believe) continue to use the initial DCID beyond the first round trip.

I thought that is against the very first sentence of the linked paragraph.

A client MUST change the Destination Connection ID it uses for sending packets in response to only the first received Initial or Retry packet.

As for possible attacks, I don't see any, third parties should not have access to necessary handshake keys, and packets failing decryption/integrity check shouldn't affect the connection.

You also need to make sure those fields have been validated already.

The fields are luckily already validated at this point (assert at the beginning of the containing function).

Another thought I had though, is should we delay removing the DCID from the lookup table on client, until we have an ACK for the handshake confirmed frame? Would that fix this too?

That would work as well, but the fix would be slightly more complicated. I can change it if you think this would be a better fix.

@rzikm rzikm force-pushed the 4166-app-to-transport-close branch from 76aa2c1 to 53e6200 Compare April 9, 2024 18:12
@rzikm
Copy link
Member Author

rzikm commented Apr 9, 2024

Another thought I had though, is should we delay removing the DCID from the lookup table on client, until we have an ACK for the handshake confirmed frame? Would that fix this too?

My fix broke 0RTT tests. Looks like your idea works better, so let's go with that :D

@rzikm rzikm requested a review from nibanks April 9, 2024 20:13
QUIC_PATH* Path = &Connection->Paths[0];
CXPLAT_DBG_ASSERT(Path->Binding != NULL);
QuicBindingOnConnectionHandshakeConfirmed(Path->Binding, Connection);
}

QuicCryptoDiscardKeys(Crypto, QUIC_PACKET_KEY_HANDSHAKE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to throw away the handshake key still? Or should this whole function just be delayed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to keep them, when TLS handshake finishes at server side, client already must have 1-RTT keys. The issue this fixes is only about getting the coalesced datagram routed to the connection.

@nibanks nibanks merged commit 34117e4 into microsoft:main Apr 15, 2024
349 of 351 checks passed
@nibanks nibanks mentioned this pull request Apr 15, 2024
4 tasks
@nibanks nibanks added the Partner: .NET By or For the .NET team label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Core Related to the shared, core protocol logic Partner: .NET By or For the .NET team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConnectionShutdown with pending (async) CertificateValidation sends Application CONNECTION_CLOSE
2 participants