Convert application close frame to transport close frame during handshake. #4169

rzikm · 2024-03-01T14:06:21Z

Description

This PR ensures we convert application CONNECTION_CLOSE to transport CONNECTION_CLOSE when the handshake is not confirmed yet (and the peer may not be able to process the frame yet).

Testing

New tests added.

Documentation

No change needed.

src/core/connection.c

codecov · 2024-03-01T14:27:06Z

Codecov Report

Attention: Patch coverage is 93.33333% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 84.04%. Comparing base (7d64cd2) to head (53e6200).
Report is 1 commits behind head on main.

Files	Patch %	Lines
src/core/packet_builder.c	83.33%	2 Missing ⚠️
src/core/crypto.c	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4169      +/-   ##
==========================================
- Coverage   85.59%   84.04%   -1.56%     
==========================================
  Files          56       56              
  Lines       15358    15384      +26     
==========================================
- Hits        13146    12929     -217     
- Misses       2212     2455     +243

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/core/connection.c

src/platform/tls_schannel.c

rzikm · 2024-03-04T16:42:11Z

I had some more time to spend on this, and it turns out to be a slightly bigger rabbit hole than I originally thought:

The way TLS abstraction on top of the CRYPTO stream works, is that when there is a pending validation, we keep "checked out" part of the buffer, and if we attempt to process future CRYPTO frame before the async validation is completed, an assert segfaults, which is a good thing I guess. However, this means that we have to pause processing CRYPTO data until the user sets the result for the async operation.

This is inconvenient especially on client, as current MsQuic implementation sends the CONNECTION_CLOSE on 1-RTT level, because it has the keys, but client derives the keys only once the (asynchronous) cert validation completes (on Linux at least). So the client will defer the CONNECTION_CLOSE until the async validation completes. We should implement sending CC on "Current and previous levels" as RFC suggests to get the desired behavior, but I would prefer to do that as a separate change.

src/core/crypto.c

nibanks · 2024-03-04T18:10:35Z

Looks like we have down-level test failures, indicating we have changed behavior unexpectedly.

rzikm · 2024-03-05T18:50:09Z

There seem to be some test failures in the kernel mode runs, @nibanks Do you have any suggestions how to debug those?

For the others, there seems to be some nondeterminism in the added tests. And the best way to fix it seemed to me to bite the bullet and implement the sending of CC on both current and lower encryption levels, but that is going to break down-level tests because it introduces a behavioral change (the client is going to process CC in Handshake packets earlier than CC in 1-RTT packets, where previously we did not send CC in Handshake packets). What do you think?

nibanks · 2024-03-12T13:13:55Z

There seem to be some test failures in the kernel mode runs, @nibanks Do you have any suggestions how to debug those?

For the others, there seems to be some nondeterminism in the added tests. And the best way to fix it seemed to me to bite the bullet and implement the sending of CC on both current and lower encryption levels, but that is going to break down-level tests because it introduces a behavioral change (the client is going to process CC in Handshake packets earlier than CC in 1-RTT packets, where previously we did not send CC in Handshake packets). What do you think?

I'm not sure. I haven't had the time to dedicate to thinking about this recently, but I do suspect we will need to send both, and deal with fixing the down-level tests.

src/core/connection.c

src/core/crypto.c

src/core/packet_builder.c

src/core/send.c

src/test/lib/HandshakeTest.cpp

src/test/lib/EventTest.cpp

src/core/crypto.c

src/core/packet_builder.c

src/core/send.c

rzikm · 2024-03-26T11:48:42Z

Looking at CI errors, there are many failures on different cases of Handshake/WithHandshakeArgs1.Connect

 Handshake/WithHandshakeArgs1.Connect/9
D:\a\msquic\msquic\src\test\lib\HandshakeTest.cpp(427): error: Server->GetPeerClosed() not true

I think this PR creates a race condition. The Handshake.Connect test initiates a connection, asserts some properties and then closes the connection at the client side. However, based on the timing (whether client received HANDSHAKE_COMPLETED frame from Server), it would either send App close (test passes), or transport close followed by App close (test fails because server reports transport close).

An idea occurred to me, if we encounter transport close with APPLICATION_ERROR, we could delay the closing of the connection and check if there is a 1-RTT packet in the same datagram with an App close frame and report that close error instead. This would keep the same behavior as before (at least when running against other MsQuic servers of the same version). What do you think, @nibanks?

nibanks · 2024-03-26T13:16:49Z

An idea occurred to me, if we encounter transport close with APPLICATION_ERROR, we could delay the closing of the connection and check if there is a 1-RTT packet in the same datagram with an App close frame and report that close error instead. This would keep the same behavior as before (at least when running against other MsQuic servers of the same version). What do you think, @nibanks?

I like the idea. I wonder if instead of delay, we treat it more like we allow the 1-RTT app close to overwrite the transport close. The change likely will just be that we need to keep processing the packets (and frames).

This reverts commit 02ae60b.

rzikm · 2024-04-04T17:32:56Z

@nibanks There are some test failures, but there are no logs attached in the downloadable artifacts, they might be the same as in #4169 (comment) but I can't say for sure.

…rt-close

rzikm · 2024-04-09T13:45:17Z

I think I see what is going on. In the failed runs, server locally confirms the handshake (since TLS handshake completed), and calls
QuicBindingOnConnectionHandshakeConfirmed, which removes the connection from remote hash-based lookup.

If client sends CC before receiving the HANDSHAKE_DONE frame, it should according to RFC send it in Handshake packet as well. For Long-header packets MsQuic does lookup based on the Remote hash

msquic/src/core/binding.c

Lines 1449 to 1463 in 226138d

    
           QUIC_CONNECTION* Connection; 
        
           if (!Binding->ServerOwned || Packets->IsShortHeader) { 
        
               Connection = 
        
                   QuicLookupFindConnectionByLocalCid( 
        
                       &Binding->Lookup, 
        
                       Packets->DestCid, 
        
                       Packets->DestCidLen); 
        
           } else { 
        
               Connection = 
        
                   QuicLookupFindConnectionByRemoteHash( 
        
                       &Binding->Lookup, 
        
                       &Packets->Route->RemoteAddress, 
        
                       Packets->SourceCidLen, 
        
                       Packets->SourceCid); 
        
           }

but the connection was already removed from said lookup so both Handshake and 1-RTT packets with CC are discarded.

rzikm · 2024-04-09T14:04:45Z

The comment above the mentioned code suggests that at server-side, Destination CIDs are not under server control for long-header packets. I needed to refresh my memory by reading the RFC

A client MUST change the Destination Connection ID it uses for sending packets in response to only the first received Initial or Retry packet. A server MUST set the Destination Connection ID it uses for sending packets based on the first received Initial packet. Any further changes to the Destination Connection ID are only permitted if the values are taken from NEW_CONNECTION_ID frames; if subsequent Initial packets include a different Source Connection ID, they MUST be discarded. This avoids unpredictable outcomes that might otherwise result from stateless processing of multiple Initial packets with different Source Connection IDs.

So it seems to me that RFC-conforming client can only choose the DCID of the first Initial packet, so it seems safe to lookup Handshake packets by DCID as well. what do you think?

rzikm · 2024-04-09T14:12:03Z

Following quick and dirty attempt fixes the issue for my local repro

-   if (!Binding->ServerOwned || Packets->IsShortHeader) {
+   if (!Binding->ServerOwned || Packets->IsShortHeader || Packets->LH->Type == QUIC_HANDSHAKE_V1) {
        Connection =
            QuicLookupFindConnectionByLocalCid(
                &Binding->Lookup,
                Packets->DestCid,
                Packets->DestCidLen);
    } else {

Proper fix would have to take QUIC Version-dependent header Type values to account ofc.

What do you think, @nibanks?

…ake.

nibanks · 2024-04-09T15:05:38Z

So it seems to me that RFC-conforming client can only choose the DCID of the first Initial packet, so it seems safe to lookup Handshake packets by DCID as well. what do you think?

Client can (and do I believe) continue to use the initial DCID beyond the first round trip. AFAIK, there is nothing that prevents that. So, while it might work with MsQuic as a client, it doesn't necessarily work with all clients.

You also need to consider if there is any (probably not) attack implication by allowing these types of packets to be routed.

What do you think, @nibanks?

You also need to make sure those fields have been validated already. Another thought I had though, is should we delay removing the DCID from the lookup table on client, until we have an ACK for the handshake confirmed frame? Would that fix this too?

rzikm · 2024-04-09T15:16:11Z

Client can (and do I believe) continue to use the initial DCID beyond the first round trip.

I thought that is against the very first sentence of the linked paragraph.

A client MUST change the Destination Connection ID it uses for sending packets in response to only the first received Initial or Retry packet.

As for possible attacks, I don't see any, third parties should not have access to necessary handshake keys, and packets failing decryption/integrity check shouldn't affect the connection.

You also need to make sure those fields have been validated already.

The fields are luckily already validated at this point (assert at the beginning of the containing function).

Another thought I had though, is should we delay removing the DCID from the lookup table on client, until we have an ACK for the handshake confirmed frame? Would that fix this too?

That would work as well, but the fix would be slightly more complicated. I can change it if you think this would be a better fix.

…g handshake." This reverts commit 97b770f.

…ake. (Attempt 2)

rzikm · 2024-04-09T19:47:45Z

Another thought I had though, is should we delay removing the DCID from the lookup table on client, until we have an ACK for the handshake confirmed frame? Would that fix this too?

My fix broke 0RTT tests. Looks like your idea works better, so let's go with that :D

nibanks · 2024-04-09T20:18:08Z

src/core/crypto.c

+        QUIC_PATH* Path = &Connection->Paths[0];
+        CXPLAT_DBG_ASSERT(Path->Binding != NULL);
+        QuicBindingOnConnectionHandshakeConfirmed(Path->Binding, Connection);
+    }

    QuicCryptoDiscardKeys(Crypto, QUIC_PACKET_KEY_HANDSHAKE);


Does it make sense to throw away the handshake key still? Or should this whole function just be delayed?

I don't think it makes sense to keep them, when TLS handshake finishes at server side, client already must have 1-RTT keys. The issue this fixes is only about getting the coalesced datagram routed to the connection.

rzikm requested a review from a team as a code owner March 1, 2024 14:06

rzikm changed the title ~~4166-app-to-transport-close~~ Convert application close frame to transport close frame during handshake. Mar 1, 2024

rzikm commented Mar 1, 2024

View reviewed changes

src/core/connection.c Show resolved Hide resolved

rzikm commented Mar 1, 2024

View reviewed changes

src/core/connection.c Outdated Show resolved Hide resolved

nibanks added the Area: Core Related to the shared, core protocol logic label Mar 2, 2024

nibanks reviewed Mar 4, 2024

View reviewed changes

src/core/connection.c Show resolved Hide resolved

nibanks reviewed Mar 4, 2024

View reviewed changes

src/platform/tls_schannel.c Outdated Show resolved Hide resolved

rzikm commented Mar 4, 2024

View reviewed changes

src/core/crypto.c Outdated Show resolved Hide resolved

rzikm force-pushed the 4166-app-to-transport-close branch from 13eb5b8 to d9d4bd4 Compare March 5, 2024 14:05

rzikm force-pushed the 4166-app-to-transport-close branch from 400449b to 4c69b4e Compare March 25, 2024 12:48