Skip to content

[Crash]: quinn-udp-0.5.10 mod.rs:54:37: no control buffer space remaining #2155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JayZhao opened this issue Feb 16, 2025 · 18 comments
Open

Comments

@JayZhao
Copy link

JayZhao commented Feb 16, 2025

Happened when I switched my VPN on.
iOS 18.2

thread 'tokio-runtime-worker' panicked at .cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-udp-0.5.10/src/cmsg/mod.rs:54:37:
**no control buffer space remaining**
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at /Users/zhaojianyin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/**//**
stack backtrace:
   0:        0x105141764 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h8eb5a5821b46260c
   1:        0x105180c3c - core::fmt::write::h6bf9cb4c4c47a54e
   2:        0x10513693c - std::io::Write::write_fmt::hd0a641e11c218eb6
   3:        0x105141618 - std::sys::backtrace::BacktraceLock::print::hf76ad77a8c09e37f
   4:        0x1051439d0 - std::panicking::default_hook::{{closure}}::h78ac8bb3d495a97f
   5:        0x105143808 - std::panicking::default_hook::h503bfd0313c845e1
   6:        0x105144320 - std::panicking::rust_panic_with_hook::hd3b0c8170d9b695a
   7:        0x105143ee8 - std::panicking::begin_panic_handler::{{closure}}::hdf3edd6896825dfd
   8:        0x105141c28 - std::sys::backtrace::__rust_end_short_backtrace::h82d56a36f86d1a7e
   9:        0x105143ba0 - _rust_begin_unwind
  10:        0x1051a7d08 - core::panicking::panic_fmt::ha93717efe1225db8
  11:        0x1051a80f8 - core::result::unwrap_failed::h1372364e47fb3337
  12:        0x104d8a770 - <quinn::endpoint::EndpointDriver as core::ops::drop::Drop>::drop::hd25c89a8ba52a62e
  13:        0x104d9ef64 - core::ptr::drop_in_place<quinn::endpoint::EndpointDriver>::h4f1ee359b046d2c7
  14:        0x104da0374 - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::he02d48720c04f9be
  15:        0x104da3a78 - tokio::runtime::task::core::Core<T,S>::poll::h8409a17260f02e21
  16:        0x104d9ac58 - tokio::runtime::task::harness::Harness<T,S>::poll::h08a7f8343a1281b8
  17:        0x1050ee8e0 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h3583143c49ce879a
  18:        0x1050eda68 - tokio::runtime::scheduler::multi_thread::worker::Context::run::h23aaf4c821426176
  19:        0x1050e0abc - tokio::runtime::context::runtime::enter_runtime::hbfd27983cd72a475
  20:        0x1050ed828 - tokio::runtime::scheduler::multi_thread::worker::run::h1b1befe57b05a227
  21:        0x1050d4e44 - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::h97f3aa66a0fc8f29
  22:        0x1050ad350 - tokio::runtime::task::core::Core<T,S>::poll::h10efa442f7a54dce
  23:        0x1050a63b0 - tokio::runtime::task::harness::Harness<T,S>::poll::hee9b81001259b38e
  24:        0x1050c16c8 - tokio::runtime::blocking::pool::Inner::run::h329147d67ff9681d
  25:        0x1050b014c - std::sys::backtrace::__rust_begin_short_backtrace::hcb0d7ef47e8282ee
  26:        0x1050b0b54 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h8b9619598a317b52
  27:        0x10514c41c - std::sys::pal::unix::thread::Thread::new::thread_start::he2247ebe94a346b7
  28:        0x102bca9ac - __pthread_start
thread 'tokio-runtime-worker' panicked at /Users/zhaojianyin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-0.11.6/src/endpoint.rs:709:50:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
stack backtrace:
   0:        0x105141764 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h8eb5a5821b46260c
   1:        0x105180c3c - core::fmt::write::h6bf9cb4c4c47a54e
   2:        0x10513693c - std::io::Write::write_fmt::hd0a641e11c218eb6
   3:        0x105141618 - std::sys::backtrace::BacktraceLock::print::hf76ad77a8c09e37f
   4:        0x1051439d0 - std::panicking::default_hook::{{closure}}::h78ac8bb3d495a97f
   5:        0x105143808 - std::panicking::default_hook::h503bfd0313c845e1
   6:        0x105144320 - std::panicking::rust_panic_with_hook::hd3b0c8170d9b695a
   7:        0x105143ee8 - std::panicking::begin_panic_handler::{{closure}}::hdf3edd6896825dfd
   8:        0x105141c28 - std::sys::backtrace::__rust_end_short_backtrace::h82d56a36f86d1a7e
   9:        0x105143ba0 - _rust_begin_unwind
  10:        0x1051a7d08 - core::panicking::panic_fmt::ha93717efe1225db8
  11:        0x1051a80f8 - core::result::unwrap_failed::h1372364e47fb3337
  12:        0x104d8bc84 - <quinn::endpoint::EndpointRef as core::ops::drop::Drop>::drop::h15ebd6db6b1c82a0
  13:        0x104d9eee4 - core::ptr::drop_in_place<quinn::endpoint::EndpointRef>::hc689b2034db35641
  14:        0x104d9efe0 - core::ptr::drop_in_place<quinn::endpoint::EndpointDriver>::h4f1ee359b046d2c7
  15:        0x104da0374 - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::he02d48720c04f9be
  16:        0x104da3a78 - tokio::runtime::task::core::Core<T,S>::poll::h8409a17260f02e21
  17:        0x104d9ac58 - tokio::runtime::task::harness::Harness<T,S>::poll::h08a7f8343a1281b8
  18:        0x1050ee8e0 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h3583143c49ce879a
  19:        0x1050eda68 - tokio::runtime::scheduler::multi_thread::worker::Context::run::h23aaf4c821426176
  20:        0x1050e0abc - tokio::runtime::context::runtime::enter_runtime::hbfd27983cd72a475
  21:        0x1050ed828 - tokio::runtime::scheduler::multi_thread::worker::run::h1b1befe57b05a227
  22:        0x1050d4e44 - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::h97f3aa66a0fc8f29
  23:        0x1050ad350 - tokio::runtime::task::core::Core<T,S>::poll::h10efa442f7a54dce
  24:        0x1050a63b0 - tokio::runtime::task::harness::Harness<T,S>::poll::hee9b81001259b38e
  25:        0x1050c16c8 - tokio::runtime::blocking::pool::Inner::run::h329147d67ff9681d
  26:        0x1050b014c - std::sys::backtrace::__rust_begin_short_backtrace::hcb0d7ef47e8282ee
  27:        0x1050b0b54 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h8b9619598a317b52
  28:        0x10514c41c - std::sys::pal::unix::thread::Thread::new::thread_start::he2247ebe94a346b7
  29:        0x102bca9ac - __pthread_start
thread 'tokio-runtime-worker' panicked at core/src/panicking.rs:231:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
@djc
Copy link
Member

djc commented Feb 16, 2025

Are you implying this is a regression in 0.5.10?

@JayZhao
Copy link
Author

JayZhao commented Feb 16, 2025

I'm not sure if it's a regression, but it did happen in 0.5.10 per this line:

thread 'tokio-runtime-worker' panicked at .cargo/registry/src/index.crates.io-6f17d22bba15001f/**quinn-udp-0.5.10/**src/cmsg/mod.rs:54:37

@djc
Copy link
Member

djc commented Feb 16, 2025

Yes, I'm asking: did you run your VPN software (what software is this?) with earlier versions of quinn-udp without running into this crash?

@JayZhao
Copy link
Author

JayZhao commented Feb 16, 2025

This is indeed the first time I've encountered this crash. However, this isn't because earlier versions were crash-free and 0.5.10 suddenly introduced the issue. Rather, I've only just started using this library beginning with version 0.5.10 itself."

The crash happened right at the moment when the VPN (still developing) was switched on. I managed to reproduce it several times, but it's not a 100% thing.

@JayZhao
Copy link
Author

JayZhao commented Feb 16, 2025

And in my toml:

quinn = { version = "0.11.6" }
quinn-proto = { version = "0.11.9" }
quinn-udp = { version = "0.5.10", features = ["fast-apple-datapath"] }

@djc
Copy link
Member

djc commented Feb 16, 2025

cc @mxinden

@djc
Copy link
Member

djc commented Feb 16, 2025

It would be useful to test with 0.5.9 (and perhaps earlier versions that have the fast-apple-datapath).

@mxinden
Copy link
Collaborator

mxinden commented Feb 17, 2025

@JayZhao in addition to djc's ask above, can you also try to reproduce the issue with the following patch applied?

diff --git a/quinn-udp/src/unix.rs b/quinn-udp/src/unix.rs
index c39941d5..26fcce2d 100644
--- a/quinn-udp/src/unix.rs
+++ b/quinn-udp/src/unix.rs
@@ -543,7 +543,7 @@ fn recv(io: SockRef<'_>, bufs: &mut [IoSliceMut<'_>], meta: &mut [RecvMeta]) ->
     Ok(1)
 }
 
-const CMSG_LEN: usize = 88;
+const CMSG_LEN: usize = 176;
 
 fn prepare_msg(
     transmit: &Transmit<'_>,

@JayZhao
Copy link
Author

JayZhao commented Feb 17, 2025

@JayZhao in addition to djc's ask above, can you also try to reproduce the issue with the following patch applied?

diff --git a/quinn-udp/src/unix.rs b/quinn-udp/src/unix.rs
index c39941d5..26fcce2d 100644
--- a/quinn-udp/src/unix.rs
+++ b/quinn-udp/src/unix.rs
@@ -543,7 +543,7 @@ fn recv(io: SockRef<'>, bufs: &mut [IoSliceMut<'>], meta: &mut [RecvMeta]) ->
Ok(1)
}

-const CMSG_LEN: usize = 88;
+const CMSG_LEN: usize = 176;

fn prepare_msg(
transmit: &Transmit<'_>,

@mxinden I failed to reproduce this crash even on 0.5.10 after several times, weird. I will try this patch too, thanks.

@Ralith
Copy link
Collaborator

Ralith commented Feb 17, 2025

This panic is from writing a cmsg buffer, which in theory we should be able to both manually compute and empirically test a worst-case for.

@JayZhao
Copy link
Author

JayZhao commented Feb 18, 2025

As a complete newbee to Rust, I was wondering if you might consider replacing expect() with if let Some pattern matching to return when no buffer space remains instead of panicking. It seems to me that the library's role is similar to that of a system API, which typically favors returning errors, even in extreme cases, rather than panicking/crashing.

@mxinden
Copy link
Collaborator

mxinden commented Feb 18, 2025

  • This should be a persistent error, i.e. always happen on consecutive calls. In other words, a user can not act on the error, other than not using quinn-udp at all.
  • I expect there to be an easy fix, e.g. increasing CMSG_LEN, once we get to the root of the issue.
  • I expect this to be a rare issue.

With the above in mind, I think an panic instead of returning an error is the way to go.

@JayZhao
Copy link
Author

JayZhao commented Mar 3, 2025

Hey, guys, with the help of AI and some luck I may have found the source of this crash:

Bug Report: Control Message Buffer Space Issue on macOS with apple_fast Feature

Summary

When using the apple_fast feature on macOS, quinn-udp is only able to add a single control message to each UDP message, despite having sufficient buffer space available. This leads to error messages like "No control buffer space remaining" and prevents important functionalities such as ECN marking and source address specification from working together.

Reproduction

This issue can be consistently reproduced on macOS when:

  1. The apple_fast feature is enabled
  2. The application attempts to add more than one control message to a UDP message
  3. Using the current implementation of MsgHdr for msghdr_x

Root Cause Analysis

The root cause is an API mismatch when handling control messages on macOS with apple_fast enabled:

  1. msghdr_x is a private Apple API structure used for sendmsg_x and recvmsg_x

  2. The current implementation casts this structure to the standard POSIX msghdr structure:

    let selfp = self as *const _ as *mut libc::msghdr;
    unsafe { libc::CMSG_NXTHDR(selfp, cmsg) }
  3. These structures have different memory layouts:

    • msghdr_x size: 56 bytes
    • msghdr size: 48 bytes
    • Difference: 8 bytes (due to the additional msg_datalen field in msghdr_x)
  4. This size discrepancy causes CMSG_NXTHDR to calculate the next control message position incorrectly, resulting in an invalid control message header with cmsg_len = 0.

  5. The current code interprets this as "no more space", while the actual issue is incompatible API usage.

Evidence

Logging the structure sizes confirms the mismatch:

[Structure Size Analysis] msghdr_x size: 56 bytes, standard msghdr size: 48 bytes, difference: 8 bytes

The error occurs when trying to get the next control message header after successfully adding the first one:

[Control Message-apple_fast] macOS returned an invalid control message header - cmsg_len value (0) is less than minimum requirement (12), returning null pointer

Only 16 bytes of the 88-byte buffer are used, confirming that this is not a true "out of space" issue:

[Control Message] Encoder destroyed - Total messages added: 1, buffer space used: 16 bytes

Impact and Limitations

  1. Functional Impact:

    • Applications requiring multiple control messages (e.g., both ECN and source address selection) cannot fully utilize apple_fast optimizations
    • The error message "No control buffer space remaining" is misleading
  2. Current Behavior:

    • Only the first control message can be successfully added
    • Subsequent control messages fail with an error message
    • The application continues to function, but without all intended control message functionality

Conclusion

This issue is not a bug in macOS, but a fundamental API incompatibility when mixing Apple's private msghdr_x structure with standard POSIX control message functions. When using apple_fast feature on macOS, there is an inherent limitation of one control message per UDP message due to the structural differences between the API types being used.

@JayZhao
Copy link
Author

JayZhao commented Mar 3, 2025

Also:

/*

  • Extended version for sendmsg_x() and recvmsg_x() calls
  • For recvmsg_x(), the size of the data received is given by the field
  • msg_datalen.
  • For sendmsg_x(), the size of the data to send is given by the length of
  • the iovec array -- like sendmsg(). The field msg_datalen is ignored.
    */
    struct msghdr_x {
    void __sized_by(msg_namelen) msg_name; / optional address /
    socklen_t msg_namelen; /
    size of address */
    struct iovec msg_iov; / scatter/gather array /
    int msg_iovlen; /
    # elements in msg_iov */
    void __sized_by(msg_controllen) msg_control; / ancillary data, see below /
    socklen_t msg_controllen; /
    ancillary data buffer len /
    int msg_flags; /
    flags on received message /
    size_t msg_datalen; /
    byte length of buffer in msg_iov */
    };

/*

  • The field "msg_flags" and "msg_datalen" must be set to zero on input.
    */
    ssize_t sendmsg_x(int s, const struct msghdr_x *msgp, u_int cnt, int flags);

@Ralith
Copy link
Collaborator

Ralith commented Mar 4, 2025

Hey, guys, with the help of AI and some luck I may have found the source of this crash:

Which parts of the remaining text can be trusted? Did you write and run the code to produce the evidence yourself?

If msghdr and msghdr_x do indeed have different layouts, then impl MsgHdr for crate::imp::msghdr_x is buggy and should be fixed (cc @mxinden who may be using this path in production). There's no reason you shouldn't be able to send/receive multiple messages.

Also:

I'm not sure what you're trying to illustrate here. Please use code blocks.

@mxinden
Copy link
Collaborator

mxinden commented Mar 5, 2025

I am assuming you are referring to the code below:

let selfp = self as *const _ as *mut libc::msghdr;
let next = unsafe { libc::CMSG_NXTHDR(selfp, cmsg) };

As far as I can tell, the cast to msghdr_x is fine and matches the Apple sample code:

https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/tests/recvmsg_x_test.c#L138-L151

Thanks for the additional work @JayZhao. The use of AI for debugging is a great idea. That said, can you please try to confirm your hypothesis above without?

@JayZhao
Copy link
Author

JayZhao commented Mar 5, 2025

I looked deeper into this later, the root of crash is this:

When sending a packet using sendmsg_x, after adding the first control message (16 bytes) and there are still 88-16 = 72 bytes avaiable, you always trigger this bug:
https://github.com/quinn-rs/quinn/blob/8d6e48c20b71f7e915c13c33b66e2a09b6d59888/quinn-udp/src/cmsg/unix.rs#L48C9-L55C10

Then, "no control buffer space remaining" is reported while you still have 72 bytes in the buffer space and causing panic.

I don't know low level networking much so I just disabled all control messages when using sendmsg_x and it worked fine.

@Ralith
Copy link
Collaborator

Ralith commented Mar 6, 2025

As far as I can tell, the cast to msghdr_x is fine and matches the Apple sample code:

Not quite equivalent, since that's a macro that accesses fields by name, but AFAICT the types have the same layout up to those fields anyway, so yeah it should be fine in practice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants