Skip to content

substream/fix: Allow empty payloads with 0-length frame #395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 26, 2025

Conversation

lexnv
Copy link
Collaborator

@lexnv lexnv commented May 22, 2025

This PR ensures that zero-length frames are propagated to higher levels.

An issue was noted with the compatibility between litep2p and smoldot, which resulted in smoldot not being accepted by substrate-based chains.

The issue stems from the litep2p substream (high level) implementation, which did not handle zero-length frames.

Smoldot sends during the notification handshake a zero-length frame (ie, frame [0]).
However, full nodes (litep2p and libp2p) send the node role Role::Full (ie, frame [1, 1]).
Substrate nodes expected the node Role, however zero-length frames were never punished.

This behavior causes the zero-length frames to go unnoticed and further call into poll_read.
In the case of smoldot, the connection is terminated after the handshake timeout of 10 seconds.

Testing Done

Closes: smol-dot/smoldot#2128

Thanks @josepot and @tomaka for repro-case and details 🙏

lexnv added 3 commits May 22, 2025 16:09
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv self-assigned this May 22, 2025
@lexnv lexnv added the bug Something isn't working label May 22, 2025
@tomaka
Copy link

tomaka commented May 22, 2025

Smoldot sends during the notification handshake a zero-length frame (ie, frame [0]).
However, full nodes (litep2p and libp2p) send the node role Role::Full (ie, frame [1, 1]).

Ah, I see.
If I recall correctly, my intention as the designer of the protocol was that notifications have an empty handshake. In Substrate, however, at the time the code written there was no way to differentiate between transactions and GrandPa in the code that was sending the handshake, so I simply sent a Role all the time. I guess I overlooked this when writing smoldot.

I'm not really opinionated on this, for what it's worth, and I can also modify smoldot to send a Role if desired.

Copy link

@sistemd sistemd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

lexnv added a commit that referenced this pull request May 23, 2025
Squashed commit of the following:

commit afc0f0b
Author: Alexandru Vasile <alexandru.vasile@parity.io>
Date:   Thu May 22 16:17:47 2025 +0000

    tests: Ensure empty with nonempty frames can interleave

    Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

commit ec76929
Author: Alexandru Vasile <alexandru.vasile@parity.io>
Date:   Thu May 22 16:15:02 2025 +0000

    tests: Ensure empty handshakes are propagated

    Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

commit a717663
Author: Alexandru Vasile <alexandru.vasile@parity.io>
Date:   Thu May 22 16:09:20 2025 +0000

    substream/fix: Handle empty payloads with 0-length frames

    Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv
Copy link
Collaborator Author

lexnv commented May 26, 2025

Sorry for the late response here🙏

Indeed this was a miss on the litep2p side. Assuming it's a minor change, sending the role of the light-client (https://github.com/paritytech/polkadot-sdk/blob/6ac204d52beab2e5c22588519f639cc03781ec26/substrate/client/network/common/src/role.rs#L93) would still benefit Kusama, which uses litep2p entirely. We'll follow up with a release shortly, but I expect a window of 2-3 weeks until enough nodes update to the fix.

@lexnv lexnv merged commit db7cbb0 into master May 26, 2025
2 checks passed
@lexnv lexnv deleted the lexnv/fix-smoldot branch May 26, 2025 09:54
lexnv added a commit that referenced this pull request May 26, 2025
## [0.9.5] - 2025-05-26

This release primarily focuses on strengthening the stability of the
websocket transport. We've resolved an issue where higher-level
buffering was causing the Noise protocol to fail when decoding messages.

We've also significantly improved connectivity between litep2p and
Smoldot (the Substrate-based light client). Empty frames are now handled
correctly, preventing handshake timeouts and ensuring smoother
communication.

Finally, we've carried out several dependency updates to keep the
library current with the latest versions of its underlying components.

### Fixed

- substream/fix: Allow empty payloads with 0-length frame
([#395](#395))
- websocket: Fix connection stability on decrypt messages
([#393](#393))

### Changed

- crypto/noise: Show peerIDs that fail to decode
([#392](#392))
- cargo: Bump yamux to 0.13.5 and tokio to 1.45.0
([#396](#396))
- ci: Enforce and apply clippy rules
([#388](#388))
- build(deps): bump ring from 0.16.20 to 0.17.14
([#389](#389))
- Update hickory-resolver 0.24.2 -> 0.25.2
([#386](#386))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Jun 3, 2025
# Litep2p Becomes the Default Network Backend

This PR finalizes the [litep2p](https://github.com/paritytech/litep2p)
integration and makes it the default network backend for substrate-based
chains.

## Litep2p Improvements

After the stabilization, a forum post will follow with up to date
information and more accurate measurements of the live impact of
litep2p.

### CPU Usage Reduction

**Litep2p consumes roughly 2x less CPU than the libp2p alternative**.
This frees up resources for other usecases (subsystems) and enables
running nodes on more cost-efficient hardware.

This metric has been collected by the `networking::libp2p-node` metric
of a live Kusama validator. This represents the CPU time spent on
polling the networking task. Litep2p CPU consumption is on the left,
using roughtly 1.3x CPUs, while libp2p on the right uses roughly 2.9-3x
CPUs:

![Screenshot 2025-05-26 at 15 23
22](https://github.com/user-attachments/assets/17bf1ed8-b887-423e-b131-f0bbf146919e)


This metric has been collected by the NodeExporter of a live Kusama
validator. Litep2p CPU consumption is on the left, using roughtly 230
CPU units, while libp2p on the right uses roughly 350 CPU units. This
makes litep2p ~1.52 times more effiecient:

![Screenshot 2025-05-26 at 15 24
33](https://github.com/user-attachments/assets/8923cb56-241d-4e1d-9593-33c5def2ff4d)



### DHT Improvements and Authority Discovery

Litep2p is able to discover peers faster via the Kademlia protocol than
libp2p. This behavior manifests in faster discovery times for
validators. For context, libp2p discovers 1K DHT records (authority
records) in approximately 10 minutes, while litep2p discovers them in
just 2.5 minutes (for more info see
#7077 (comment)).

This will improve issues we've seen with libp2p that causes validators
to not receive rewards:
- #8548

### Stable Sync Peers

Litep2p presents a more stable peer count in comparison with the libp2p
backend. This ensures we can sync up faster than libp2p to the tip of
the chain. In an older experiment, litep2p syncs to the tip of the chain
in 526s, compared to 803s for libp2p. The stability of connections shows
improvements for other protocols as well:

![Screenshot 2025-05-26 at 15 01
59](https://github.com/user-attachments/assets/ac3607ba-a551-49e5-9a50-f5150a6b619f)

The previous image shows on the left the litep2p version and on the
right the libp2p version.


### Revert Kusama Enablement
This PR reverts #7866.
Litep2p is now enabled by default, we don't need to selectively enable
it on different chains.

### Litep2p 0.9.5

This release primarily focuses on strengthening the stability of the
websocket transport. We've resolved an issue where higher-level
buffering was causing the Noise protocol to fail when decoding messages.

We've also significantly improved connectivity between litep2p and
Smoldot (the Substrate-based light client). Empty frames are now handled
correctly, preventing handshake timeouts and ensuring smoother
communication.

Finally, we've carried out several dependency updates to keep the
library current with the latest versions of its underlying components.

Fixed:
- substream/fix: Allow empty payloads with 0-length frame
([#395](paritytech/litep2p#395))
- websocket: Fix connection stability on decrypt messages
([#393](paritytech/litep2p#393))

Changed:
- crypto/noise: Show peerIDs that fail to decode
([#392](paritytech/litep2p#392))
- cargo: Bump yamux to 0.13.5 and tokio to 1.45.0
([#396](paritytech/litep2p#396))
- ci: Enforce and apply clippy rules
([#388](paritytech/litep2p#388))
- build(deps): bump ring from 0.16.20 to 0.17.14
([#389](paritytech/litep2p#389))
- Update hickory-resolver 0.24.2 -> 0.25.2
([#386](paritytech/litep2p#386))


### Fix peerset reserve only mode

This has been move in PR:
#8650 for ease of
reviewing.
The PR rejects non-reserved peers in the reserved-only mode of the
litep2p notification peerset.

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
pgherveou pushed a commit to paritytech/polkadot-sdk that referenced this pull request Jun 11, 2025
# Litep2p Becomes the Default Network Backend

This PR finalizes the [litep2p](https://github.com/paritytech/litep2p)
integration and makes it the default network backend for substrate-based
chains.

## Litep2p Improvements

After the stabilization, a forum post will follow with up to date
information and more accurate measurements of the live impact of
litep2p.

### CPU Usage Reduction

**Litep2p consumes roughly 2x less CPU than the libp2p alternative**.
This frees up resources for other usecases (subsystems) and enables
running nodes on more cost-efficient hardware.

This metric has been collected by the `networking::libp2p-node` metric
of a live Kusama validator. This represents the CPU time spent on
polling the networking task. Litep2p CPU consumption is on the left,
using roughtly 1.3x CPUs, while libp2p on the right uses roughly 2.9-3x
CPUs:

![Screenshot 2025-05-26 at 15 23
22](https://github.com/user-attachments/assets/17bf1ed8-b887-423e-b131-f0bbf146919e)


This metric has been collected by the NodeExporter of a live Kusama
validator. Litep2p CPU consumption is on the left, using roughtly 230
CPU units, while libp2p on the right uses roughly 350 CPU units. This
makes litep2p ~1.52 times more effiecient:

![Screenshot 2025-05-26 at 15 24
33](https://github.com/user-attachments/assets/8923cb56-241d-4e1d-9593-33c5def2ff4d)



### DHT Improvements and Authority Discovery

Litep2p is able to discover peers faster via the Kademlia protocol than
libp2p. This behavior manifests in faster discovery times for
validators. For context, libp2p discovers 1K DHT records (authority
records) in approximately 10 minutes, while litep2p discovers them in
just 2.5 minutes (for more info see
#7077 (comment)).

This will improve issues we've seen with libp2p that causes validators
to not receive rewards:
- #8548

### Stable Sync Peers

Litep2p presents a more stable peer count in comparison with the libp2p
backend. This ensures we can sync up faster than libp2p to the tip of
the chain. In an older experiment, litep2p syncs to the tip of the chain
in 526s, compared to 803s for libp2p. The stability of connections shows
improvements for other protocols as well:

![Screenshot 2025-05-26 at 15 01
59](https://github.com/user-attachments/assets/ac3607ba-a551-49e5-9a50-f5150a6b619f)

The previous image shows on the left the litep2p version and on the
right the libp2p version.


### Revert Kusama Enablement
This PR reverts #7866.
Litep2p is now enabled by default, we don't need to selectively enable
it on different chains.

### Litep2p 0.9.5

This release primarily focuses on strengthening the stability of the
websocket transport. We've resolved an issue where higher-level
buffering was causing the Noise protocol to fail when decoding messages.

We've also significantly improved connectivity between litep2p and
Smoldot (the Substrate-based light client). Empty frames are now handled
correctly, preventing handshake timeouts and ensuring smoother
communication.

Finally, we've carried out several dependency updates to keep the
library current with the latest versions of its underlying components.

Fixed:
- substream/fix: Allow empty payloads with 0-length frame
([#395](paritytech/litep2p#395))
- websocket: Fix connection stability on decrypt messages
([#393](paritytech/litep2p#393))

Changed:
- crypto/noise: Show peerIDs that fail to decode
([#392](paritytech/litep2p#392))
- cargo: Bump yamux to 0.13.5 and tokio to 1.45.0
([#396](paritytech/litep2p#396))
- ci: Enforce and apply clippy rules
([#388](paritytech/litep2p#388))
- build(deps): bump ring from 0.16.20 to 0.17.14
([#389](paritytech/litep2p#389))
- Update hickory-resolver 0.24.2 -> 0.25.2
([#386](paritytech/litep2p#386))


### Fix peerset reserve only mode

This has been move in PR:
#8650 for ease of
reviewing.
The PR rejects non-reserved peers in the reserved-only mode of the
litep2p notification peerset.

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transactions broadcast via smoldot on Kusama are not being included
4 participants