Skip to content

[RFC] Simplified Mempool Design #4257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zivkovicmilos opened this issue May 7, 2025 · 0 comments
Open

[RFC] Simplified Mempool Design #4257

zivkovicmilos opened this issue May 7, 2025 · 0 comments
Assignees
Labels
📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related 🌱 feature New update to Gno 🌟 improvement performance improvements, refactors ...

Comments

@zivkovicmilos
Copy link
Member

zivkovicmilos commented May 7, 2025

RFC: Simplified Mempool Design

Authors: Milos Zivkovic (@zivkovicmilos), Dragan Milosevic (@Milosevic02)
Date: May 7, 2025
Status: Draft


Disclaimer: This design assumes transactions sequences can be extracted from a Tendermint transaction, which is not the case today

Abstract

This RFC is a continuation of issue #1830, that briefly outlined a few problems with the current Tendermint2 mempool design. It aims to expand on them here as well, with concrete examples on how they can be resolved.

The order in which issues are listed in the abstract are as follows:

  • user-facing issues, that immediately degrade UX
  • developer-facing issues, that hinder maintenance, performance, and stability

Out-of-order transaction inclusion

The current implementation of the mempool module does not consider the account sequence ordering when returning “ready” transactions for block building, causing transactions to fail when being executed as part of a block later down the line.

The mempool currently returns transactions only ordered by their appearance (FIFO), and not by account sequence first. To make matters worse, the gas values by which transactions are included in a block are user-defined values which can be skewed, as discussed later.

This leads to a scenario where the mempool can return transactions for block building that are not in the correct order of sending for an account. For example:

  1. TX 1 (nonce 0) sends a transfer to account X, initializing it in the process (the account was uninitialized)
  2. TX 2 (nonce 1), by account X, deploys a Realm
  3. These 2 transactions are prepared offline, and sent to the node for processing
  4. The mempool is asked to return ready transactions in the block building process of the node, returns [TX2, TX1], since they are ordered by appearance
  5. TX 2 fails because the account is uninitialized
  6. TX 1 succeeds
  7. The user (dApp / client) now has to send TX2 again, to get the desired state

Realm deploys cost more gas and appeared first, so TX2 will be executed before TX1, since the mempool returns this transaction first. TX 2 will fail because the account is uninitialized. Account balance transfers cost less gas and appeared later, so they will be executed later in the mempool implementation.

Regardless of the block-inclusion algorithm (discussed below), the transactions for a block should be picked by their “executability” first — the ones whose nonce matches the next state nonce of the account. Otherwise, nonce-mismatched txs will always fail (we call this out-of-order inclusion), and require a resubmission by the user / dApp / client.

Having this in mind, the only way to guarantee a path to valid transaction execution is to sort transactions for a block by their nonce first, and then by some other parameter that is relevant to the TM2 application, since it is guaranteed these types of transactions are potentially executable, as the nonce check won’t fail.

Block building uses user-defined gas for transactions

The PR #2838 introduced an EIP-1559-style fee model in which validator nodes (who propose blocks) can curate transactions based on their specified gas fee, or rather a "tip” meant to signal the transaction’s willingness to occupy valuable block space.

In a “functioning” EIP-1559 market (Ethereum, for example) the validator’s block building process would sort its candidate transactions by effective fee (=basefee + tip), so that users who attach a higher tip (pay more for their tx) see their transactions be committed to the chain earlier. That priority is the entire point of tipping, and minimum transaction fee markets.

In practice, TM2 does not honor that contract. Although the proposer does deduct the declared fee and route it to the coinbase (as part of the AnteHandler), the selection algorithm that fills the block ignores the fee signal entirely. Instead, it grabs the first (by appearance) largest transactions in the mempool by user-defined gas until the block limit is hit. What this effectively means is, a small (by gas) high-tip payment can be displaced by a much larger (by gas) low-tip transaction, even though the former is paying a higher effective fee per unit of gas. Users therefore receive none of the latency guarantees they believed they were buying.

	// ReapMaxBytesMaxGas reaps transactions from the mempool up to maxDataBytes
	// bytes total with the condition that the total gasWanted must be less than
	// maxGas.
	// If both maxes are negative, there is no cap on the size of all returned
	// transactions (~ all available transactions).
	ReapMaxBytesMaxGas(maxDataBytes, maxGas int64) types.Txs

The current implementation of the block building process does not consider the actual gas used by a transaction when trying to figure out if it can fit into a block. Instead, the block building process simply reads the (user defined) gas limit, and sees if the block has enough space to fit the transaction, even though the transaction actually uses less gas than what was specified.

This design eliminates the incentives EIP-1559 introduces.
Users who attach higher tips receive no latency advantage, so rational senders will adapt to minimize their tip and just overstate gas instead. The network could just as well be running on a static minimum-fee rule if all that matters is fitting into the block’s gas limit.

In practice, transaction block inclusion is something which has proven to be natively an application choice, and not a Tendermint Core-one. Applications should be the ones to decide how their transactions are prioritized in blocks. This concept is already adopted and live in Cosmos (via CometBFT) in the form of Mempool lanes.

No support for transaction “swapping”

Building off the previous point, the Tendermint2 mempool currently lacks the ability to “swap transactions” inside the mempool.

This is a QoL feature that allows users to “speed up” their transactions in case they specified an initially low fee for block inclusion. Modern EVM chains let a sender “swap” a pending transaction for a new one with the same nonce, but a higher effective gas price — it allows wallets to expose to the user an opportunity to cancel a transaction or accelerate it.

In this scenario, validators (block builders) would simply keep the highest-priced version (tx) and discard the rest, so the user can un-stick a transaction or raise its priority without waiting for time-outs. It is worth noting this mechanism only makes sense in case the previously discussed fee-ordering is present.

Mempool interface heavily leaks implementation details

The current mempool interface is defined as follows:

// Mempool defines the mempool interface.
//
// Updates to the mempool need to be synchronized with committing a block so
// apps can reset their transient state on Commit.
type Mempool interface {
	// CheckTx executes a new transaction against the application to determine
	// its validity and whether it should be added to the mempool.
	CheckTx(tx types.Tx, callback func(abci.Response)) error

	// CheckTxWithInfo performs the same operation as CheckTx, but with extra
	// meta data about the tx.
	// Currently this metadata is the peer who sent it, used to prevent the tx
	// from being gossiped back to them.
	CheckTxWithInfo(tx types.Tx, callback func(abci.Response), txInfo TxInfo) error

	// ReapMaxBytesMaxGas reaps transactions from the mempool up to maxDataBytes
	// bytes total with the condition that the total gasWanted must be less than
	// maxGas.
	// If both maxes are negative, there is no cap on the size of all returned
	// transactions (~ all available transactions).
	ReapMaxBytesMaxGas(maxDataBytes, maxGas int64) types.Txs

	// ReapMaxTxs reaps up to max transactions from the mempool.
	// If max is negative, there is no cap on the size of all returned
	// transactions (~ all available transactions).
	ReapMaxTxs(maxVal int) types.Txs

	// Lock locks the mempool. The consensus must be able to hold lock to safely update.
	Lock()

	// Unlock unlocks the mempool.
	Unlock()

	// Update informs the mempool that the given txs were committed and can be discarded.
	// NOTE: this should be called *after* block is committed by consensus.
	// NOTE: unsafe; Lock/Unlock must be managed by caller
	Update(blockHeight int64, blockTxs types.Txs, deliverTxResponses []abci.ResponseDeliverTx, newPreFn PreCheckFunc, maxTxBytes int64) error

	// FlushAppConn flushes the mempool connection to ensure async reqResCb calls are
	// done. E.g. from CheckTx.
	FlushAppConn() error

	// Flush removes all transactions from the mempool and cache
	Flush()

	// TxsAvailable returns a channel which fires once for every height,
	// and only when transactions are available in the mempool.
	// NOTE: the returned channel may be nil if EnableTxsAvailable was not called.
	TxsAvailable() <-chan struct{}

	// EnableTxsAvailable initializes the TxsAvailable channel, ensuring it will
	// trigger once every height when transactions are available.
	EnableTxsAvailable()

	// Size returns the number of transactions in the mempool.
	Size() int

	// TxsBytes returns the total size of all txs in the mempool.
	TxsBytes() int64

	// Maximum allowable tx size.
	MaxTxBytes() int64

	// InitWAL creates a directory for the WAL file and opens a file itself.
	InitWAL()

	// CloseWAL closes and discards the underlying WAL file.
	// Any further writes will not be relayed to disk.
	CloseWAL()
}

The Mempool interface is referenced from modules such as consensus, state, RPC, but it heavily leaks implementation details, and forces the mempool implementation to function in a specific way that is perhaps not optimal.

Locking / Unlocking

	// Lock locks the mempool. The consensus must be able to hold lock to safely update.
	Lock()

	// Unlock unlocks the mempool.
	Unlock()

The mempool should itself protect its invariants, and not expose its atomicity to the user of the mempool (who now needs to be aware of which operations on the mempool aren’t atomic).

Methods such as CheckTxWithInfo lock the entire mempool internally, without expecting the caller to use Lock / Unlock:

func (mem *CListMempool) CheckTxWithInfo(tx types.Tx, cb func(abci.Response), txInfo TxInfo) (err error) {
	mem.mtx.Lock()
	// use defer to unlock mutex because application (*local client*) might panic
	defer mem.mtx.Unlock()

WAL management

	// InitWAL creates a directory for the WAL file and opens a file itself.
	InitWAL()

	// CloseWAL closes and discards the underlying WAL file.
	// Any further writes will not be relayed to disk.
	CloseWAL()

WAL initialization and closing is exposed to the caller / user of the mempool. This initialization is done once, in fact, by the node when it’s initializing OnStart / OnStop — so it does not belong in the public API for all mempool users.

Forced FIFO priority

	// ReapMaxBytesMaxGas reaps transactions from the mempool up to maxDataBytes
	// bytes total with the condition that the total gasWanted must be less than
	// maxGas.
	// If both maxes are negative, there is no cap on the size of all returned
	// transactions (~ all available transactions).
	ReapMaxBytesMaxGas(maxDataBytes, maxGas int64) types.Txs

The block building process calls ReapMaxBytesMaxGas to get the list of transactions to include within a new block. Since ReapMaxBytesMaxGas returns the transactions that fit into the gas range, on a first-come-first-served basis:

	var totalBytes int64
	var totalGas int64
	// TODO: we will get a performance boost if we have a good estimate of avg
	// size per tx, and set the initial capacity based off of that.
	// txs := make([]types.Tx, 0, min(mem.txs.Len(), max/mem.avgTxSize))
	txs := make([]types.Tx, 0, mem.txs.Len())
	for e := mem.txs.Front(); e != nil; e = e.Next() {
		memTx := e.Value.(*mempoolTx)
		// Check total size requirement
		if maxDataBytes > -1 && totalBytes+int64(len(memTx.tx)) > maxDataBytes {
			return txs
		}
		totalBytes += int64(len(memTx.tx))
		// Check total gas requirement.
		// If maxGas is negative, skip this check.
		// Since newTotalGas < masGas, which
		// must be non-negative, it follows that this won't overflow.
		newTotalGas := totalGas + memTx.gasWanted
		if maxGas > -1 && newTotalGas > maxGas {
			return txs
		}
		totalGas = newTotalGas
		txs = append(txs, memTx.tx)
	}
	return txs

Notice that inclusion is based on user-defined gasWanted, and not by empirical data (transaction execution and validation before inclusion in a block).

Flushing app-connection

The mempool exposes an app-conn flush API that the mempool caller can invoke:

// FlushAppConn flushes the mempool connection to ensure async reqResCb calls are
// done. E.g. from CheckTx.
FlushAppConn() error

Following the implementation flow of this method leads to a no-op:

func (app *localClient) FlushSync() error {
	return nil
}

The user of the mempool should not be aware how the mempool communicates with the application, but this API exposes the design decision.

Unnecessary subscription API

The mempool exposes a top-level subscription API:

	// TxsAvailable returns a channel which fires once for every height,
	// and only when transactions are available in the mempool.
	// NOTE: the returned channel may be nil if EnableTxsAvailable was not called.
	TxsAvailable() <-chan struct{}

	// EnableTxsAvailable initializes the TxsAvailable channel, ensuring it will
	// trigger once every height when transactions are available.
	EnableTxsAvailable()

TxsAvailable returns a shared channel, which means multiple callers (listeners) can cause event starvation. It also has coupling with EnableTxsAvailable, causing the caller to be aware of the underlying implementation.

Changing this segment of the interface is a larger change that also affects the consensus process (block building), and as such is left for discussion.

C-List performance bottlenecks

As mentioned before, the mempool implementation has funky locking mechanics, that closes down the entire mempool, even for accounts whose transactions are not being pruned / updated. The larger the pool grows, the greater this lock contention becomes.

The mempool is implemented as a doubly linked list. The operations that are most common on the mempool:

  • “right” transaction fetching (with criteria)
  • new transaction adding (either through gossip, or RPC)
  • updating (pruning specific txs) after a block commit

Perhaps the most disturbing fact is that following a block commit, TM2 locks and rechecks every single tx in the pool before unlocking and allowing other parts of the system to interact with the mempool (ex. block building process that is waiting for the recheck to finish). In reality, this shouldn’t be the case because only a subset of the pool accounts / transactions have ideally been impacted with a block commit (and need to be updated), rather than the entire pool being processed all over again.

It is worth noting that given the c-list mempool is FIFO, implementing custom ordering on it is not feasible, so fee-priority transactions (as well as in-order txs) cannot be honored.

The Cosmos community outlined more problems with the C-list implementation here, which is worth reading into. These issues are still happening in 2025.

The mempool currently present in CometBFT, and in TM2 is flood-based, meaning transactions are gossiped out to peers regardless of their internal state. We discuss a DDoS protection mechanism below, in the proposal section that aims to eliminate node strain even in a gossip-based pool. Future mempool implementations should look into Bullshark, and how the DAG-based consensus layer in the mempool unlocks throughput.

Motivation

Given all that was mentioned, the intention of moving away from the c-list mempool implementation is the following:

  • Allow proper transaction processing
    • Significantly boosts UX by requiring fewer retries, enabling batch sends.
  • Honor pricier transactions in block-space
    • The EIP-1559-style change TM2 adopted a while back was only part of the picture, and incomplete without giving block space to the user who paid for it.
  • Allow for faster block building, by reducing contention moments in the mempool and increasing throughput
    • This will have a direct effect on TPS and consensus stability, as the aim is to eliminate drag and contention in the mempool.
  • Enable scalable DDoS protection, per account, and generally in the entire mempool — despite the flooding mechanism

Approaches

There have been more than a few attempts to improve the mempool implementation in Cosmos, with each attempt introducing a new spin. The scope of these changes is outside the realm of this RFC, but we leave the resource links as reference for additional study:

Proposal

The proposal will cover the high-level architecture of a new mempool design, and focus on how it solves the outlined problems mentioned in the abstract.

High-level API

// Tx defines the transaction abstraction
type Tx interface {
	// Hash returns the transaction hash (unique)
	Hash() []byte

	// Sender returns the sender's address
	Sender() crypto.Address

	// Sequence returns the transaction's nonce
	Sequence() uint64

	// Price returns the price of the tx, used for prioritization.
	// TODO this can be changed to something more general and app-controlled, such as a callback that returns a priority. The mempool can ask the AppConn for the priority, in which case this method can be dropped entirely
	Price() std.Coin

	// Size returns the size of the transaction in bytes
	Size() uint64
}

// Mempool defines the slot-based in-memory mempool abstraction
type Mempool interface {
	// AddTx adds a new transaction to the mempool.
	// The transaction is checked against local mempool state and the AppConn
	AddTx(tx Tx) error

	// Pending returns the next executable transactions from the mempool, if any, grouped by the transaction sender (sorted by nonce). Does not remove transactions from the mempool
	Pending() map[crypto.Address][]Tx

	// Update updates the mempool's internal state after a block commit, pruning transactions and updating accs
	Update(committed []Tx)

	// Content returns the full content of the mempool
	Content() []Tx

	// Flush removes all transactions from the mempool
	Flush()
}

Pending vs. Queued Txs

There needs to be a distinction between “pending” and “queued” transactions in the mempool.

Pending transactions are transactions which are immediately “executable”, meaning they are next-in-order for the account to be executed (they match the next account sequence).

In contrast, queued transactions are transactions which cannot be executed currently, since there is a transaction with a lower sequence (nonce) value still missing from the mempool. Queued transactions can eventually become pending transactions if the nonce gap is filled; otherwise, they are “shelved” for later use and inclusion, depending on available memory capacity.

Examples:

Account A:

On‑chain sequence = 5

Tx1 {nonce:5} -> pending	(ready to execute)
Tx2 {nonce:6} -> pending	(contiguous)
Tx3 {nonce:8} -> queued		(gap: missing nonce 7)

If a tx with nonce 7 arrives, tx3 will be promoted to pending.

---

Account B:

On‑chain sequence = 12

Tx4 {nonce:14} -> queued	(waiting for nonce 13)

---

Account C:

On‑chain sequence = 0

Tx5 {nonce:0} -> pending	(ready to execute)

Block Building

The block building process is straightforward with the proposed new API.
Since the mempool supports executable transactions fetches during the block building process through Pending, the process follows the flow:

flowchart TD
    A([Start the <br/>block building process])
    B["txs := mempool.Pending()"]
    C{"len(txs)" != 0?}
    D([Seal &<br/>broadcast the block])
    E["Take next<br/>tx from txs"]
    F{Fits in<br/>remaining gas?}
    G[[ideally apply tx<br/>↳ update block]]
    I{More txs<br/>in this list?}

    A --> B
    B --> C
    C -- No --> D
    C -- Yes  --> E
    E --> F
    F -- No --> D
    F -- Yes --> G --> I
    I -- Yes --> E         
    I -- No  --> D      

Loading

Notice that at no point does the block building process lock the entire pool to pick a transaction, meaning transactions can be coming in asynchronously into the pool.

Additionally, note that Pending does not remove the transactions from the pool, but just fetches executable transactions. The mempool is realigned when a block is committed, and the caller invokes Update with the executed transaction.

Upon updating the mempool, all “stale” transactions (transactions whose nonce is now less than the account state nonce) are pruned, as well as all committed transactions. At this point as well, transactions for a specific account can move from being “queued” to being “pending”.

Pending() vs. Pop()

There is a nuance in the proposed Mempool API that's worth discussing.

The proposed API doesn't include a Pop() Tx method, but instead relies on the Pending() []Tx & Update([]Tx) combination (where Pending is non-destructive).

The reason for this choice is to avoid a situation where a transaction is popped from the pool during the block building process and cannot be re-added back in (due to the pools capacity being filled up for example, in the meantime).

In most blockchain clients, a transaction is dropped from the mempool only when a block is committed to the canonical chain. The only exception seems to be the re-injecting of transactions into the mempool for orphaned blocks -- this situation doesn't exist for instant-finality chains like TM2.

DDoS protection

The mempool is entirely an in-memory structure. This means that space it occupies as part of the node process is limited, and should be tightly controlled.

We propose 3 memory limits to apply to the mempool in order for memory to be controlled properly and not degrade the node performance:

  • maximum pool capacity, in bytes — controls the overall memory footprint of the mempool, not allowing it to grow beyond a specific limit. This number depends on the forecasted usage of the node’s mempool, but it should be large enough to accommodate for uninterrupted throughput (considering the mempool is constantly filled and emptied through block building).
  • maximum number of transactions, per account — controls the total number of transactions a specific account (sender) can have in the mempool (which includes both pending and queued transactions). This number should be moderate, around 60.
  • maximum number of queued transactions, per account — controls the total number of queued transactions a specific account (sender) can have in the mempool. This number should usually be low, around 20.

The mempool should hold a “seen” cache / index, to know exactly which transactions it can reject in case it receives duplicates (from the RPC layer, or the p2p layer).

Mempool atomicity

The proposed mempool moves away from the double-linked list model and instead adopts a simple lookup table.

The mempool’s state can essentially be:

// account wraps a sender's mempool txs
type account struct {
	queued	[]Tx	// sorted by nonce
	pending	[]Tx	// sorted by nonce
	nonce	uint64	// latest state nonce. Updated during addition and updating the mempool (block commits)

	mux		sync.RWMutex
}

type Mempool struct {
	accounts sync.Map // crypto.Address -> account object
	
	// other required fields...
}

By having a separate lock per account, the mempool doesn’t cause contentious moments when:

  • a new tx is asynchronously added
  • a tx is being replaced (see "swapping”, mentioned before)
  • a tx is being removed / mempool is being flushed
  • the mempool is being updated after a block commit
@zivkovicmilos zivkovicmilos self-assigned this May 7, 2025
@zivkovicmilos zivkovicmilos added 🌱 feature New update to Gno 📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related 🌟 improvement performance improvements, refactors ... labels May 7, 2025
@Kouteki Kouteki moved this from Triage to In Progress in 🧙‍♂️gno.land core team May 8, 2025
@Kouteki Kouteki moved this from In Progress to In Review in 🧙‍♂️gno.land core team May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related 🌱 feature New update to Gno 🌟 improvement performance improvements, refactors ...
Projects
Status: In Review
Development

No branches or pull requests

1 participant