Original Article Title: On the limits of encrypted mempools
Original Article Authors: Pranav Garimidi, Joseph Bonneau, Lioba Heimbach, a16z
Original Article Translation: Saoirse, Foresight News
In blockchain, the maximum value that can be extracted, known as Maximum Extractable Value (MEV), is earned by deciding which transactions to include in a block, which to exclude, or adjusting the transaction order. MEV is prevalent in most blockchains and has been a widely discussed topic in the industry.
Note: This article assumes readers have a basic understanding of MEV. Some readers may first read our MEV explainer article.
Many researchers, when observing the MEV phenomenon, have raised a specific question: Can cryptographic techniques solve this issue? One proposed solution is to use an encrypted mempool: users broadcast encrypted transactions that are only decrypted and revealed after sorting is completed. As a result, the consensus protocol must "blindly select" the transaction order, seemingly preventing MEV opportunistic gains during the sorting phase.
Unfortunately, both from a practical application and theoretical perspective, encrypted mempools cannot provide a universal solution to the MEV problem. This article will outline the challenges and explore possible design directions for encrypted mempools.
There have been many proposals regarding encrypted mempools, but the general framework is as follows:
1. Users broadcast encrypted transactions.
2. Encrypted transactions are submitted to the chain (in some proposals, transactions need to undergo a verifiable random shuffling first).
3. When the block containing these transactions is eventually confirmed, the transactions are decrypted.
4. Finally, execute these transactions.
It is important to note that there is a key issue in step 3 (transaction decryption): Who is responsible for decryption? What happens if decryption is not completed? A simple idea is to let users decrypt their own transactions (in this case, encryption may not even be necessary, only commitment hiding is required). However, this approach has a vulnerability: attackers may engage in speculative MEV.
In speculative MEV, an attacker speculates that a particular encrypted transaction contains an MEV opportunity, then encrypts their own transaction and attempts to insert it into a favorable position (such as before or after the target transaction). If the transactions are ordered as expected, the attacker will decrypt and extract MEV through their own transaction; if not, they will refuse to decrypt, and their transaction will not be included in the final blockchain.
One possible approach is to impose penalties on users whose decryption fails, but the implementation of this mechanism is extremely challenging. The reason is that the penalty for all encrypted transactions must be consistent (as transactions are indistinguishable once encrypted) and the penalty must be severe enough to deter speculative MEV even in the face of high-value targets. This would result in a significant amount of locked-up funds that need to remain anonymous to avoid revealing the association between transactions and users. Furthermore, if genuine users are unable to decrypt their transactions due to software bugs or network failures, they would also suffer losses.
Therefore, most solutions suggest that when encrypting transactions, it must be ensured that they can be decrypted at some point in the future, even if the initiating user is offline or uncooperative. This goal can be achieved through the following methods:
Trusted Execution Environments (TEEs): Users can encrypt transactions to a key held in a secure enclave of a Trusted Execution Environment (TEE). In some basic versions, the TEE is only used to decrypt transactions at a specific time (requiring time-awareness within the TEE). More complex schemes involve the TEE decrypting transactions and constructing blocks, sorting transactions based on arrival time, fees, and other criteria. The advantage of TEE over other encrypted mempool solutions is its ability to directly process plaintext transactions, reducing on-chain redundancy by filtering out transactions that will be rolled back. However, this method is dependent on hardware trustworthiness.
Secret-sharing and threshold encryption: In this scheme, users encrypt transactions to a key held by a specific committee (usually a subset of validators) collectively. Decryption requires meeting a certain threshold condition (e.g., two-thirds of the committee members agreeing).
With threshold decryption, the trusted entity shifts from hardware to the committee. Proponents argue that since most protocols implicitly assume validators possess the "honest majority" property in their consensus mechanism, a similar assumption can be made that the majority of validators will remain honest and not decrypt transactions prematurely.
However, it is important to note a key distinction here: these two trust assumptions are not the same concept. Forks in blockchain and other consensus failures have public visibility (a form of "weak trust assumption"), while malicious committees privately decrypting transactions prematurely leave no public evidence, making such attacks undetectable and unpunishable (a form of "strong trust assumption"). Therefore, although on the surface the security assumptions of consensus mechanisms and encrypted committees may seem aligned, the credibility of the assumption that "committees will not collude" is much lower in practice.
Time-Lock and Delay Encryption: As an alternative to threshold encryption, the principle of delay encryption is as follows: a user encrypts a transaction to a certain public key, with the corresponding private key hidden behind a time-locked puzzle. A time-locked puzzle is a type of sealed secret cryptographic puzzle where the secret content can only be revealed after a preset time, specifically, the decryption process requires iteratively executing a series of non-parallelizable computations. In this mechanism, anyone can solve the puzzle to obtain the key and decrypt the transaction, but only after completing a sufficiently time-consuming slow (essentially serial) computation, ensuring that the transaction cannot be decrypted until final confirmation. The strongest form of this encryption primitive is to publicly generate such puzzles through delay encryption technology; alternatively, a trusted committee can approximate this process using time-lock encryption, although the relative advantage over threshold encryption is debatable.
Whether using delay encryption or having a trusted committee perform computations, such schemes face various practical challenges: firstly, as delay inherently depends on a computation process, ensuring the accuracy of decryption time is difficult; secondly, these schemes rely on specific entities running high-performance hardware to efficiently solve puzzles, and while anyone can take on this role, how to incentivize that entity to participate remains unclear; finally, in such designs, all broadcasted transactions will be decrypted, including those transactions that were never finally written into a block. In contrast, threshold-based (or witness) encryption schemes may only decrypt transactions that have been successfully included.
Witness Encryption: The most advanced cryptographic scheme is the use of "witness encryption" technology. In theory, the mechanism of witness encryption is as follows: after encrypting information, only someone who knows a specific NP relationship corresponding to "witness information" can decrypt it. For example, information can be encrypted such that only someone who can solve a certain Sudoku puzzle or provide a specific numeric hash pre-image can complete decryption.
(Note: NP relationship refers to the correspondence between a "problem" and an "answer that can be quickly verified")
For any NP relationship, a similar logic can be achieved through SNARKs. Essentially, witness encryption encrypts data in a form where only entities that can prove satisfaction of specific conditions through SNARK can decrypt it. In the encrypted mempool scenario, a typical example of such conditions is: transactions can only be decrypted after the block is finally confirmed.
This is a highly promising theoretical primitive. In fact, it is a generic solution, where committee-based and delay-based approaches are merely specific applications. Unfortunately, we currently do not have any practically deployable witness-based encryption schemes. Furthermore, even if such schemes exist, it is challenging to say they would be more advantageous in a proof-of-stake chain than committee-based approaches. Even if witness encryption is set to "only decrypt transactions once they are ordered in a finally determined block," a malicious committee can still privately simulate a consensus protocol to fake the final confirmation status of transactions and then use this private chain as "evidence" to decrypt transactions. At this point, using threshold decryption by the same committee can achieve equivalent security with much simpler operations.
However, in a Proof of Work consensus protocol, the benefits of witness encryption are more significant. Even if the committee is entirely malicious, it cannot privately mine multiple new blocks on top of the current blockchain head to forge the final settlement state.
Several practical challenges constrain the ability of encrypted memory pools to prevent MEV. Overall, information confidentiality itself is a daunting task. It is worth noting that cryptographic technology is not widely used in the Web3 field, but our decades-long practice of deploying encryption technology in networks (such as TLS/HTTPS) and secure communication (from PGP to modern encrypted messaging platforms like Signal and WhatsApp) has fully exposed the complexities involved: encryption, while a tool to protect confidentiality, cannot provide absolute security.
First, some entities may directly access users' transaction plaintext information. In a typical scenario, users usually do not encrypt transactions themselves but delegate this task to wallet service providers. As a result, wallet service providers can access the transaction plaintext and may even use or sell this information to extract MEV. The security of encryption always depends on all entities that can access the keys. The extent of key control is the security boundary.
Moreover, the biggest issue lies in metadata, the unencrypted data surrounding the encrypted payload (transaction). Searchers can use this metadata to infer transaction intent and then engage in speculative MEV. It is important to note that searchers do not need to fully understand the transaction content or guess correctly every time. For example, as long as they can reasonably determine that a transaction is a buy order from a specific decentralized exchange (DEX), it is enough to launch an attack.
We can categorize metadata into several types: one is the classic problem inherent in cryptographic technology, while the other is specific to encrypted memory pools.
· Transaction Size: Encryption itself cannot hide the plaintext size (it is worth noting that the formal definition of semantic security explicitly excludes hiding the plaintext size). This is a common attack vector in encrypted communication, with a typical example being that even with encryption, eavesdroppers can determine in real-time what is being played on Netflix by analyzing the size of each data packet in the video stream. In encrypted memory pools, specific types of transactions may have unique sizes, thereby leaking information.
· Broadcast Time: Encryption also cannot conceal time information (another classic attack vector). In the Web3 scenario, some senders (such as in a structured sell-off scenario) may initiate transactions at fixed intervals. Transaction time may also be correlated with other information, such as activities on external exchanges or news events. A more discreet use of time information is in the arbitrage between centralized exchanges (CEX) and decentralized exchanges (DEX): sorters can exploit the latest CEX price information by inserting transactions created as late as possible, while excluding all other transactions broadcast after a certain time point (even if encrypted), ensuring their transactions enjoy the latest price advantage.
· Source IP Address: Observers can infer the transaction sender's identity by monitoring the peer-to-peer network and tracking the source IP address. This issue was discovered in the early days of Bitcoin (over a decade ago). If a specific sender has a predictable behavior pattern, this is highly valuable to observers. For example, knowing the sender's identity allows linking encrypted transactions to previously decrypted transactions.
· Transaction Sender with Fee / Gas Information: Transaction fees are a specific metadata type in the crypto mempool. In Ethereum, a standard transaction includes the on-chain sender address (for fee payment), the maximum gas limit, and the gas price the sender is willing to pay per unit. Similar to the source network address, the sender address can be used to correlate multiple transactions and real-world entities; the gas limit can hint at the transaction's intent. For example, interacting with a specific DEX may require a recognizable fixed gas amount.
Sophisticated observers may combine multiple metadata types mentioned above to predict transaction content.
In theory, this information can all be concealed, but at the cost of performance and complexity. For instance, padding a transaction to a standard length can hide size but wastes bandwidth and on-chain space; adding pre-send delay can hide timing but increases latency; submitting transactions via anonymous networks like Tor can hide IP addresses but introduces new challenges.
The most challenging metadata to hide is the transaction fee information. Encrypted fee data poses a series of issues for block builders: firstly, the spam problem. If transaction fee data is encrypted, anyone can broadcast incorrectly formatted encrypted transactions, which, although included in the order, cannot pay fees, cannot be executed upon decryption, and no one can be held accountable. This might be addressable through SNARKs, proving transaction format correctness and sufficient funds, but incurring significant overhead.
Secondly, there is an efficiency issue with block building and fee auction. Builders rely on fee information to create blocks that maximize profit and determine the current market price of on-chain resources. Encrypted fee data disrupts this process. One solution is to set a fixed fee for each block, which is economically inefficient and may foster secondary markets for transaction packaging, contradicting the original design of the crypto mempool. Another approach is to conduct fee auctions through secure multi-party computation or trusted hardware, both incurring extremely high costs.
Lastly, a secure crypto mempool will increase system overhead in several ways: encryption will add latency, computational load, and bandwidth consumption to the chain; how to integrate with crucial future goals like sharding or parallel execution is currently unclear; new failure points for liveness may be introduced (such as decryption committees in threshold schemes, delay function solvers); and both design and implementation complexity will significantly escalate.
Many of the challenges facing the encrypted mempool are similar to those faced by privacy-focused blockchains (such as Zcash and Monero). If there is any silver lining, it is this: solving all the challenges of encryption technology in MEV mitigation will also help clear obstacles for transaction privacy.
Lastly, the encrypted mempool also faces economic challenges. Unlike technical challenges, the latter can be gradually mitigated through sufficient engineering investment. These economic challenges are fundamental constraints that are extremely difficult to address.
The core issue of MEV stems from the information asymmetry between transaction initiators (users) and MEV opportunity seekers (searchers and block builders). Users are usually unaware of how much extractable value is embedded in their transactions, so even with a perfect encrypted mempool, they may still be induced to reveal decryption keys in exchange for a reward lower than the actual MEV value, a phenomenon known as "incentivized decryption."
This scenario is not hard to imagine, as similar mechanisms like MEV Share already exist in reality. MEV Share is an order flow auction mechanism that allows users to selectively submit transaction information to a pool, where seekers compete to gain the right to exploit the MEV opportunity of that transaction. The winning bidder, after extracting the MEV, returns a portion of the proceeds (either the bid amount or a certain percentage) to the user.
This model can be directly adapted to the encrypted mempool: users need to disclose decryption keys (or some information) to participate. However, most users are unaware of the opportunity cost of participating in such a mechanism; they only see the immediate return and are willing to disclose information. Similar cases exist in traditional finance, such as the zero-commission trading platform Robinhood, whose profit model relies on "payment-for-order-flow," selling user order flow to third parties.
Another potential scenario is where large builders, under the guise of review, force users to disclose transaction content (or related information). Resistance to censorship is an important and contentious topic in the Web3 space, but if large validators or builders are legally obliged (such as by the US Office of Foreign Assets Control - OFAC regulations) to enforce a review list, they may refuse to process any encrypted transactions. From a technical perspective, users may be able to confirm that their encrypted transactions comply with review requirements through zero-knowledge proofs, but this would add additional costs and complexity. Even if the blockchain has strong censorship resistance (ensuring encrypted transactions are inevitably included), builders may still prioritize placing plaintext transactions at the front of the block and encrypted transactions at the end. Therefore, those needing to ensure execution priority may ultimately be forced to disclose content to the builders.
The encrypted mempool introduces system overhead in several obvious ways. Users must encrypt transactions, and the system must somehow decrypt them, increasing computational costs and potentially enlarging transaction size. As mentioned earlier, processing metadata further exacerbates these costs. However, there are some efficiency costs that are not as straightforward. In the financial realm, a market is considered efficient when prices reflect all available information; delays and asymmetric information lead to market inefficiency. This is precisely the inevitable result of the encrypted mempool.
This inefficiency can lead to a direct consequence: increased price uncertainty, a direct byproduct of the additional delay introduced by the encrypted mempool. As a result, transactions failing due to exceeding price impact tolerance may increase, leading to wasted on-chain space.
Likewise, this price uncertainty may also give rise to speculative MEV transactions, where such transactions attempt to profit from on-chain arbitrage. It is worth noting that the encrypted mempool may make such opportunities more common: due to execution delays, the current state of decentralized exchanges (DEXs) becomes more uncertain, likely leading to decreased market efficiency and price discrepancies across different trading platforms. These speculative MEV transactions also waste block space as they often abort execution once arbitrage opportunities are not discovered.
The purpose of this article is to outline the challenges facing the encrypted mempool so that people can focus their efforts on developing other solutions, although the encrypted mempool may still be part of MEV governance solutions.
One feasible approach is a hybrid design: some transactions go through the encrypted mempool for "blind sorting," while others adopt different sorting schemes. For certain types of transactions (e.g., large market participants' buy/sell orders who have the capability to intricately encrypt or front-run transactions and are willing to pay a higher cost to avoid MEV), a hybrid design may be a suitable choice. For highly sensitive transactions (such as transactions to fix vulnerabilities in security contracts), such a design also makes practical sense.
However, due to technical limitations, high engineering complexity, and performance costs, the encrypted mempool is unlikely to become the anticipated "universal MEV solution." The community needs to develop other solutions, including MEV auctions, application-layer defense mechanisms, and shortening final confirmation times, among others. MEV will remain a challenge in the near future, and a balance of various solutions needs to be found through in-depth research to counter its negative impacts.
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia