RSCH-1005
Smart Consistency
Omer Mishael, KagemniKarimu
Date Compiled: 1/25/24
Overview
Lava implements Smart Consistency mechanisms designed to ensure that neither mempool inconsistency nor block inconsistency negatively affect the Quality of Service available from providers on the network. Smart consistency is a novel solution for mitigating the impact of differences in latest block and nonce state across nodes that does not rely on commonly used forms of stickiness which are ubiquitous in web3.
Smart Consistency solves both block inconsistency and mempool inconsistency by ensuring data freshness without loss of Provider diversity.
Rationale
A blockchain is a distributed storage system. As such, it lives under the theoretical constraints of the CAP theorem. The CAP Theorem, also known as Brewerās Theorem, posits that any distributed storage system can only provide two of the three following guarantees: consistency, availability, or partition tolerance. Importantly, partition tolerance is defined as the degree of system continuity despite dropped or delayed messages between nodes.
In the event of a network partition, requests must either be dropped (i.e. unavailability) or data must be presented stale (i.e. inconsistency). There must be at least one trade-off to maintain network continuity. Be that as it may, both inconsistency and unavailability are highly undesirable traits for blockchains - therefore, an intelligent tradeoff must be maximized.
Across web3 data access layers, this partitioning issue is most commonly solved through stickiness. Stickiness is the tendency for a data consumer to receive relay responses from the same node. The drawback is that stickiness, as most often implemented, enforces consistency at the expense of partition tolerance. It is a less than ideal trade-off.
Lava protocol avoids stickiness by using a newly developed paradigm called Smart Consistency. Smart Consistency mitigates blockchain inconsistencies by utilizing accountability measures, message parsing, and metadata analysis on a relay-by-relay basis to do predictions on which providers are most likely to provide the appropriate response. The end result is that a given consumerās usage of paired Providers can be inclusive of more providers without stickiness.
In a blockchain, there are two types of inconsistency to deal with:
- Block Inconsistency - Block inconsistency refers to the situation where multiple nodes have reached different states in the blockchain. The situation can result from network delays, soft forks, software issues, or malicious activities. Working across multiple nodes who may be synced to different blocks on the blockchain will produce inconsistent or unavailable answers (imaged below).
- Mempool Inconsistency - Mempool inconsistency refers to the situation wherein the local state of a node (i.e. its memory pool) has not been correctly or fully propagated to other nodes in the network. The situation can result from network delays, transaction propagation strategies and configurations, or malicious activities. A key aspect of mempool inconsistency is confusion around nonces. The nonce is often stored locally in a given nodeās memory pool (mempool). A nonce is typically a number associated with a transaction that represents its sequence in the list of transactions from a particular sender. Nodes having differing states of nonce order can cause transaction collisions or propagation of invalid transactions (imaged below).
A paradigm is needed to ensure that neither block inconsistency nor mempool inconsistency negatively affect the functioning of the protocol.
Findings & Considerations
Stickiness
Stickiness refers to the characteristic of a system or process where certain elements, such as users, tasks, or data, tend to remain within a particular node or server. This concept is often utilized in load balancing and resource allocation strategies. In relation to blockchain, it is the likelihood for a consumer to return to the same provider given a different relay. Using Stickiness solves both mempool and block inconsistency. However, Stickiness has advantages and disadvantages that are less than the optimal trade-offs possible. These are explained below.
Advantages
- High Availability: Stickiness ensures that consumers consistently interact with the same node. This can lead to improved performance and reliability, as the protocol is sure of whether a given node is reachable or not and can fall-over to a different node in the event that a node becomes unreachable or stops supplying responses.
- Ease of Implementation: Implementing stickiness in a blockchain is generally straightforward. It often involves simple rules or algorithms that bind a consumer or procedure call to a specific node, making the implementation process less complex compared to more dynamic node selection strategies.
Disadvantages
- Partition Intolerance: The primary issue with stickiness is that it introduces partition intolerance. That is, if a given consumerās selected provider has problems, that consumer will inherit them too ā under stickiness, a consumer relies solely on the provider it is stuck to, to provide accurate and fresh data. If the data is corrupted somehow, the consumer has lesser recourse.
- Limited Scalability: One of the major drawbacks of stickiness is that it can hinder scalability. As specific nodes become bound to certain consumers, it becomes challenging to distribute load evenly across nodes as demand increases. Under stickiness, growth will not inherently result in more supply. This approach of āpicking favoritesā can lead to bottlenecks and introduce a degree of centralization.
- Low Utilization: Stickiness can lead to uneven utilization of resources across the network. Some nodes may be overburdened with a high volume of consumer relays, while others remain underutilized - receiving only the bare minimum service requests. This imbalance, like scalability challenges, can lead to decreased overall network performance.
Smart Consistency
Instead of relying on Stickiness, consistency mechanisms are employed which perform relay by relay predictions with the result of being more inclusive of a diversity of Providers. These mechanisms are referred to as āSmart Consistency.ā Because Smart Consistency fundamentally deals with two different types of inconsistency (mempool and block), it implements different routes for solving each issue. In what follows, we examine each solution in detailā¦
Block Consistency
As noted, block inconsistency is the state of nodes being synced to different latest blocks. Lava works by selecting providers from a pairing list which is determined through provider stake, geolocation and Quality of Service Excellence scores. Pairing is completed by āa sophisticated algorithmic mechanism designed to match consumers with the most appropriate service providers, considering a wide array of inputs and constraints.ā Pairing is beyond the scope of this research paper, but should be understood to produce a list of viable Providers on the network for whom relay requests are routed to from a given Consumer within a given Epoch. Understanding pairing is essential to comprehending the strategy used to establish block consistency - so it may be worthwhile for the reader to review.
To solve block inconsistency, three steps are taken:
- the Consumer calculates the chance a provider has the needed block and algorithmically determines the Provider most likely to be synced to the correct spot,
- the Provider calculates the likelihood of serving the correct data and disqualifies itself if it cannot reasonably do so,
- the Consumer verifies the finalization proof in the headers of the provider response to ensure that the data is indeed correct and has not been misrepresented or corrupted.
A section is dedicated to each step.
Consumer Mechanism: Provider Optimization
First, consumers use the Provider Optimizer to work out which provider to use after receiving a pairing list. It is important to note that the last seen block of a consumer is cached in shared state amongst consumers. The last seen block of a consumer is a vital piece of information for determining the providerās optimality. Because this value is nil
prior to a consumerās first relay, the first relay does not have enforceable consistency. However, consistency is reinforced with each additional seen block and thus increases with each relay made.
The combination of the consumerās last seen block, most recently requested block, as well as the selected providerās latest block, and the latest block of all available providers are used to create a score.
//Pseudo-code Snippet of Consumer-side Calculation
probabilityBlockError := po.CalculateProbabilityOfBlockError(requestedBlock, providerData)
probabilityOfTimeout := po.CalculateProbabilityOfTimeout(providerData.Availability)
probabilityOfSuccess := (1 - probabilityBlockError) * (1 - probabilityOfTimeout)
probabilityBlockError*costBlockError + probabilityOfTimeout*costTimeout + probabilityOfSuccess*costSuccess
Based upon this score, the consumer is able to select a provider who they reasonably believe will be able to answer their relays with the freshest block.
Provider Mechanism: Handle Consistency Function
Next, the providers receive requests from consumers. The provider process is hard-coded to evaluate whether it can plausibly return the necessary data before supplying the consumer with a response. The provider determines this, roughly, by the following logic wrapped in the handleConsistency
function:
If what a Provider has as latest block is greater than or equal to ****what a Consumer has seen: send response
If what a Consumer has seen is greater than or equal to ****what a Provider has as latest block, but the requested block is lower than the last block a Consumer has seen: send response
If a Consumer requested and has seen a block height higher than what a Provider returns:
- Bail + New Provider
- Wait + Try Again
As can be seen, a provider listens to the request made and evaluates whether its latest block is great enough to overcome what a consumer has recently seen and/or what a consumer has recently requested. The provider will not send a response if it determines that its response would be less fresh than what a consumer needs. Once it has determined that its response would be inconsistent, it must decide between bailing and allowing another provider to service the request or waiting to catch up to the requested block and trying again. To do so, it must calculate the chance of catching up to the necessary latest block before the timeout period has elapsed.The calculation for this is fairly sophisticated and elicits further explanation.
The average block time for a blockchain is defined in each spec; it is used as the initial value to determine the speed at which blocks advance on a given chain. Note that block inconsistency is also a factor of the speed at which blocks advance (called here, the block rate). As long as blocks continue to advance, a fast blockchain, that is one which has a high block rate, does not have block inconsistency for long. Of course, blocks can advance at any rate regardless of historical movement, thus there can be significant deviation from the predefined rate set in specs and the real world speed. An equation is used to determine the block rate which approximates the cumulative distribution function (CDF) of a poisson distributed random variable (i.e., blocks for a blockchain):
Typical Cumulative Probability for a Poisson distribution
Inverse Upper Incomplete Gamma Function
To calculate the chance of success, an inverse upperbound gamma regularized function is used. This function is far more precise than taking an arithmetic average - yielding a precise moving average for the last 25 blocks. The actual result of this equation is interepreted optimistically: it is rounded down to postulate that blocks may move faster than the data implies. The equation requires 25 blocks minimum to make a calculation. If less than 25 blocks have been seen, the average block time (as defined in spec) is used. The provider estimates the probability of a new block appearing before the timeout. Based on the aforementioned calculations, the provider returns an error if the block gap , that is the distance between the consumerās requested block and the providerās latest block is too great to be overcome. It is expressed in psuedo-code here:
if (blockGap > blockLagForQosSync*2 || (blockGap > 1 && probabilityBlockError > 0.4)) && (seenBlock >= latestBlock) {
return latestBlock & ConsistencyError
}
Otherwise, the provider sleeps for the necessary time until it can sync its latest block and provide an accurate response.
Consumer Mechanism: Finalization Verification
Whenever a relay is made, each provider response is accompanied by a finalization proof which keeps the provider cryptographically accountable. This finalization proof (contained in the headers of a relay response) also expresses to the consumer what the providerās latest block is. A consumer has the ability to check the hashes and ensure that a provider is telling the truth. This verifies the consistency of a providerās response and it is the final step to ensuring block consistency.
Cumulatively, this approach yields consistency without sacrificing the diversity of providers queried.
Mempool Consistency
As noted, mempool inconsistency is the state of nonces differing that can result in confused Provider responses. This is particularly noteworthy on requests that affect nonces and the visibility of transactions Such requests easily fail when accessing the mempool of a different Provider than expected. Mempool consistency is established using request propagagtion.
Establishing mempool consistency is considerably easier than block consistency. To solve mempool inconsistency, changes in the local state of an altered Providerās mempool are propagated out to Providers in the Consumerās pairing List for a given epoch. Propagation ensures that all the providers that a consumer access within an epoch will share the same mempool information. Whenever a transaction or query based on the local state is encountered, the provider sends the transaction or query to all the active/available providers in the same Pairing list of that epoch. This ensures that each consumerās usable providers experience the same state transformation at the same time and eliminates possible discrepancies. This also brings many benefits to consumers as externalities:
- Forcefully propagating mempool data increases the chance that a transaction gets to a validator faster and thus creates latency resistance for the network.
- Enforcing duplicated state across multiple independent actors prevents the possibility of censorship.
This approach is a computationally inexpensive operation which retains high availability and consistency.
Future Considerations
QoS Excellence (RSCH-1001) - This research avoids explanation of reputational rating and provider optimization. Quality of Service Excellence, as a factor in determining pairing and provider optimization, is a significant topic of discussion. Quality of Service Excellence is cumulative and affects reputation and pairing for Providers. This is to be explained in future research.
Fraud Detection (RSCH-1002) - This research outlines how a consumer determines the likelihood of a provider providing the desired response, but it does not explain mentioned markers of honesty such as finalization proofs or other data reliability measures. This is to be explained in future research.
REFERENCE: N/A