RSCH-1005: Smart Consistency

RSCH-1005

Smart Consistency

Omer Mishael, KagemniKarimu
Date Compiled: 1/25/24


Overview :telescope:

Lava implements Smart Consistency mechanisms designed to ensure that neither mempool inconsistency nor block inconsistency negatively affect the Quality of Service available from providers on the network. Smart consistency is a novel solution for mitigating the impact of differences in latest block and nonce state across nodes that does not rely on commonly used forms of stickiness which are ubiquitous in web3.

Smart Consistency solves both block inconsistency and mempool inconsistency by ensuring data freshness without loss of Provider diversity.

Rationale :dna:

A blockchain is a distributed storage system. As such, it lives under the theoretical constraints of the CAP theorem. The CAP Theorem, also known as Brewer’s Theorem, posits that any distributed storage system can only provide two of the three following guarantees: consistency, availability, or partition tolerance. Importantly, partition tolerance is defined as the degree of system continuity despite dropped or delayed messages between nodes.

In the event of a network partition, requests must either be dropped (i.e. unavailability) or data must be presented stale (i.e. inconsistency). There must be at least one trade-off to maintain network continuity. Be that as it may, both inconsistency and unavailability are highly undesirable traits for blockchains - therefore, an intelligent tradeoff must be maximized.

Across web3 data access layers, this partitioning issue is most commonly solved through stickiness. Stickiness is the tendency for a data consumer to receive relay responses from the same node. The drawback is that stickiness, as most often implemented, enforces consistency at the expense of partition tolerance. It is a less than ideal trade-off.

Lava protocol avoids stickiness by using a newly developed paradigm called Smart Consistency. Smart Consistency mitigates blockchain inconsistencies by utilizing accountability measures, message parsing, and metadata analysis on a relay-by-relay basis to do predictions on which providers are most likely to provide the appropriate response. The end result is that a given consumer’s usage of paired Providers can be inclusive of more providers without stickiness.

In a blockchain, there are two types of inconsistency to deal with:

  1. Block Inconsistency - Block inconsistency refers to the situation where multiple nodes have reached different states in the blockchain. The situation can result from network delays, soft forks, software issues, or malicious activities. Working across multiple nodes who may be synced to different blocks on the blockchain will produce inconsistent or unavailable answers (imaged below).

  1. Mempool Inconsistency - Mempool inconsistency refers to the situation wherein the local state of a node (i.e. its memory pool) has not been correctly or fully propagated to other nodes in the network. The situation can result from network delays, transaction propagation strategies and configurations, or malicious activities. A key aspect of mempool inconsistency is confusion around nonces. The nonce is often stored locally in a given node’s memory pool (mempool). A nonce is typically a number associated with a transaction that represents its sequence in the list of transactions from a particular sender. Nodes having differing states of nonce order can cause transaction collisions or propagation of invalid transactions (imaged below).

A paradigm is needed to ensure that neither block inconsistency nor mempool inconsistency negatively affect the functioning of the protocol.

Findings & Considerations :alembic:

Stickiness

Stickiness refers to the characteristic of a system or process where certain elements, such as users, tasks, or data, tend to remain within a particular node or server. This concept is often utilized in load balancing and resource allocation strategies. In relation to blockchain, it is the likelihood for a consumer to return to the same provider given a different relay. Using Stickiness solves both mempool and block inconsistency. However, Stickiness has advantages and disadvantages that are less than the optimal trade-offs possible. These are explained below.

Advantages

  1. High Availability: Stickiness ensures that consumers consistently interact with the same node. This can lead to improved performance and reliability, as the protocol is sure of whether a given node is reachable or not and can fall-over to a different node in the event that a node becomes unreachable or stops supplying responses.
  2. Ease of Implementation: Implementing stickiness in a blockchain is generally straightforward. It often involves simple rules or algorithms that bind a consumer or procedure call to a specific node, making the implementation process less complex compared to more dynamic node selection strategies.

Disadvantages

  1. Partition Intolerance: The primary issue with stickiness is that it introduces partition intolerance. That is, if a given consumer’s selected provider has problems, that consumer will inherit them too — under stickiness, a consumer relies solely on the provider it is stuck to, to provide accurate and fresh data. If the data is corrupted somehow, the consumer has lesser recourse.
  2. Limited Scalability: One of the major drawbacks of stickiness is that it can hinder scalability. As specific nodes become bound to certain consumers, it becomes challenging to distribute load evenly across nodes as demand increases. Under stickiness, growth will not inherently result in more supply. This approach of ‘picking favorites’ can lead to bottlenecks and introduce a degree of centralization.
  3. Low Utilization: Stickiness can lead to uneven utilization of resources across the network. Some nodes may be overburdened with a high volume of consumer relays, while others remain underutilized - receiving only the bare minimum service requests. This imbalance, like scalability challenges, can lead to decreased overall network performance.

Smart Consistency

Instead of relying on Stickiness, consistency mechanisms are employed which perform relay by relay predictions with the result of being more inclusive of a diversity of Providers. These mechanisms are referred to as ‘Smart Consistency.’ Because Smart Consistency fundamentally deals with two different types of inconsistency (mempool and block), it implements different routes for solving each issue. In what follows, we examine each solution in detail…

Block Consistency

As noted, block inconsistency is the state of nodes being synced to different latest blocks. Lava works by selecting providers from a pairing list which is determined through provider stake, geolocation and Quality of Service Excellence scores. Pairing is completed by “a sophisticated algorithmic mechanism designed to match consumers with the most appropriate service providers, considering a wide array of inputs and constraints.” Pairing is beyond the scope of this research paper, but should be understood to produce a list of viable Providers on the network for whom relay requests are routed to from a given Consumer within a given Epoch. Understanding pairing is essential to comprehending the strategy used to establish block consistency - so it may be worthwhile for the reader to review.

To solve block inconsistency, three steps are taken:

  1. the Consumer calculates the chance a provider has the needed block and algorithmically determines the Provider most likely to be synced to the correct spot,
  2. the Provider calculates the likelihood of serving the correct data and disqualifies itself if it cannot reasonably do so,
  3. the Consumer verifies the finalization proof in the headers of the provider response to ensure that the data is indeed correct and has not been misrepresented or corrupted.

A section is dedicated to each step.

Consumer Mechanism: Provider Optimization

First, consumers use the Provider Optimizer to work out which provider to use after receiving a pairing list. It is important to note that the last seen block of a consumer is cached in shared state amongst consumers. The last seen block of a consumer is a vital piece of information for determining the provider’s optimality. Because this value is nil prior to a consumer’s first relay, the first relay does not have enforceable consistency. However, consistency is reinforced with each additional seen block and thus increases with each relay made.

The combination of the consumer’s last seen block, most recently requested block, as well as the selected provider’s latest block, and the latest block of all available providers are used to create a score.

//Pseudo-code Snippet of Consumer-side Calculation
  probabilityBlockError := po.CalculateProbabilityOfBlockError(requestedBlock, providerData)
	probabilityOfTimeout := po.CalculateProbabilityOfTimeout(providerData.Availability)
	probabilityOfSuccess := (1 - probabilityBlockError) * (1 - probabilityOfTimeout)

probabilityBlockError*costBlockError + probabilityOfTimeout*costTimeout + probabilityOfSuccess*costSuccess

Based upon this score, the consumer is able to select a provider who they reasonably believe will be able to answer their relays with the freshest block.

Provider Mechanism: Handle Consistency Function

Next, the providers receive requests from consumers. The provider process is hard-coded to evaluate whether it can plausibly return the necessary data before supplying the consumer with a response. The provider determines this, roughly, by the following logic wrapped in the handleConsistency function:

If what a Provider has as latest block is greater than or equal to ****what a Consumer has seen: :white_check_mark: send response

If what a Consumer has seen is greater than or equal to ****what a Provider has as latest block, but the requested block is lower than the last block a Consumer has seen: :white_check_mark: send response

If a Consumer requested and has seen a block height higher than what a Provider returns:

  1. Bail + New Provider :negative_squared_cross_mark:
  2. Wait + Try Again :hourglass_flowing_sand::repeat_one:

As can be seen, a provider listens to the request made and evaluates whether its latest block is great enough to overcome what a consumer has recently seen and/or what a consumer has recently requested. The provider will not send a response if it determines that its response would be less fresh than what a consumer needs. Once it has determined that its response would be inconsistent, it must decide between bailing and allowing another provider to service the request or waiting to catch up to the requested block and trying again. To do so, it must calculate the chance of catching up to the necessary latest block before the timeout period has elapsed.The calculation for this is fairly sophisticated and elicits further explanation.

The average block time for a blockchain is defined in each spec; it is used as the initial value to determine the speed at which blocks advance on a given chain. Note that block inconsistency is also a factor of the speed at which blocks advance (called here, the block rate). As long as blocks continue to advance, a fast blockchain, that is one which has a high block rate, does not have block inconsistency for long. Of course, blocks can advance at any rate regardless of historical movement, thus there can be significant deviation from the predefined rate set in specs and the real world speed. An equation is used to determine the block rate which approximates the cumulative distribution function (CDF) of a poisson distributed random variable (i.e., blocks for a blockchain):

Equation1 Typical Cumulative Probability for a Poisson distribution

Equation 2 Inverse Upper Incomplete Gamma Function

To calculate the chance of success, an inverse upperbound gamma regularized function is used. This function is far more precise than taking an arithmetic average - yielding a precise moving average for the last 25 blocks. The actual result of this equation is interepreted optimistically: it is rounded down to postulate that blocks may move faster than the data implies. The equation requires 25 blocks minimum to make a calculation. If less than 25 blocks have been seen, the average block time (as defined in spec) is used. The provider estimates the probability of a new block appearing before the timeout. Based on the aforementioned calculations, the provider returns an error if the block gap , that is the distance between the consumer’s requested block and the provider’s latest block is too great to be overcome. It is expressed in psuedo-code here:

if (blockGap > blockLagForQosSync*2 || (blockGap > 1 && probabilityBlockError > 0.4)) && (seenBlock >= latestBlock) {
	return latestBlock & ConsistencyError
}

Otherwise, the provider sleeps for the necessary time until it can sync its latest block and provide an accurate response.

Consumer Mechanism: Finalization Verification

Whenever a relay is made, each provider response is accompanied by a finalization proof which keeps the provider cryptographically accountable. This finalization proof (contained in the headers of a relay response) also expresses to the consumer what the provider’s latest block is. A consumer has the ability to check the hashes and ensure that a provider is telling the truth. This verifies the consistency of a provider’s response and it is the final step to ensuring block consistency.

Cumulatively, this approach yields consistency without sacrificing the diversity of providers queried.

Mempool Consistency

As noted, mempool inconsistency is the state of nonces differing that can result in confused Provider responses. This is particularly noteworthy on requests that affect nonces and the visibility of transactions Such requests easily fail when accessing the mempool of a different Provider than expected. Mempool consistency is established using request propagagtion.

Establishing mempool consistency is considerably easier than block consistency. To solve mempool inconsistency, changes in the local state of an altered Provider’s mempool are propagated out to Providers in the Consumer’s pairing List for a given epoch. Propagation ensures that all the providers that a consumer access within an epoch will share the same mempool information. Whenever a transaction or query based on the local state is encountered, the provider sends the transaction or query to all the active/available providers in the same Pairing list of that epoch. This ensures that each consumer’s usable providers experience the same state transformation at the same time and eliminates possible discrepancies. This also brings many benefits to consumers as externalities:

  • Forcefully propagating mempool data increases the chance that a transaction gets to a validator faster and thus creates latency resistance for the network.
  • Enforcing duplicated state across multiple independent actors prevents the possibility of censorship.

This approach is a computationally inexpensive operation which retains high availability and consistency.

Future Considerations :test_tube:

QoS Excellence (RSCH-1001) - This research avoids explanation of reputational rating and provider optimization. Quality of Service Excellence, as a factor in determining pairing and provider optimization, is a significant topic of discussion. Quality of Service Excellence is cumulative and affects reputation and pairing for Providers. This is to be explained in future research.

Fraud Detection (RSCH-1002) - This research outlines how a consumer determines the likelihood of a provider providing the desired response, but it does not explain mentioned markers of honesty such as finalization proofs or other data reliability measures. This is to be explained in future research.


REFERENCE: N/A

3 Likes

Here are my thoughts after using AI to try to condense some of this:

  • I find it cool that the system will mathematically try to decide which provider has the latest data by comparing it to a group of providers before picking one.
  • So a provider will either wait to get the latest data or proxy it to another provider in a fallback if the client picks a bad node (the consumer side fails?).
  • I assume Finalization is the key thing that makes Lava stand out (trustlessness).
  • I’m not a blockchain expert, so it’s a stupid question, but the mempool consistency seems like selective gossip. I am unsure how that’s different from BTC with broadcasting (blockchain basics in general).
2 Likes

these are great points and i am glad you took the time to dive deep into this

  • providers cannot respond with an inconsistent response (yes finalization proofs + signatures are what enables that accountability), so they must respond with an error, and the client triggers a retry
  • definitely! keep to the basics when you can. turns out broadcasting your transaction directly is a very efficient way of solving mempool inconsistency and it lowers the time until your tx is accepted by a validator so why not :slight_smile: , we aren’t doing something that wasn’t done before, just leveraging our reach to many nodes, and making it widely available to lava users without a hassle, that’s a small advantage over using a single source for your rpc
2 Likes

It’s Amazing work guys

Ensuring consumer trust through finalization proofs and consistent mempool data propagation fosters network reliability and resistance to latency and censorship.

2 Likes

Thank you @APPIE sir! :smile:

Wouldn’t be possible without Master Providers like yourself! :heart:

2 Likes

We’re really into how the Lava protocol tackles the stickiness issue. It suggests a solution that not only amps up data consistency but also sidesteps the compromises tied to partition intolerance, limited scalability, and uneven resource usage. Plus, the way it handles changes in the mempool to ensure consistency is a brilliant example of efficiently tackling challenges. Can’t wait to see how this cool Smart Consistency implementation will step up the game and improve service quality in the blockchain! Exciting stuff ahead! We look forward to learning more about the QoS and Fraud Detection system.

5 Elements Nodes

1 Like