RSCH-1000
Passable Quality of Service
Omer Mishael, KagemniKarimu
Date Compiled: 09/25/23
Overview
Lava implements a Passable Quality-of-Service (QoS) assurance. Network consumers score providers per relay on metrics of latency, sync, and availability. Scores on each metric are averaged over relays per session and presented to providers during the session. Providers report scores on-chain by taking the geometric mean of the three quality of service scores (latency, sync, availability) received, to come up with a single coefficient representing a final score. Quality of Service is assured by attenuating provider rewards in response to these final scores; a maximum score of 1
reaps full rewards whereas a minimum score of 0
reaps half rewards (50%).
Passable QoS differs from QoS excellence which is to be discussed in a later research paper.
Rationale
A system is needed to ensure that providers on the network are up to an operable standard. Passable QoS ensures that:
- providers reach a minimum or ‘passable’ threshhold to maintain active status on the network
- providers are rewarded in direct proportion to the quality of services rendered
- neither consumers nor providers are able to profitably give false quality reports
The system uses consumer-side scoring as the mechanism. Other systems considered were:
- Fisherman - Fisherman involves empowering an authoritative inspector which performs surprise reports on relays to unsuspecting providers. This is an approach widely used throughout web3, but it is neither trustless nor permissionless. Discovering or predicting the identity of the fisherman provides an attack surface for malicious actors. Additionally, the fisherman must be paid for this burden of scoring/inspecting: a smart attacker can manipulate the inspection or collude with the fisherman. Finally, and problematically, under this arrangement, there needs a verification mechanism for ensuring that the fisherman actually made these tests. Efficient solutions are complicated and require more computation and relays. By contrast, our consumer-side scoring mechanism uses existing relays to convey scores.
- Elective Fisherman - Elective Fisherman means electing, at random, and by mandate of the protocol, a provider or consumer who performs scoring/reporting and whose compensation for doing so is continued participation in the network. This approach is akin to jury duty or conscription, whereby being a participant on the network makes one eligible for fisherman duty. In this example, no further compensation or reward would be given to the fisherman. We toyed with this approach but ultimately decided it was less efficient than consumer-side reporting. As in the previously mentioned Fisherman approach, it also requires complicated proofs and verifications**.**
- Provider Performance Meta-analysis - Provider Meta-analysis involves collecting provider data on-chain for direct insights and analysis on quality of services. Barring an extremely clever provider learning how to falsify reports, this approach gives a very direct view into the relays received and serviced by a provider. However, it is highly data intensive and does not scale well. While this would conceivably work for a smaller network of consumers and providers, a network at-scale can be easily overloaded by the sheer volume of data providers would report. Again, the consumer-side scoring provides a more efficient implementation.
Note that Passable QoS does not ensure 1) the best provider is selected for each consumer, 2) the best providers receive the most rewards. Both of these assurances are characteristics of QoS Excellence, which is implemented separately from Passable QoS. Passable QoS is only a threshhold for which data is considered unusable and by which payments (specific to the unusable data) are impacted.
Findings & Considerations
Passable QoS is collected as a consumer-side score reported by providers on-chain to attenuate payments. To be fair and ungameable, payments must be proportionate to the effort expended by providers to complete a call. Every Lava call is measured in compute units (CUs). Compute Units (CUs) are an abstraction which quantify the compute intensiveness of a given request. Rewards are highly correlated with CUs - so it is easiest to think of compute units as a proxy for the ‘cost’ of a request. CUs have other implications related to Passable QoS as will be highlighted below.
Score
At its simplest form, a Passable QoS score is the geometric mean of three sub-scores (latency, sync, availability). Each of the three areas is calculated algorithmically without direction of the consumer. They are taken together to make a Passable QoS Score that will be 0
for complete failure, 1
for perfect performance, or, most likely, some number between 0
and 1
. This is sent on chain at the completion of each peer-to-peer session.
Latency
Latency is the amount of time elapsed before a request is returned. It is measured in ms
. Passable QoS for Latency is hard-coded in the client code on the rpcconsumer
. Latency of >x ms will produce a score of 0
and latency ≤x ms will produce a score of 1
, where x is defined as the latency threshhold for a given call. The formula for computing the latency threshold is below:
The values in this equation are currently hardcoded as follows and liable to change:
Variable | Value |
---|---|
Extra Relay Timeout | 0 ms for regular APIs; average block time for hanging APIs |
Time Per CU | 100 ms per CU |
Average World Latency | 300 ms |
What this demonstrates is that latency threshold scales linearly with CUs; as the amount of CUs increase for a given call, the allowable time elapsed is greater… In fact, some calls are knowingly expensive. In some cases, as with Hanging APIs, allowable latency is even greater, reaching up to the average block time. This makes latency scores more reflective of actual performance differences instead of mere computational difficulty.
Importantly, while each latency calculation is an absolute threshold, the latency scores are averaged over many relays per session. This means that the latency score for a fulfilled session is likely some number between 0
and 1
.
Sync
Sync is the provider’s proximity to the latest block on the serviced chain. It is measured by taking the median latest block (w/ block interpolation based off time) of all providers on a given chain, then demanding a provider does not fall too far behind. The distance a provider can lag behind is defined in the specification of a supported chain as a specific number of blocks. The current rpcconsumer
and LavaSDK
code reads this specific number from the blockchain and derives a sync score based on the provider’s position relative to it. If the provider distance is greater > than the number in the spec, the provider receives a score of 0
, if provider’s distance ≤ than number in the spec, they receive a score of 1
. Just as latency calculations are averaged over many relays per session, so are sync scores. This means that the value sent over a session (aggregated) will likely be some number between 0
and 1
.
We mentioned that block interpolation is used for determining the median latest block. The basis of this is that since we don’t have all the latest measurements from all providers at any one point, we derive a measurement based upon the average block time and latest measurement time & height. Our median latest block measurement is capped to the last seen block just in case a chain was halted or a block comes substantially slower than expected. Additionally, the benefit of doubt always goes to providers. If not enough providers are available to perform interpolation — the provider is optimistically assigned a score of 1
. The golang algorithm, summarizing sync calculations, is provided below:
for provider := range providerLatestBlocks {
interpolation := InterpolateBlocks(now, providerDataContainer.LatestBlockTime, averageBlockTime_ms)
expected := providerDataContainer.LatestFinalizedBlock + interpolation
// limit the interpolation to the highest seen block height
if expected > highestBlockNumber {
expected = highestBlockNumber
}
mapExpectedBlockHeights[providerAddress] = expected
}
medianOfExpectedBlocks := median(mapExpectedBlockHeights)
Availability
Availability is the tendency of a provider to respond to requests received. In a given session, a provider must respond to no less than 90% of requests. If a provider responds to 90% or less of requests, they receive a score of 0
. However, if they pass the minimum threshhold of 90%, they receive a score scaling from 0
at 90% all the way up to 1
for 100%. Thus, Passable QoS Availability is a scaled average of those scores. The two formulas below detail how this is calculated:
Total
As seen, each of these respective scores is an aggregate over a session and can be 0
, 1
, or any number between 0
and 1
. Because the Passable QoS Score is calculated as the geometric mean of the previous sub-scores, a 0
of any sub-score will lead to an overall score of 0
. This is reasonable to consider. Under this schema, a provider which has unreasonable latency, whose blocks are wildly out of sync, or who is unavailable to answer responses cannot possibly be considered as passable quality of service. Naturally, this antagonizes the worst performing providers on the network.
Rewards & Reporting
One of the stated goals of Passable QoS is to make sure API providers are up to a certain standard of performance. The current method of ensuring that is by directly affecting payments. Payments are degraded in response to low Passable QoS scores. Providers are guaranteed at least 50% of payment for a service relay, irrespective of their Passable QoS score. In other words, regardless of Passable QoS score, at least half-of-the-value of a serviced relay must be payed for. The remaining rewards which are not given to providers due to poor quality of service are burned.
Rewards
To calculate how many rewardable CU per session a provider may get paid for, we use the following equation:
Total CU is the total CU a consumer used in a session as given in the cryptographic signature in their submitted relays. The Total CU is multiplied against the reward modifier (calculated above) to get the Rewardable CU. As mentioned earlier, CU are a great proxy for the price of a transaction and are submitted by providers on-chain as part of a reward proof to sequester rewards.
Chart
To understand how this plays out in transactions, we can see the percentage of rewards retained. A simplified equation for calculating the reward percentage is f(x)=50x+50%, where x is the total Passable QoS score. This results roughly in the payout schedule below:
Passable QoS Score | Reward |
---|---|
0 | 50% |
0.125 | 56.25% |
0.25 | 62.5% |
0.5 | 75% |
0.625 | 62.5% |
0.75 | 87.5% |
1 | 100% |
This schema meets the goal of rewarding providers directly in proportion to the quality of their service on the network. Additionally, it ensures neither consumers nor providers can profitably give false reports. Because providers are guaranteed at least 50% of their reward, falsifying reports has a recurrent cost. This works as a strong disincentive for consumers wrongfully rating providers low.
Reports
Reports (containing Rewardable CUs) must be submitted by providers to the Lava chain in order to receive rewards. Passable QoS Reports are discrete for each session; what a provider received in a past session has no bearing on potential earnings or pairings. Passable QoS Scores are self-reported by providers on-chain as part of the reward proof. They are cryptographically protected and tamper resistant. The scores only affect the provider-consumer payment for a specific session. This creates the following expectations:
- For consumers, there are no rebates or refunds. Every relay has a minimum cost of 50% of the potential reward. There is no prevailing effect on the likelihood that consumers receive the same provider, good or bad, and no punishment against future sessions.
- For providers, bad quality of service decreases actual profitability (rewards earned) without affecting potential profitability (rewards-to-be-earned). Future sessions are unaffected by past sessions. This is enough to function as a disincentive for poor service without having any reputational qualities or affecting the volume of traffic coming to a provider.
The system works virtuously with consumers bearing the cost of their relays and providers bearing the cost of their service.
Future Considerations
QoS Excellence (RSCH-1001) - This research avoids explanation of reputational rating and provider optimization. Quality of Service Excellence, as a counterpart to Passable QoS, is the means by which the best provider is selected and the best providers are rewarded most often. Quality of Service Excellence is cumulative and lives beyond a specific session. This is to be explained in future research.
- Geolocation (RSCH-1004) - This research specifically mentions dynamic measures of latency in response to compute intensiveness of a relay. However, it does not address how geographical distance affects the rate at which responses are returned or the impact that this has upon quality of service. Geolocation is accounted for in Quality of Service Excellence using a specific modifying mechanism to be explained in future research.
Fraud Detection (RSCH-1002) - This research outlines how a provider must be available, timely, and fresh, but does not explain how or why a provider’s responses should be considered honest. Honesty is a guaranteed feature in the Lava protocol that is not assured by Passable QoS. This is to be explained in future research.
Availability jailing (RSCH-1003) - This research explains the penalty that a provider can receive when <90% available. However, it does not explain what happens when a provider is totally unresponsive and unable to submit proofs of lower availability on chain. Availability jailing is a mechanism which takes unavailable providers offline. This feature is to be explained in depth in future research.
REFERENCE: N/A