KYVE Fundamentals Article 1: The Hazards of Poor Infrastructure and Lack of Trustless Data

KYVE Team
KYVE
Published in
6 min readOct 17, 2022

--

With the development of decentralization, open-source, trustlessness, and more, there’s no denying that Web3 has introduced numerous advancements for solving significant Web2 roadblocks. However, even these innovations have overlooked pain points, most notably — not entirely relying on trustless, verified data, leaving data feeds vulnerable to malicious attacks.

Today, anyone can bring data on-chain and claim it’s true, creating a significant risk for those who might implement it into their project. The idea of a trustless environment emerged from the need to eliminate these untrustworthy sources, instead relying on network participants with distributed trust and incentivization to only bring forward valid data. In theory, this is a great process; however, improper implementation in the node infrastructure and overlooked details can easily lead to its own risk factors.

Take the Ronin Bridge attack, for example, in March 2022. A hacker compromised five of the network’s nine validator nodes, taking the majority to approve their withdrawal of over 173.6k ETH and 25.5M USDC from the bridge. In this situation, the Ronin Bridge team did not suspect anything, as there was no process to raise suspicion about the true incentives of each node.

Now begs the question, how exactly can data be implemented into all future Web3 builds in a truly trustless way? And once on-chain, how can we correctly validate this data in a decentralized way, assuring no further risk of tampering or manipulation?

Vulnerabilities When Introducing Data

When projects source off-chain data, oracles are their go-to tool as they provide an easy access point to deterministic Web2 data. However, no matter if an oracle claims that they verify the data or not, there is an underlying issue that remains: Lack of trustlessness. How can we be sure that the data the oracle fetched was from the correct source and or hasn’t been previously tampered with?

Suppose a project automatically implements incorrect data provided by an oracle without double-checking. In that case, it can become highly vulnerable to attacks, losses, improper pricing, and/or data process discrepancies as it opens itself up to an attack vector from one source.

Just last week, for example, Mango Markets, a trading platform on Solana, had a targeted attack via oracle discrepancies. Two malicious accounts took an outsized position in MNGO-PERP, leading to a fluctuation of 5–10x in the MNGO price. With this, two oracles updated their MNGO benchmark, further causing a market-to-market increase in the value, all from an unrealized profit.

As mentioned by the Mango Markets team, “…neither oracle providers have any fault here. The oracle price reporting worked as it should have.” Clearly showing how limited oracles can be in terms of providing “valid” data.

Another typical attack is if a hacker targets the majority of validator nodes in a network to approve certain actions, AKA a 51% attack. This can occur if, for example, a network doesn’t have strong enough security in its node infrastructure or, simply, not enough nodes in general. With all projects vulnerable to this type of attack, it’s crucial that they ensure proper decentralization within their node infrastructure.

There are many ways to reduce the risk of a 51% attack. Today, KYVE is heavily focusing on this topic by implementing the right recipe of incentivization, high stake, weighted power, and more to create a secure, fully trustless environment for introducing data.

Once this is achieved, the next hurdle comes: How can we make sure the data introduced into the space is truly correct?

Validation in a Decentralized Way

Since data can be uploaded by anyone and claimed that it’s true, having multiple sources of truth is a probable outcome. How can we weed through all the data and check if it’s accurate in a trustless way? Bring in true decentralization.

Decentralization is the key pillar in the ethos of Web3, distributing power, trust, acts, etc., among stakeholders and network participants. In general, to determine if a piece of data is valid, there always needs to be a generic solution, i.e., developers creating custom validation methods per data set. However, what’s lacking is managing these different runtimes and ensuring that all data sets are properly sourced and validated quickly and efficiently.

Enter KYVE, the data lake built to ensure all types of on- and off-chain data are validated, truly decentralized, and continuously updated, providing the tooling developers need to write these custom solutions. Let’s break down how…

KYVE’s data lake organizes the fetched data sets within specific storage pools. Each pool executes the code responsible for relaying the data, AKA a runtime, which also includes an abstract implementation of a validation function. The function in place is simply returning true or false if the data is valid or not.

The chain then computes the result of the bundled data, either valid, invalid, or dropped, only keeping track of the valid data bundles so that we only provide access to correct data. There are multiple reasons why a data bundle might be dropped, for example, if no quorum was reached, the endpoint was unavailable, etc. This screening process helps weed out incorrect data from a developer’s view.

In each pool on the KYVE data lake, one node is responsible for uploading the data, with the rest accountable for voting on if that data is valid. Once the vote is final, the responsibility of uploading data is switched to another randomly selected node. Doing so combats the risk of centralization, i.e., if we only had one node uploading data at all times, that would be a higher risk factor for an attack.

Below you can see KYVE’s current code for evaluating the vote distribution:

Lastly, to incentivize good node behavior and maintain a proper flow of valid data, we introduced specific pool economics. To put it simply, those who require direct and easy access to trustless data act as “funders”, supplying $KYVE tokens as rewards for well-behaving pool participants. There are also “delegators” who delegate their tokens to support nodes in exchange for token rewards. However, if a node misbehaves, their tokens will get slashed.

Our team is constantly investigating new ways to ensure the KYVE Network and overall infrastructure is fully decentralized for its upcoming mainnet launch and completing its mission of providing truly trustless data for all to use for building secure and scalable Web3 projects.

Looking Forward

With the current global Web3 adoption rate at 4.2%, according to TechStory, there’s no doubt we have a long way to go, but that doesn’t mean it’s not time to start focusing on creating a secure baseline for all the projects to come in the near future.

Building with trustless data validated in a decentralized way is a necessity for developers when sourcing data for their dApp and or blockchain. Doing so will decrease their project’s risk of data manipulation and attacks and contribute to improving Web3’s data foundation.

Follow KYVE’s journey in taking a lead role in this movement, enabling all to easily access trustless data validated in a decentralized way via our data lake protocol, eliminating any data doubt or hard efforts for builders, node runners, and more. Want to go further into the details? Read into our docs.

About KYVE

KYVE, the Web3 data lake solution, is a protocol that enables data providers to standardize, validate, and permanently store blockchain data streams. By leveraging permanent data storage solutions like Arweave, KYVE’s Cosmos SDK chain creates permanent backups and ensures the scalability, immutability, and availability of these resources over time.

KYVE’s network is powered by decentralized uploaders and validators funded by $KYVE tokens and aims to operate as a DAO in the near future. This past year KYVE has gained major support, currently backed by top VCs, including Hypersphere Ventures, Coinbase Ventures, Distributed Global, Mechanism Capital, CMS Holdings, IOSG Ventures, and blockchains such as Arweave, Avalanche, Solana, Interchain, and NEAR.

Join KYVE’s community: Twitter | Discord | Telegram

--

--

KYVE, the web3 data lake solution, that enables data providers to standardize, validate, and permanently store blockchain data streams. https://kyve.network