The Biggest Missing Piece in Web3 - Decentralized Databases (Blockchains are Not Enough)
No one wants to pay gas fees to send a like
The one technology that’s missing to make the web3 dream a reality is the decentralized database piece, yet it’s something I practically never see anyone talking about since many seem to be under the delusion that blockchains will store all the world’s data.
Blockchains will probably only ever store 1-2% of data in web3 since they’re too slow and expensive for anything but the most important data. People will not pay gas fees to send a like, send messages to their friends, post on their social media, or update their profile description. Even with layer 2 scaling reducing gas fees to fractions of a penny, people will not pay for things they don’t have to because they are too used to websites being free. Any business fronting the cost in the short-term would inevitably look to monetize through other means at the expense of the user (eg. selling user data, polluting content with ads), bringing us back full circle to the problems of web2.
The other 98% of data in web3 (note: a private PostgreSQL server is not web3) will be stored on immutable file stores like IPFS, and decentralized databases. The problem is that very little attention is being paid to decentralized databases, and the technology is not ready for production yet. Sure a company can call themselves “web3” while using private databases, but that isn’t web3 in the sense that users don’t actually own their data.
The problem is that there is not a single decentralized database I’ve found that is remotely mature enough to be used in a true decentralized application.
I was looking to build a decentralized social media application where users own their data (and thus could reuse that data for other applications), but quickly gave up after realizing there are no practical decentralized database solutions out there that enable one to build such an app. That’s why there are no live examples of any truly decentralized applications beyond a rudimentary level of complexity (eg. a social media site like Twitter or Reddit where users can log in across various devices).
Note: The real test of if an application is truly decentralized is - do you actually own your data? Can you access it without anyone’s permission? Can you view your data outside the application? Can other applications be built utilizing this same data? (Having your data in someone else’s public API does not mean you “own” it since they could shut it down any second)
Not to knock on the groundbreaking work being done by some projects. My goal here is simply to bring more attention to this problem so that we can more quickly create solutions.
Web3 will only happen if developers can actually build decentralized apps, and in its current state it’s just not practical outside of blockchain apps, which again will only ever store 1-2% of web data. Anyone who is preaching about web3 and the benefits of decentralization while being ok with all our data being owned by Twitter, Discord, Reddit, Youtube, etc. is at best naïve if they think the blockchain is going to displace them, at worst a hypocrite.
Decentralized Architectural Considerations
Before diving into the current state of decentralized databases, it’s worth outlining our goals here and some architectural implications:
The goal of web3 is for users to own their data, and be able to reuse their data across various platforms. We probably want:
Authentication via public/private key cryptography instead of usernames/passwords stored on private servers
The ability to associate one’s data to a global ID that can be reused across various applications, and that one has full control over (ie. not OAuth / “login with Google/Facebook”). Global IDs enable a user’s identity to be easily reused across multiple applications rather than just being tied down to one.
Global data models / schemas that can be reused across applications (eg. imagine being able to port profile info, followers, and posts across apps as a user or developer).
Crypto wallets solve the first two, and thus naturally make sense for authentication and global ID.
Applications should be modular/composable (utilize protocols), and code should be open source.
Decentralization of course brings additional challenges that are easier to deal with in traditional server-client architectures, and may require a degree of centralization or democratic governance to combat:
How does one discover other content/users outside of people they don’t already follow? This would require some sort of registry which would probably inevitably require some degree of centralization in order to weed out spam and disturbing/illegal content. Though anyone could create their own separate registry (similar to how anyone can run a Mastodon server).
How to combat spam? This is by no means unique to web3, but will require new systems to combat (eg. decentralized governance, electing of moderators)
Performance - This is easily the Achilles' heel right now of decentralized applications. IPFS is relatively slow right now. How can an application improve performance while still maintaining decentralization. It’s not an insurmountable problem, just requires resources dedicated to solving it.
The problems of filtering out spam and disturbing/illegal content will inevitably entail a degree of centralization. Fortunately, decentralized governance (eg. DAOs) could enable democratically managing spam and disturbing content (eg. via electing moderators and/or liquid democracy). And since the data is all decentralized and the code open source, anyone could run their own versions of the app with their own moderation systems.
Current Decentralized Database Solutions
Here are a couple potential solutions I briefly investigated, yet ultimately found too immature/lacking in their current state without needing a significant amount of work to actually build anything of value on, ranging from lower to higher level of abstraction (note: this is not a technical deep dive, more a cursory “first impressions” and “things to look into”).
IPFS (and IPNS, IPFS PubSub, Libp2p)
IPFS despite more commonly being known as an immutable file store has something called IPNS that can be used to create mutable records. Perhaps this could be used as the basis of a decentralized database.
OrbitDb is a database built on top of IPFS that utilizes IPFS Pubsub for database syncing, and CRDTs for conflict resolution. It seems that authentication via an ethereum wallet could be set up (using an identity provider or access controller). I’m not sure if one’s ethereum public key (or ENS name) could be linked back to a OrbitDb database ID though (ie. one’s ethereum address serving as a Global ID), which would be enormously useful for finding other users and not needing to keep track of another User ID.
One flaw is that write access to a database cannot currently be removed without changing the database address (the readme claims a future version will support this).
Ceramic enables one to store data attached to a global user identity (eg. ethereum address) on IPFS in custom data model schemas. It relies on the additional layer of connecting to a Ceramic node (which anyone can run), which communicate via libp2p (came from IPFS) and are responsible for execution.
The easiest way to see Ceramic in action is to create a profile on https://self.id/ connecting with your crypto wallet, and then going to https://dns.xyz and seeing your profile information already populated there.
I was really excited to build out a prototype with Ceramic, but found the documentation to be extremely lacking, much of it totally outdated (in their defense they’re in the process of revamping it).
There are no repositories demonstrating basic CRUD functionality with custom data models (I find the fastest way to get going is to work off an existing repo rather than trudging through coding tutorials and wasting time with boilerplate). The Discord chat also doesn’t seem to be very active.
It seems the following repo demonstrates CRUD functionality with a custom model. This is probably where I’d start if I were looking to build something off of Ceramic.
I’m very excited about the decentralized future. Hopefully this article brings to attention the need for decentralized databases, Global IDs, schemas, and better resources to enable the development decentralized applications. The initial prototypes will be slow and janky, but when people see the possibilities that data ownership enables, that will attract a lot more users and developers who will then be motivated to iron out the performance issues.
It’s time to stop associating the words web3 and decentralization synonymously with blockchain, and admit that blockchains will only ever serve the most valuable data because nobody wants to pay gas fees to send a like.