Deconstructing web3: Blockchain as a database
A dive into web3 from a technological standpoint, exploring the benefits and downsides compared to web2
Simple Instagram
Let’s try to understand the fundamentals of web3 and decentralization by designing a simple version of instagram, like a typical systems architecture interview asked at most tech companies. In this simple version:
- Users can upload photos
- Users can follow other users
- Users can see a feed of photos from users they follow
- All photos and follow relationships are public by default, anyone can read anyone’s feed
Web2 architecture
Database: There are two key fundamental pieces of data that we need to store per user:
- List of photos they have uploaded
- List of users they follow
Let’s not worry about performance issues for now, in the most simplest form the data can be stored as a key-value mapping, with key being a userId and data being the following json object.
{
“photos” : [
“0xa61b94…”, // Binary of photo1
“0xcf1230…” // Binary of photo2
],
“following”: [
userId1,
userId2
]
}
UserId generation: Let’s build an authentication / identity management service which takes in a user secret (can be password / two factor code or anything else) and outputs an integer which is the userId
Data validation: We can’t let users write any random binary blob, it needs to be a valid JPEG encoding. Additionally we don’t want to allow all photos on our service (e.g. photos containing violence / nudity …) so we also build a content moderation logic within data validation.
Putting these all together we get:
Web3 architecture
Database: Now let’s see how all these components can be built on top of a blockchain, say Ethereum. Imagine ethereum to be akin to an AWS DynamoDB instance. The only difference is that instead of accessing it through an AWS proxy, you interact (read/write) with it through a decentralized network of nodes. These nodes can be run by anyone, you can spin up your own node if you want. All these different nodes maintain a consistent view of the database through a secret sauce called consensus mechanism. (e.g. Proof of Work / Proof of Stake). The exact consensus mechanism is irrelevant to this design, it’s an internal detail of the blockchain. The database component now looks like:
UserId generation: There’s no authentication / identity management service. Instead of providing their secret, Users now sign the data that they send with their secret (a.k.a private key) which is a trustless way of proving that they have the secret, without revealing the secret. So the userId, instead of being an integer, becomes their public key.
Data validation: The data validation logic is hardcoded and shared among all ethereum nodes. It is run before every write by protocol. Typically the code behind the logic is open sourced on Github so anyone can read and scrutinize it. There’s no need for a separate validation service.
The governance of the validation logic (i.e. evolution of content moderation policies) can be configured on the blockchain. This can either be
- Constant: Once written, set in stone forever with no scope for change
- Centrally Controlled: A single entity, e.g. Instagram can unilaterally upgrade the validation logic
- Some form of democratic governance: Anyone can propose changes and users must vote to implement any change
Combining these components, the overall architecture becomes:
Tradeoffs
Both these architectures will provide all the user functionality we want from the simple instagram application. However there are some key differences:
Data ownership: In the web2 design, all data is owned by Instagram. Users request Instagram for updating or reading their data but they have no direct access, and Instagram can modify data en route if it wants to. Web3 is a fundamentally different way of thinking about data ownership, there’s no intermediary and users have full ownership of their data. They are the only ones who can modify it and they can define rules on who can read it. The question of data privacy does not even arise in web3 world.
Note: Technically anyone can read the data on blockchain, but users can write encrypted data to restrict access and define custom views.
Trustless identity management: Web3 is a fundamentally different way of how we think about online identities, there’s no database containing userIds. Users have full control over their identities, they can create new ones whenever they want and they don’t need to rely on anyone to give them access to their identity. On the flip side there’s no account recovery process, if you lose your secret no one can help you recover your account.
Transparency: In Web3, the validation logic is completely transparent to everyone, there are no hidden algorithms. Additionally you can configure it such that no single entity can decide what’s allowed and what’s not allowed.
System cost: In Web2, each photo upload will probably cost a few cents, whereas on ethereum will probably cost 1000s of dollars. Blockchains are orders of magnitude behind with respect to cost.
Performance: Blockchains are horrible at performance, it can take minutes to get confirmation for writes and you can only do a handful of writes at once. Compared to web2 which takes a few milliseconds and can handle millions of writes per second.
Which one is better?
Say you want to build a competitor to instagram, which design should you use today?
While the notions of data ownership, trustless identity and transparency sound good in theory and can solve a lot of hot issues that big tech is facing today (Data privacy, Content moderation policies …) it still remains to be validated that users really value these and would be ready to overcome the inertia to move to new systems en-masse. Practically, today the cost and performance concerns make it impossible to build a scaled application based completely on a blockchain database.
However there two things that make blockchains attractive:
- They are scaling at an exponential rate and many people (including me) believe that in a few years blockchains will scale to a point when costs would no longer be a concern.
- Some domains, particularly finance (DeFi) and collectibles (NFT) have already seen massive user adoption where benefits have outweighed the exiting costs.
P.S. If you have any thoughts or feedback please comment! I’m fairly new to the blockchain world.