Ideas for a Faster/Cheaper Tezos
Introduction
Preliminary benchmarks (done at Nomadic in Jan 2019) lead us to think that the biggest time cost of validating blocks is in the hard-disk accesses. Logically, this makes big_map
instructions and getting the contract from the disk very expensive in terms of gas.
This post proposes several ideas for improving access to big_maps and other Tezos storage. Through this, I hope to gauge interest in some of these ideas, particularly whether it makes sense to include them in a future amendment proposal or shell upgrade.
Hash / Merkle Commitments
Removing Disk Access
If hard-disks are indeed the things that take longest when validating blocks, then a solution is to remove them entirely. Tezos currently uses Irmin for storage, which already builds a Merkle-Tree of the whole state (because it is based on Git). Through Merkle Proofs, it is thus possible to assert that something is part of the storage without having to actually retrieve it!
Writes to the storage are not as much as an issue, for two reasons:
- Most operations are reads. Trivially true, because there is no write-only operation. More interestingly true, because the code of smart-contracts can never be rewritten and a lot of smart-contract code is about validation (which don’t involve writes).
- Whereas reads have to be done immediately when a value is needed, writes can be pooled, parallelized and done later.
Mechanism
This would work like this: when sending a Tx, you would have the opportunity (which should be dealt with by libraries like Taquito, rather than directly exposed at the UX layer) to commit what you plan to read (such as contract code, storage, big map keys, etc.) with merkle proofs. Doing it this way, bakers would not need to retrieve this information from the hard-disk, which would make this operation only costs as much as checking the Merkle Proof. A similar thing can be done with Big Maps.
A problem with this approach is that if the thing you are building a Merkle Proof for (like, the big map of a contract where you check the balance) changes in the meanwhile, the Tx is not valid anymore. However:
- The code of smart-contract never changes
- It is possible to cache big map writes that happened during the last few blocks. Given there is a gas price to big maps access, this yields a bound for the cache size.
- The storage part of contract that is modified (and that is not a big map) seems harder to deal with.
It is also possible to mitigate this with ad-hoc caching mechanisms.
We of course have to mention Plebeia, the name of both a standard and an implementation of an alternative backend to Irmin. It has first-class support for Merkle Proofs, and was designed with efficient Merkle Proofs in mind, as we will see.
Preliminary numbers for the size of Merkle Proofs
It is important to realize that this method sacrifices space for speed. Indeed, the merkle proofs can be very large.
Here are some estimates, assuming 1 Go of data with a uniform distribution of keys:
- For Irmin 2, which roughly has 256-ary trees and 32 bytes hashes, this yields approximately 1000 hashes per proof. Which in turn yields an average proof size of 32 kilobytes. Caching the first two layers only divides the size of the proofs by two.
- For Plebeia, which roughly has 2-ary trees and 32 bytes hashes, this yields approximately 60 hashes per proof. Which in turn yields an average proof size of 2 kilobytes.
A typical transaction with a contract might involve retrieving its script, its storage, and reading two big map keys (in a nft transfer, the balance of the sender and the recipient). This would require 4 merkle proofs.
In comparison, the current size of an operation is given by max_operation_data_length
, which is currently 16kb
.
Hot and Cold Storage
Another solution to the problem of storage is to simply cache things. Data would be split between “hot” storage (usually the RAM) and “cold” storage (usually a hard-disk). The idea is for data that is accessed regularly to be always hot, so that accessing it is faster.
The Naive Approach
The naive way to solve this problem is to implement the caching strategies in “the backend”. “The backend” is the database Tezos uses to store the data required by the protocol. The main Tezos backend (as of now) is Irmin 2. Irmin 2 already implements some caching strategy, so that its access times are acceptable.
However, gas can not be made cheaper solely because caching strategies have been implemented. Even though accesses are faster in average, in blockchains, we don’t care about the average case, but the worst case! Indeed, blockchains are adversarial environments, and someone meaning to halt the chain could craft a sequence of operations that will trigger the worst cases for Irmin 2. This could lead to validating a block composed of those operations taking too long, thus halting the chain.
Tezos has been so far priced on worst cases (new reads and writes). As if all accesses were done on disk.
The Right Way
What the previous paragraph tells us is that caching should not be opaque to the protocol, because it prevents the protocol from pricing it accordingly. If caching is implemented in the backend and we want to make gas cheaper thanks to it, the protocol should be aware of it, and access costs should be priced accordingly.
An example would be to implement a FIFO policy at the protocol layer, and have it communicate which the backend to ensure that data is cached according to this policy.
Done this way, we successfully avoid those attacks!
There are two minor concerns with regard with caching:
- The size of the cache directly constrains the size of bakers. If the size of the cache is 16 Go, then it means that it is not possible to run a baking node with less than 16 Go of RAM. So there is a direct trade-off between decentralization and cache efficiency.
- There are two pricing levels for transactions on smart-contracts, one for regularly accessed contracts, and one for the others. This might create some small barriers to entry to new dapps. Given the history of smart-contracts on blockchain, it should not be a real problem until quite a while. But this is something to keep in mind.
Dynamic Burn Pricing
Right now, there is a fixed fee for storing bytes. However, bakers might want to modulate this fee depending on how much storage has been consumed so far.
Typically, bakers might want to make the first gigabytes much cheaper, in order to incentivize early activity on the network.
Whole Storage in RAM
With dynamic burn pricing, it becomes possible to keep the whole storage in RAM. Bakers could simply put prohibitive prices as the size of the storage gets closer to the limit.
A preliminary benchmark show that holding 80m accounts (~the same number as Ethereum) would take 24Go. This is:
- On a toy in-memory environment, without any kind of optimizations
- Not counting smart-contracts
As such, that strategy is not necessarily out of the realm of possibilities.
(I will add more to this article as more benchmarks are done.)
Storage Markets Weirdness
Having a dynamic burn price would mean that people will buy storage now in order to sell it later when it has a higher price. While this in itself is sound and useful, there might be problems arising out of it:
- Market manipulations
- Hoarding through Monopolies / Oligopolies
- Higher price than needed because of perceived moneyness/currencyness
- Useless storage use in order to keep your title to your storage space
It is not clear yet how all of these problems can be solved, and I am interested in potential solutions.
Dynamic Ownership of Memory
Solutions based on rents (fixed amount per month, Harberger taxes, etc.) have to deal with dynamic ownership of memory. Concretely, who should pay for the storage?
The naive approach is to have smart-contracts pay for their own storage, anyone being able to credit a contract Tezzies to pay for its storage. However, that would leave way to a massive attack. Let’s consider an NFT contract:
- A user buys a lot of tokens (ten thousand)
- The contract has have enough funds to affor the storage space of all of those tokens
- No one funds it because of a textbook tragedy of the commons
- The contract gets destroyed and all users lose their NFTs
The other approach would be to have the contract itself deal with the storage logic. For instance, it could check that each user pay for their NFTs, and destroy their NFTs if one does not pay for it, thus freeing storage space.
While this is theoretically sound, this would make developer experience (having to think about an ownership model) as well as a terrible user experience (having to think about paying for the “storage” you consume for each apps you interact with that uses Tezos).
We see that there is a need to find an approach that works reasonably for most simple cases, yet is generic enough so that we can customize it in more complex cases, when it is required. This design problem is similar to dealing with Concurrency.
If you are interested in such things, please manifest yourself! I would love to read ideas from other people.
Also, if you know people who are passionate or talented in those domains, please have them reach out too!