Tarides continues to develop fast, reliable storage components for Tezos, so this report contains our progress on the Tezos storage projects during July and August 2021. We’re excited to showcase our advancements to both the OCaml and Tezos communities.
As the chain’s activity keeps growing, it’s of the utmost importance to continue improving the overall performance and scalability of the Tezos storage. This summer, we mainly focused on
index, a component used to index Tezos’ context elements which is very I/O intensive. Therefore, we are trying to (1) reduce the number of indexed objects and (2) explore alternative index implementations with better performance characteristics. We’ve also been working on optimizing how data is organized in the context; hence, our focus—in collaboration with DaiLambda—was on flattening the Tezos data model, now part of the Hangzhou proposal. Finally, to better understand the performance impact of these changes, we have continued to make progress on the record/replay benchmarks for the bootstrap trace, and we’re setting up Irmin/Tezos benchmarks on the Raspberry Pi. Read more details about each of these in the report below.
Before continuing, we’d like to extend special thanks to the engineers who contributed to this report and all their hard work. If you are new to Irmin, please read a short introduction in our Irmin 2021 Update and on our website Irmin.org.
Index is a scalable component of
irmin-pack—the Irmin backend used in the Tezos storage layer.
irmin-pack writes data in an append-only file called a
pack file. In order to efficiently retrieve the data, it uses the Index library, which maps hashes of objects to their location in the
The index implementation in
irmin-pack has a hard job to solve because the Tezos context stores many millions of individual objects—all contained in the
pack file—and each of them must be addressable by their hash. Not only does it need to be compact on disk, it also needs to be very fast to search for every object we read.
The index was originally optimised for speedy read performance at the cost of potentially-slow writes on very large stores. However, this initial design choice is becoming less tenable as the Tezos blockchain grows and the index used in Octez increases in size. We’re working on improving the indexing mechanism to better meet the requirements of a modern Tezos node.
In the search for an alternative way to store data in the index, we’re exploring a method based on the structured keys. This consists of adding more information to the store keys to avoid accessing the index for every read. This new method would lead to fewer stored objects in the index at the cost of potential duplication of objects on disk, so it’s still a WIP. We prepared the code for this feature by refactoring it through the use of
Schemas, a more compact way of instantiating the Irmin
Make modules. Moving forward, we’ll continue this work on the structured keys approach, which will enable fewer stored objects in the index and therefore improve its performance.
Additionally, we experimented with using
mmap instead of
pwrites for the I/O calls, and we benchmarked it with a small operation trace that corresponded to the first 100K commits in Tezos. These initial benchmarks showed around 10% performance improvement, but the improvement was less significant when calling
msync regularly. See more details at index/350. While the change to
mmap still seems promising, it will probably have a small impact on performance.
Another possible way to improve
irmin-pack's indexing performance would be to change the design to one that’s better suited to the huge contexts of the modern Tezos chain. Along these lines, we’ve been experimenting with using B-trees as part of an index implementation.
Over the summer, we added data integrity to our implementation of B-trees, so it now allows recovery from a crash. Although we’re sad to report that Gabriel Belouze’s internship on B-tree finished this month, we’re thrilled with his work on cleaning the code, releasing Cactus.1.0.0, and writing a report.
We also implemented a simpler version of B-trees—mini-btree—to produce some baseline benchmarks. We created a simple log using
mmap to compare it with the
pwrites used in index.
Flatten Tezos Data Model
The proposed Hangzhou protocol brings with it a change to the structure of the context: flattening internal paths in order to improve the efficiency of the storage layer. This flattening requires that nodes undergo an automatic migration.
In collaboration with DaiLambda, we’ve been working on reducing the memory usage of this migration process, so even nodes with limited available RAM (<= 8GB) are able to upgrade seamlessly.
Record/Replay of the Tezos Bootstrap Trace
We continued to make progress on the new record/replay benchmarks. In July, we finished implementing the record and the conversion from the raw trace to a replayable one, and in August we’ve started to implement the replay phase.
These new replay/record benchmarks can be used to record traces of live nodes and also include more metrics. Plus, we’ve almost finished implementing the summary computation for all the stats gathered during the record or the replay phase.
Stay tuned for a blog post covering the technical details of these benchmarks and showing how to reproduce them.
Run Irmin/Tezos Benchmarks on a Raspberry Pi 4
We’re in the process of setting up several Rapsberry Pis with different configurations to use for our monthly benchmarks. This is fun project, so we look forward to reporting the outcome in one of our next public reports.
We continue maintaining
current-bench, fixing bugs, and refactoring the
docker-compose component of the pipeline. We published a blog post explaining the benchmarks infrastructure.
General Irmin Maintenance
Part of the maintenance work in July 2021 included refactoring the
Config module, which resulted in more uniform configuration options across backends.
As the CI added support for IBMz
s390x machines, we found several spots in Irmin that weren’t working on big-endian systems, thus causing stack overflow. We detected a discrepancy between the memory usage reported by our
memtrace-filters and the
memtrace-viewer, so we are investigating this. We’re also investigating the bugs in the graph traversals of an Irmin object graph, using Gospel (see more details below).
Recover and Debug Corrupted Stores
We concentrated most of our efforts this month on understanding and repairing an occurrence of a corrupted context caused by the node crashing unexpectedly. The issue, including more details on how it occurred, was tracked in irmin/1476. While investigating the issue, we developed several tools:
**diff**tool for commits to highlight the difference between the same commit but in a corrupted and a normal store
- a brute-force integrity check tool to traverse the entire store and check for all types of inconsistencies
a light version of this tool that only checks for missing entries in the index, which was the source of the corruption and is now integrated in the
storagesubcommand in the
- a script that launches, kills, and restarts nodes in a loop to look for a reproduction of the issue
The corruption was caused by the
merge threads (running concurrently to the main thread and updating the on disk
index with in-memory data) being killed on an
out of memory failure, but they were restarted when the node recovered. When the
merge thread restarted, all data added to the store before the failure was lost. We fixed the issue and released index versions 1.4.1 and 1.3.2 containing the fix.
We merged the new releases of
index and Irmin that contained bug fixes, and we also included the light version of the integrity checking commands mentioned above.
Respond to and Track Issues Reported by Tezos Maintainers
We released Irmin.2.7.1, which contains a small bug fix for the
reconstruct-index subcommand of
./tezos-node storage, and we responded to a bug reported by Nomadic Labs regarding the configuration of
Sometimes, nodes have to list all contracts as a response to an RPC call. With the flattened store, listing all of them might consist of millions of entries, so such a request can freeze the node for 20 minutes. We are working on adding pagination for the “get_all” RPCs to avoid listing all contracts at once.
Relevant PRs and issues: tezos/3421.
Verify Existing Bits of the Stack using Gospel
We are developing Ortac, a framework for runtime assertion checking of OCaml programs based on Gospel behavioural specifications. It provides a flexible solution for traditional assertion checking, monitoring misbehaviours, and automated fuzzing of OCaml programs.
We applied the tool to two projects used by Irmin and Tezos: the
optint library and Irmin’s
Object_graph. For the latter, we managed to write and check an original implementation using some tricks to overcome the tool’s limitation.
C Bindings for Irmin
We wrote libirmin to generate a C library using
ctypes inverted stubs. This can be used by C clients to directly interface with the Tezos storage.
We are continuing to experiment with having a separate storage deamon (
irmin-server) to handle the full Tezos storage. This will help with operating the storage and to better handle concurrency.
We started outlining future work on the
irmin-server and how we can add a
We are also implementing the
msgpack serialisation format in the
irmin-server and adapting it to work with
Repr. We’re maintaining the current implementations of
irmin-rpc, and we hope to release them with Irmin 3.0.
Improve Irmin Documentation
We are working on integrating the tutorial into the release process and getting it ready for the Irmin 3.0 release.
Follow the Tarides blog for future Irmin updates.