We’re thrilled to release our second Irmin public report! We’ll strive to publish these regularly, hopefully every month for the benefit of the Tezos Community. This month’s update is a summary of the work done by Tarides’s engineering teams on Irmin and on the storage components of Tezos, in collaboration with Nomadic Labs, DaiLambda and the rest of the Tezos community.
Improve irmin-pack’s inode performance for large directories. We decided not to change the inode configuration yet because the benchmarks are not indicating an important performance gain. We are still investigating this, but we lack two pieces to move forward: more realistic benchmarks (replaying the trace from Edo to Florence, see details belows) and additional measure metrics. For the latter, we are introducing an additional subcommand for
./tezos-node storagethat reports the size of the largest Merkle proof.
Improve index performance. Our current efforts consist of storing less hashes in
index. To do this, we’d like backend keys to hold more metadatas via the use of structured keys, but it’s currently not very easy to handle. To simplify this, we have tried to make the API simpler for backends via the definition of schemas.
Flattening the Tezos data model. We’re continuing our work with DaiLambda for merging the store flattening. We reviewed MR tezos/2771, tested the migration, and inspected the store to check for undetected flattening patterns. We also merged the irmin-pack.mem package and released Irmin 2.7 to unblock MR tezos/2771.
Alternative index implementation. We implemented the
Btree.removeoperation and a simpler, faster flushing strategy. We’ve benchmarked btrees with the new flush strategy and have performances comparable with
index, but with smaller tail latencies for commits. We also implemented record/replay benchmarks for btrees so that we can investigate their performance and memory issues separately. One remaining issue is the memory consumption. It is as expected (and adjustable during configuration) when running the btrees alone, but it is considerably worse when running the irmin-pack benchmarks. We are investigating this, but it’s not blocking because the overall memory is dependent on the adjustable btree memory usage.
Publish Irmin/Tezos performance benchmarks. We ran the June benchmarks on
indexon a Rasperry Pi 4. We’re working on automating the monthly benchmarks.
Record/replay of the Tezos bootstrap trace. We are making progress on extending our benchmarks, which only replayed the bootstrap trace up to Edo, to include the Edo trace moving forward. Edo introduced a different Irmin API that is more difficult to record. We also changed the format of the trace. Instead of recording only the Irmin API that Tezos uses, we are now recording the
lib_storageoperations, which are the Tezos operations that call the Irmin API. This way we can have more accurate benchmarks and fix potential performance issues presented in
lib_storage, not only in
Apart from recording the operations, we added some statistics about the blocks—in particular the transactions done per block, which will allow us to compute the TPS while replaying the trace. To add this info in the trace, we distinguished between the raw action trace and the replayable trace; it is the second one that will contain the stats.
Continuous benchmarking. We are maintaining the CB framework so it’s usable by
irmin, and we’re continuing our work of adding better monitoring and testing of the CB. We’ve added two features: the frontend should no longer show repeated commits (which make the graphs harder to read) and the frontend should show an in-progress status while waiting for the benchmarks results.
- Experimental integration of the non-blocking layered store. The layered store’s performance issues are due to the gc thread, which runs concurrently to the main thread that commits to the store, but the two threads aren’t blocking each other for long enough, causing the delayed commits. To fix this, we experimented with Lwt.pause, which calls the Lwt scheduler and allow for more cooperation between the threads. The commits are still blocking, but the time has been reduced to a few seconds. However, this performance issue completely disappears on machines with plenty of memory. This suggests that reducing memory will have a positive impact on the performance. We’re investigating the memory usage using memtrace and tracking the maxrss.
General Irmin maintenance. We are simplifying the configuration modules for the different Irmin backends, and we started investigating the semantics of file/directory merges in cases of conflict. We set up a Tezos
testnetbaker on AWS in order to better understand the pain-points / real-world behaviour of the storage stack. Lastly, we’re fixing an issue with file descriptors that aren’t closed during aborted merges.
Verify existing bits of the stack using Gospel. We are continuing our work on Ortac to automatically-generate fuzzing for OCaml libraries, and we’ve started experimenting with it for mirage/optint. This revelead some blocking issues in Gospel.
Respond to and track issues reported by Tezos maintainers. We worked on fixing corruption in a store that crashed with
out of memory, as we’ve explained below. We also diagnosed and fixed a separate bug in
indexfound by a Tezos user.
Recover and debug corrupted stores. We started debugging the corruption in stores that crashed with
out of memory. We implemented a tool to traverse the whole store and look for all possible inconsistencies.
When the inconsistencies are due to
index, a possible fix is the
reconstruct-indexcommand. We worked on making it faster by optimizing the process of deserialisation for pack values. It required changes to
irmin, then we released the three libraries.