Tezos Storage / Irmin: June 2021

We’re thrilled to release our second Irmin public report! We’ll strive to publish these regularly, hopefully every month for the benefit of the Tezos Community. This month’s update is a summary of the work done by Tarides’s engineering teams on Irmin and on the storage components of Tezos, in collaboration with Nomadic Labs, DaiLambda and the rest of the Tezos community.

If you are new to Irmin, please read a short introduction in our Irmin 2021 Update or on irmin.org.

Improve the Performance

  • Improve irmin-pack’s inode performance for large directories. We decided not to change the inode configuration yet because the benchmarks are not indicating an important performance gain. We are still investigating this, but we lack two pieces to move forward: more realistic benchmarks (replaying the trace from Edo to Florence, see details belows) and additional measure metrics. For the latter, we are introducing an additional subcommand for ./tezos-node storage that reports the size of the largest Merkle proof.

  • Improve index performance. Our current efforts consist of storing less hashes in index. To do this, we’d like backend keys to hold more metadatas via the use of structured keys, but it’s currently not very easy to handle. To simplify this, we have tried to make the API simpler for backends via the definition of schemas.

  • Flattening the Tezos data model. We’re continuing our work with DaiLambda for merging the store flattening. We reviewed MR tezos/2771, tested the migration, and inspected the store to check for undetected flattening patterns. We also merged the irmin-pack.mem package and released Irmin 2.7 to unblock MR tezos/2771.

  • Alternative index implementation. We implemented the Btree.remove operation and a simpler, faster flushing strategy. We’ve benchmarked btrees with the new flush strategy and have performances comparable with index, but with smaller tail latencies for commits. We also implemented record/replay benchmarks for btrees so that we can investigate their performance and memory issues separately. One remaining issue is the memory consumption. It is as expected (and adjustable during configuration) when running the btrees alone, but it is considerably worse when running the irmin-pack benchmarks. We are investigating this, but it’s not blocking because the overall memory is dependent on the adjustable btree memory usage.

  • Publish Irmin/Tezos performance benchmarks. We ran the June benchmarks on irmin and index on a Rasperry Pi 4. We’re working on automating the monthly benchmarks.

  • Record/replay of the Tezos bootstrap trace. We are making progress on extending our benchmarks, which only replayed the bootstrap trace up to Edo, to include the Edo trace moving forward. Edo introduced a different Irmin API that is more difficult to record. We also changed the format of the trace. Instead of recording only the Irmin API that Tezos uses, we are now recording the lib_storage operations, which are the Tezos operations that call the Irmin API. This way we can have more accurate benchmarks and fix potential performance issues presented in lib_storage, not only in irmin.

    Apart from recording the operations, we added some statistics about the blocks—in particular the transactions done per block, which will allow us to compute the TPS while replaying the trace. To add this info in the trace, we distinguished between the raw action trace and the replayable trace; it is the second one that will contain the stats.

  • Continuous benchmarking. We are maintaining the CB framework so it’s usable by index and irmin, and we’re continuing our work of adding better monitoring and testing of the CB. We’ve added two features: the frontend should no longer show repeated commits (which make the graphs harder to read) and the frontend should show an in-progress status while waiting for the benchmarks results.

Improve the Space Usage

  • Experimental integration of the non-blocking layered store. The layered store’s performance issues are due to the gc thread, which runs concurrently to the main thread that commits to the store, but the two threads aren’t blocking each other for long enough, causing the delayed commits. To fix this, we experimented with Lwt.pause, which calls the Lwt scheduler and allow for more cooperation between the threads. The commits are still blocking, but the time has been reduced to a few seconds. However, this performance issue completely disappears on machines with plenty of memory. This suggests that reducing memory will have a positive impact on the performance. We’re investigating the memory usage using memtrace and tracking the maxrss.


  • General Irmin maintenance. We are simplifying the configuration modules for the different Irmin backends, and we started investigating the semantics of file/directory merges in cases of conflict. We set up a Tezos testnet baker on AWS in order to better understand the pain-points / real-world behaviour of the storage stack. Lastly, we’re fixing an issue with file descriptors that aren’t closed during aborted merges.

  • Verify existing bits of the stack using Gospel. We are continuing our work on Ortac to automatically-generate fuzzing for OCaml libraries, and we’ve started experimenting with it for mirage/optint. This revelead some blocking issues in Gospel.

  • Respond to and track issues reported by Tezos maintainers. We worked on fixing corruption in a store that crashed without of memory, as we’ve explained below. We also diagnosed and fixed a separate bug in index found by a Tezos user.

  • Recover and debug corrupted stores. We started debugging the corruption in stores that crashed with out of memory. We implemented a tool to traverse the whole store and look for all possible inconsistencies.

    When the inconsistencies are due to index, a possible fix is the reconstruct-index command. We worked on making it faster by optimizing the process of deserialisation for pack values. It required changes to repr, index, and irmin, then we released the three libraries.


Is irmin still used in Octez 10?

Yes the context is still using Irmin. I guess you are referring the the store which is switching from lmdb to index (the same index that we are already using in Irmin and are continuously optimising).