Tezos Storage / Irmin: May 2022

samoht · June 15, 2022, 6:32pm

In May 2022, Tarides began work to integrate the layered store/GC prototype into Irmin to make it ready for a production release, and we’ll have it fully integrated soon. In the meantime, we found solutions for several issues between the snapshot feature and layered store’s interaction. Our progress last month marked an important milestone for this evolving tool.

Our goal for the GC feature is to ensure bakers can run rolling nodes with constant disk usage (using at most 5 GB) and keep the 12x more responsive operations that we managed to obtain the month before. We’re working to ensure the GC has zero impact on the current node performance and to get that constant disk usage.

We’ve written a blog post about optimising Irmin’s Merkle proof API, and published it on the Tarides blog. Follow us on Twitter for notifications on new blog posts, or subscribe to our RSS feed.

If you’re new to Irmin, the Octez storage component, please read more about it on Irmin.org.

Scale Irmin’s Performance for Large Commercial Adoption

Integration of Layered Store/GC

This month was, in retrospect, a significant one for the layered store (“layers/GC”). Various issues were discovered related to the interaction of the existing snapshot feature and the layered store, but workarounds were developed. We were then able to bootstrap testnet and mainnet successfully to completion, using existing snapshots from several weeks previous. This marks a significant point in the growing maturity of the layers/GC code.

The other major development is that we started the work to integrate layers/GC into Irmin proper; the plan for this involves refactoring some of the Irmin internals, before proceeding with the integration proper. This required some refactoring in the code, but an additional, bigger refactoring is needed that involves a lot of the code in irmin-pack, splitting the current I/O code into a simpler I/O interface, and introducing a control file to handle the synchronisation between the different files on disks.

For next steps in the layers/GC integration plan, we note that the support for the RO instances of Irmin is not well tested in the prototype, as we can only properly test and benchmark this once the layered store is fully integrated in Tezos. Another decision to be made is whether a part of the GC is done in a separate process created by Lwt_unix.fork or is a separate executable, as it is now in the prototype.

Relevant links: mirage/irmin#1826, mirage/irmin#1830, mirage/irmin#1831, mirage/irmin#1832, mirage/irmin#1835, mirage/irmin#1840, mirage/irmin#1841, mirage/irmin#1844, mirage/irmin#1848, mirage/irmin#1795.

Add Monitoring for Irmin Stats to a Tezos Node

Tezos nodes can run with an attached Prometheus server that monitors and reports stats about node execution. The Prometheus server is now integrated into the Tezos codebase; it’s launched by the node when requested by the users. We plan to add Irmin-specific stats to this monitoring system, so users can also inspect them while running a Prometheus server.

We have a prototype on this branch, but further discussions are necessary to understand how stats gathering works with the multiple processes running Irmin in a Tezos node.

This work is scheduled for Q2.

Record/Replay of the Tezos Bootstrap Trace

We are working on a benchmarking system to be integrated in Octez’s lib_context.

We are maintaining a branch for the record and replay benchmarks, available here, rebased over tezos#master. This allows us to run the lib_context benchmark for every month and at every release.

Provide Support to Tezos for MirageOS Dependencies

Tools to Upload and Debug Corrupted Stores

While analyzing the contents of a store, we noticed that some internal nodes (inodes) are very unbalanced: an inode tree of 1 million entries contains 2 million subtrees. In Irmin, the inode trees are 32-ary trees, meaning that for a fully balanced inode tree of 1 million entries, we expect 1 mil / 32 = 31250 internal nodes (inode trees) and 1 million leaves (inode values).

To investigate this, we computed statistics on the structure of an inode tree depending on the number of entries in the tree. It appears that our internal node data structure has a cyclical behaviour depending on the number of entries in the node. This is an expected trade-off: in return, insertion of new entries is minimising rebalancing to avoid the creation of lots of new nodes (and thus unnecessary IO).

Relevant links: mirage/irmin#1806.

Respond to and Track Issues Reported by Tezos Maintainers

DaiLambda tried to reproduce the benchmarks for Irmin 3.0 using the tezos-node replay command, but they didn’t have the same numbers as we have using our replay benchmarks. There are two possible sources of divergence: (i) tezos-node replay does not commit blocks (so doesn’t perform I/O writes), while our benchmarks are specialised for I/O; (ii) tezos-node replay does more than just the Irmin operations associated to a block validation. In collaboration with DaiLambda, we will investigate this further by recording a trace for the tezos-node replay and replaying it.

Improve Tezos Interoperability

Storage Appliance

Using an Irmin daemon instead of using it as a library, as is the case now in Tezos, potentially reduces the memory usage of the node. Instead of two processes interacting with Irmin while caching things in memory, only one process is used for all interactions with the Irmin context. This also simplifies the semantics of interactions between read-write and read-only instances, as we don’t have to use the filesystem as an inefficient interprocess communication mechanism. The server can also be used to access a remote context, in case not enough storage is available locally, for example. Overall, this aims to improve the Tezos interoperability by defining more clear semantics of Irmin interactions and avoiding memory issues produced by excessive caching or data-racing.

The storage appliance work aims to provide Irmin with features for a better management of concurrent accesses to the storage. We are currently working on implementing and optimizing the Irmin store interface written using irmin-client. This allows for irmin-client to act as a drop-in replacement for irmin-pack in lib_context. Since irmin-server supports all Irmin backends, there is some ongoing work needed to support some irmin-pack specific functionality such as snapshots and stats. This month we also set up a branch of the Tezos repository that uses irmin-client but there are some incompatibilities introduced by using a remote server that are still being worked on.

This part of storage appliances work is scheduled to be available in Q3 2022.

Relevant links: mirage/irmin-server#42, mirage/irmin-server#44.

Irmin in the Browser

The internship project consists in combining irmin-indexddb and irmin-server to produce an offline-first application with fast and efficient synchronisation. Offline-first applications ensure that apps work just as well offline as they do online. Thanks to Irmin’s mergeable replicated data-types, it becomes much easier to build applications that can transform the state offline and resynchronise the state later. For Tezos, this could be useful to write web applications to interact with the context state directly.

irmin-server is a server for Irmin using a custom wire protocol, designed to have minimal overhead. Clients can connect to the irmin-server over a byte-stream such as a TCP connection or a Unix domain socket. This restricts the client to Unix-like systems, meaning we cannot put a client in the browser to interact with a remote server efficiently. In May, we extended both the client and server to support websockets, a message-oriented data flow with good support in modern browsers. With this new communication channel, and with the appropriate abstractions, the clients of the irmin-server can now be compiled to the browser. This allows for building browser-based, offline-first applications that can efficiently synchronise with a remote Irmin store.

The offline-first Irmin application is scheduled to be available in Q3 2022.

Relevant links: mirage/irmin-server#47, mirage/irmin-server#48, mirage/irmin#1843.

Maintain and Improve Irmin

General Irmin Maintenance

We fixed a bug in the snapshot export, as the export was failing for blocks committed before Irmin 3.0. The fix is part of the Irmin 3.2.2 release and is in tezos#master. We are also looking into multicore and eio.

We are working on integrating the Irmin pretty-printers in utop.

Relevant links: mirage/irmin#1845, tezos/opam-repository#299.

Memory-efficient and formally verified caches

Last month an internship started with the goal to efficiently implement various caching policies (e.g., LFU, LIRS, etc.). Replacement policies can be complex, but since missing entries is part of the caches’ expected behaviour, wrongfully understanding or implementing them often go unnoticed. It can, however, dramatically affect performances or lead to memory leakages. We plan to evaluate the performance of the different policies in a real-world setting within Irmin and Tezos. Moreover, we aim to formalize caches behaviour and specify the cache replacement policies using Gospel.

The development last month included the creation of cache-through, an interface and a functor to call cache strategy in different cases (e.g. aside cache, through cache). We added tests based to cache-through on Alcotest and we’ve further explored LFU (Least Frequently Used) implementations in linear complexity.

This work is scheduled to be finished in Q2.

Relevant links: cachecache#6, cachecache#8.