Tezos Storage / Irmin: January 2022

We are thrilled to continue our monthly report on the Tezos storage update. In January 2022, Tarides showed steady progress on their task to improve Irmin’s performance for the Tezos layered stores. As the release of Irmin 3.0 approaches, we remain on target to deliver it simultaneously with octez.13 at the beginning of March. You can look forward to a dramatic reduction in the I/O node usage with Irmin 3.0, amongst several other enhancements!

Our efforts this monthfocused mainly on Irmin 3.0’s latest bugs and cleanups, creating a new design document for implementing a GC to improve rolling node disk usage and archive nodes’ performance, and forward porting the Merkle proofs needed for the rollups, which are part of Irmin 2.10 and need to be included in Irmin 3.0 as well.

If you’re new to Irmin, please read more on our website Irmin.org.

Improve the Performance

Improve I/O Performance
In January, we focused on finishing up Irmin 3.0. We added more tests and fixed a bug that occurred while importing a full Tezos snapshot. We also monitored the disk usage for a Tezos store running with the latest protocol, and everything seems comparable to previous disk usage. The resulting speed-up is impressive: TPS has doubled when replaying the bootstrap trace! Note that this only considers storage operations - it’s possible that network operations slow things down during a normal bootrap. Stay tuned for a full blog post with more details about this next month!

Relevant links:mirage/irmin#1695

GC for rolling nodes
We worked on a design document for implementing a GC for improving disk usage of rolling nodes and performance of archive nodes. We spent January ensuring that the specifications are clear and that we have a solution for all parts of the implementation. As part of the new design, we implemented a new type of I/O abstraction: a sparse file, which will be used during the freeze operation.

Relevant links: mirage/irmin#1701, tomjridge/sparse-file.

Improve Snapshot Import/Export
In order to solve the current memory issues that Tezos users face with import/snapshot, we started working on exposing the internal nodes from Irmin into the snapshot format. We benchmarked our initial prototype on rolling snapshots, and it seems promising, as the memory usage is down significantly.

Record/Replay of the Tezos Bootstrap Trace
We’re making progress on the record/replay benchmarks in order to 1/ replay traces from arbitrary points in time (starting from a snapshot); 2/ record more information; and 3/be able to replay the latest protocol instead of focusing on the early bootstrap. We maintain a WIP branch against tezos/tezos that includes a lot of new stats, and it works with Irmin 3.0. Some bugs appeared during our initial attempt to replay a trace with the new mechanism, so we’re addressing those.

Publish Irmin/Tezos Performance Benchmarks
Apart from our monthly efforts to run and publish benchmarks, we’ve also run benchmarks with the three different GC strategies. We made considerable progress in automating the monthly benchmarks, including a script that tags every month’s first day on each benchmarked repo, rents a packet machine, and launches the benchmarks. The Equinoxe Library, a CLI tool for Equinix for deploying machines from the command line, also went through some redesign to integrate it in the workflow above.

Relevant links: maiste/equinoxe#68, maiste/equinoxe#71.

Continuous Benchmarking Infrastructure
We are continuing to develop a continuous benchmarking infrastructure to detect performance regression in Irimin. We are working on adding support for multiple compiler variants for benchmarking to prepare for experimentations with Multicore OCaml. The UI now supports overlaying graphs for better comparisons and visualisation. We added a new HTTP API endpoint to upload benchmark results with current-bench. This endpoint uses the frontend without requiring the benchmarks to build themselves with the pipeline and can be used to display the benchmarking results that are generated while replaying the Tezos bootstrap traces.

Relevant links: ocurrent/current-bench#273, ocurrent/current-bench#288, ocurrent/current-bench#287.

Merkle Proofs

As part of the upcoming rollups projects for Tezos, we’ve been working in collaboration with Nomadic Labs, DaiLambda and Trilitech on proposing an API for Merkle proofs in Tezos. The result is that we have added Merkle proofs and released them with Irmin 2.10. We then have ported octez to use Irmin 2.10 for the rollups projects to start using them.

In parallel, as we are developing in a separate branch version 3 of Irmin, focused on improving I/O usage, we have started to forward porting the Merkle proofs to that branch this month. While doing this, we also detected a few bugs in the original Merkle proofs implementation. These bugs are now fixed and released with Irmin 2.10.1 and 2.10.2.

Relevant links: tezos/tezos#4138, tezos/tezos#4086, tezos/tezos#4145, mirage/irmin#1712, mirage/irmin#1716, mirage/irmin#1720, mirage/irmin#1741.

Support

Irmin didn’t support s309x IBM machines due to an OCaml restriction related to functions using a lot of arguments. The proper fix is to use OCaml 4.14, but until then, we added a fix in Irmin.

While testing the Ithaca protocol, we observed an increase in memory usage due to the Irmin’s LRUs. In order to avoid delaying the Ithaca release, we integrated a simple fix in Tezos to simply reduce the size of the LRUs. We haven’t observed an important effect on performance, as it seems that only very recently added values to the LRUs are being reused. A better fix would be to restrict the LRU on the size of its elements, not on the number of elements it contains (as it is now). This is a work in progress, as it requires more benchmarks.

Relevant links: ocaml/ocaml#10857, mirage/irmin#1693, tezos/tezos#2376.

C, Rust and Python bindings for Tezos Storage

The C bindings for Irmin are now part of the main repo and will be available as an Irmin subpackage. These bindings don’t yet work on ARM64, which we’re still investigating. We also cleaned up the documentation and tests, and we’re preparing the integration of the Rust and Python bindings to the Irmin repo.

Relevant links: mirage/irmin#1713

Maintainance

General Irmin Maintenance
We have added code coverage checking to the irmin repository. We also made sure opam-monorepo works with Irmin, and it will be supported once Irmin 3.0 is released. In preparation for the next release, we’re also fixing some known bugs that haven’t occurred in Tezos but that may impact other Irmin users. Finally, we’re developing a better Irmin CLI that could be used to inspect and manipulate Tezos stores.

We released Irmin 2.10.1 and Irmin 2.09.1 that contain fixes for s390x machines.

Relevant links: mirage/irmin#1702

Verify Existing Bits of the Stack Using Gospel
Gospel is a specification language that we want to use to automatically fuzz-test some parts of the Irmin codebase. We have identified limitations in the Gospel specs and have decided on fixing those. This includes refactoring the Gospel AST to improve its maintainability and extensibility.

Relevant links: Gospel AST refactoring.

9 Likes