TezEdge node: A preview of what’s coming up

jurajselep · April 20, 2021, 3:32pm

Before we begin our article on phase 3 of the TezEdge node, we have an announcement to make; our company is rebranding itself as Viable Systems. The name comes from the Viable systems model, a theoretical model for self-sustaining systems described by the British scientist Stafford Beer in his seminal book “Brain of the Firm (1972). It is, in essence, a model for a system organised in such a way as to meet the demands of surviving in the changing environment . We will describe the viable system model in an upcoming article on our soon-to-be launched blog on our website.

Please note that our Medium is now TezEdge, our Twitter is @TezEdge and our GitHub is also renamed to TezEdge.

In the next phase of project, we will focus on the following areas:

1. Security — improving safety and stability

Our primary goal is to make the Rust-based TezEdge node as stable and secure as possible. We want to polish the node from a security perspective, checking for any possible vulnerabilities and removing them, minimizing its attack surface and making sure that every module is airtight.

2. Core — increasing determinism and stability

We want to increase the node’s performance and determinism by reducing the complexity of the node’s architecture. We also want to improve performance and add support for the OCaml implementation of the baker, endorser and accuser.

3. Storage — improving performance

We will also improve the storage, adding crash recovery support, garbage collection, increasing performance in low-CPU environments and other features.

4. Community — adding more developer tools

We want to continue improving support for developers; increasing the inner visibility of the node, creating more developer tools, creating interfaces for node performance and low-level security monitoring.

Let’s now take a closer look at what we want to accomplish in these areas:

Security — improving safety and stability

Security is the paramount attribute of any blockchain. Nodes need to be as secure and stable since they constitute a critical infrastructure that holds significant financial value. We want to make the TezEdge node as secure and stable as possible.

Fuzzing and formal verification

When the complexity of a system increases, there will also be increased room for mistakes and errors. We need to constantly look for the best methods of identifying such vulnerabilities, and fuzzing is one method of achieving that. Fuzzing finds vulnerabilities often missed by static program analysis and manual code inspection. It is an effective way to find security bugs in software and is rapidly becoming the standard for critical enterprise systems.

In the development of complex systems, the process is often even more important than manual code inspection. The introduction of automatic fuzzing tests will improve our development process and will minimize the amount of errors in our codebase.

Through fuzzing, we want to minimize the attack surface of the node. We will utilize it to explore the wider state space and seek out any possible weaknesses. We believe that a machine is better suited to test system vulnerabilities than a human.

While fuzzing is good at verifying errors occurring during code implementation, it is not suitable for verifying algorithms or removing errors in the logic of specifications. For these purposes, we will utilize formal verification.

Formal verification is the process of mathematically proving a program is free from errors.

We want to utilize formal verifications to test invariants, specifically liveness and safety. An invariant is a property of a mathematical object which remains unchanged after operations are applied to the object.

Using eBPF to increase inner visibility

While fuzzing and formal verification are useful in the development process, once the node is running, we want to be able to view what is going on inside the node’s environment (kernel). For this purpose, we want to utilize extended Berkeley Packet Filters (eBPF) to ensure the node is running securely and performing well. As some of you may recall, we’ve already utilized eBPF in the creation of a firewall for the node to filter incoming packets.

eBPF is a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. By making the Linux kernel programmable, the Rust node can leverage existing layers, making them more intelligent and feature-rich without continuing to add additional layers of complexity to the system.

eBPF allows us to develop a completely new generation of tooling in areas such as networking, security, application profiling/tracing and performance troubleshooting that no longer rely on existing kernel functionality but instead actively reprogram runtime behavior without compromising execution efficiency or safety. The information provided to us by eBPF also reveals how we can improve the node in the future.

Core — increasing determinism and stability

The node has to be as stable as possible in order to be ready for use by the Tezos community. We want to achieve this primarily by smoothing out any technical kinks, removing randomness (increasing determinism), as well as improving the support we give to our users.

Stability

We want to be able to identify and replicate bugs in the node. To achieve that, we need to make the node as deterministic as possible. A deterministic system is a system in which no randomness is involved in its behavior. Such a system will always produce the same output from a given starting condition or initial state.

First we need to ensure that our node is deterministic when it receives messages from a single peer (P2P or RPC user). We need to audit the node and identify every part of the node that does not act deterministically when handshaking or bootstrapping with a peer. We also want to identify non-deterministic behavior in situations when the node validates and processes the mempool’s operation, as well as when applying operations to the protocol and storing blockchain state into the storage. Next, we want to do the same for situations when the node communicates with multiple peers at the same time.

Baking

First, we want our node to support the OCaml implementation of the Tezos baker, endorser and accuser. Currently, the OCaml baker uses RPCs from the node and directly accesses the OCaml storage, but it can’t access the Rust storage. We will implement support for the Rust storage into the OCaml baker. Users will be able to use our node for baking even if we won’t have a baker implementation ready yet. When this will be stable, we will begin implementing our own baker.

Storage — improving performance

In a blockchain node, the storage module is one of the most important components as it is used to store the blockchain state. We want to improve the performance of the TezEdge storage, whether it means lowering memory usage, speeding up reading or launching the storage in a low-CPU environment.

Improving locality

We want to continue improving our implementation of the storage. We want to begin by developing support for storing the entire Merkle tree in memory for rolling nodes. A Merkle tree (also known as a hash tree) is a way of structuring data that allows a large body of information to be verified for accuracy both extremely efficiently and quickly.

The goal of this implementation is to improve the locality of the data so that we can better utilize CPU cache. In order to support a rolling version of the node, we need to implement garbage collection. We want to ensure that the storage only saves data from the last several cycles. This will significantly reduce the volume of data and potentially enable us to hold all data only in RAM.

We want to implement a method for flushing data from the RAM to the disk that will utilize the Tezos business logic. For data that persists on the disk, we want to implement the compression and compaction of data. Finally, we want to implement snapshots, which allows us to easily move storage from one node to another.

We also want to implement support for crash recovery. When there will be some kind of catastrophic failure, we want to have the option of recovering into the last valid state.

Rewriting the key-value store

We also want to rewrite the key-value store. While RocksDB, the current implementation, is an excellent database, it is not suitable for our use case because it is programmed in C++. We want to rewrite all of the components of RocksDB that are suitable for Merkle trees into Rust.

We also want to optimize the database for reads, which is difficult with RocksDB as we are running into its limitations.

RocksDB has been designed to mostly use disk storage and just a small part of memory. That may be suitable for small-resources use cases, which is good for decentralization. We plan to continue supporting this use case. However, for bakers or owners of high performing devices, RocksDB doesn’t provide a good option to efficiently utilize memory.

RocksDB by default uses a small part of memory for cache and memtables while dumping everything to the disk. We want to develop a database that will be something in between RocksDB and an in-memory DB, so that huge amounts of RAM can be better utilized.

To improve flushing/compaction and write/read amplification, we want to utilize Tezos business logic to perform flushes and compaction at specific times (taking into account the current usage of resources).

RocksDB has no information of how storage is used, the optimization opportunities are limited because of the clear abstraction. For example, we want to schedule aggressive compaction during low load if we know that high load (end of cycle) is imminent, to speed up future reads. If we have any information about which context hash will be next used, we can prefetch it or otherwise optimize for it.

Community — adding more developer tools

A blockchain is only as strong as the people in its community. We want to help the Tezos community grow by attracting developers from other ecosystems, which will increase the value of the entire network. Additionally, increasing user diversity will also increase the Tezos network’s resiliency and improve its ability to respond to new challenges.

We have already created a number of tools for developers, including the TezEdge Node Explorer, which increases the inner visibility of the node. This is achieved through a real-time visualization of all the data flowing through the node’s various components, including the network, storage and RPC layers, as well as logs.

We want to continue creating tools for developers, increasing their productivity and helping them implement cost-effective smart contracts.

Developer tools

In this regard, our primary focus is to develop tools that will allow developers to debug and profile smart contracts, improve visibility into the storage and also provide them with low level security monitoring and performance monitoring. Developers must know as much as possible about what happens inside the node in order to create adequate and well-functioning solutions.

Baking tools

We also want to create a number of tools through which bakers will be able to easily run, manage and monitor the Rust node, including the ability to manage baking rewards via a simple baking manager. This will help save time and simplify work for bakers who are currently forced into tedious set-ups that include RPCs or command terminals.

We appreciate your time in reading this article and we value the support the Tezos community has given us throughout the years. We’re very excited about entering phase 3 of the TezEdge node’s development, and we look forward to hearing your thoughts on our work thus far as well as our plans into the future.

Feedback regarding the project is always welcome and feel free to contact me directly by email. To read more about Tezos and the TezEdge node, please subscribe to our Medium, view our documentation or visit our GitHub.