Concurrency is a notoriously hard problem in Computer Science. In a concurrent system, you usually want to run multiple pieces of a program in a dynamic order, which have access to some shared memory. Those conditions violate many assumptions that are made when reasoning about classical programs. Reasoning about the evaluation order and memory ownership (who can modify some piece of memory) require careful analysis, rather than relying on simple patterns.
But however hard concurrency in regular systems is, blockchains make it much harder:
- Other programs are adversarial! With regular concurrency, you are the one writing all the pieces that will be called together. In blockchains, people will actively try to break your contract by writing contracts that interact in unexpected ways with it. This makes writing a safe smart-contract more similar to a scheduler than a concurrent program.
- Programs have to be right from the get go and errors are unforgiving. You can not rely on tests, looking for errors and patching them as they come.
Because of this, concurrency is expectedly a pain point for most blockchains, That’s why I believe it is worth its own discussion in Tezos.
Here, I am
Message Passing vs Direct Call
In Ethereum, contracts can call each other’s methods, like a regular function call. This infamously led to a lot of problems, to which different mitigations (rather than full solutions, due to the not-self-improving nature of Ethereum) have been developed.
Tezos aimed to avoid this by using a message passing architecture. Concretely, it means that a contract can not directly call another contract, unboundedly yielding its control flow. This makes moot a lot of Ethereum mitigations by default, such as:
- Finishing your state updated before calling another contract
- Adding a self-lock so that other contracts can not unexpectedly call you while you call them
On top of this simple message passing architecture, Tezos has a few more features/quirks related to the interaction between smart-contracts.
Breadth First Search (BFS)
Tezos’ ordering of contract calls is quite unusual. Let’s consider this following chain of execution:
|Current Call||Next Calls|
|…||[ A ]|
|Call to A (which call B and C)||[ B ; C ]|
|Call to B (which call B1 and B2)||[ C ; B1 ; B2 ]|
|Call to C (which call C1)||[ B1 ; B2 ; C1 ]|
|Call to B1 (which call nothing)||[ B2 ; C1 ]|
|Call to B2 (which call nothing)||[ C1 ]|
|Call to C1 (which call nothing)||[ ]|
If we conceive
B2 as being part of the execution of
C2 being part of the execution of
C, then we can see that the execution of
C are interweaved. This makes reasoning about their interactions quite hard, as you can not enforce the complete execution of
B before that of
C. This guarantee of the order of execution is usually a given in paradigms based on regular function/method calls.
SOURCE / SENDER
Tezos’ has two ways to authenticate transfers’ emitters in smart-contracts:
SENDER, which returns the address that called the current-smart contract
SOURCE, which returns the address that created the initial operation that ultimately led to the smart-contract call
I do not know when you would use
SOURCE, and my recommendation has usually been to always use
SENDER. Indeed, relying on
SOURCE for authentication makes for a ripe target of usurpation (the source calls
A, which then calls
B and makes it seem like the source meant to call
Theoretically, all the results of the execution of a smart-contract are known in advance. But practically, it is much easier to check who you directly call than the whole list of called contracts.
Closed World Assumption (CWA)
The first way to reason about smart-contracts is to use some kind of Closed-World Assumption (CWA). Concretely, we could reason about writing smart-contracts imagining that only the contracts of interest are running. This lets use reuse patterns from Cooperative Multitasking.
Cyclic Call Graph
In the example from the BFS section, reasoning about what happens is fairly easy. Each contract processes its inputs, changes its state, and then calls some other contract. There are no cycles (ie, a path where
However, recursie calls happen quite regularly. For instance, when a contract queries some information from another contract. Let’s consider the example where a contract
B holds the balances of contracts, and contract
A needs to know its balance. It will often look like this:
There, it’s not so good anymore. There is a cycle (
A). Everything has become much more complicated:
Ahas to maintain internal state before
Bcalls it back
Ahas to authenticate each caller. For instance, making sure that
Bis the one calling it back. This is further complicated by the fact that
Bcan be decided dynamically (which means it has to be part of the internal state)
We can see here that even with a Closed World Assumption, reasoning about multiple smart-contracts that you control is not straightforward. This makes such interactions error-prone.
Open World Assumption
With an Open World Assumption, we are getting into the adversarial realm. We have seen that even with the CWA, interactions can be hard to reason about. However, in an open world, everything becomes much harder. If we continue with our previous example (
B), then, we must also take into account that:
- Other contracts can call
Abetween the moment
Call with Balance. This means
Ashould manage multiple pieces of internal state (one for each call to
Main Entry Point). Authentication also becomes harder if
Bis dynamic and changes between calls to
Main Entry Point.
- Other contracts can call
Bbetween the moment
Call with Balance. This means that the balance of
Acould change between the moment
Aasks for it and
Areceives an answer. Depending on this needed invariant, this might cause small issues (
Ashould keep track of this internally) to big ones (
Acan not keep track of this and
Bdoes not maintain the invariant
Acares about). Or even crucial ones: all of this is too complicated for some smart-contract writer, who then messes up something.
The Open World Assumption is actually even more unforgiving. For instance, if a contract gives 1000tz to someone revealing a hash-preimage, that contract should factor a non-trivial commit mechanism. Else, bakers could simply censor the reveal operation and take the 1000tz themselves.
More attacks (on Ethereum) are described here. The one we mentioned is also common with Ethereum and is called Front Running.
Having such a document that people have to learn by heart, and to which more is added to as attacks are discovered, is exactly where I do not want Tezos to end. Whenever you are thinking with an Open World Assumption, you are actually never really safe. You only operate following guidelines, hoping that they cover all attacks.
In upgradable systems, it’s actually not that big of a problem (except for 0-days). You can just upgrade your software whenever new attacks are discovered. This is not possible with blockchains.
If part of this complication stems from using a Message Passing Style and BFS, then why not move Direct Call Style and DFS? The answer is two-fold:
- They hid and hide a lot of the complexity inherent to concurrency. This complexity however does not disappear, and makes it susceptible to bugs. It would be similar to moving to an untyped language, because the typer is too merciless.
- They are likely totally incompatible with sharding. While Direct Call Style and DFS rely on strong synchronicity assumptions, Message Passing Style make calls asynchronous and BFS can be seen as a way to deterministically randomize the order of arrivals of messages. As such, the latters are more compatible with the assumptions of sharding than the formers.
I will discuss here some solutions. They are not necessarily mutually exclusive, and none I believe is a silver bullet.
Going The Ethereum Way
It was discussed before, but we could copy Ethereum.
Concretely, this would mean using a Direct Call Style and DFS. However, it is still easy to shoot yourself in the foot (and thus requires following a lot of ad-hoc rules, added over time), and is not compatible with sharding.
Read Only Calls
A simple mitigation of the problem would be to use Direct Call Style only in a limited setting.
I have identified one limited setting that seemed both safe and useful. It is described at length in Read-Only Calls. It consists of a direct call to another contract in order to get some data out of it (no mutation being possible).
I am waiting for feedback from other core developers before merging it or ditching it.
Another mitigation would be to expand on the
SENDER mechanism. This gets us something close to Capability Based Systems. It is been discussed on the Contracts Signature post (which I prefer to call “stamps”). I have worked on a WIP MR (which already features a prototype and a test). I am waiting for more feedback before putting more deep work into it.
I will likely make another post to explain the precise problems it solves, the design space, and the choices that I have made.
A proper solution would be to design a sound framework in which one can confidently write concurrent smart-contracts. Whether it should be implemented as an amendment, in high-level languages, or both, remains to be decided.
I have knowledge of different approaches that could be used to design such a framework:
- Academic literature. Beyond some basic courses, I do not know much about PI calculus and Join calculus. At first sight, they do not seem amenable to blockchain environments. I am open to changing my mind in light of evidence that they actually are. I have heard that an INRIA researcher was working with Nomadic Labs on this.
- Programming Languages. Some programming languages tackled similar problems (lockless concurrency, capability based security, etc.). I had a look in the past at Pony, which is about a completely different setting. More recently, I have read about ERights, which introductory material is incomplete and has not been updated in a while.
- Layer 2 solutions. Some layer 2 faced similar problems. A fully general framework must account for the fact that different operations on a smart-contract might or might not be incompatible. This is linked to ownership of memory pieces, which is a main theme in some Layer 2 solution.
I by no means consider this research exhaustive. I do not have time to do a comprehensive overview, but follow pointers people link to me.
What would be good would be to have more people collaborate and research on this issue.
I hope this served as some introductory material about various aspects of concurrency on Tezos.
In the meanwhile, before better solutions are developed, I recommend to smart-contract developers to avoid complex interactions between smart-contracts as much as possible.
If you happen to necessarily need complex interactions between smart-contracts, please reach out here.
If you want to participate in an effort to alleviate one of these problems, please reach out! I would love more people working on core matters.