Tezos calling convention migrating from Breadth-First to Depth-First Order (BFS to DFS)

Summary: If the Florence proposal is adopted, we recommend you do not deploy new Michelson contracts that are dependent on the BFS calling convention. We do not expect this to be a problem in practice. However, those planning on deploying contracts in the near term should check that their contract’s correctness is unaffected by the change in calling convention.

The current calling convention for intercontract calls in Tezos is that they are added to a “first-in, first-out” queue, also called a “BFS” (or “breadth-first”) approach.

The proposed Florence protocol update includes a switch to a “first-in, last-out” convention, also called a “DFS” (“depth-first”) approach.

Early in the history of Tezos, the decision was made to use a BFS calling convention for intercontract calls. This was motivated by theoretical work that appeared to show that BFS would be superior to DFS for this purpose.

However, experience indicates the opposite:

  • the BFS calling convention can confuse some developers and cause errors, and
  • it complicates porting contracts from other chains, where DFS calling conventions dominate.

A number of mechanisms were considered for the Florence proposal to add a DFS calling convention in addition to BFS. However, when mechanisms for backwards compatibility with legacy contracts were considered, subtle bugs were frequently found that would impact the correctness of existing contracts, and which could also render future contracts unsafe.

In practice reasoning about a mixed calling convention is hard, and accordingly the Florence proposal contains a straightforward change to switch the Tezos intercontract calling convention from BFS to DFS.

Replays of on-chain history indicate that this migration does not break the ordinary functionality of any existing live contracts. Because the BFS convention typically provides few useful guarantees, it appears that current contracts deployed on the chain are insensitive to calling order (we speculate that authors just found reasoning about BFS calling order too difficult, so avoided depending on it).

However, new contracts with a calling order dependency might get added to the chain between the injection of Florence and its adoption. Therefore, we recommend you do not add new contracts that depend on BFS calling conventions to the chain – if Florence is adopted, such contracts would break.

Note that, unless you explicitly built a contract to depend on the BFS calling convention, it probably doesn’t.

We will continue to monitor the chain for new contracts that might depend on calling convention ordering, and attempt to contact their authors.

Although this upgrade path is imperfect, it seems the best available option for improving both the smart contract developer experience and the future safety of contracts deployed on the Tezos network. The alternatives would have increased complexity and created unacceptable scope for error.

The BFS calling convention for smart-contract interactions was an unfortunate design flaw. However, Tezos can self-amend and is always evolving, so we in the Tezos community can solve this issue by adopting the Florence amendment through an on-chain vote.

5 Likes

Contacting the authors of all affected contracts might not be an easy route to take once this network grows to a massive size. There may always be core changes that break smart contracts. Therefore shouldn’t Tezos have an official messaging network built within? The only downside is more gigabytes to the blockchain but every message can have a fee. Or build the official messaging platform on layer 2 and have each user store their own chat history (off chain).

Considering the response to this ToB researcher, I’m very happy to see this change of perspective and calling convention coming to Tezos.

This brings a similar execution pattern to ETH, and while there will be issues around reentrancy they seem like a price worth paying in order to avoid the issues outlined in the original post from ToB. I can tell you that in Kolibri we tried as hard as we could to avoid calling between contracts due to fears of security issues and a lack of ease when it came to reasoning about them.

In that vein, are there any plans for reentrancy protections built into the protocol itself (possibly a reentrancy lock michelson verb)? Some mitigation controls built in IMO would go a long way towards ensuring that reentrancy issues don’t become systemic.

Replays of on-chain history indicate that this migration does not break the ordinary functionality of any existing live contracts. Because the BFS convention typically provides few useful guarantees, it appears that current contracts deployed on the chain are insensitive to calling order (we speculate that authors just found reasoning about BFS calling order too difficult, so avoided depending on it).

One thing I am concerned with, specifically to Kolibri, is that we have contracts that have functionality we’ve never executed on mainnet (like a global pause button) - have you run this same analysis against delphinet (or any other testnets)?

If not, how can we have any assurances that if (heaven forbid) we do need to hit that button it’ll actually work?

2 Likes

We would like to express our concerns with respect to the latest Florence proposal. Particularly, we are worried about changing the execution order from BFS to DFS.

Note that:

  • We agree that the switch is necessary: it will open up the possibility of having more complex behaviors and interactions between the contracts, especially the ones developed by different teams. Such emergent behaviors are necessary for the ecosystem to progress.
  • We agree that just doing the switch is superior to combining the two operation orders in any way. The latter has a lot of dark corner cases, and developing contracts in this hybrid setting would be akin to playing 5D chess with multiverse time travel.
  • We have checked a number of mainnet contracts (with a special focus on DeFi contracts) and have not identified any contracts that become vulnerable due to the BFS→DFS switch.

However, we still have some concerns regarding how the switch is executed:

  1. While our research has not revealed any vulnerabilities in the reviewed contracts, we have identified some potentially unintended side-effects caused by the switch.
  2. Florence proposal’s campaign seems to understate the possible risks. It is impossible to carefully review all the contracts on mainnet and in development, some of the vulnerabilities or side effects may pass unnoticed. The proposal description, in turn, mentions the switch like any other entry in the changelog, which may be misleading for the voters.
  3. A simpler execution order may incentivize people to blindly use more inter-contract communication. There are currently no comprehensive tutorials or best practices for inter-contract calls – it may lead to false assumptions, misunderstandings, and vulnerabilities, especially with CPS view entrypoints.

1. Unintended side effects

Changing the execution order may lead to issues even if no single contract becomes directly vulnerable. In DeFi, subtle differences in the behavior of a set of contracts can lead to attacks like the famous Uniswap and Lendf.me breaches. When we analyzed the Tezos ecosystem for vulnerabilities of this kind, we concluded that such attacks were not possible, partially due to BFS execution order.

We have identified two unintended side-effects of the BFS → DFS switch in CFMM (Uniswap-like) contracts:

  1. By abusing tokenToToken function in Dexter or ctez CFMM, one can get a flash loan in tokens from a CFMM contract.
  2. A consequence of (1) – if ctez CFMM contract is a price oracle in another contract, it becomes possible to change the returned price of the asset by putting the oracle call in between loaning and returning.

These can’t be treated as vulnerabilities per se – as of now, it seems it’s not possible to use these side-effects to steal funds or buy expensive assets cheaply. However, in certain conditions, e.g., if some contract relies on CFMM as a trustworthy price oracle, this might become a problem.

But the greater concern is that one can’t catch such subtle differences in behavior by only “checking if a contract assumes a certain execution order”. A more in-depth analysis and collaboration with smart contract developers are required.

2. More inter-contract communications

BFS execution order provides almost no guarantees to contract developers, which makes them use inter-contract communication as rarely as possible. The switch to DFS will incentivize people to split their contracts and, likely, make mistakes. There is currently no comprehensive guide explaining how to prevent reentrancy attacks in the DFS world, what bad patterns should be avoided and why, and which ones should be adopted instead.

One of the stated reasons for the switch is that CPS-style views become reliable in the DFS world. Namely, with DFS you can be sure that when you call a CPS view, you will get the result in the first reentrant call. We may assume that if Florence is accepted, there will be more people who rely on CPS-style view calls. This should be a concern due to two reasons:

  1. It’s hard to use CPS-style views correctly: you need to use synchronization primitives like mutexes or semaphores, ensure that the contract never ends up in an intermediate state, check the source of the data, etc. High-level languages may, in theory, try to encapsulate this, but currently such communication is still manual and error-prone.
  2. If a contract implements a view, one can trick it to issue an operation. Currently, Tezos contracts mostly use sender value for authorization – according to our review of the mainnet contracts, tickets are not quite popular yet. The security of the existing contracts now relies on the fact that the existing CPS views return types don’t match the types of entrypoints like FA2 transfer. If CPS views become more common, the type signatures would likely match eventually, e.g.:
    1. Imagine there’s a multisig contract with (pair address (contract bool)) %isOwner CPS view.
    2. Let’s also say this multisig contract controls a token with bool %pause entrypoint. This entrypoint checks sender == multisig, and pauses the token if the condition holds true.
    3. We can then bypass the signatures check by calling %isOwner with the callback address equal to KT1TokenAddress %pause.

3. Campaign

Introducing the switch as a regular “protocol proposal” without providing an in-depth analysis of the possible risks is irresponsible. When stakeholders vote for such a proposal, they don’t have all the information – the change in the execution order looks like a regular entry in “Florence changelog”, and the trust in the authors of the proposal (“a joint effort from Nomadic Labs, Marigold, DaiLambda, and Tarides”) makes voters lean towards accepting the proposal without researching the possible consequences.

This creates an alarming precedent: ”trustworthy” teams may, in theory, make a harmful proposal succeed, counting on voters who are accustomed to accepting proposals that resemble technical changelogs without researching their implications.

Conclusion

The Florence proposal may lead to unintended side-effects and increased complexity of inter-contract communication. Moreover, it downplays the possible risks by saying that the change in the execution order should not affect the existing contracts. We suggest providing a risk disclaimer in the proposal so that this matter can get the attention that it deserves. That being said, we still consider the proposal beneficial for the ecosystem and recommend accepting it.

8 Likes

Forwarded this to the Telegram: Contact @TezosGovernance room to raise awareness with the bakers in Telegram.