Concurrency, BFS vs DFS (and a proposal)

I would like to continue the discussion about concurrency/BFS vs DFS, as described
in https://forum.tezosagora.org/t/problems-with-concurrency/1771

  1. I understand that moving to message passing (all effects are final
    before operations are executed) solves the problems of
    control-flow in Ethereum. An alternative would be to have
    call-return without effects (only fail/no-fail) or limited effects
    after the return. This limited call-return enables useful patterns
    like defensive programming using monitors/runtime-verification as
    pointed out in https://forum.tezosagora.org/t/monitors/1982

    In a nutshell, one can write “checkers” (even written by independent
    developers as a wrapper) that inspect the effect of the whole smart
    contract invocation and decide to accept or abort.

  2. Is there any reason beyond sharding to prefer BFS over DFS? Again,
    DFS is more natural to think about and it would another way to
    implement monitors.

    I would like to suggest an alternative to sharding that can be used
    both in DFS and in BFS (effectively making DFS efficient). Ideally,
    if one knew the effects of all sub-operation generated by a
    contract many call sequences (whose semantics are sequential) could
    be run in parallel. However, Without formally proven contracts this
    is not feasible.

    But, inspired by optimistic concurrency control, one can just
    assume that calls are independent and run them in parallel. Since
    the effects are of course known after the execution, one can check
    after the facto whether the parallelization was to optimistic, and
    (if necessary) re-run the whole tree (this time with more knowledge).

    By the way, this method of run-and-check can be used with
    BFS+tickets. In my view, as discussed in
    Contract signatures since
    tickets have an immediate global effect (e.g PUNCH) this would
    jeopardize sharding (as an operation can depend on whether a
    previous operation PUNCHED a shared ticket).

What do you think?

2 Likes

An alternative would be to have
call-return without effects (only fail/no-fail)

See Adding Read-Only Calls for a discussion on this topic.

I think BFS will become an “obvious” choice, if we think about Tezos operations in terms of ACID transactions. Tezos already provides isolation (each new operation and its spawned operations are executed in isolation from the other operations in the pool) and durability (once a block is baked, the operation result cannot be lost).

Tezos solves atomicity for the following cases:

  • Individual contract call (individual transaction) atomically updates state of the single contract
  • A contract can spawn a list of operations to invoke other contracts. Such a list can be considered a composite transaction (composed of a sequence of individual transactions). BFS order guarantees that the composite transaction will not be interleaved with the operations from other composite transactions.

However, each operation can depend on the state of exactly one contract. What if we want to express calculation that still updates the state of the called contract, but depends on data from other contracts on the chain? Currently, the only available option is continuation passing style (CPS) entry points; Read-Only Calls are also proposed.
But those techniques introduce their own problems that violate composite transaction consistency. There is a detailed analysis and some proposed solutions here.

Monitors are an interesting feature, but it looks like they do not address multi-contract interactions.

I think it is an interesting idea, very similar to software transactional memory (STM).

I am not sure I understand. In my understanding BFS breaks this notion of atomicity (DFS does not). If an operation A spawns two further operations B and C, BFS does not guarantee that B (and its suboperations say B1 B2) execute ACID with respect to C and its suboperations (say C1 and C2). If B, B1, B2, C, C1, C2 punch tickets (for example, which is a global shared resource, and therefore has visible effect across oprations), the BFS order is B,C,B1,B2,C1,C2. Therefore the atomicity of B+subops with respect to C+subops os broken. The DFS order would have been B,B1,B2,C,C1,C2 which does preserve atomicity.

I want to understand why BFS is preferred over DFS (is it only efficiencty/sharding?) because there is a price to pay in BFS: atomicity, etc.

Am I missing something?

I am not sure I understand. In my understanding BFS breaks this notion of atomicity (DFS does not).

In your example, if operation A spawns operations B and C, [B,C] become a composite transaction. BFS guarantees that [B, C] will not be interleaved with any other operations (like [B, B1, B2, C]). I think about a composite transaction as a sequence of updates of multiple contracts that transitions one global consistent state to another global consistent state. The global state may be inconsistent in the middle of the composite transaction, thus injecting operations in the middle of a composite transaction will make them operate on inconsistent state.

Let’s imagine some kind of accounting application where each account contract maintains its own balance and supports debit and credit operations. There is also a banker contract with transfer operation. To fulfill a transfer, banker contract spawns two operations to debit and credit two accounts:
[transfer_op, debit_a, credit_b]. When executed, the global state (total balance of all accounts) is consistent before the operation debit_a and after the operation credit_b, but it is inconsistent in between. If contract A, spawns some operations depending on the state of the contract B using DFS order ([transfer_op, debit_a, a1, a2, credit_b]), those operations will access inconsistent global state.
If a1 or a2 operations involve balance of the contract B or a combination of balances A and B, they will operate on inconsistent state.

Taking a bit of a sidestep to outline what introducing DFS in Tezos could look like and different potential strategies. This list a few strategies with various degrees of simplicity vs integration with the existing system.

  1. Yolo

All calls are now DFS calls, contracts that relied on BFS behavior may be broken. This does not seem like a good option. It might have been possible to rip that bandaid at one point in history but we’re past that.

  1. Parallel universe

Introduce KT2 contracts which are very much like KT1 contracts except that they can only call other KT2 contracts and that their calls are BFS. In this approach, there are two incompatible universes of smart-contract, one based on BFS, the other on DFS, may the most useful one win. This may be a bit of a naive approach but it is very easy to reason about the fact that this does not break what currently exists. It clearly dominates the yolo approach.

  1. Communicating universes

Like parallel universe, except KT1 and KT2 contracts can call each other. Calls from KT1 to KT2 are executed first and on a stack, calls from KT2 to KT1 it gets placed on a queue. There could be a single queue but it’s likely more natural to have a queue per level on the stack.

  1. Contracts choose

In this mode there is no new contract types but rather a michelson opcode called “DFS : operation -> operation” which tags an operation as needed to be placed on a stack. When the operations are returned, the ones that need to go on a stack are extracted and executed on a stack. This is similar to “communicating universes” but the call type is determined by the calling contract’s script and not by the kind of contract being called. This has the downside that a contract cannot require to be called a certain way, it’s not clear how much this matters.

  1. Direct calls with KT2

Either like “parallel universe” or like “communicating universes”, but KT2 contracts return a value and can be called from inside the script as opposed to being explicitly passed as an effect.

4 Likes
  1. If we plan for them to be separate forever, isn’t that like either a fork or other explicit duplication of the network? This seems to punt the issue too much.
  2. This allows for the most control, but then the choice of BFS vs DFS might as well be non-deterministic
  3. Explicit return types (where current contracts are grandfathered in as having an empty return type) makes the most sense to me: then we can add primitives like DFS : operation -> operation with more interesting types to control the execution flow
2 Likes

They aren’t fully separate even in this proposal as tez can be transmitted through implicit accounts. So not, it’s nothing like a fork. Balances are 99% of the “state”. And of coure, there’s no reason to say they have to be separate forever.

No, if you want to make sure you are called back immediately as soon as a contract processes your request, you get that. That’s the main benefit of DFS.

Most likely. We lose the purity but I think that is what DFS points towards anyway.

Continuing with the discussion of BFS/DFS… I have been working with Margarita Capretto @mcapretto about Monitoring (see post Monitors), in particular studying new mechanisms, opcodes, etc that allow to effectively implement monitors. We already have some negative results (impossibility) and positive results.

One of the topics we explored is the influence of the scheduler order (BFS, DFS, etc) and we studied different ways to mix BFS and DFS.

We always assume that it is crucial to guarantee legacy (old contracts keep running as usual, assuming that internal operations they invoke run as usual), so no Yolo.

Names and implementation:

  • The Tezos scheduler allows to insert operations after the execution of a contract, and to extract the next operation to execute.
  • Internally the scheduler maintains a buffer with the pending operations.
  • The runtime system uses the scheduler to run a contract until either the buffer is empty or the gas is exhausted or some contract fails (explicitly or balance errors).
  • BFS corresponds to the scheduler adding the new operations at the end of the buffer (aka FIFO or queue). This is what Tezos does now.
  • DFS would correspond to adding operations at the beginning of the buffer (aka LIFO or stack).

Why not exploring different or mixed ways to insert and extract from the buffer?

  1. Mixed First Search (MFS). Operations can generate two lists (with the second lists empty by default for legacy). The operations in the first lists are added at the end of the buffer (BFS) while the operations from the second list are added at the beginning (DFS).

  2. Double Buffer First Search (DBFS). The scheduler maintains two buffers, and iterates over one until exhaustion (in a BFS, DFS or MFS, to be discussed) before starting with the second. In our study of monitoring this was explored to run monitors at the end of the external operation (the second buffer would be for monitoring only). This is very useful when combined with capabilities, for example limiting the operations from the second buffer to NOT generate further operations (or even change the storage, etc).

  3. Contexts. The idea here is that when an operation is created, it can be labeled to be “framed” with its own buffer of sub-operations (like a push so-to-speak). The semantics of this local buffer of operations is again an orthogonal issue (BFS or DFS or MFS…) but this local buffer will be emptied before continuing with further operations generated by the father operation. This mechanism eases the separation between generated operations, for example to guarantee legacy or to guarantee isolation between operations.

Note that all the ideas above (6. 7. And 8.) still maintain that operations are functions that terminate generating buffer(s) of operations to be scheduled (no call-return in the middle of the function).

What I was describing in 4 is I think similar to what you describe in “Contexts” with the difference that every context can have its own BFS queue. It may be simpler to treat contexts as pure DFS which is I think what you’re describing.

I understand that your proposals maintain the property that operations are only returned at the end of a contract. As you may have seen in my post, I’m wondering if it might make sense to go further and simply treat them as function calls within the code. This would pair well with the concept of contexts. Execute like function calls within one context and then output operations. Yes, this means people might be a bit more likely to create reentrancy bugs, but the same is true with offering DFS anyway. It’s also likely that high level languages would abstract away a callback mechanism and let you write the calls as if they happened immediately, so why not…

I think Contexts (as proposed above) are a bit more general than pure DFS, because a Context may for example, invoke a legacy contract (designed a verified in BFS) to be run using its local BFS buffer. We can craft strong proofs that this notion of isolation guarantees legacy behaviour. Other uses of isolation can be equally useful, for example to guarantee that the contract balance is used and subtracted before moving on to the next child operation.

Of course we could design functional style call-return (DFS), but I think it is wise to restrict the semantics simple with a functional semantics for local operations and a DFS at the level of remove/local invocations. Note that there is effect here (storage gets updated after each operation), so pure/effect gets cleanly separated. Also, monitors can easily abstract away the functional part and inspect only the parameters/storage-changes. Reentrancy bugs can easily detected/eliminated with monitors (that for example can fail if a reantrancy operation occurs within a “Context”).

Does this make sense?

1 Like

I think it would be useful to illustrate this with pseudocode examples for clarity.

My name is Eli Guenzburger I am researching with Prof. Cesar Sanchez at the IMDEA Software Institute.

When an operation is marked as Contextual, all of its resulting operations are executed before subsequent operations in the queue. In contrast to the dfs proposal (Point 4 in earlier post), contexts preserve the legacy bfs calling pattern of existing contracts on the blockchain. In this way, if I have an existing contract that I verified as safe assuming a bfs calling pattern, I do not have to worry that anything will break as a result of another contract calling my contract as dfs.

We will mark a Contextual Function (CF) as Contextual.
By default, a function is a Non-Contextual Function (NCF).
An operation that triggers a CF is a Contextual Operation (CO)
And one that triggers a NCF is a Non-Contextual Operation (NCO)

BFS (Status quo)

//exec function takes an operation, executes it and returns the list of generated operations. 
exec : operation -> operation list 

API: BFS([s])

BFS(Queue Q):
While (Q is not empty)
  op = Q.dequeue()
  Q.enqueue(exec(op))

BFS + DFS proposal

API: BFS([s])

BFS(Queue Q):
While (Q is not empty)
  op = Q.dequeue()
  if op is NCO
     Q.enqueue(exec(op))
  else (* op is DFS *)
     DFS([op])

DFS(Stack S):
While (S is not empty)
  op = S.pop()
  S.push(exec(op))

BFS + Context proposal

API: BFS([s])

BFS(Queue Q):
While (Q is not empty)
  op = Q.dequeue()
  if op is NCO
     Q.enqueue(exec(op))
  else (* op is CO *)
     Q'= new Queue()
     BFS(Q'.enqueue(exec(op)))

BFS vs BFS + DFS vs BFS + Context Examples

  1. The first table below describes a set of operations, and the operations they generate respectively. The three tables below it describe the different order of operations that result when operation A is executed using 3 different scheduler algorithms (while marking operation B2 as DFS or Contextual in tables 3 and 4 respectively).
Operation Emissions
A B1 B2
B1 C1 C2
B2 C3 C4
C1 D1
C2 D2
C3 D3
C4 D4
D1
D2
D3
D4
  1. Current Order of Operations (Pure BFS)

Order: A, B1, B2, C1, C2, C3, C4, D1, D2, D3, D4

Executing Queue Q (Post ex)
A B1 B2
B1 B2 C1 C2
B2 C1 C2 C3 C4
C1 C2 C3 C4 D1
C2 C3 C4 D1 D2
C3 C4 D1 D2 D3
C4 D1 D2 D3 D4
D1 D2 D3 D4
D2 D3 D4
D3 D4
D4
  1. B2 labelled Contextual Operation

Order: A, B1, B2, C3, C4, D3, D4, C1, C2, D1, D2

Executing Queue Q (Post ex) Queue Q’ (Post ex)
A B1 B2
B1 B2 C1 C2
B2 C1 C2 C3 C4
C3 C1 C2 C4 D3
C4 C1 C2 D3 D4
D3 C1 C2 D4
D4 C1 C2
C1 C2 D1
C2 D1 D2
D1 D2
D2
  1. B2 Labelled DFS

Order: A, B1, B2, C3, D3, C4, D4, C1, C2, D1, D2

Executing Queue Q (Post ex) Stack S (Post ex)
A B1 B2
B1 B2 C1 C2
B2 C1 C2 C3 C4
C3 C1 C2 D3 C4
D3 C1 C2 C4
C4 C1 C2 D4
D4 C1 C2
C1 C2 D1
C2 D1 D2
D1 D2
D2
4 Likes

Thank you, this is super helpful and the pseudo-code makes it very clear what the proposals do. These two options are basically what I had in mind. In the BFS + Context proposal it’s natural to return all operations at the end of a contract but, in the case of the BFS + DFS proposal, I wonder if it might be worthwhile to directly integrate the DFS calls inside the contracts and have them return a value. This makes the code definitely less functional but it might make it more usable.

It’s also quite possible that users end up relying on high level languages that emulate this direct, effectful, function call style by saving the program point in the storage and using callbacks, so it might be worth just giving it to them? Or perhaps, as a middleground, make views (read-only calls) accessible this way.

2 Likes

Thanks a lot for this presentation, @EGuenz.

What seems absolutely clear to me is that the vast majority of smart contract developers would be much happier with DFS + return values.

DFS alone would already be great for some patterns but it feels like return values would be so much closer to the intuition expressed by everyone, with the rare exception of (some) core devs who, if I understand well, value the (slight) advantage of BFS for sharding (personally not convinced about it).

Return values are so useful because they act like a specialized entry point with a perfect knowledge about context (versus a callback that needs to check if it gets an answer by the right contract, for the right question, etc. <- maybe tickets are the alternative for the context part, but they require discipline).

Encoding the context in tickets is only part of the answer, with return values you get the additional element that you know that you can do something after your call. The alternative without return values is the more cumbersome encoding: put something in your storage and call yourself back again after every call to check after the fact that you were indeed called.

Last element, I think we could/should support a call without reentrancy (this may be a parameter when you call another contract).
Without reentrancy means that yourself (not any contract) cannot be called back during this call.
And I would even put this option by default with DFS + return values.