Deriving FA token balance_updates from big_map_diff

This is a technical suggestion (+ proof of concept) of doing FA token accounting by indexers.

Reasons

There are two separate issues while indexing token operations:

  1. We need some basic metadata to display balances, at least symbol and number of decimals ;
  2. In FA1.2 and upcoming FA2 standards the transfer method is standardized, but there are other methods that alter token balances such as mint and burn ; thus we cannot do accounting based on transfer call parameters only.

What we need

A lightweight, and preferably stateless solution that could derive token balance updates from just RPC output ( /chains/main/blocks/{block_id} ). This would allow to add generic token support to the standard indexer workflow at minimal cost. Ideally, it should work like the balance_updates receipts.

Big Map Diff

It can be argued that the vast majority of contracts implementing FA-family standards are/will store its balance ledgers in lazy structures aka Big_map . Thus, the big_map_diff data (list of changed Big_map keys with new values) that is being attached to every operation is actually what we need. The only problem is that different contracts have different storage types and we need to know exactly how to extract the data we need.

Agnostic to standards

Using big_map_diff data as a calculation basis gives us a relative independence from the token standards. It doest matter whether transfer / mint / burn / other method was called - in the end we get just a list of balances for each token holder affected.

Custom handlers

As mentioned above, in addition to FA-standardized methods, there may be other methods that modify user balances and it is impossible to standardize them all, plus we would have to sacrifice flexibility.
This means that indexer developers have to implement custom handlers for each new FA token (this is how it works at the moment). It’s not good for many reasons, and there should be another solution.

Michelson scripts/plugins

The idea of using Michelson as a generic script language has been voiced several times [1], [2] and seems like a way out. But the following question arises: how to execute these scripts. Of course you can use the standard RPC run_code endpoint but it’s costly and suboptimal.
Luckily, there’s an ongoing work on encapsulating the Michelson script interpreter and hopefully we will be able to link it as a standalone library in the future. Moreover, there are several other Michelson implementations (in Haskell by Serokell, in Python by Baking Bad, probably more), and since we need only a relatively small subset of instructions (no blockchain bindings which are most costly to implement) it’s a rather doable task to write an own, or to make a collaborative effort.
NOTE: You can actually write a parser script in any high-level language you want (LIGO, SmartPy, Lorentz, SCaml, etc.) since it’s a valid Tezos contract and can be compiled down.

Possible Shell integration

Another alternative to writing own interpreter is to integrate this plugin system into the Tezos Shell. It would require to register custom big_map_diff parsers by the node operator, e.g. like that:

tezos-node register big_map_diff handler "tzbtc_parser.tz" for big_map 31

From the POV of a developer it could look like an RPC response extension, for each supported Big_map he would receive an extra operation receipt:

...
"operation_result": {  
 "balance_updates": [{
      "kind": "token",
      "address": "tz1d75oB6T4zUMexzkr5WscGktZ1Nss1JrT7",
      "balance": 100500,
      "symbol": "TZBTC",
      "decimals": 8
    },
    ... ]
},
...

Other questions

Who will write parser scripts

Token developers. Many scripts would likely be reusable (especially in case of contract factories).

Where to store parser scripts

It seems logical to keep them on-chain since they are actually valid contracts. FA token contracts can keep the parser address (pointer) in the storage.

Could there be multiple parsers

The approach can be generalized to extract any other data and not just from big_map_diff , but also from contract storage, operation parameters. You can basically move all view methods that are not used by other contracts off-chain, make them external (as suggested in [1]).

What if it’s impossible to calculate balances using just big_map_diff?

In cases when the holders’ balances are calculated dynamically (or else) this approach won’t work and one would require full-fledged external views (see [1], [2]). However, big_map_diff still shoud be considered as a source of info which paricular accounts were altered.

Proof-of-concept

A parser script for the TZBTC contract.

Check out the step-by-step tutorial:

NOTE: This is probably the hardest case one could imagine, most scripts would be much much simpler.

A balance_updates derivation demo

Implemented using the TZBTC parser script and PyTezos library:

Inspired by

[1] External Views by Gabriel Alfour
[2] https://smondet.gitlab.io/fa2-smartpy/tutorial.html#get-balance-off-chain by Sebastien Mondet

3 Likes

For what it’s worth, I have real code for a real use case where some balances depend on a big map but also the current timestamp (e.g. some decay continuously over time, some accrue). You wouldn’t be able to infer balances just from a structure

Can one assume this is an “unrealized” balance, and it will become “realized” when it is claimed in some way (by transfer or otherwise)?

Yes, in theory there are use cases with dynamic balances. I can even imagine a use case with random balances. However, in practice the number of “classic” assets with non-dynamic balances is incredibly larger than those hypotheticall use cases.
At the moment the only working approach is writing custom handlers for each FA token. Obviously, this approach doesn’t scale, so by extending the existing FA standards with deriving token balance updates from big map diff we can make life of 95% of token developers and 100% of indexer and wallet developers easier.
Let FA1.2 be generic and FA1.2s (for example) be compatible with deriving of token balance updates.

2 Likes

I wouldn’t call it an “assumption” in as much as a categorization… it’s a matter of UX in fine… if you’re not displaying to the user the unealized total, they are missing something, if you are, then you’re doing the calculation and all is well…

You’re right. For your particular case one would need to apply a full-fledged external view for each balance request either single or batch.
Big_map_diff actually gives us the independecy from standards, because it doesn’t matter whether it’s a transfer or multi-transfer or mint or burn - int the end we get a list of changed balances for a particular holder/token.
In addition to that it gives a huge boost since we don’t have to make extra context/db calls. But this approach doesn’t cover all the cases, I agree. [updated the post]

1 Like

Since a Michelson script is token contract specific and most likely needs to be tailored for the contracts storage structure and big_map type, can we think about it as an external view for the token contract? Token contract developers may implement dynamic balance calculation as a part of the Michelson script that extracts balance data from the big_map

1 Like

Sure it is. Basically, one can consider our apporach as a special case of an external view that does not require the whole context but a contract storage + changed big map values only (big_map_diff), and it’s actually a “truly” pure function.
What am I trying to convey is the following:
Imagine a typical (simplified) indexer process that consumes one block after another and parses operation contents among other things. Let’s say he met a transfer method invocation of a known FA token. We know that by standard there are to and from fields holding altered account addresses and we should go apply an external view to that token in order to query actual balances for those accounts. But what if he met mint or burn or another method altering token balances? It’s said nothing in the standard so he’ll just skip them.
Why is it needed? In order to do this (consistent accounting) https://tzkt.io/tz1VvWQ93JYFY1bw8QC6SrV2KdJCkDsRnVVu/operations