Why not shard the bakers?

Why not shard the bakers?

relating to sharding a chain of blocks, tezos specific blocks appear to be relatively linear and shardable.

there are two hurdles to spend tezos received, typically c number of confirmations, like 30, and 1:1 block hierarchy to run all blocks in one dimension (x being a 1 dimensional depth axis hereafter) .

Why not add a y axis for shards to bake along multiple hash radix?

Just as if we were sharding a key-value store this would enable a block to confirm against a much larger spread of bakers, along a progression of nodes to arrive at the one-true ledger (an x depth dimension).

What changes?

We want to divide and conquer the block baking and endorsement as if merging multiple chains processed by a sharding force-multiplier to continue delivering the facade of a blockchain based on depth (x index).

The result would function and comply with the existing blockchain methods and algorithms as a single chain of blocks having a single x index, while the delegation of processing baking and endorsements would occur along boundaries of key value radixes, proportionate to the pool of active bakers (a heuristic rolling average), a y axis, which would presumably be determined as a mem-pool tweak. The baking and endorsement should accommodate a minimum of 2 shards, and should likewise use local and provable key values to maximize the variance of which 2+ chains are tasked between any pool of bakers.

So we are discussing how to fairly bake all the transactions of depth=x by dividing the load across a y coefficient width of bakers. Then we distribute all the work to the existing topology that exists today, effectively dropping the y axis once a block is complete by the mempool’s exposure to the processing agenda knowledge that should exist identically in the connected nodes.

Proposed features

Non-zero load balancing fairness:
We can draw from the cuckoo hash (Cuckoo hashing - Wikipedia) semantics of creating a y buckets for baking nodes that fall into hashing buckets; and when a given hash bucket is full the cuckoo hash algorithm will designate an alternative bucket, and so on, as a reasonable and performant distribution of balancing work more or less fairly.

Transparent baking-only load balancing factors:
The goal of this is to spark discussion illuminating why or why not the possibility exists to operate on every transactions purely on the basis of prior black transactions published and that there are no validation failure conditions based on sibling transactions in the same depth. Once a shard is designated in the mempool and processing nodes are determined, perform work, and publish one or more blocks with x and y correspondence as if we were sharding a directory tree or a nosql key-value store ( Shard (database architecture) - Wikipedia ) we want to deliver the now smaller blocks and reconstruct the chain as previously depth-only one dimension chain, for all intents and purposes the correct choice of shard determination should introduce no divergence from the contract and blockchain processing semantics in play of any given revision.

Safety factors:
We can assume the baking node agenda is known in advance as is presently done with tezos (Edo), but the transactions and keys contained therein are relatively difficult to predict.

Roughly speaking:

Minimum mean shard size
Determined by the rolling average of bakers and reasonable minimum overlap to regulate the bucket depth of the cuckoo hash mentioned above. This is a feature of mempool performed with the queue on hand. This is to reduce attack surface of exploiting empty waiting bakers by spoofed seed values or the engineering of grossly uneven queue depths to desync the network.

Impractical Long Range Attacks on Shards:
Shard delegation would correspond to: Baker’s Node id * Baker’s Published Key * (Transaction hash * n ) * (hash of x + block(x-1…5).merkle)

Redistribution of transactions on contested blocks
We would expect that rewinding chain transactions and any change in number of transactions will create completely unrelated y correspondances to baking nodes from the contested blocks. This should be well afforded by the force multiplier effect of using a greater number of processing homes to verify smaller chunks and arriving at earlier completion, amortized.

Towards the evolution of more scalable smart contracts:

We assume that a successful network effect needs at a minimum an effective and self-organizing heuristic of load balancing and handling usage spikes. We should assume that gas price bidding wars is an unfavorable outcome which can be addressed by the acceptance of the tezos utility, enticing an uptick in the knowledgeable Baking community, and virtuous cycles all around from a simple mechanic of force multiplying and reducing the intervals between baking and endorsing as needed.

Miscellaneous questions:
Luck: to what degree would this change the baking luck, payouts, and staking requirements? I’m the least qualified individual I know to comment on this matter.

With the help of the Tezos Discord I was pointed in the direction of some relevant points about the state of sharding, the Mempools’s present and future expectations, and its ideal situation as a  block delegator and load balancer agent.

I make no claim about knowing the specifics of the OCAML codebase or having experience with the topology, my background is from database and nosql architecture and performance tuning. I would greatly appreciate additional topical pointers to continuing refining and improving my understandings (this is representative of what we’re doing, but adding a few safety features to avoid bad actors The Dynamo Paper | DynamoDB, explained. )

(apologies for mixing rich-text with markdown in advance)


1st of all, welcome! I saw your initial thoughts in the Discord as well. I think I understand a little better what you are talking about. But I don’t see the need.

The only reasoning I could come up with was if block creation/validation was actually a high-effort action. Which it really isn’t on Tezos. Most bakeries are containerized in cloud or run on small physical servers. It cannot be understated how little effort it takes your computer to validate a block lol. Far less than any PoW mining.

Since this is the case, why distribute? Personally I don’t see the need. If we needed faster consensus, that should happen at the consensus layer instead of distributing baking rights. Tezos is built for this, its why we have on-chain governance and upgrades, so that there’s no need to resort to sharding the chain. I’m pretty sure distributed baking rights would have little effect on the overall economy of baking anyways, since it would still be randomly assigned according to how many rolls you have.

Speaking from a “load balancing” standpoint, this is kinda already baked into the protocol. If you stake a roll, you are telling the network to give you some of the validation load, right? If the protocol gives you validation rights, and you don’t actually have the power to validate it within 1 minute, then someone else picks up the block and locked bonds are slashed. PoS is financial incentives all the way down. I’m just not sure you need to load balance a load-balanced system.

Also, in practice no one waits 30 confirmations. Either way its going down to ~6 with Emmy* and hopefully down to ~2 with tenderbake. If you need any info on those let me know.

Depth First Execution Order: Previously, intercontract calls were executed in a so-called “breadth first” ordering. This was believed to be the correct choice when the Tezos protocol was initially designed, but it has turned out to significantly complicate the lives of smart contract developers. If Florence is adopted, the calling convention will change to a “depth first” execution order. This will make it far easier to reason about intercontract calls.

As the Florence protocol replaces BFS with DFS I believe these agora posts are relevant to your sharding question

I’m unaware of the boundless capacity available through the baking load balancing system, as a baker i was not aware my blocks are only a part of the load.

if Tezos can in any way get behind a high capacity potential, without standing up a closed datacenter and shooting fish in a barrel laying claim to millions of tx/s, that ought to happen and be made known; particularly when we’ve been hearing about a certain #2 cap token working on sharding since well before the Tezos ICO.

again, scaling horizontally is better for the ecosystem overall than for any outcome that encourages a gas bidding war.