RFC for Table of Constants TZIP

The corresponding MR: https://gitlab.com/tezos/tezos/-/merge_requests/2474

Several things about names:

First, I’ve gotten the feedback that “constants” might confuse users, who might not associate things like lambda’s with the word “constant”. An alternative is “global table of values”. I think I use “global constants” in the code. I’m not particular - we just need to pick something and I’ll make it universal in the code.

Second, about the names of the constants themselves (that is, their keys in the tables). For this initial pass I’ve allowed arbitrary strings, but obviously this is too lax. The choice before is: do we allow users to pick relatively arbitrary names (like package names in opam or npm)? Or do we do some kind of name based on the content of the value?

Allowing users to name things means we have to deal with several social problems like name squatting and name spacing.

Hashing the content is a really cool idea. Gabriel has pointed it out this locks us into whatever hashing algorithm we use - this is a serious drawback to be considered. However, overlooking that for a moment, there are some superpowers to be gained if we can find code on the chain based on its content. unison lang is doing some really stuff with content-addressed code, as is Formality lang/Moonad and it’s more recent incarnations. In short, if you have a guaranteed hashing scheme, you never have to publish the same code twice to the chain. Compilers could take advantage of this and hash-consing everything automatically, reducing the size of contracts substantially. The most used addresses could be heavily optimized (e.g., converted to native OCaml).

I’m sure there are drawbacks even beyond what Gabriel pointed out, otherwise everyone would be doing this aleady and Unison wouldn’t have any buzz about it. In fact, Arthur explicitly said we may want to avoid providing any guarantee of a mapping from value to address - I’d be interested to hear him elaborate on his reasons.

1 Like

@murbard , in your original post, you suggested a Michelson instruction that registers a global constant. What is the purpose of such an instruction? Why would contracts need to create libraries, rather than managers simply publishing them, (especially since you cannot dynamically construct lambdas from within Michelson)?

Isn’t this proposal less generic than the one about views (Views TZIP)? Assuming we have views I can register a constant c named n of type ty by originating the contract parameter never; storage unit; code {CAR; NEVER}; view n unit ty {DROP; PUSH ty c} and then I can access it by calling PUSH address "KT1.."; UNIT; VIEW n unit ty or am I missing something?

Indeed, it is less generic; however, it incurs less overhead because the whole script doesn’t need to be type checked. If we could ever move to type checking contracts only once, we might indeed be able to replace this command with Views. In the meantime, smart contract developers sorely miss being able to produce larger contracts, and global constants provide an instant relief.

@murbard, as the original proposer of this, I’m interested in your thoughts: can we just use views in place of global constants?

The only difference I see would be locality in the context tree. Once the protocol starts handling cursors, caching one on top of the constants subtree would likely be a good strategy. With the method proposed, using views, it would be all over the place.

I offer no judgment on which of the two features should be added, if any. But, thoughts:

For the gas overhead question here, I would like to know how much extra gas it’s going to cost currently. Conceivably the overhead could be insignificant…

VIEW already seems to alleviate the problems of the origination op size limit and sharing storage burn costs.

However, constants could be nice in their own right… There is a static assurance that it’s just a constant. Loading the constant is not going to FAIL or diverge or bump the internal operation nonce counter or whatever (though it might cause gas exhaustion.) Loading the constant multiple times is going to give the same value, even in different ops (up to protocol amenders’ whims, at least.)

You can arrange for similar behavior with VIEW, but it is not statically evident. You have to know the VIEW’d contract’s view and go analyze it.

Also, constants can be content-addressable, which seems very nice to me… More on that later.

1 Like

Globals will lead to a collection of problems related to the representation and origination of contract scripts.

The familiar contract script, with parameter, storage, and code, will now be incomplete. A complete deployable contract will generally require additional data to describe the value of the globals. This problem has different facets for different tools: compilers, IDEs, clients… It might be nice to have a standard representation for a contract script together with definitions for some or all of its globals.

We must also imagine how origination of contract scripts using globals will actually work.

The draft impl, with string names for globals, seems particularly troublesome. Assigning “originated” addresses to globals (like how addresses are assigned to contracts: hash of op_hash+counter) seems somewhat better, but still problematic.

Content-addressable globals (the address is a hash of the value) seem very nice, comparatively. We can do a batch op which registers globals together with an origination which uses them. Registering a global which already exists can be a noop. Clients can easily omit globals which were already registered, by just checking whether they already exist. We don’t need to worry about front-runners or squatters or accidental collisions or typos (unlike string names.) A contract can be originated in mockup or test chains with exactly the same script and storage as on mainnet. You never need to update the script or storage to use claimed global names or originated global addresses. It is trivial for different instances of a contract to share references to the same globals.

There is still a question of how to represent a complete deployable contract. A contract script mentioning globals is still incomplete. However, its meaning is determined by its content. All that is missing is the pre-images for the global addresses mentioned.

Just making the globals content-addressable does not fully answer my two questions, and there could be satisfactory answers for other implementations. But I claim 1) string names are unacceptable, and 2) content-addressable globals should lead to easier, better answers to my questions.

1 Like

I agree with @tom on his points. I think content-addressing values offers a wide range of benefits. The only downside I can think of is that it locks us into a particular hashing scheme; however, we can always do a migration if we ever adopt a different scheme.

For representing contracts, what do we think of this:

  • Add a new globals field to the usual parameter, storage,code representation that is a list of key-value pairs.
  • Have a macro system that replaces placeholders with the hashes of their values at origination.

Another problem with hashes for addresses is that protocol migrations might rewrite the code (e.g. when some instruction is deprecated) and then either the hash needs to be rewritten too, or it will no longer actually be the hash of the value.

It seems OK to me to guarantee that the address is either the hash of the value, or the hash of an old version of the value when the value was rewritten by a protocol migration…

If we’re rewriting code, and the hash of a value changes from X to Y, couldn’t we rewrite all references of X to references of Y? The value at X would become undefined (unless you could validly construct X in the new version and someone re-originated it).

I don’t see why we couldn’t.

All, I’ve reworked the design based on the feedback here and in discussions within Nomadic and Marigold. I’ve updated the TZIP and made a new MR.