Proposal Idea: Add Possibility to Verify Source For Tezos Smart Contracts

RonSwanson · November 30, 2022, 9:38pm

Current Status:
Project XY deploys his smart contracts (be it Ligo, SmartPy or Archetype). Everything works great. Users can see the compiled Michelson on Better Call Dev and even via thebsite simply fork it.

But the option to verify the contract source code(with the ABI and ByteCode) on etherscan so that after veryfing the solidity code is displayed for everyone is not possible on Tezos.

Quote from Alexander Eichhorn: @Alex
This level of verification does not (yet) exist in Tezos. Although the Michelson bytecode can have type annotations there is still a huge gap between actual source (smartpy, etc) and Michelson. I’m not aware of any Tezos compiler that adds version info into the bytecode (there is a more or less well used field in the tzip 16 metadata, but few contracts actually use tzip16 or add the compiler version there). As explorer we’d be interested to link original source to a published contract, but there’s no easy way for us to locate/identify the original source or automate the verification process.
Only thing you have for free in Tezos is the standardized ABI interface and type info for the contract’s storage. Both are explicitly published as Micheline structures together with the code of the contract (see parameters and storage sections in contract scripts)

What users cant do(especially non code/non Michelson savy users) is to verify the contract source before interacting. Why is this so important?

Well because…

the advertised contract code may be different from what runs on the blockchain.
smart contract verification enables investigating what a contract does through the higher-level language it is written in, without having to read Michelson which is a stack-based language.
to provide additional transparency
it adds ultimately to the “trustless” ethos - and smart contracts are designed to be “trustless”. So a user shouldnt trust but be able to verify the smart contract source code
verified smart contracts can be more easy used/changed by others (becuase it is not only Michelson thats available) = this is a benefit for the whole ecosystem

Why is Michelson not sufficient enough for that matter?
Michelson forms the lowest level of a Tezos smart contract. The Michelson code is what will be deployed on the Tezos network. However, while reading or writing a Michelson code is accessible for small smart contracts, it can become pretty tedious and complicated to use or read, for more complex smart contracts.

Most of the people know this feature from the Ethereum Blockchain. This is an excerpt from the Ethereum documentation how the process works on Ethereum:

Deploying a smart contract on Ethereum requires sending a transaction with a data payload (compiled bytecode) to a special address. The data payload is generated by compiling the source code, plus the constructor arguments of the contract instance appended to the data payload in the transaction. Compilation is deterministic, meaning it always produces the same output (i.e., contract bytecode) if the same source files, and compilation settings (e.g. compiler version, optimizer) are used.

Verifying a smart contract basically involves the following steps:

1. Input the source files and compilation settings to a compiler.
2. Compiler outputs the bytecode of the contract
3. Get the bytecode of the deployed contract at a given address
4. Compare the deployed bytecode with the recompiled bytecode. If the codes match, the contract gets verified with the given source code and compilation settings.
5. Additionally, if the metadata hashes at the end of the bytecode match, it will be a full match.

So this basically looks like this
grafik

I am (unfortunately) not competent enough to judge if this is a good way but I suppose that the Core developers might know if this is a feasible solution to enable this on Tezos too. Or if another approach is needed. It would be great if Core developers would take this on or someone else like for example @G_B_Fefe

Once this possibility is added to Tezos Block Explorers like Tzstas or Better Call Dev etc can add this source verification option so users can easy verify the contract. Maybe also other tools will emerge from this possibility like there are on Ethereum (e.g. Sourcify or Tenderly)

nicolasochem · December 2, 2022, 6:32pm

It’s a good idea.

It’s more of an ecosystem coordination issue than a dev issue. You need some sort of standardized pipeline where the inputs are:

source code
a compiler (smartpy, ligo…), potentially in docker container format
some metadata, such as the options passed to the compiler

and the output is some michelson and micheline structure that can be mapped to a KT1 address.

You also need a standardized storage format for verified source code and metadata, maybe in git. It would also take care of deduplication (store the source for similar contracts only once).

RonSwanson · December 5, 2022, 4:37pm

Thank you for your comment. Ecosystem coordination issue sounds like a bigger hurdle than a dev issue, at least it seems to me but I am hopeful we will see this feature on Tezos in the coming months

Britcoin · December 6, 2022, 1:36pm

So this can be achieved via a new TZIP and not necessarily an change/update in the protocol itself?

RonSwanson · December 9, 2022, 5:10pm

Again than you for the comment nicholasochem. I am still interested to hear other peoples thoughts that are for sure more clever than me

Please share your opinion and thoughts people @trolleps @tulpenhaendler @simonmcl @codecrafting @rognierbenoit @Jordan @tom @germanD @iguerNL @rafoo and also others please!
Thank you very much

iguerNL · December 9, 2022, 7:07pm

I didn’t thought a lot about it. But I guess that this could be achieved, e.g., via a dedicated section in Michelson (let’s call it source, origin, info, or whatever) like parameter, storage, code sections.

Such a section could contain the hash of the source code, information about the compilation toolchain, … as well as the hash of the michelson smart contract (without this new section).

But I guess that this should also be applied to global constants.

roland · December 12, 2022, 5:50pm

We plan to implement something like this for SmartPy. To reproduce the Michelson code deterministically the following elements will have to be stored:

the compiler version
any flags given to the compiler
all the SmartPy source files (there may be several ones, due to imports)

Then there is the question of how to archive and actually run different compiler versions. Tools like Nix and Docker come to mind, though we would probably take into account indexers’ suggestions (@Alex?).

What I’m not clear about is whether TZIP-16 metadata is the right tool for this. A contract’s metadata may change. So anyone able to modify it could replace the SmartPy code, as long as it compiles to the same Michelson code. For example, one could rename local variables inside an entrypoint or refer to a different compiler version. Should this be considered a problem? Certainly something to be aware of. On the other hand, an on-chain element source as suggested above by @iguerNL, would presumably be immutable.

A slightly different approach would be not to have any link from the contract at all, but submitting the source code to indexers directly. The relationship with the deployed contract would be established by comparing (hashes of) the compilation result and deployed contracts. An advantage of this approach is that source code could be added post hoc, including for contracts already deployed today.

rafoo · December 14, 2022, 2:39pm

I don’t understand the use case. If I have the hash of the source code I still need the source code to check the hash so how does that differ from compiling the code and checking that the output is the given Michelson script?

RonSwanson · December 15, 2022, 3:09pm

Thank you for taking your time! This sounds good although my technical background is a bit limited, with the hash of the source code one could then add later the source code and the hashes would be compared to see if it matches?

Thats great news!

I agree I think this should be the end goal and the preferred way? - but as I said you guys have way more technical knowledge!

Thats also an interesting approach. I dont know which one fits the best as a final solution for implementation.

Ok I guess I understand it a bit wrong what @iguerNL proposed? Thanks for taking your time here rafoo and roland!

zamrokk · December 15, 2022, 3:37pm

To go a bit further, I would add also :

sourceCodeHash + compiler version + sourceCodeSignature
michelsonSignature

we this I can verify that :

compiled source code Hash produces same has Michelson code Hash <= to trust sourcecode <-> Michelson correspondance
sourceCode is signed by known tz address <= to trust the editor
deployed Michelson code is deployed by known tz address <= to trust the deployer

simonmcl · December 19, 2022, 12:25pm

The process of giving users the ability to do this would indeed be great. As someone else mentioned there needs to be a lot of agreement and standards between dev teams to make something like this happen

… then, what everyone forgets is tooling. There are still many things that rely on third parties and hacky code to achieve. If you want to forge / parse operations outside of the CLI you have to rely on something like Taquito, which is built with type script. Doesn’t work well for native desktop / mobile apps. I see something like the above feature being quite challenging for every team to implement until bits and pieces of the CLI / Octez are provided as standalone binaries in something like c++ or rust, so that in can be imported in any language

RonSwanson · January 4, 2023, 12:37am

RonSwanson · January 20, 2023, 1:35am

Any chance to get this in the next few months?

d4hines · March 14, 2023, 2:36pm

RonSwanson:

m not aware of any Tezos compiler that adds version info into the bytecode (there is a more or less well used field in the tzip 16 metadata, but few contracts actually use tzip16 or add the compiler version there). As explorer we’d be interested to link original source to a published contract, but there’s no easy way for us to locate/identify the original source or automate the verification process.
Only thing you have for free in Tezos is the standardized ABI interface and type info for the contract’s storage. Both are explicitly published as Micheline structures together with the code of the contract (see parameters and storage sections in contract scripts)

What users cant do(especially non code/non Michelson savy users) is to verify the contract source before interacting. W

To have truly trustless verification, you’d need to have the compiler on-chain. This is a bad idea right now - the teams behind Ligo, SmartPy, etc. need agility, and coordinating with protocol development would be a needless burden.

What I think you guys are proposing instead is simply to make the process of compiling the source code and checking the code yourself more reproducible. Once it can be done reproducibly and scale, indexers could attest “yes, we checked this contract, and indeed the linked source code matches the Michelson code on-chain”. This serves as an attestation, but the user needs to decide whether they trust the indexer to accurately run and report on the pipeline. More circumspect users can just run the pipeline themselves.

Defining the interface for this and getting that interface widely adopted might be tricky. I think Nix is a good fit - they’ve already done the work of defining a “universal API” for building things. But you’d have to get smart contract developers to opt-in to maintaining Nix builds for there contracts, and Nix has a steep learning curve.

As much as I long for the day when everyone uses Nix and all builds are reproducible in principle, I suspect it would be 100x easier to validate a particular contract that interests you than to impose a standard for validating all contracts. If the source code is not available, if the CI is not green, or if the build fails for you locally, you should raise a complaint with the contract devs.

RonSwanson · April 22, 2023, 6:48pm

Is this like it works with the the source code verification on ethereum?

I guess a Michelson decompiler (that works accurate) would be also a great additon if its not feasible to implement the proposed process. What do you think @d4hines ?

d4hines · April 24, 2023, 8:05pm

I talked to an engineer working on code search for the Ethereum ecosystem. He helped me realize that reproducible smart contract compilation pipelines aren’t actually that hard. There’s never any linking with native code, no custom preprocessors, no cross-compilation, etc. so, e.g. Nix isn’t really needed.

I would say just a Makefile that defines a target contract.tz, but you still need to specify the compiler version somewhere. Perhaps we want something like docker, with an interface like this:

docker run --rm -v "$PWD":/output" $your_docker_image

where it produces the file `contract.tz in the curent directory.

I don’t think decompilation is the right approach in general. Much easier to demand source code from people as the status quo. Decompilation tools are likely only necessary for professionals doing bespoke analysis of weird edge cases.

nicolasochem · April 24, 2023, 9:21pm

There is a similar need for rollup source code, in fact. How do you map a deployed kernel on mainnet with known source code in wasm or a higher level language?

d4hines · April 27, 2023, 1:48am

For this, allow me to suggest a scheme with Nix, which is a great fit for this use case I think (I know, I said I wouldn’t above, but I can’t help it). It’s a bigger payoff, since compiling to WASM involves a more baroque toolchain.

A convention off the top of my head:

Each kernel repo exposes a default package that builds ./result/bin/kernel.wasm
store a nix-compatible uri at the path /revision in persistent storage, e.g. github:d4hines/my-kernel/20316ae1c5248a5051c2724c1b1f48e4874dc9b8

Then anyone can build the kernel like so:

nix build github:d4hines/my-kernel/20316ae1c5248a5051c2724c1b1f48e4874dc9b8

With as close to bit-for-bit reproducibility as we know how to achieve at scale as an industry.

Then you can compare the shasums of ./result/bin/kernel.wasm with the result of e.g. curl http://yourrollup/storage/boot/kernel.wasm (I can’t remember the exact API, but it’s something like that).

There, I’ve made my peace - we don’t actually have to embrace Nix, but one can dream.

saga · February 27, 2024, 12:19pm

Thanks for inquiring about this. I’m a dapp dev interested in developing on Tezos but this is definitely a dealbreaker for me. Has any progress been made with regards to code verification? The current explorers seem to still only show Michelson code. Will have to stick to EVM chains because of this limitation. Users need to be able to easily trust the code as it was written. Why should they have to blindly trust any random dev’s deployed smart contract?