Athens to Babylon — Part 2: Testing Framework and Test Driven Development


Testing Improvements

Many of these ideas are now industry standards when developing software, especially in the financial markets and would recommend reading Craig Larman, “Agile & Iterative Development”. In this respect a test-driven software approach is the key to providing high quality software deliverables. In this paper we have focused on test-driven development but agile and extreme programming techniques should also be considered. The internal development processes by the development teams may already incorporate many of these techniques but need to be exposed to the wider community.

Regression Testing Pack

Developers need to write all their unit tests and do their own testing but an independent testing team can add value by verifying changes and ensuring existing code has not broken. Also this independent testing team would be responsible for building, maintaining and running the regression packs with help from the development teams.

The results of the test runs should be available for viewing in a central place with regression packs available to the community. The community could either develop suitable tools itself or it could adopt some of the test-driven tools that are available on the market. This central resource would show all regression runs, unit tests, chains status, etc. It is vital for anyone involved in testing to know i) what they have tested and ii) what version of the chain they are running on. This is something that has been very confusing for the Tezos community, as no-one has really known what version of the alphanet or zeronet was running. Knowing the exact version of these testnets means that wallet providers and application developers can confidently sign-off knowing that they have tested against the correct version.

Everytime a commit is made to code repository or test cases are changed then all the regression test packs and unit tests would automatically be run. Any code version that breaks the tests can be viewed and interested users could even delve into the detailed logs and error stacks if they wanted to. It is likely that you would have at least 3 development streams: production, UAT and development.

From these test packs a software release process can be built. So, as an example, a hotfix on the production chain is urgently required. The first step is to build the unit test/regression test and put it into the pack. This will result in a test failure, so now the next step is to build a fix and apply it to the production branch. The fix is committed, the test pack is automatically run and if all tests are green then the chain is ready to be deployed to a testnet. Fixes will also need to be applied to the UAT and DEV streams.

Four regression packs have been identified and should be automatically run every time a new change is committed to any of the branches.

  1. A full startup and teardown of the chain which runs a number of transactions and tests the final state. This chain would have a deterministic snapshot so the exact end state can be tested and this can be achieved by having a known seed fed into the random number generator. This means the snapshot random generating mechanism won’t be tested in the way it was intended but the result will be a deterministic chain. Part of this regression pack would take the previous protocol and the new protocol as inputs meaning the chain can be tested through the entire on-chain governance flow. The pack of tests would be executed on each stage of the on-chain governance process and would also make sure transactions in the mem pool cross the software upgrade boundary. The final step would be to tear down the chain and run any final checks. As an absolute minimum every RPC node call needs at least one test case.
  2. A full startup and teardown of the chain but only testing the parts that were not tested in the first pack, so generating the snapshots using the random number generator. Even though the outcome will be nondeterministic (hopefully), it can be tested to check is was constructed correctly. There may be a need for a few more packs to test very specific things on a chain that are difficult to cover in the first regression pack.
  3. A regression pack that can run on the testnet. This would actually be a subset of the regression pack in step 1, which are tests that are complete in their own right. Most tests will fall into this category and this pack will be constantly updated and extended. The pack would mainly be a set of transactions that can be applied to a chain then check the results. For example, moving all funds from a KT address to a TZ address and then checking the funds (minus fees) are now in the TZ address. A test that would have passed in Athens but failed in Babylon.
  4. Dapps regression tests. Wallet providers etc. could push their own test cases into a general applications regression pack. This would allow the community to check that the applications they use on a daily basis don’t break on the new release. This could even be published to the community to show who is ready for the release. All test runs would be published to a known place with complete breakdown of the tests, execution times, results and if they passed or failed. Anyone from the community could run the tests on their own machines and even encourage their favorite app vendors to add plenty of test coverage. The main difference with this pack is that broken tests wouldn’t stop a software release but would be visible for all to see.

It would be really interesting to have a testing framework that runs on a decentralised network and therefore nodes are rewarded for running the regression packs on their machines. This would be a great project for a possible Tezos Foundation grant.

Continuous Regression Testing

Continuous regression testing is key, and it means your system gets better tested over time because the regression pack is constantly being revised and expanded. Any bugs found must not only be fixed but appropriate tests added to the pack to ensure it couldn’t happen again. In the case of the bug with the Athens transactions not being identified in the Babylon mem pool, it seemed to us that the flow was not tested prior to the final code release. Once a regression test pack is in place, then the Tezos community is in a much better position to allow some last-minute fixes with a much higher level of confidence (although this would not be recommended). The regression pack gives the community the ability to test the latest code base and the testing team will make available all their test results runs, in real-time, on a Tezos testing dashboard.

The regression pack would over time become a very valuable resource to the community as anyone could run it. It wouldn’t be trivial to create as it would have to start up a chain, install bakers, run many transactions, take the chain through the on-chain governance process, test the final state and then tear down the chain. This resource would be very valuable as without it developers will be scared to make more significant changes to the code base. Developers need to make brave changes in the future, so there must be testing framework available to allow changes to be made with confidence.


A number of different testnets are required to provide full coverage of the on-chain governance process. Block explorers would require investment in order to support all the different testnets. In the last few weeks (Nov 2019) we have seen improvements in the communication of testnets and hope this will continue to improve.

  1. Rapid development testnet;
  2. Governance testnet;
  3. Full testnet;
  4. Abandon the spawned testnet .

1. Rapid development testnet

Not all testnets have to use the current consensus as the algorithm is overly complicated for basic testing of contracts running on the execution engine. Implementing a different consensus algorithm such as proof-of-authority into Tezos will make it far easier to do quick testing as the chain no longer requires a baking infrastructure.

Pluggable consensus would be far easier to deploy and doesn’t need snapshots and bakers to get it running. This testnet will allow most dapps to be run but any that rely on properties of the consensus algorithm may no longer work. By having a proof-of-authority chain, a chain can be rapidly deployed as there is no delay in respect of the baking rights. This chain would run the latest code and would be the “go to” place to see the release before it’s locked down.

The main objective of this testnet is for testing contracts at a very early stage before the software has been locked down. Before a proposal is made to the mainchain then it must have been implemented on this testnet for a defined period of time. The testnet would activate a new version at a defined block height and because it doesn’t rely on bakers upgrading the software then it’s guaranteed to be running the correct version.

Once the changes have been running successfully in this testnet then the code is ready to take the next step and be packaged up for the on-chain governance. Any bugs or issues identified at this stage are managed by the independent testing team. The development teams would decide whether a fix is required and each fix must come with appropriate test cases (or regression tests or both), where it can be clearly demonstrated that the tests failed before the change but now pass. Once bugs have been fixed in this testnet, all test cases passed and the regression pack runs green then a new release in this testnet is proposed to activate at a specific block height. To get past this stage, all known bugs have been resolved and the testnet has been running without issue for a defined period of time.

2. Governance testnet

The next testnet would be a chain for anyone interested in testing the on-chain governance and would go through all the various stages in a week and reset. Every week (or defined time-line) the chain would be reset and started again and additionally it would be reset when a new release is ready to be tested. The testing website would be the source for the exact code version running on each testnet, something that in the past has caused the community a lot of confusion. It is vital that the community knows exactly what code base it is testing against.

This chain would be mainly for the developers but would be useful if you are developing a dapp which is using information from the on-chain governance process.

3. Full testnet

This testnet will be the default testnet for Tezos and the one that is currently labeled Carthagenet. When the proposal has been locked down and ready to start the on-chain governance, two things need to happen:

  1. Publish hash for first vote to kick off the on-chain process;
  2. The default test chain is updated and the block height of the software switch is published to the community

The difference with this chain is that it uses the standard consensus algorithm and will need many of the bakers to participate in order for the chain to run smoothly. To date the bakers were not required to be involved but this is where their input would be invaluable. Bakers could be incentivised to bake on the testnet by being rewarded on the mainchain, so for example, every 1000 testnet XTZ could be rewarded with 1XTZ on the mainchain. Bakers could even decide to share their extra rewards with their delegates to encourage the whole community to get involved. This would incentive everyone in the community as delegates would want their baker to participate. Incentivised testing is key to improving the release process.

It is paramount to make this as easy as possible for bakers to get involved. Every time this testnet is reset then the genesis block should pre-fund the bakers’ test accounts with the same XTZ they have on the mainchain. The test account is registered on the mainchain by each baker so the process can be completely automatic and block explorers can also use this information to label up the accounts correctly. This means that a baker would no longer need to keep changing their address and go through the painful process of getting enough test XTZ. One possible strategy to store the baker’s testing account key is discussed in the section On-chain Governance by Adoption, below.

Getting bakers’ engagement is vital at this stage so it needs to be backed with incentives. The Tezos Foundation could initially sponsor any testing rewards in order to get the process started. In the longer term, the rewards system could be changed so that the mainchain can mint the extra rewards for participation on the test chain. This would certainly be an interesting research project as the mainchain would effectively pay for its own testing and would take decentralisation to another level. Proof of work from the testnet could be submitted to the mainchain for payment. This would slightly increase the inflation rate of Tezos but the testing bakers and their delegates would reap the benefit from the extra rewards.

At this stage, if any new bugs are discovered that need a code fix then the entire process should start again. The vote is abandoned and the process goes right back to the beginning. Tezos has already suffered reputational damage from the last release; we can repair damage with our loyal users but this won’t be possible when Tezos is running financial contracts, although if full regression testing was adopted then it may make sense to reduce the elapsed time for the on-chain voting process.

4. Abandon the spawned testnet

As part of the current on-chain governance process a new chain is spawned which is the chain the bakers are supposed to test. The “Testing Period” step should now be removed as this chain is no longer needed and has a security issue that it requires bakers to use their production key on a testnet. For security reasons, most bakers will refuse to do this and could also lead to double baking or replay attacks. Most of the larger bakers use HSMs (Hardware Security Modules) and they would likely refuse to use production keys to sign on a testnet.

Upgradable Functionality

Many small incremental releases are better than one big bang approach. If a release has multiple features in the software it doesn’t mean these have to be activated at once. It would make more sense to get the code into the codebase but then activate the various features at appropriate times, so you don’t have to fire fight all the possible issues at once. If one piece of functionality doesn’t perform as expected then the rest of the features can be delayed until appropriate fixes have been applied. Hotfixes can be applied outside of the on-chain governance process assuming comprehensive regression test packs are in place, although hotfixes should only be used for urgent issues and should never be part of the normal release process.

Each piece of functionality can be activated by an offset from the block height at the point the software is rolled out. Alternatively, the on-chain governance could be extended so each feature can be voted on, giving the community the ability to stop features being rolled out if there were problems with the previous functionality. In fact all new features could default to “off” then, after a successful release, features could be switched on using the same voting mechanism. This process would make testing more complicated but such an approach is often used in finance as software updates are often on a different schedule from the business changes.

On-chain Governance by Adoption


This is not directly related to testing but it is an issue that needs to be considered to encourage more bakers to vote. Voting is a security issue for bakers because the same key is also used for validating and withdrawing funds. One option would be to use a smart contract for the baking address containing baker information (name, fee, etc) and also constructed of 5 signing keys:

  • Baking key for signing blocks and endorsements
  • Withdraw key to withdraw funds to a predefined address
  • Voting key
  • Testing key for signing on the testnet. Address can be pre-filled with XTZ in testnet Genesis block
  • Admin key to reset any of the keys above

This means a baker would not have to use their baking key for voting and withdrawing making the process far easier and far more secure. This would certainly encourage more bakers to vote as they are not compromising their security. Companies that run large funds such as Polychain or Coinbase could give designated staff access to the voting key without comprising the funds. The downside of not doing this is large bakers won’t participate in on-chain governance because the security risk would be perceived to be too high.

On a separate point, presently it’s advantageous to hold off voting until the last minute to see how everyone else has voted. This should be changed to use a commit and reveal scheme but that process is outside the scope of this paper but an Ethereum example can be seen here.

Testing Period and Promote Vote Period — Only activate when ready

With the changes suggested above, the testing phase is no longer required as originally defined, although the testing phase should become the preparation stage for the software release. The preparation stage is simple and no longer needs bakers to explicitly vote for promotion of the software release; 90% voting power of validating nodes signal that they are ready would activate new version. Signalling would occur if a baker is running both protocols at a defined block height. This guarantees that the release is only activated if sufficient bakers have actually made the upgrade. Incentives could also be considered so that every baker that signals could be rewarded (or maybe slash their bond if that is more effective). When the software moved to Athens the chain stalled for around 22 minutes which is unacceptable if the chain is being fully utilised; it could create a backlog of transactions that would take days to clear. If Ethereum was down for 22 minutes it would take many weeks to catch back up and in the meantime would cause a spike in the gas price. Early mistakes in the chain’s life are inevitable. Whilst there are no serious applications running on the network,Tezos just suffers reputation damage but it needs to be eliminated going forward.

The community can’t blame bakers for not upgrading as this will never solve the problem long term. Changes are required in the process so that the on-chain governance process never activates a software release unless 90% of the baker voting rights are signalling.


A transparent test-driven approach should be considered and all changes going forward should start with a test. For many people developing software in the financial services industry this is now common practice along with extreme programming and agile iterative development techniques. To truly learn from your mistakes requires you to make significant changes to the process to prevent developers making common mistakes. Developers will be juggling many balls during the development process but this process should support them to make brave changes regardless of their experience. Quality of code is significantly improved when introducing such techniques as pair programming and using a test driven approach. Extreme programming techniques certainly improve the quality of the codebase and will result in far fewer errors reaching the mainchain.

A complete overhaul of the way testing is currently done is required, as there are too many gaps where code is not being tested. In developing software if you leave a gap in testing then inevitably people will keep falling into it. The Albert Einstein quote comes to mind, “The definition of insanity is doing the same thing over and over again, but expecting different results.”. Unfortunately without significant change in testing then chaos will ensue on each software release and will result in tremendous reputational damage.

Not only are changes required in the development process but tools need to be developed to support this process. An independent test team would maintain the regression test packs and a decentralised set of resources could execute these test. Testnets should be easily identifiable and available for use by the community and software releases should not force all functionality to be activated at once. Dapp developers should find it significantly easier to run their test pack on the latest testnet version and be confident it won’t break when the mainnet is updated. Incentives could be used to encourage engagement on the testnet and how these rewards could be funded.

Changes with the on-chain governance process are required and the final promotion vote stage should be removed. The chain will only upgrade when over 90% of the vote is signalling the new release. Using this approach the upgrade will be much smoother as it will only happens when the chain is truly ready.

Clearly you can never guarantee bug free code but all the code must be covered by unit tests and regression tests. Many of the bugs from the last release could have been caught if tests were written first. Improving the process means you need to take action so that if the same error is repeated, something in the process would subsequently break. Relying on developers doing the right thing next time is not good enough as this is not a process improvement and will result in the same outcome. Bugs should be viewed as an opportunity to improve the test coverage.

Action Points

A summary of the action points discussed in the paper:

  • Independent testing team to coordinate testing of new proposal and maintains the regression test packs;
  • Decentralised test-regression infrastructure which runs the test packs and rewards the nodes running them;
  • Pluggable consensus algorithm so it becomes easier to create testnets that don’t need the baking infrastructure;
  • Testnets built from pre-funded baker testing accounts in the genesis block;
  • Reward bakers for running the main testnet and they could even pass these reward to delegates. A protocol change could be made to enable a payment to be claimed from the mainchain by submitting a proof of staking from the official testnet;
  • Ability to switch on new features independently so no big bang software releases;
  • Improve security by using a smart contract for a baking address and therefore maintaining separate keys for different functions;
  • A testing web-site to see the current state of the testing and testnets;
  • Remove voting on promotion step and replace it by nodes signalling when they are ready.


Craig Larman, “Agile & Iterative Development. A Manager’s Guide”, Addison Wesley

1 Like