Athens to Babylon and some thoughts on testing improvements

We initially posted this on reddit and it suggested we also post it here.

We plan to put more detail into Medium article but we are keen to get some feedback on some of our thoughts. We want to see some real action points from this release otherwise we don’t think lessons will be learnt. We want to put a few thoughts out there in order to drive the conversation forward and get some real improvement changes into the process.

Very little action on the testnet. There should be incentives for running a test network from the foundation. We are one of the very few bakers running on the Babylon testnet but we need more bakers to get involved. We would like to see some mainnet rewards for baking on the testnet and therefore encourage more bakers to get involved and more importantly bakers will be better prepared as they would have been running the new code for 3 months. For example, bakers could be paid 1 XTZ for every 1000XTZ baked on the testnet and paid on the mainnet to the testing baking address. It doesn’t have to be a lot but at least some incentive for bakers to invest in extra hardware to support the testnet.

Some bakers have still not upgraded so engagement is vital.

Are we really doing on-chain governance? We have never been a big fan of on-chain governance mechanism but not for the reasons many of you may think. We have experience in developing large financial systems using agile techniques and one question we always ask is how you test any new functionality. If you can’t easily automate the tests for the design, it’s very unlikely the design will make it into code. The on-chain governance is difficult to test, and it seems to us that the only time that process is regression tested is in production and this is not acceptable.

If you have a great design but very difficult to test, then it’s not a great design! Assuming we want to go ahead with on-chain governance then as a community we need to invest in testing this complex process. The testing will be an expensive process and the community needs to decide if the expense is worth it. On-chain governance is not a free ride and needs to be resourced. A rough guide is a testing team should be 10% of the size of the core development teams. These resources need to be funded from the foundation or some decentralised pool, but the important point is it must be independent from the development teams.

We were going to vote Nay. This was for the simple reason that a hotfix on an update raises red flags. A hotfix on the production release makes complete sense and that should not go through the on-chain governance mechanism (especially urgent security fixes) but we should never do it on a fix on an update without significant automated testing resources. We should live by the sword die by the sword. If we are going to do on-chain governance, then let’s start the process again, even if it’s one line of code! What is the point of having a process if we don’t follow it? Currently we don’t have on-line governance as we were asked to patch our binary minutes after the release and anything could have been thrown into that code. This is not a blame game but about being honest with ourselves in realising we don’t have on-line governance currently, as no-one voted for the code that is in production. We need to test the entire on-chain governance process or remove it altogether. We also think too much stuff was pushed into this release unnecessary and was probably due to the length of the process.

Regression Testing. We need full regression testing independent from the developers. This team would initially need to be funded from the foundation and we would suggest is should be around 10% of the core development teams. The bulk of the team would be professional testers/developers but would need key development resources which could be provided on a rota system from the development teams. A regression pack would be designed and implemented to provide the testing of the process flow going from the old version to the new. As time goes by more and more tests would be added to this pack and this would become a very valuable community resource. If the time taken to run the regression test is a few hours, it would be run many times a day (when code changes or pack is added to) and so by the time we go live it has run thousands of times. If after the release there were issues, then these should be added to regression pack, so at least we know they won’t hit us again.

Continuous regression testing is key, and it means your system gets better tested with time because the regression pack is constantly being revised and expanded. Any bugs found must not only be fixed but appropriate tests added to the pack so it couldn’t happen again. The bug with the Athens transactions in the Babylon mem pool not being identified, seemed to us that the flow was not even tested once. Once we have such a regression test pack in place then the community is in a much better position to allow some last-minute fixes with a much higher level of confidence. The regression pack gives the community the ability to test the latest code base and the testing team will make available all their test results runs, in real-time, on a Tezos testing dashboard.

The regression pack would over time become a very valuable resource to the community as anyone could run it. It wouldn’t be trivial to create as it would have to start up a chain install bakers, run many transactions, take the chain through the on-chain governance process, test the final state and then tear down the chain. This resource would be a very valuation resource as without it developers will be scared to make more significant changes to the code base. If we want developers to make brave changes in the futures, then as a community we need to provide them the resources to allow them to make changes with confidence.


These are all great points and as a community we need to have an ongoing discussion on these issues. We never focused on creating and discussing a comprehensive “lessons learned” review of Athens. Babylon shows just how much this is needed.

One thing I would like to add is that we should expect proposers to provide an initial impact analysis. The community would then be tasked with determining whether enough effort was put into explaining the pros/cons and migration costs before approving a proposal.

One minor quibble with your phrasing as I think it is important to be accurate in describing our system. The Tezos governance currently only serves to determine the protocol code; we do not vote on the shell. So it is a bit misleading to say that the patch to the shell minutes after release was “realising we don’t have on-line governance currently, as no-one voted for the code that is in production”.

The terminology can be confusing but from our point of view we voted for one version but that is not the version that is currently running. If the shell was changed because of a change in the protocol then I think this is a moot point, the code we voted for and tested is not the same as what is currently running.