will Tezos freeze during hanhzhou migration?

I tried to trigger a Hangzhou migration on mainnet context using the yes-node method described here. On a 4 year old laptop, it ran for 2 hours before I had to interrupt it. Someone else reported a 90 minute completion time. This was with rolling storage.

The blocking step is the storage flattening described in the release notes. It is indicated by the following log line:

flattening the context storage: this operation may take several minutes

It is concerning to see such migration time with rolling storage. Full and archive storage will likely take even longer. During this time, no one will be producing blocks.

Storage migration has happened before in tezos (Athens to Babylon) but we are a different network now with a rolling snapshot being 3Gi in size. It will hopefully be even bigger in a couple months when migration will happen.

What happens if this migration is interrupted while it runs?

Is there a better way? For example, have the node create a shadow context and update it just-in-time, and have the new context ready to switch over to during actual proto migration?

6 Likes

Several minutes on testnet is hours on mainnet. I hope we don’t have to test the limits of the Emmy* consensus in a painful way during the context migration.

2 Likes

Hangzhou migration flattens the context: the file system of Tezos blockchain state. It removes the nested directories like /12/ab/3c/4d which are no longer required and speeds up the file system access by 30% (therefore we think this is favorable change for Tezos). It is expected to take about 10 minutes to flatten the context then validate it with the today’s Mainnet context.

The context flattening in the current shell requires significant amount of memory. If your computer has around 8GB of memory, it may experience problems: it takes much longer time due to the memory swapping, or the process may be killed by the kernel due to the out of memory.

We know this issue and are working on it: a new shell will be released before Hangzhou:
machines with 8GB memory should migrate to Hangzhou smoothly with it. The migration time will be also halved.

Meanwhile, with the current shell, you can try --singleprocess option of tezos-node. This should reduce the memory usage of the migration significantly. (But it may still fail depending on the data size.)

You can find more technical details at Draft: Context: reduces memory comsumption at the context flattening (#1682) (!3475) · Merge requests · Tezos / tezos · GitLab

I also wrote a small personal memo how to reduce the memory usage of Tezos node: Running Tezos node with small memory (i.e. 8GB) - HackMD

13 Likes

To complete @jun’s excellent answer, we are also preparing three new changes to improve the migration experience:

  • a new release of Irmin (2.7.3) which will fix a performance issue when exporting large context trees to disk (which happens during the migration).
  • a new release of tezos-context where we will be adding regular flushing to the disk for large blocks (such as migrations).
  • a new release of the shell to avoid both the (read-only) validator process and the (read-write) node process having to do the migration.

With the two first changes, the read-write process goes down from 7 GB to 2 GB in our early benchmarks. With the third change, the migration is only done by the read-write process and so won’t need twice the memory.

These three changes will be part of the stable v11 release (and ideally in v11-rc2 so you could test it on your hardware!). Many thanks for testing the migration so early, this is super helpful to make sure we do not regress on the envelope of supported hardware. To make it clearer: our goal is that Tezos won’t freeze during Hangzhou migration.

6 Likes

Thank you for the reply!

I am testing on my laptop, which is not a perfect benchmarking tool :slight_smile:

I redid it again today after switching off my browser and all other processes and I got a better time of 22 minutes. I confirm my machine was swapping during this process, even though I have 16Gi of RAM.

Node logs:

Oct  7 12:00:38.737 - node.protocol: 011-PtHangzH: flattening the context storage: this operation may take several minutes
Oct  7 12:08:36.126 - node.protocol: 011-PtHangzH: context storage flattening completed
Oct  7 12:08:36.612 - validation: initializing protocol PtHangzHogok...
Oct  7 12:08:36.614 - node.protocol: 011-PtHangzH: flattening the context storage: this operation may take several minutes
Oct  7 12:22:56.429 - node.protocol: 011-PtHangzH: context storage flattening completed
Oct  7 12:22:56.458 - validator.block: block BM39PsHtZVbc4vm1Ez5EWuEtuuTjutbMYVcMaiy8oWkmwJDvfT5 successfully validated
Oct  7 12:22:56.458 - validator.block: Request pushed on 2021-10-07T19:08:36.402-00:00, treated in 13.840us, completed in 14min20s 
Oct  7 12:25:11.495 - node.store: the protocol table was updated: protocol PtHangzHogok (level 11) was
Oct  7 12:25:11.496 - node.store:   activated on block BM39PsHtZVbc4vm1Ez5EWuEtuuTjutbMYVcMaiy8oWkmwJDvfT5
Oct  7 12:25:11.496 - node.store:   (level 1752598)
Oct  7 12:25:11.500 - validator.chain: Update current head to BM39PsHtZVbc4vm1Ez5EWuEtuuTjutbMYVcMaiy8oWkmwJDvfT5 (level 1752598, timestamp 2021-10-06T03:00:02-00:00, fitness 01::000000000010be16), same branch
Oct  7 12:25:11.500 - validator.chain: Request pushed on 2021-10-07T19:22:56.452-00:00, treated in 6.435ms, completed in 2min15s 

Baker logs:

nochem@peck ~/workspace/tezos () $ time ./tezos-client -d /tmp/yes-wallet bake for foundation1 --minimal-timestamp
Disclaimer:
  The  Tezos  network  is  a  new  blockchain technology.
  Users are  solely responsible  for any risks associated
  with usage of the Tezos network.  Users should do their
  own  research to determine  if Tezos is the appropriate
  platform for their needs and should apply judgement and
  care in their network interactions.

Oct  7 12:08:36.137 - 010-PtGRANAD.delegate.baking_forge: found 0 valid operations (0 refused) for timestamp
Oct  7 12:08:36.137 - 010-PtGRANAD.delegate.baking_forge:   2021-10-06T03:00:02.000-00:00 (fitness 01::000000000010be16)
Injected block BM39PsHtZVbc

real    22m18.379s
user    0m0.589s
sys     0m0.085s

Machine specifications:

  • Intel(R) Core™ i7-7600U CPU @ 2.80GHz
  • 16Gi RAM
  • NVMe SSD

Here is how to replicate the problem:

# Download a rolling snapshot
wget https://mainnet.xtz-shots.io/rolling
# I downloaded the file tezos-mainnet-1752596.rolling

# check out tezos master branch and build
make build-deps
eval $(opam env)
make

# force upgrade of the protocol 2 blocks after the snapshot level (1752596+2 = 1752598)
./scripts/user_activated_upgrade.sh src/proto_011_PtHangzH 1752598

# create yes-node and yes-wallet
patch -p1 < scripts/yes-node.patch
dune exec scripts/yes-wallet/yes_wallet.exe -- create minimal in /tmp/yes-wallet

# recompile
make

# create temp data dir for yes-node
mkdir ~/.tezos-yes-node

# import snapshot
./tezos-node snapshot import tezos-mainnet-1752596.rolling --data-dir ~/.tezos-yes-node/

# back it up in case you do this several times
cp -r ~/.tezos-yes-node ~/.tezos-yes-node-orig

# start node without connections
./tezos-node run --connections 0 --data-dir ~/.tezos-yes-node --rpc-addr localhost

# bake twice (if foundation1 does not work, try foundation2, foundation3 etc... until it works)
./tezos-client -d /tmp/yes-wallet bake for foundation1 --minimal-timestamp

# on second bake, it will perform hangzhou storage flattening

Could you please address my two other questions: (1) what happens if the node shuts down in the middle of this and (2) is there any way to do this flattening at an other time so the proto migration is quicker?

1 Like

@nicolasochem,

  1. what happens if the node shuts down in the middle of this?
    I think @samoht can give you more detailed answer when the node is killed during writing context changes to the disk, but basically the restarted node should restore the previous state without problems.
  2. is there any way to do this flattening at an other time so the proto migration is quicker?
    The context must be flattened between Granada and Hangzhou, since Hangzhou assumes files in the context are in the flattened directory structure. (And Granada assumes non flattened structure.) I am afraid we cannot move it…

The context must be flattened between Granada and Hangzhou, since Hangzhou assumes files in the context are in the flattened directory structure. (And Granada assumes non flattened structure.) I am afraid we cannot move it…

Can you have 2 contexts side by side for some time (week?) before the protocol change, and then just drop the unflattened after the change?

(I forgot to make this message as a reply. Reposting.)

It is interesting idea but, I think it very hard to implement correctly.

If it would be done in the protocol, it would have to create a flattened copy of the context gradually in several hundred blocks, which would make the migration very complex and much harder to test. We could separate the flattening into several stages (5 or 6, since it has 5 big directories to flatten) and perform one stage at one protocol change, but it would take more than 1 year to complete the flattening. In any case, Hangzhou is already injected and cannot be modified.

If the shell would do it, it would be a hacky optimization which is independent from the protocol. It must create another version of context for the flattened data. The protocol would still need to validate the flattened one at the migration in any way.

An MR Draft: Context: reduces memory consumption at the context flattening (#1682) (!3475) · Merge requests · Tezos / tezos · GitLab reduces the memory usage at the mgiration a lot, so that 8GB memory machine can migrate to Hangzhou smoothly. By default, it’s peak memory usage at the migration is 7GB. WIth --singleprocess, it is around 4.1GB.

Yeah, I meant that the shell would do it. Maybe there’s a way to only verify a proof of correct flattening at migration time (and the flattening itself done ahead of time), rather than check the whole thing at migration time?

We performed some tests with recent mainnet context on a 12-core 64G bare metal server with 2x NVMe drives in Raid-1. Our Tezos node ran inside a Docker container with Alpine Linux 3.14 without CPU or mem limits and data volumes mounted via LVM .

Our findings:

  • an archive node took 1h 40m to migrate
  • a fresh from snapshot full node took 11m 25s to migrate

Peak memory usage as reported by docker stats was quite high at 26G (archive) and 20G (full).

Comprehensive DIY guide is here in case someone wants to replicate on other stack/hardware: Testing Tezos Protocol Migrations

Thanks to Alex’s guide, I was able to replicate the process on a Raspberry Pi 4B 8 GB using a microSD card as storage.

It took 1h 26m to complete the migration using a fresh rolling snapshot.

More details have been posted here.

2 Likes