Post Mortem: Critical TezPay Payment Failure After Tezos Tallinn Activation

:warning: Note: All TezPay users must update to version 0.24.4 immediately! :warning:

Post-Mortem: Critical TezPay Payment Failure After Tezos Tallinn Activation

Date: January 25, 2026
Authors: Tez.Capital Team
Status: Resolved
Severity: Critical

Executive Summary

On Saturday, January 24, 2026, following the activation of the Tezos Tallinn protocol, a critical defect was discovered in TezPay version 0.24.2. This defect caused the application to crash immediately after broadcasting payout transactions but before recording the successful payout in the internal database.

This specific crash vector created a “zombie state” where funds were sent, but the system had no record of the transfer, leading to a high risk of double payouts if the process was simply restarted.

The root cause was identified as a nil pointer dereference in the upstream TzGO SDK (v1.24.0), which occurred when handling Tezos transfers under the new Tallinn protocol rules.

Impact Assessment

  • Data Integrity: Two (2) bakers reported crashes resulting in data inconsistencies. In both cases, funds were sent on-chain, but the local database was not updated.
  • Financial Impact: No funds were permanently lost. The affected bakers identified the discrepancy before re-triggering payouts. The affected batches were small and reconciled manually.
  • User Base: The impact was severely mitigated by an unrelated infrastructure failure. A defect in the ami-tezpay release pipeline (introduced Jan 21) prevented the broken version (0.24.2) from being deployed to tezpay users operating through ami and tezbake. Consequently, only bakers running standalone TezPay instances were affected.
  • Service Availability: Payouts were halted globally for approximately 3 hours while a fix was verified and deployed.

Root Cause Analysis

The incident was the result of a collision between two independent failures:

  1. Primary Technical Failure (The Crash):
    The release of TezPay 0.24.2 utilized the newly released TzGO SDK 1.24.0 (released Jan 21) to support the Tallinn protocol. This SDK version contained a bug causing a segmentation violation (nil pointer dereference) specifically during transaction construction/signing under Tallinn protocol parameters. This crash occurred in the most critical execution window: Post-Broadcast / Pre-Commit.

  2. Secondary Infrastructure Failure (The Containment):
    On Jan 21, an update to the ami-tezpay release pipeline introduced a bug that prevented it from fetching the latest application version. This effectively froze AMI users on the stable, older version (0.24.1). While this was an operational failure, it acted as an accidental firewall, preventing the flawed 0.24.2 build from reaching the majority of our user base.

Detailed Timeline (UTC)

Pre-Incident Context

  • Jan 21 19:40: ami-tezpay pipeline updated. A bug is introduced that prevents correct version updating, freezing AMI builds at v0.24.1.
  • Jan 21 22:36: Trilitech releases TzGO 1.24.0, containing the latent nil pointer dereference bug regarding Tallinn protocol.
  • Jan 22 17:51: Tez.Capital releases TezPay 0.24.2, built on TzGO 1.24.0.

Incident: Saturday, January 24, 2026

  • 16:06: Tezos Tallinn Protocol activates at block 11,640,288.
  • 16:07: TezPay instances correctly detect the new protocol and halt operations (standard safeguard).
  • 16:59: Incident Trigger: The first two standalone bakers attempt payouts. TezPay crashes immediately after broadcasting funds. Reports indicate “Segmentation Violation.”
  • 17:02: TzC (Tez.Capital) team begins investigation.
  • 17:15: Analysis points to the upstream TzGO library as the source of the panic.
  • 17:18: Red Alert: TzC contacts Trilitech for urgent assistance regarding the operational flaw in TzGO.
  • 17:19: Containment: TzC broadcasts “STOP PAYOUTS IMMEDIATELY” via all community channels.
  • 17:49: First official response received from Trilitech.
  • 18:05: Trilitech TzGO development team joins the war room.
  • 18:15: Developers request reproduction steps. TzC opts to prioritize an emergency “kill-switch” release over immediate repro.
  • 18:24: Mitigation: TezPay releases an emergency update disabling the payout functionality entirely to prevent further data corruption for users attempting to run the binary.
  • 18:29: TzC begins working on isolation and reproduction.
  • 18:37: Issue confirmed to be caused by tez transfers within the TzGO SDK.
  • 18:39: Trilitech developers confirm they can reproduce the issue.
  • 19:04: Fix provided by Trilitech; verification requested.
  • 19:20: Resolution: All fixes verified. Patched TezPay version deployed.
  • 21:00: Report received that ami-tezpay users are not seeing the new version.
  • 21:50: Confirmed that ami-tezpay had been frozen on v0.24.1 since Jan 21, inadvertently saving those users from the crash.

Lessons Learned & Next Steps

This was the most serious technical issue in TezPay history. While the financial impact was negligible, the potential for catastrophic double-spending was high. We are writing this post-mortem not just to document the fix, but to highlight that “luck” is not a valid reliability strategy.

We were fortunate that a pipeline failure prevented widespread adoption of the broken build. However, relying on accidental failures to prevent critical ones is unacceptable.

Action Items

  1. Transaction Safety Atomicity:
    We are investigating options to improve transaction safety atomicity.

  2. Pipeline Repair:
    The ami-tezpay pipeline has been fixed to ensure future updates are delivered reliably.

  3. Community Collaboration:
    We hope this incident provides valuable data to help improve the TzGO SDK and its test suite, preventing similar issues in the future.

5 Likes

Thanks for the highly detailed recounting!

2 Likes

We are introducing a kill switch in the upcoming TezPay minor release. This feature will be active by default but can be toggled off in the configuration. We believe this provides safety to less involved and technical users while preserving the freedom of more experienced bakers.

With this switch we can manually halt baker payments in emergencies where payment discrepancies after protocol changes are expected or if there are issues with upstream payment data provided by the API.

More information will be posted soon.