Security Alert: recommendations for operators of public RPC nodes

NomadicLabs · January 26, 2024, 5:22pm

TL;DR

We have recently received warning of a potential DoS vulnerability for public Octez RPC node operators. Our initial investigation confirms that certain RPC endpoints can be instrumentalized to implement a _Denial of Service (DoS) _attack against public-facing nodes, if precautions are not taken at the infrastructure level. Some of these endpoints are allowed by the default ACL config.

This vulnerability does not compromise the safety nor the liveness of any Tezos network, but could severely degrade end-user experience in the case of a large-scale attack against infrastructure providers – e.g. by temporarily impeding wallets (and hence users) to submit transactions and smart contract calls to Tezos Layer 1 Mainnet.

In this document we provide urgent advice for Octez public RPC node operators and the Tezos community at large on how to disable serving the identified unsafe endpoints – and how to properly prepare otherwise.

We will provide a full post-mortem in due time.

Findings

The Octez suite assumes that some RPC endpoints are naturally inconvenient to expose publicly, either because of the size of the response or the resource footprint necessary to serve the request. This is why the Octez suite implements a user-configurable RPC access control list_ (ACL)_ which, by default, forbids queries to most RPC endpoints, with the exception of a small set of whitelisted endpoints – see this tech-doc’s entry for the full list.

However, the received warning and our subsequent findings reveal that some endpoints in the default whitelist increase the surface for potential DoS attacks, given the size of the replies or the resource consumption required to compute the answer, and should not have been allowed. We have also identified other endpoints which similarly need precautions in order to prevent plausible DoS attacks on public-facing infra.

Unsafe endpoints

The following end-points reply heavy payloads and are uninteresting for end-users:

GET /errors
GET /describe/**

The fact that they are allowed by default is a bug that will be addressed in future Octez releases.

Handle-with-care endpoints

Our internal assessment has also identified other RPC endpoints which are either computationally heavy to serve or that provide large replies even though they are exposed by the default ACL. For example:

GET /chains/*/blocks/*/context/**
GET /protocols/*
GET /chains/main/mempool/pending_operations
GET /monitor/**
GET /chains/*/mempool/monitor_operations
GET /protocols/*

Some sub-paths of /context/ endpoints are quite large. Indeed, querying all contracts in Mainnet context via the /chains/main/blocks/head/context/contracts endpoint can produce a response in the ~100MB ballpark. Make sure your infrastructure is prepared to serve such requests.

RPC endpoints that output full operations can also stream significant amounts of data. For example, querying all pending operations from a Mainnet node’s mempool via mempool/pending_operations can result in 2MB of JSON data. The same goes for querying operations in a block.

Serving these endpoints requires extra care and should be served by production-ready infrastructure providing load balancing and standard DoS prevention mechanisms.

Potentially leaky endpoints

Some further endpoints in the default whitelist are not computationally heavy, but can leak information about local infra which an attacker might use to gain further intel on your infra.

They are included in the whitelist for monitoring purposes, but are not intended to be public.

Recommended fine-grained ACL configuration and general RPC security guidelines

As a recap, we want to summarize the following recommendations for public RPC node operators, or any operator of public-facing Octez infrastructure:

Audit and refine ACLs to align with your business model, infrastructure, and user needs – do my clients need to know about the node’s /network stats? Or to any node-monitoring RPC endpoint indeed?
Explicitly forbid remote queries to /errors and the complete /describe/** path. These are unsafe, and not needed by most users. They will be removed from the default whitelist in future Octez versions.
Tezos context endpoints can trigger large replies. As such, prepare your infrastructure accordingly: consider implementing load balancing and rate-limits, and follow standard practices to prevent DoS attacks. For example, serving requests behind a WAF.
Consider refining and restricting usage of operation endpoints accordingly. For example, a wallet need not deal with consensus operations when querying a block or the mempool. Please consider filtering such endpoints following the capabilities described in Fine-grained queries to operation endpoints below.

As for bakers and non-public node operators, we advise to ensure and regularly verify that the local RPC port of your Octez node is never open to the outside world!

Example of a defensive ACL config for public nodes

We provide here a snippet with an example of a defensive ACL configuration removing the unsafe and leaky endpoints discussed earlier. It can be integrated into the Octez node’s configuration file – see this doc entry for further guidelines.

[ { "address": "127.0.0.1", "blacklist": [] },
  { "address": "any.public.address",
	"whitelist":
  	[ "GET/chains/*/blocks", "GET/chains/*/blocks/*",
    	"GET/chains/*/chain_id", "GET/chains/*/checkpoint",
    	"GET/chains/*/blocks/*/context/adaptive_issuance_launch_cycle",
    	"GET/chains/*/blocks/*/context/big_maps/*/*",
    	"GET/chains/*/blocks/*/context/cache/**",
    	"GET/chains/*/blocks/*/context/constants",
    	"GET/chains/*/blocks/*/context/contracts/**",
    	"GET/chains/*/blocks/*/context/delegates/**",
    	"GET/chains/*/blocks/*/context/denunciations",
    	"GET/chains/*/blocks/*/context/issuance",
    	"GET/chains/*/blocks/*/context/issuance/*",
    	"GET/chains/*/blocks/*/context/liquidity_baking/*",
    	"GET/chains/*/blocks/*/context/merkle_tree/**",
    	"GET/chains/*/blocks/*/context/merkle_tree_v2/**",
    	"GET/chains/*/blocks/*/context/nonces/*",
    	"GET/chains/*/blocks/*/context/sapling/**",
    	"GET/chains/*/blocks/*/context/seed_computation",
    	"GET/chains/*/blocks/*/context/selected_snapshot",
    	"GET/chains/*/blocks/*/context/total_frozen_stake",
    	"GET/chains/*/blocks/*/context/total_supply",
    	"GET/chains/*/blocks/*/hash", "GET/chains/*/blocks/*/header",
    	"GET/chains/*/blocks/*/header/**",
    	"GET/chains/*/blocks/*/helpers/current_level",
    	"GET/chains/*/blocks/*/live_blocks",
    	"GET/chains/*/blocks/*/metadata",
    	"GET/chains/*/blocks/*/metadata_hash",
    	"GET/chains/*/blocks/*/minimal_valid_time",
    	"GET/chains/*/blocks/*/operation_hashes",
    	"GET/chains/*/blocks/*/operation_hashes/**",
    	"GET/chains/*/blocks/*/operation_metadata_hashes",
    	"GET/chains/*/blocks/*/operations",
    	"GET/chains/*/blocks/*/operations/**",
    	"GET/chains/*/blocks/*/operations_metadata_hash",
    	"GET/chains/*/blocks/*/protocols",
    	"GET/chains/*/blocks/*/resulting_context_hash",
    	"GET/chains/*/blocks/*/votes/**", "GET/chains/*/invalid_blocks",
    	"GET/chains/*/invalid_blocks/*", "GET/chains/*/is_bootstrapped",
    	"GET/chains/*/levels/*", "GET/chains/*/mempool/filter",
    	"GET/chains/*/mempool/pending_operations", "GET/config/history_mode",
    	"GET/config/network/user_activated_protocol_overrides",
    	"GET/config/network/user_activated_upgrades",
    	"GET/config/network/dal", "GET/network/stat", "GET/network/version",
    	"GET/network/versions", "GET/protocols",
    	"GET/protocols/*/environment", "GET/version",
    	"POST/chains/*/blocks/*/context/contracts/*/big_map_get",
    	"POST/chains/*/blocks/*/context/seed", "POST/injection/operation" ] } ]

NB Remember to replace any.public.address with the appropriate IP address or domain name.

This (and potentially further) refinement will be integrated into the default ACL whitelist of future Octez versions.

Fine-grained queries to operation endpoints

To avoid generating unnecessary large replies for operation RPC endpoints, we suggest only performing requests for operations relevant to the application’s use case. Depending on the specific endpoint, operations can often be filtered by:

Operation kinds aka validation passes: consensus, voting, anonymous, and manager operations.
Their validation status in the mempool: valid, outdated, refused, branch-refused and branch-delayed.

The following examples implement different available filters based on these concepts.

Filtering operations in block endpoints by validation pass

To query the operations in a given block one can use the endpoint:

chains/&lt;chain>/blocks/&lt;block>/operations/3

instead of chains/blocks/block/operations to only output manager operations – transactions, contract calls, etc. By doing so, you could avoid streaming ~150Kb of data in the result.

Filtering pending operations in the mempool by validation pass

It is possible to filter the results from mempool/pending_operations using the validation_pass optional argument. For example, one can use:
mempool/pending_operations?validation_pass=3
to only query manager operations from the node’s mempool.

Filtering pending operations in the mempool by their validation status

Querying the full contents of the mempool/pending_operations endpoint will not only return valid operations, but other invalid operations. For example, it will also include Refused operations which cannot be included in blocks, and are kept by the mempool to defend against DoS attacks at the P2P level.

In order to query only valid operations the <status>=false argument can be used to filter out uninteresting operations with other validation statuses. This is done with a query to:

mempool/pending_operations?outdated=false&refused=false&branch_refused=false&branch_delayed=false

In this way, the size of the data returned by the RPC call is reduced significantly, depending on the load of the queried mempool. For example, adding the filter above can reduce the size of the reply to /mempool/pending_operations from ~900KB to ~250KB.

We can go indeed further by combining this call with e.g, the validation_pass=3 filter from the previous example. Again, depending on the live mempool load, it can reduce the size of the reply to ballpark ~10KB.

Staying safe together

We reiterate that any eventual attack from these vulnerabilities would only concern public RPC nodes as baker and private nodes need not – and are not supposed to – expose their local RPC server’s ports to the outside world.

Therefore, neither the safety nor the liveness of Tezos Mainnet is at risk, only the availability of services that depend on these public nodes.

We look forward to sharing more details in due time. In the meantime, please heed to the recommendations above and don’t hesitate to reach out for help implementing them – Stay safe!