Missed Blocks

What are Missed Blocks alerts?

Each Axelar validator is required to produce and sign blocks. Occasionally, a validator might fail to sign blocks. Remember that Tendermint Consensus rewards used by Axelar require validators to sign at least 50% of the blocks for every 35,000 block window. For every block they miss they lose 0.01% of rewards per block for downtime. If they lose liveness beyond the 35,000 block window then they are “jailed” for two hours. While a validator is “jailed”, they stop accruing the corresponding rewards but it’s an essential mechanism as it prevents repeated slashing for the same issue. Given that a single missed block is not an unusual event, to inform users about potential issues while at the same time preventing alert fatigue, we provide an alert when more than x blocks have been missed within the last hour, where x is the alert threshold set by the user (default value: 5).

How are missed blocks alerts triggered?

Our alerting system evaluates missed blocks alerts every 5 minutes. Once it identifies that a validator has missed more than x blocks within the last hour, an alert will be triggered.

What notifications will I receive?

You will receive a single notification per validator you are subscribed to informing you when a block issue has occurred. If the issue persists two consecutive periods, you will not receive any additional notifications, but its status will remain “Firing” in the Metrika platform. Once the issue is resolved (your validator missed at most x blocks in the last hour), you will receive a single notification informing you about the resolution of the block issue. If you have enabled PagerDuty notifications, you will notice that the severity level is set to “Warning” if a validator misses more than 5 blocks but fewer than 25. In the case missed blocks are greater than 25 the severity level is set to “Error”. Failing to sign more than 25 blocks could be an indication that the validator is now missing more than 50% of the blocks.

How do I resolve it?

There may be various root causes causing this alert. Here are some recommended checks on your validator’s health to identify the potential issue:

  • Check your node’s health via CLI

    • axelard status

    • if node is down, try restarting it (assuming auto-restart is not enabled)

  • See if your validator node is in good status. (Possible bad status includes: not in active set, missed too many blocks, jail status, etc.)

    • axelard health-check --tofnd-host {TOFND_HOST} --operator-addr {VALOPER_ADDR}

  • Search logs for anomalies such as error message, consensus failure

    • if issues found, please report it on Axelar Discord, to private team channel or the validator channel

  • Check system load, network throughput, CPU, memory and disk utilization, and see if you can identify any hardware bottlenecks

  • Restore node from a snapshot - do this, as last resort option, after consulting with team

Last updated