Health Watchdog

Health Watchdog is an automated mechanism that detects and block queries during bad health conditions on the server.

If the database is under a high concurrent load, it leads to a bad health state on the server. The Health Watchdog is designed to mitigate the bad health state by doing the following:

  • Detecting the bad health state.

  • Stopping the transactions from adding to this bad state by blocking DDL/DML transactions.

  • Once the bad health state has been mitagated, allowing all blocked transactions to proceed.

Health Watchdog has three health metrics that it uses to check the server status and enact the mitigation:

  • Truncation Version Lag - tracks the catalog sync service and detects bad health conditions in the server when the current commit version is far ahead of the database truncation version.

  • GCLX Queue Bloat - tracks the GCLX queue size and stops the GCLX requests when the server is bombarded.

  • Mergeout Queue Bloat - tracks the TM queue size and stops DML transactions if the TM pool threads cannot keep up with the number of TM requests.

You can check the status of the server using check_cluster_health and the health_watchdog_blocked_transactions system table.