Health Watchdog
If the database is under a high concurrent load, it leads to a bad health state on the server. The Health Watchdog is designed to mitigate the bad health state by doing the following:
-
Detecting the bad health state.
-
Stopping the transactions from adding to this bad state by blocking DDL/DML transactions.
-
Once the bad health state has been mitagated, allowing all blocked transactions to proceed.
Note
Only non-super user commands can be stopped by the Health Watchdog check.The metrics Health Watchdog uses to check the server status and enact the mitigation are:
-
Truncation Version Lag - tracks the catalog sync service and detects bad health conditions in the server when the current commit version is far ahead of the database truncation version. By default, it is set to 500. It can be tuned using
TruncationVersionLag
. -
GCLX Queue Bloat - tracks the GCLX queue size and stops the GCLX requests when the server is bombarded. By default, it is set to 100. It can be tuned using
GCLXBlockParameter
. -
Mergeout Queue Bloat - tracks the TM queue size and stops DML transactions if the TM pool threads cannot keep up with the number of TM requests. By default, it is set to 100. It can be tuned using
MergeoutBlockParameter
. -
Watchdog Timout Interval - the amount of time a transaction is blocked before it is timed out. By default, it is set to 5 minutes. It can be tuned using
WatchdogTimeoutInterval
.
You can check the status of the server using check_cluster_health and the health_watchdog_blocked_transactions system table.