Collecting database statistics

The Vertica cost-based query optimizer relies on data statistics to produce query plans.

The Vertica cost-based query optimizer relies on data statistics to produce query plans. If statistics are incomplete or out-of-date, the optimizer is liable to use a sub-optimal plan to execute a query.

When you query a table, the Vertica optimizer checks for statistics as follows:

  1. If the table is partitioned, the optimizer checks whether the partitions required by this query have recently been analyzed. If so, it retrieves those statistics and uses them to facilitate query planning.

  2. Otherwise, the optimizer uses table-level statistics, if available.

  3. If no valid partition- or table-level statistics are available, the optimizer assumes uniform distribution of data values and equal storage usage for all projections.

Statistics management functions

Vertica provides two functions that generate up-to-date statistics on table data: ANALYZE_STATISTICS and ANALYZE_STATISTICS_PARTITION collect table-level and partition-level statistics, respectively. After computing statistics, the functions store them in the database catalog.

Both functions perform the following operations:

  • Collect statistics using historical queries (at epoch latest) without any locks.

  • Perform fast data sampling, which expedites analysis of relatively small tables with a large number of columns.

  • Recognize deleted data instead of ignoring delete markers.

Vertica also provides several functions that help you management database statistics—for example, to export and import statistics, validate statistics, and drop statistics.

After you collect the desired statistics, you can run Workload Analyzer to retrieve hints about under-performing queries and their root causes, and obtain tuning recommendations.