Analyzing row counts

Vertica lets you obtain row counts for projections and for external tables, through ANALYZE_ROW_COUNT and ANALYZE_EXTERNAL_ROW_COUNT, respectively.

Vertica lets you obtain row counts for projections and for external tables, through ANALYZE_ROW_COUNT and ANALYZE_EXTERNAL_ROW_COUNT, respectively.

Projection row count

ANALYZE_ROW_COUNT is a lightweight operation that collects a minimal set of statistics and aggregate row counts for a projection, and saves it in the database catalog. In many cases, this data satisifes many optimizer requirements for producing optimal query plans. This operation is invoked on the following occasions:

  • At the time intervals specified by configuration parameter AnalyzeRowCountInterval—by default, once a day.

  • During loads. Vertica updates the catalog with the current aggregate row count data for a given table when the percentage of difference between the last-recorded aggregate projection row count and current row count exceeds the setting in configuration parameter ARCCommitPercentage.

  • On calls to meta-functions ANALYZE_STATISTICS and ANALYZE_STATISTICS_PARTITION.

You can explicitly invoke ANALYZE_ROW_COUNT through calls to DO_TM_TASK. For example:

=> SELECT DO_TM_TASK('analyze_row_count', 'store_orders_fact_b0');
                                              do_tm_task
------------------------------------------------------------------------------------------------------
 Task: row count analyze
(Table: public.store_orders_fact) (Projection: public.store_orders_fact_b0)

(1 row)

You can change the intervals when Vertica regularly collects row-level statistics by setting configuration parameter AnalyzeRowCountInterval. For example, you can change the collection interval to 1 hour (3600 seconds):

=> ALTER DATABASE DEFAULT SET AnalyzeRowCountInterval = 3600;
ALTER DATABASE

External table row count

ANALYZE_EXTERNAL_ROW_COUNT calculates the exact number of rows in an external table. The optimizer uses this count to optimize for queries that access external tables. This is especially useful when an external table participates in a join. This function enables the optimizer to identify the smaller table to use as the inner input to the join, and facilitate better query performance.

The following query calculates the exact number of rows in the external table loader_rejects:

=> SELECT ANALYZE_EXTERNAL_ROW_COUNT('loader_rejects');
 ANALYZE_EXTERNAL_ROW_COUNT
----------------------------
                 0