Getting data on table statistics
Vertica provides information about statistics for a given table and its columns and partitions in two ways:
-
The query optimizer notifies you about the availability of statistics to process a given query.
-
System table
PROJECTION_COLUMNS
shows what types of statistics are available for the table columns, and when they were last updated.
Query evaluation
During predicate selectivity estimation, the query optimizer can identify when histograms are not available or are out of date. If the value in the predicate is outside the histogram's maximum range, the statistics are stale. If no histograms are available, then no statistics are available to the plan.
When the optimizer detects stale or no statistics, such as when it encounters a column predicate for which it has no histogram, the optimizer performs the following actions:
-
Displays and logs a message that you should run
ANALYZE_STATISTICS
. -
Annotates
EXPLAIN
-generated query plans with a statistics entry. -
Ignores stale statistics when it generates a query plan. The optimizer uses other considerations to create a query plan, such as FK-PK constraints.
For example, the following query plan fragment shows no statistics (histograms unavailable):
| | +-- Outer -> STORAGE ACCESS for fact [Cost: 604, Rows: 10K (NO STATISTICS)]
The following query plan fragment shows that the predicate falls outside the histogram range:
| | +-- Outer -> STORAGE ACCESS for fact [Cost: 35, Rows: 1 (PREDICATE VALUE OUT-OF-RANGE)]
Statistics data in PROJECTION_COLUMNS
Two columns in system table
PROJECTION_COLUMNS
show the status of each table column's statistics, as follows:
-
STATISTICS_TYPE
returns the type of statistics that are available for this column, one of the following:NONE
,ROWCOUNT
, orFULL
. -
STATISTICS_UPDATED_TIMESTAMP
returns the last time statistics were collected for this column.
For example, the following sample schema defines a table named trades, which groups the highly-correlated columns bid
and ask
and stores the stock
column separately:
=> CREATE TABLE trades (stock CHAR(5), bid INT, ask INT);
=> CREATE PROJECTION trades_p (
stock ENCODING RLE, GROUPED(bid ENCODING DELTAVAL, ask))
AS (SELECT * FROM trades) ORDER BY stock, bid;
=> INSERT INTO trades VALUES('acme', 10, 20);
=> COMMIT;
Query the PROJECTION_COLUMNS
table for table trades
:
=> SELECT table_name AS table, projection_name AS projection, table_column_name AS column, statistics_type, statistics_updated_timestamp AS last_updated
FROM projection_columns WHERE table_name = 'trades';
table | projection | column | statistics_type | last_updated
--------+-------------+--------+-----------------+--------------
trades | trades_p_b0 | stock | NONE |
trades | trades_p_b0 | bid | NONE |
trades | trades_p_b0 | ask | NONE |
trades | trades_p_b1 | stock | NONE |
trades | trades_p_b1 | bid | NONE |
trades | trades_p_b1 | ask | NONE |
(6 rows)
The statistics_type
column returns NONE
for all columns in the trades
table, while statistics_updated_timestamp
is empty because statistics have not yet been collected on this table.
Now, run
ANALYZE_STATISTICS
on the stock
column:
=> SELECT ANALYZE_STATISTICS ('public.trades', 'stock');
ANALYZE_STATISTICS
--------------------
0
(1 row)
Now, when you query PROJECTION_COLUMNS
, it returns the following results:
=> SELECT table_name AS table, projection_name AS projection, table_column_name AS column, statistics_type, statistics_updated_timestamp AS last_updated
FROM projection_columns WHERE table_name = 'trades';
table | projection | column | statistics_type | last_updated
--------+-------------+--------+-----------------+-------------------------------
trades | trades_p_b0 | stock | FULL | 2019-04-03 12:00:12.231564-04
trades | trades_p_b0 | bid | ROWCOUNT | 2019-04-03 12:00:12.231564-04
trades | trades_p_b0 | ask | ROWCOUNT | 2019-04-03 12:00:12.231564-04
trades | trades_p_b1 | stock | FULL | 2019-04-03 12:00:12.231564-04
trades | trades_p_b1 | bid | ROWCOUNT | 2019-04-03 12:00:12.231564-04
trades | trades_p_b1 | ask | ROWCOUNT | 2019-04-03 12:00:12.231564-04
(6 rows)
This time, the query results contain several changes:
statistics_type |
|
statistics_updated_timestamp |
Set to the same timestamp for all columns, confirming that statistics (either full or row count) were updated on all. |