This section contains Vertica functions for collecting and managing table data statistics.
This is the multi-page printable view of this section. Click here to print.
Statistics management functions
- 1: ANALYZE_EXTERNAL_ROW_COUNT
- 2: ANALYZE_STATISTICS
- 3: ANALYZE_STATISTICS_PARTITION
- 4: DROP_EXTERNAL_ROW_COUNT
- 5: DROP_STATISTICS
- 6: DROP_STATISTICS_PARTITION
- 7: EXPORT_STATISTICS
- 8: EXPORT_STATISTICS_PARTITION
- 9: IMPORT_STATISTICS
- 10: VALIDATE_STATISTICS
1 - ANALYZE_EXTERNAL_ROW_COUNT
Calculates the exact number of rows in an external table. ANALYZE_EXTERNAL_ROW_COUNT
runs in the background.
Note
You cannot calculate row counts on external tables withDO_TM_TASK
.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
ANALYZE_EXTERNAL_ROW_COUNT ('[[[database.]schema.]table-name ]')
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table-name
- Specifies the name of the external table for which to calculate the exact row count. If you supply an empty string, Vertica calculate the exact number of rows for all external tables.
Privileges
Any INSERT/UPDATE/DELETE privilege on the external table
Examples
Calculate the exact row count for all external tables:
=> SELECT ANALYZE_EXTERNAL_ROW_COUNT('');
Calculate the exact row count for table loader_rejects
:
=> SELECT ANALYZE_EXTERNAL_ROW_COUNT('loader_rejects');
See also
2 - ANALYZE_STATISTICS
Collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. The function skips columns of complex data types. You can set the scope of the collection at several levels:
-
Database
-
Table
-
Table column
By default, Vertica analyzes multiple columns in a single-query execution plan, depending on resource limits. This multi-column analysis reduces plan execution latency and speeds up analysis of relatively small tables with many columns.
Vertica writes statistics to the database catalog. The query optimizer uses this collected data to create query plans. Without this data, the query optimizer assumes uniform distribution of data values and equal storage usage for all projections.
You can cancel statistics collection with CTRL+C or by calling INTERRUPT_STATEMENT.
ANALYZE_STATISTICS is an alias of the function ANALYZE_HISTOGRAM, which is no longer documented.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
ANALYZE_STATISTICS ('[[[database.]schema.]table]' [, 'column-list' [, percent ]] )
Returns
0—Success
If an error occurs, refer to
vertica.log
for details.
Arguments
[[
database
.]
schema
].
table
- Table on which to collect data. If set to an empty string, Vertica collects statistics for all database tables and their projections. The default schema is
public
. If you specify a database, it must be the current database. column-list
- Comma-delimited list of columns in
table
, typically predicate columns. Vertica narrows the scope of the data collection to the specified columns. Columns of complex types are not supported.If you alter a table to add a column and populate its contents with either default or other values, call ANALYZE_STATISTICS on this column to get the most current statistics.
percent
- Percentage of data to read from disk (not the amount to analyze), a float between 0 and 100. The default value is 10.
Analyzing a higher percentage takes proportionally longer to process, but produces a higher level of sampling accuracy.
Privileges
Non-superuser:
-
Schema: USAGE
-
Table: One of INSERT, DELETE, or UPDATE
Restrictions
-
Vertica supports ANALYZE_STATISTICS on local and global temporary tables. In both cases, you can obtain statistics only on tables that are created with the option ON COMMIT PRESERVE ROWS. Otherwise, Vertica deletes table content when committing the current transaction, so no table data is available for analysis.
-
Vertica collects no statistics from the following projections:
- Live aggregate and Top-K projections
- Projections that are defined to include a SQL function within an expression
-
Vertica collects no statistics on columns of ARRAY, SET, or ROW types.
Examples
See Collecting table statistics.
See also
ANALYZE_STATISTICS_PARTITION3 - ANALYZE_STATISTICS_PARTITION
Collects and aggregates data samples and storage information for a range of partitions in the specified table. Vertica writes the collected statistics to the database catalog.
You can cancel statistics collection with CTRL+C or meta-function INTERRUPT_STATEMENT.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
ANALYZE_STATISTICS_PARTITION ('[[database.]schema.]table', 'min-range-value','max-range-value' [, 'column-list' [, percent ]] )
Returns
0: Success
If an error occurs, refer to
vertica.log
for details.
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- Table on which to collect data.
min-range-value
max-range-value
- Minimum and maximum value of partition keys to analyze, where
min-range-value
must be ≤max-range-value
. To analyze one partition,min-range-value
andmax-range-value
must be equal. column-list
- Comma-delimited list of columns in
table
, typically a predicate column. Vertica narrows the scope of the data collection to the specified columns. percent
- Float value between 0 and 100 that specifies what percentage of data to read from disk (not the amount of data to analyze). If you omit this argument, Vertica sets the percentage to 10.
Analyzing more than 10 percent disk space takes proportionally longer to process, but produces a higher level of sampling accuracy.
Privileges
Non-superuser:
-
Schema: USAGE
-
Table: One of INSERT, DELETE, or UPDATE
Requirements and restrictions
The following requirements and restrictions apply to ANALYZE_STATISTICS_PARTITION:
-
The table must be partitioned and cannot contain unpartitioned data.
-
The table partition expression must specify a single column. The following expressions are supported:
-
Expressions that specify only the column—that is, partition on all column values. For example:
PARTITION BY ship_date GROUP BY CALENDAR_HIERARCHY_DAY(ship_date, 2, 2)
-
If the column is a DATE or TIMESTAMP/TIMESTAMPTZ, the partition expression can specify a supported date/time function that returns that column or any portion of it, such as month or year. For example, the following partition expression specifies to partition on the year portion of column
order_date
:PARTITION BY YEAR(order_date)
-
Expressions that perform addition or subtraction on the column. For example:
PARTITION BY YEAR(order_date) -1
-
-
The table partition expression cannot coerce the specified column to another data type.
-
Vertica collects no statistics from the following projections:
-
Live aggregate and Top-K projections
-
Projections that are defined to include an SQL function within an expression
-
Examples
4 - DROP_EXTERNAL_ROW_COUNT
Removes external table row count statistics compiled by
ANALYZE_EXTERNAL_ROW_COUNT
. DROP_EXTERNAL_ROW_COUNT
runs in the background.
Caution
Statistics can be time consuming to regenerate.This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
DROP_EXTERNAL_ROW_COUNT ('[[[database.]schema.]table-name ]');
Parameters
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table-name
- The external table for which to remove the exact row count. If you specify an empty string, Vertica drops the exact row count statistic for all external tables.
Privileges
-
INSERT/UPDATE/DELETE privilege on table
-
USAGE privilege on schema that contains the table
Examples
Drop row count statistics for external table loader_rejects
:
=> SELECT DROP_EXTERNAL_ROW_COUNT('loader_rejects');
See also
Collecting database statistics5 - DROP_STATISTICS
Removes statistical data on database projections previously generated by
ANALYZE_STATISTICS
. When you drop this data, the Vertica optimizer creates query plans using default statistics.
Caution
Regenerating statistics can incur significant overhead.This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
DROP_STATISTICS ('[[[database.]schema.]table]' [, 'category' [, '[column-list]'] )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- Table on which to drop statistics. If set to an empty string, Vertica drops statistics for all database tables and their projections.
category
- Category of statistics to drop, one of the following:
-
ALL
(default): Drop all statistics, including histograms and row counts. -
HISTOGRAMS
: Drop only histograms. Row count statistics remain.
-
column-list
- Comma-delimited list of columns in
table
, typically predicate columns. Vertica narrows the scope of dropped statistics to the specified columns. If you omit this parameter or supply an empty string, Vertica drops statistics on all columns.
Privileges
Non-superuser:
-
Schema: USAGE
-
Table: One of INSERT, DELETE, or UPDATE
Examples
Drop all base statistics for the table store.store_sales_fact
:
=> SELECT DROP_STATISTICS('store.store_sales_fact');
DROP_STATISTICS
-----------------
0
(1 row)
Drop statistics for all table projections:
=> SELECT DROP_STATISTICS ('');
DROP_STATISTICS
-----------------
0
(1 row)
See also
DROP_STATISTICS_PARTITION
6 - DROP_STATISTICS_PARTITION
Removes statistical data on database projections previously generated by
ANALYZE_STATISTICS_PARTITION
. When you drop this data, the Vertica optimizer creates query plans using table-level statistics, if available, or default statistics.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
DROP_STATISTICS_PARTITION ('[[database.]schema.]table', '[min-range-value]', '[max-range-value]' [, category [, '[column-list]'] )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- Table on which to drop statistics.
-
min-range-value max-range-value
- The minimum and maximum value of partition keys on which to drop statistics, where
min-range-value
must be ≤max-range-value
. If you supply empty strings for both parameters, Vertica drops all partition-level statistics for this table or the specified columns.Important
The range of keys to drop must be equal to, or a superset of, the full range of partitions previously analyzed byANALYZE_STATISTICS_PARTITION
. If the range omits any analyzed partition,DROP_STATISTICS_PARTITION
drops no statistics. category
- The category of statistics to drop, one of the following:
-
BASE
(default): Drop histograms and row counts (min/max column values, histogram). -
HISTOGRAMS
: Drop only histograms. Row count statistics remain. -
ALL
: Drop all statistics.
-
column-list
- A comma-delimited list of columns in
table
, typically predicate columns. Vertica narrows the scope of dropped statistics to the specified columns. If you omit this parameter or supply an empty string, Vertica drops statistics on all columns.
Privileges
Non-superuser:
-
Schema: USAGE
-
Table: One of INSERT, DELETE, or UPDATE
See also
DROP_STATISTICS
7 - EXPORT_STATISTICS
Generates statistics in XML format from data previously collected by ANALYZE_STATISTICS. Before you export statistics, collect the latest data by calling ANALYZE_STATISTICS.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
StableSyntax
EXPORT_STATISTICS ('[ filename ]' [,'table-spec' [,'column[,...]']])
Arguments
filename
- Specifies where to write the generated XML. If
filename
already exists, EXPORT_STATISTICS overwrites it. If you supply an empty string, EXPORT_STATISTICS writes the XML to standard output. table-spec
- Specifies the table on which to export projection statistics:
[[database.]schema.]table
The default schema is
public
. If you specify a database, it must be the current database.If
table-spec
is omitted or set to an empty string, Vertica exports all statistics for the database. column
- The name of a column in
table-spec
, typically a predicate column. You can specify multiple comma-delimited columns. Vertica narrows the scope of exported statistics to the specified columns.
Privileges
Superuser
Restrictions
EXPORT_STATISTICS does not export statistics for LONG data type columns.
Examples
The following statement exports statistics on the VMart example database to a file:
=> SELECT EXPORT_STATISTICS('/opt/vertica/examples/VMart_Schema/vmart_stats.xml');
EXPORT_STATISTICS
-----------------------------------
Statistics exported successfully
(1 row)
The next statement exports statistics on a single column (price) from a table named food:
=> SELECT EXPORT_STATISTICS('/opt/vertica/examples/VMart_Schema/price.xml', 'food.price');
EXPORT_STATISTICS
-----------------------------------
Statistics exported successfully
(1 row)
See also
8 - EXPORT_STATISTICS_PARTITION
Generates partition-level statistics in XML format from data previously collected by ANALYZE_STATISTICS_PARTITION.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
StableSyntax
EXPORT_STATISTICS_PARTITION ('[ filename ]', 'table-spec', 'min-range-value','max-range-value' [, 'column[,...]' )
Arguments
filename
- Specifies where to write the generated XML. If
filename
already exists, EXPORT_STATISTICS_PARTITION overwrites it. If you supply an empty string, the function writes to standard output. table-spec
- Specifies the table on which to export partition statistics:
[[database.]schema.]table
The default schema is
public
. If you specify a database, it must be the current database. min-range-value
,max-range-value
- The minimum and maximum value of partition keys on which to export statistics, where
min-range-value
must be ≤max-range-value
.Important
The range of keys to export must be equal to, or a superset of, the full range of partitions previously analyzed by ANALYZE_STATISTICS_PARTITION. If the range omits any analyzed partition, EXPORT_STATISTICS_PARTITION exports no statistics. column
- The name of a column in
table
, typically a predicate column. You can specify multiple comma-delimited columns. Vertica narrows the scope of exported statistics to the specified columns.
Privileges
Superuser
Restrictions
EXPORT_STATISTICS_PARTITION does not export statistics for LONG data type columns.
See also
EXPORT_STATISTICS9 - IMPORT_STATISTICS
Imports statistics from the XML file that was generated by
EXPORT_STATISTICS
. Imported statistics override existing statistics for the projections that are referenced in the XML file.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
StableSyntax
IMPORT_STATISTICS ( 'filename' )
Parameters
filename
- The path and name of an XML input file that was generated by
EXPORT_STATISTICS
.
Privileges
Superuser
Restrictions
-
IMPORT_STATISTICS
imports only valid statistics. If the source XML file has invalid statistics for a specific column, those statistics are not imported and Vertica throws a warning. If the statistics file has an invalid structure, the import operation fails. To check a statistics file for validity, runVALIDATE_STATISTICS
. -
IMPORT_STATISTICS
returns warnings for LONG data type columns, as the source XML file generated byEXPORT_STATISTICS
contains no statistics for columns of that type.
Examples
Import the statistics for the VMart database from an XML file previously created by EXPORT_STATISTICS
:
=> SELECT IMPORT_STATISTICS('/opt/vertica/examples/VMart_Schema/vmart_stats.xml');
IMPORT_STATISTICS
----------------------------------------------------------------------------
Importing statistics for projection date_dimension_super column date_key failure (stats did not contain row counts)
Importing statistics for projection date_dimension_super column date failure (stats did not contain row counts)
Importing statistics for projection date_dimension_super column full_date_description failure (stats did not contain row counts)
...
(1 row)
See also
10 - VALIDATE_STATISTICS
Validates statistics in the XML file generated by
EXPORT_STATISTICS
.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
StableSyntax
VALIDATE_STATISTICS ( 'XML-file' )
Parameters
XML-file
- the path and name of the XML file that contains the statistics to validate.
Privileges
Superuser
Reporting valid statistics
The following example shows the results when the statistics are valid:
=> SELECT EXPORT_STATISTICS('cust_dim_stats.xml','customer_dimension');
EXPORT_STATISTICS
-----------------------------------
Statistics exported successfully
(1 row)
=> SELECT VALIDATE_STATISTICS('cust_dim_stats.xml');
VALIDATE_STATISTICS
---------------------
(1 row)
Identifying invalid statistics
If VALIDATE_STATISTICS
is unable to read a document's XML, it throws this error:
=> SELECT VALIDATE_STATISTICS('/home/dbadmin/stats.xml');
VALIDATE_STATISTICS
----------------------------------------------------------------------------
Error validating statistics file: At line 1:1. Invalid document structure
(1 row)
If some table statistics are invalid, VALIDATE_STATISTICS
returns a report that identifies them. In the following example, the function reports that attributes distinct
, buckets
, rows
, count
, and distinctCount
cannot be negative numbers.
=> SELECT VALIDATE_STATISTICS('/stats.xml');
WARNING 0: Invalid value '-1' for attribute 'distinct' under column 'public.t.x'.
Please use a positive value.
WARNING 0: Invalid value '-1' for attribute 'buckets' under column 'public.t.x'.
Please use a positive value.
WARNING 0: Invalid value '-1' for attribute 'rows' under column 'public.t.x'.
Please use a positive value.
WARNING 0: Invalid value '-1' for attribute 'count' under bound '1', column 'public.t.x'.
Please use a positive value.
WARNING 0: Invalid value '-1' for attribute 'distinctCount' under bound '1', column 'public.t.x'.
Please use a positive value.
VALIDATE_STATISTICS
---------------------
(1 row)
In this case, run
ANALYZE_STATISTICS
on the table again to create valid statistics.