This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Statistics management functions

This section contains Vertica functions for collecting and managing table data statistics.

This section contains Vertica functions for collecting and managing table data statistics.

1 - ANALYZE_EXTERNAL_ROW_COUNT

Calculates the exact number of rows in an external table.

Calculates the exact number of rows in an external table. ANALYZE_EXTERNAL_ROW_COUNT runs in the background.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

ANALYZE_EXTERNAL_ROW_COUNT ('[[[database.]schema.]table-name ]')

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table-name
Specifies the name of the external table for which to calculate the exact row count. If you supply an empty string, Vertica calculate the exact number of rows for all external tables.

Privileges

Any INSERT/UPDATE/DELETE privilege on the external table

Examples

Calculate the exact row count for all external tables:

=> SELECT ANALYZE_EXTERNAL_ROW_COUNT('');

Calculate the exact row count for table loader_rejects:

=> SELECT ANALYZE_EXTERNAL_ROW_COUNT('loader_rejects');

See also

2 - ANALYZE_STATISTICS

Collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table.

Collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. The function skips columns of complex data types. You can set the scope of the collection at several levels:

  • Database

  • Table

  • Table column

By default, Vertica analyzes multiple columns in a single-query execution plan, depending on resource limits. This multi-column analysis reduces plan execution latency and speeds up analysis of relatively small tables with many columns.

Vertica writes statistics to the database catalog. The query optimizer uses this collected data to create query plans. Without this data, the query optimizer assumes uniform distribution of data values and equal storage usage for all projections.

You can cancel statistics collection with CTRL+C or by calling INTERRUPT_STATEMENT.

ANALYZE_STATISTICS is an alias of the function ANALYZE_HISTOGRAM, which is no longer documented.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

ANALYZE_STATISTICS ('[[[database.]schema.]table]' [, 'column-list' [, percent ]]  )

Returns

0—Success

If an error occurs, refer to vertica.log for details.

Arguments

[[database.]schema].table
Table on which to collect data. If set to an empty string, Vertica collects statistics for all database tables and their projections. The default schema is public. If you specify a database, it must be the current database.
column-list
Comma-delimited list of columns in table, typically predicate columns. Vertica narrows the scope of the data collection to the specified columns. Columns of complex types are not supported.

If you alter a table to add a column and populate its contents with either default or other values, call ANALYZE_STATISTICS on this column to get the most current statistics.

percent
Percentage of data to read from disk (not the amount to analyze), a float between 0 and 100. The default value is 10.

Analyzing a higher percentage takes proportionally longer to process, but produces a higher level of sampling accuracy.

Privileges

Non-superuser:

  • Schema: USAGE

  • Table: One of INSERT, DELETE, or UPDATE

Restrictions

  • Vertica supports ANALYZE_STATISTICS on local and global temporary tables. In both cases, you can obtain statistics only on tables that are created with the option ON COMMIT PRESERVE ROWS. Otherwise, Vertica deletes table content when committing the current transaction, so no table data is available for analysis.

  • Vertica collects no statistics from the following projections:

    • Live aggregate and Top-K projections
    • Projections that are defined to include a SQL function within an expression
  • Vertica collects no statistics on columns of ARRAY, SET, or ROW types.

Examples

See Collecting table statistics.

See also

ANALYZE_STATISTICS_PARTITION

3 - ANALYZE_STATISTICS_PARTITION

Collects and aggregates data samples and storage information for a range of partitions in the specified table.

Collects and aggregates data samples and storage information for a range of partitions in the specified table. Vertica writes the collected statistics to the database catalog.

You can cancel statistics collection with CTRL+C or meta-function INTERRUPT_STATEMENT.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

ANALYZE_STATISTICS_PARTITION ('[[database.]schema.]table', 'min-range-value','max-range-value' [, 'column-list' [, percent ]] )

Returns

0: Success

If an error occurs, refer to vertica.log for details.

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table
Table on which to collect data.
min-range-value
max-range-value
Minimum and maximum value of partition keys to analyze, where min-range-value must be ≤ max-range-value. To analyze one partition, min-range-value and max-range-value must be equal.
column-list
Comma-delimited list of columns in table, typically a predicate column. Vertica narrows the scope of the data collection to the specified columns.
percent
Float value between 0 and 100 that specifies what percentage of data to read from disk (not the amount of data to analyze). If you omit this argument, Vertica sets the percentage to 10.

Analyzing more than 10 percent disk space takes proportionally longer to process, but produces a higher level of sampling accuracy.

Privileges

Non-superuser:

  • Schema: USAGE

  • Table: One of INSERT, DELETE, or UPDATE

Requirements and restrictions

The following requirements and restrictions apply to ANALYZE_STATISTICS_PARTITION:

  • The table must be partitioned and cannot contain unpartitioned data.

  • The table partition expression must specify a single column. The following expressions are supported:

    • Expressions that specify only the column—that is, partition on all column values. For example:

      PARTITION BY ship_date GROUP BY CALENDAR_HIERARCHY_DAY(ship_date, 2, 2)
      
    • If the column is a DATE or TIMESTAMP/TIMESTAMPTZ, the partition expression can specify a supported date/time function that returns that column or any portion of it, such as month or year. For example, the following partition expression specifies to partition on the year portion of column order_date:

      PARTITION BY YEAR(order_date)
      
    • Expressions that perform addition or subtraction on the column. For example:

      PARTITION BY YEAR(order_date) -1
      
  • The table partition expression cannot coerce the specified column to another data type.

  • Vertica collects no statistics from the following projections:

    • Live aggregate and Top-K projections

    • Projections that are defined to include an SQL function within an expression

Examples

See Collecting partition statistics.

4 - DROP_EXTERNAL_ROW_COUNT

Removes external table row count statistics compiled by ANALYZE_EXTERNAL_ROW_COUNT.

Removes external table row count statistics compiled by ANALYZE_EXTERNAL_ROW_COUNT. DROP_EXTERNAL_ROW_COUNT runs in the background.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

DROP_EXTERNAL_ROW_COUNT ('[[[database.]schema.]table-name ]');

Parameters

schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table-name
The external table for which to remove the exact row count. If you specify an empty string, Vertica drops the exact row count statistic for all external tables.

Privileges

  • INSERT/UPDATE/DELETE privilege on table

  • USAGE privilege on schema that contains the table

Examples

Drop row count statistics for external table loader_rejects:

=> SELECT DROP_EXTERNAL_ROW_COUNT('loader_rejects');

See also

Collecting database statistics

5 - DROP_STATISTICS

Removes statistical data on database projections previously generated by ANALYZE_STATISTICS.

Removes statistical data on database projections previously generated by ANALYZE_STATISTICS. When you drop this data, the Vertica optimizer creates query plans using default statistics.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

DROP_STATISTICS ('[[[database.]schema.]table]' [, 'category' [, '[column-list]'] )

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table
Table on which to drop statistics. If set to an empty string, Vertica drops statistics for all database tables and their projections.
category
Category of statistics to drop, one of the following:
  • ALL (default): Drop all statistics, including histograms and row counts.

  • HISTOGRAMS: Drop only histograms. Row count statistics remain.

column-list
Comma-delimited list of columns in table, typically predicate columns. Vertica narrows the scope of dropped statistics to the specified columns. If you omit this parameter or supply an empty string, Vertica drops statistics on all columns.

Privileges

Non-superuser:

  • Schema: USAGE

  • Table: One of INSERT, DELETE, or UPDATE

Examples

Drop all base statistics for the table store.store_sales_fact:

=> SELECT DROP_STATISTICS('store.store_sales_fact');
 DROP_STATISTICS
-----------------
               0
(1 row)

Drop statistics for all table projections:

=> SELECT DROP_STATISTICS ('');
 DROP_STATISTICS
-----------------
               0
(1 row)

See also

DROP_STATISTICS_PARTITION

6 - DROP_STATISTICS_PARTITION

Removes statistical data on database projections previously generated by ANALYZE_STATISTICS_PARTITION.

Removes statistical data on database projections previously generated by ANALYZE_STATISTICS_PARTITION. When you drop this data, the Vertica optimizer creates query plans using table-level statistics, if available, or default statistics.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

DROP_STATISTICS_PARTITION ('[[database.]schema.]table', '[min-range-value]', '[max-range-value]' [, category [, '[column-list]'] )

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table
Table on which to drop statistics.
min-range-value max-range-value
The minimum and maximum value of partition keys on which to drop statistics, where min-range-value must be ≤ max-range-value. If you supply empty strings for both parameters, Vertica drops all partition-level statistics for this table or the specified columns.
category
The category of statistics to drop, one of the following:
  • BASE (default): Drop histograms and row counts (min/max column values, histogram).

  • HISTOGRAMS: Drop only histograms. Row count statistics remain.

  • ALL: Drop all statistics.

column-list
A comma-delimited list of columns in table, typically predicate columns. Vertica narrows the scope of dropped statistics to the specified columns. If you omit this parameter or supply an empty string, Vertica drops statistics on all columns.

Privileges

Non-superuser:

  • Schema: USAGE

  • Table: One of INSERT, DELETE, or UPDATE

See also

DROP_STATISTICS

7 - EXPORT_STATISTICS

Generates statistics in XML format from data previously collected by ANALYZE_STATISTICS.

Generates statistics in XML format from data previously collected by ANALYZE_STATISTICS. Before you export statistics, collect the latest data by calling ANALYZE_STATISTICS.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Stable

Syntax

EXPORT_STATISTICS ('[ filename ]' [,'table-spec' [,'column[,...]']])

Arguments

filename
Specifies where to write the generated XML. If filename already exists, EXPORT_STATISTICS overwrites it. If you supply an empty string, EXPORT_STATISTICS writes the XML to standard output.
table-spec
Specifies the table on which to export projection statistics:
  
[[database.]schema.]table
  

The default schema is public. If you specify a database, it must be the current database.

If table-spec is omitted or set to an empty string, Vertica exports all statistics for the database.

column
The name of a column in table-spec, typically a predicate column. You can specify multiple comma-delimited columns. Vertica narrows the scope of exported statistics to the specified columns.

Privileges

Superuser

Restrictions

EXPORT_STATISTICS does not export statistics for LONG data type columns.

Examples

The following statement exports statistics on the VMart example database to a file:

=> SELECT EXPORT_STATISTICS('/opt/vertica/examples/VMart_Schema/vmart_stats.xml');
        EXPORT_STATISTICS
-----------------------------------
Statistics exported successfully
(1 row)

The next statement exports statistics on a single column (price) from a table named food:

=> SELECT EXPORT_STATISTICS('/opt/vertica/examples/VMart_Schema/price.xml', 'food.price');
        EXPORT_STATISTICS
-----------------------------------
Statistics exported successfully
(1 row)

See also

8 - EXPORT_STATISTICS_PARTITION

Generates partition-level statistics in XML format from data previously collected by ANALYZE_STATISTICS_PARTITION.

Generates partition-level statistics in XML format from data previously collected by ANALYZE_STATISTICS_PARTITION.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Stable

Syntax

EXPORT_STATISTICS_PARTITION ('[ filename ]', 'table-spec', 'min-range-value','max-range-value' [, 'column[,...]' )

Arguments

filename
Specifies where to write the generated XML. If filename already exists, EXPORT_STATISTICS_PARTITION overwrites it. If you supply an empty string, the function writes to standard output.
table-spec
Specifies the table on which to export partition statistics:
  
[[database.]schema.]table
  

The default schema is public. If you specify a database, it must be the current database.

min-range-value, max-range-value
The minimum and maximum value of partition keys on which to export statistics, where min-range-value must be ≤ max-range-value.
column
The name of a column in table, typically a predicate column. You can specify multiple comma-delimited columns. Vertica narrows the scope of exported statistics to the specified columns.

Privileges

Superuser

Restrictions

EXPORT_STATISTICS_PARTITION does not export statistics for LONG data type columns.

See also

EXPORT_STATISTICS

9 - IMPORT_STATISTICS

Imports statistics from the XML file that was generated by EXPORT_STATISTICS.

Imports statistics from the XML file that was generated by EXPORT_STATISTICS. Imported statistics override existing statistics for the projections that are referenced in the XML file.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Stable

Syntax

IMPORT_STATISTICS ( 'filename' )

Parameters

filename
The path and name of an XML input file that was generated by EXPORT_STATISTICS.

Privileges

Superuser

Restrictions

  • IMPORT_STATISTICS imports only valid statistics. If the source XML file has invalid statistics for a specific column, those statistics are not imported and Vertica throws a warning. If the statistics file has an invalid structure, the import operation fails. To check a statistics file for validity, run VALIDATE_STATISTICS.

  • IMPORT_STATISTICS returns warnings for LONG data type columns, as the source XML file generated by EXPORT_STATISTICS contains no statistics for columns of that type.

Examples

Import the statistics for the VMart database from an XML file previously created by EXPORT_STATISTICS:

=> SELECT IMPORT_STATISTICS('/opt/vertica/examples/VMart_Schema/vmart_stats.xml');
                     IMPORT_STATISTICS
----------------------------------------------------------------------------
Importing statistics for projection date_dimension_super column date_key failure (stats did not contain row counts)
Importing statistics for projection date_dimension_super column date failure (stats did not contain row counts)
Importing statistics for projection date_dimension_super column full_date_description failure (stats did not contain row counts)
...
(1 row)

See also

10 - VALIDATE_STATISTICS

Validates statistics in the XML file generated by EXPORT_STATISTICS.

Validates statistics in the XML file generated by EXPORT_STATISTICS.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Stable

Syntax

VALIDATE_STATISTICS ( 'XML-file' )

Parameters

XML-file
the path and name of the XML file that contains the statistics to validate.

Privileges

Superuser

Reporting valid statistics

The following example shows the results when the statistics are valid:

=> SELECT EXPORT_STATISTICS('cust_dim_stats.xml','customer_dimension');
    EXPORT_STATISTICS
-----------------------------------
 Statistics exported successfully
(1 row)

=> SELECT VALIDATE_STATISTICS('cust_dim_stats.xml');
 VALIDATE_STATISTICS
---------------------
(1 row)

Identifying invalid statistics

If VALIDATE_STATISTICS is unable to read a document's XML, it throws this error:

=> SELECT VALIDATE_STATISTICS('/home/dbadmin/stats.xml');
                       VALIDATE_STATISTICS
----------------------------------------------------------------------------
Error validating statistics file: At line 1:1. Invalid document structure
(1 row)

If some table statistics are invalid, VALIDATE_STATISTICS returns a report that identifies them. In the following example, the function reports that attributes distinct, buckets, rows, count, and distinctCount cannot be negative numbers.

=> SELECT VALIDATE_STATISTICS('/stats.xml');
WARNING 0:  Invalid value '-1' for attribute 'distinct' under column 'public.t.x'.
   Please use a positive value.
WARNING 0:  Invalid value '-1' for attribute 'buckets' under column 'public.t.x'.
   Please use a positive value.
WARNING 0:  Invalid value '-1' for attribute 'rows' under column 'public.t.x'.
   Please use a positive value.
WARNING 0:  Invalid value '-1' for attribute 'count' under bound '1', column 'public.t.x'.
   Please use a positive value.
WARNING 0:  Invalid value '-1' for attribute 'distinctCount' under bound '1', column 'public.t.x'.
   Please use a positive value.
 VALIDATE_STATISTICS
---------------------
 (1 row)

In this case, run ANALYZE_STATISTICS on the table again to create valid statistics.

See also