This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Projection functions

This section contains projection management functions specific to Vertica.

This section contains projection management functions specific to Vertica.

See also

1 - CLEAR_PROJECTION_REFRESHES

Clears information projection refresh history from system table PROJECTION_REFRESHES.

Clears projection refresh history from the PROJECTION_REFRESHES system table. PROJECTION_REFRESHES records information about successful and unsuccessful refresh operations.

CLEAR_PROJECTION_REFRESHES removes information only for refresh operations that are complete, as indicated by the IS_EXECUTING column in PROJECTION_REFRESHES.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

CLEAR_PROJECTION_REFRESHES()

Privileges

Superuser

Examples

=> SELECT CLEAR_PROJECTION_REFRESHES();
 CLEAR_PROJECTION_REFRESHES
----------------------------
 CLEAR
(1 row)

See also

2 - EVALUATE_DELETE_PERFORMANCE

Evaluates projections for potential DELETE and UPDATE performance issues.

Evaluates projections for potential DELETE and UPDATE performance issues. If Vertica finds any issues, it issues a warning message. When evaluating multiple projections, EVALUATE_DELETE_PERFORMANCE returns up to ten projections with issues, and the name of a table that lists all issues that it found.

For information on resolving delete and update performance issues, see Optimizing DELETE and UPDATE.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

EVALUATE_DELETE_PERFORMANCE ( ['[[database.]schema.]scope'] )

Parameters

`[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

scope
Specifies the projections to evaluate, one of the following:
  • [table.]projection
    Evaluate projection. For example:

    SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact.store_orders_fact_b1');
    
  • table
    Specifies to evaluate all projections of table. For example:

    SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact');
    

If you supply no arguments, EVALUATE_DELETE_PERFORMANCE evaluates all projections that you can access. Depending on the size of your database, this can incur considerable overhead.

Privileges

Non-superuser: SELECT privilege on the anchor table

Examples

EVALUATE_DELETE_PERFORMANCE evaluates all projections of table example for potential DELETE and UPDATE performance issues.

=> create table example (A int, B int,C int);
CREATE TABLE
=> create projection one_sort (A,B,C) as (select A,B,C from example) order by A;
CREATE PROJECTION
=> create projection two_sort (A,B,C) as (select A,B,C from example) order by A,B;
CREATE PROJECTION
=> select evaluate_delete_performance('example');
            evaluate_delete_performance
---------------------------------------------------
 No projection delete performance concerns found.
(1 row)

The previous example show that the two projections one_sort and two_sort have no inherent structural issues that might cause poor DELETE performance. However, the data contained within the projection can create potential delete issues if the sorted columns do not uniquely identify a row or small number of rows.

In the following example, Perl is used to populate the table with data using a nested series of loops:

  • The inner loop populates column C.

  • The middle loop populates column B.

  • The outer loop populates column A.

The result is column A contains only three distinct values (0, 1, and 2), while column B slowly varies between 20 and 0 and column C changes in each row:

=> \! perl -e 'for ($i=0; $i<3; $i++) { for ($j=0; $j<21; $j++) { for ($k=0; $k<19; $k++) { printf "%d,%d,%d\n", $i,$j,$k;}}}' | /opt/vertica/bin/vsql -c "copy example from stdin delimiter ',' direct;"
Password:
=> select * from example;
 A | B  | C
---+----+----
 0 | 20 | 18
 0 | 20 | 17
 0 | 20 | 16
 0 | 20 | 15
 0 | 20 | 14
 0 | 20 | 13
 0 | 20 | 12
 0 | 20 | 11
 0 | 20 | 10
 0 | 20 |  9
 0 | 20 |  8
 0 | 20 |  7
 0 | 20 |  6
 0 | 20 |  5
 0 | 20 |  4
 0 | 20 |  3
 0 | 20 |  2
 0 | 20 |  1
 0 | 20 |  0
 0 | 19 | 18
 ...
 2 |  1 |  0
 2 |  0 | 18
 2 |  0 | 17
 2 |  0 | 16
 2 |  0 | 15
 2 |  0 | 14
 2 |  0 | 13
 2 |  0 | 12
 2 |  0 | 11
 2 |  0 | 10
 2 |  0 |  9
 2 |  0 |  8
 2 |  0 |  7
 2 |  0 |  6
 2 |  0 |  5
 2 |  0 |  4
 2 |  0 |  3
 2 |  0 |  2
 2 |  0 |  1
 2 |  0 |  0
=> SELECT COUNT (*) FROM example;
 COUNT
-------
  1197
(1 row)
=> SELECT COUNT (DISTINCT A) FROM example;
 COUNT
-------
     3
(1 row)

EVALUATE_DELETE_PERFORMANCE is run against the projections again to determine whether the data within the projections causes any potential DELETE performance issues. Projection one_sort has potential delete issues as it only sorts on column A which has few distinct values. Each value in the sort column corresponds to many rows in the projection, which can adversely impact DELETE performance. In contrast, projection two_sort is sorted on columns A and B, where each combination of values in the two sort columns identifies just a few rows, so deletes can be performed faster:


=> select evaluate_delete_performance('example');
            evaluate_delete_performance
---------------------------------------------------
 The following projections exhibit delete performance concerns:
        "public"."one_sort_b1"
        "public"."one_sort_b0"
See v_catalog.projection_delete_concerns for more details.

=> \x
Expanded display is on.
dbadmin=> select * from projection_delete_concerns;
-[ RECORD 1 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id      | 45035996273878562
projection_schema  | public
projection_name    | one_sort_b1
creation_time      | 2019-06-17 13:59:03.777085-04
last_modified_time | 2019-06-17 14:00:27.702223-04
comment            | The squared number of rows matching each sort key is about 159201 on average.
-[ RECORD 2 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id      | 45035996273878548
projection_schema  | public
projection_name    | one_sort_b0
creation_time      | 2019-06-17 13:59:03.777279-04
last_modified_time | 2019-06-17 13:59:03.777279-04
comment            | The squared number of rows matching each sort key is about 159201 on average.

If you omit supplying an argument to EVALUATE_DELETE_PERFORMANCE, it evaluates all projections that you can access:

=> select evaluate_delete_performance();
                          evaluate_delete_performance
---------------------------------------------------------------------------
 The following projections exhibit delete performance concerns:
        "public"."one_sort_b0"
        "public"."one_sort_b1"
See v_catalog.projection_delete_concerns for more details.
(1 row)

3 - GET_PROJECTION_SORT_ORDER

Returns the order of columns in a projection's ORDER BY clause.

Returns the order of columns in a projection's ORDER BY clause.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_PROJECTION_SORT_ORDER( '[[database.]schema.]projection' );

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

projection
The target projection.

Privileges

Non-superuser: SELECT privilege on the anchor table

Examples

=> SELECT get_projection_sort_order ('store_orders_super');
                                 get_projection_sort_order
--------------------------------------------------------------------------------------------
 public.store_orders_super [Sort Cols: "order_no", "order_date", "shipper", "ship_date"]

(1 row)

4 - GET_PROJECTION_STATUS

Returns information relevant to the status of a :.

Returns information relevant to the status of a projection:

  • The current K-safety status of the database

  • The number of nodes in the database

  • Whether the projection is segmented

  • The number and names of buddy projections

  • Whether the projection is safe

  • Whether the projection is up to date

  • Whether statistics have been computed for the projection

Use GET_PROJECTION_STATUS to monitor the progress of a projection data refresh.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_PROJECTION_STATUS ( '[[database.]schema.]projection' );

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

projection
The projection for which to display status.

Examples

=> SELECT GET_PROJECTION_STATUS('public.customer_dimension_site01');
                                     GET_PROJECTION_STATUS
-----------------------------------------------------------------------------------------------
 Current system K is 1.
# of Nodes: 4.
public.customer_dimension_site01 [Segmented: No] [Seg Cols: ] [K: 3] [public.customer_dimension_site04, public.customer_dimension_site03,
public.customer_dimension_site02]
[Safe: Yes] [UptoDate: Yes][Stats: Yes]

5 - GET_PROJECTIONS

Returns contextual and projection information about projections of the specified anchor table.

Returns contextual and projection information about projections of the specified anchor table.

Contextual information
  • Database K-safety

  • Number of database nodes

  • Number of projections for this table

Projection data
For each projection, specifies:
  • All buddy projections

  • Whether it is segmented

  • Whether it is safe

  • Whether it is up-to-date.

You can also use GET_PROJECTIONS to monitor the progress of a projection data refresh.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_PROJECTIONS ( '[[database.]schema-name.]table' )

Parameters

`[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table
Anchor table of the projections to list.

Privileges

None

Examples

The following example gets information about projections for VMart table store.store_dimension:

=> SELECT GET_PROJECTIONS('store.store_dimension');
-[ RECORD 1 ]---+
GET_PROJECTIONS | Current system K is 1.
# of Nodes: 3.
Table store.store_dimension has 2 projections.

Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
store.store_dimension_b1 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
store.store_dimension_b0 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]

6 - PURGE_PROJECTION

PURGE_PROJECTION can use significant disk space while purging the data.

Permanently removes deleted data from physical storage so disk space can be reused. You can purge historical data up to and including the Ancient History Mark epoch.

See PURGE for details about purge operations.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

PURGE_PROJECTION ( '[[database.]schema.]projection' )

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

projection
The projection to purge.

Privileges

  • Table owner

  • USAGE privilege on schema

Examples

The following example purges all historical data in projection tbl_p that precedes the Ancient History Mark epoch.

=> CREATE TABLE tbl (x int, y int);
CREATE TABLE
=> INSERT INTO tbl VALUES(1,2);
 OUTPUT
--------
      1
(1 row)

=> INSERT INTO tbl VALUES(3,4);
 OUTPUT
--------
      1
(1 row)

dbadmin=> COMMIT;
COMMIT
=> CREATE PROJECTION tbl_p AS SELECT x FROM tbl UNSEGMENTED ALL NODES;
WARNING 4468: Projection <public.tbl_p> is not available for query processing.
Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
=> SELECT START_REFRESH();
             START_REFRESH
----------------------------------------
 Starting refresh background process.
=> DELETE FROM tbl WHERE x=1;
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT MAKE_AHM_NOW();
         MAKE_AHM_NOW
-------------------------------
 AHM set (New AHM Epoch: 9066)
(1 row)

=> SELECT PURGE_PROJECTION ('tbl_p');
 PURGE_PROJECTION
-------------------
 Projection purged
(1 row)

See also

7 - REFRESH

Synchronously refreshes one or more table projections in the foreground, and updates the PROJECTION_REFRESHES system table.

Synchronously refreshes one or more table projections in the foreground, and updates the PROJECTION_REFRESHES system table. If you run REFRESH with no arguments, it refreshes all projections that contain stale data.

To understand projection refreshing in detail, see Refreshing projections.

If a refresh would violate a table or schema disk quota, the operation fails. For more information, see Disk quotas.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

REFRESH ( [ '[[database.]schema.]table[,...]' ] )

Parameters

[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

table
The anchor table of the projections to refresh. If you specify multiple tables, REFRESH attempts to refresh them in parallel. Such calls are part of the Database Designer deployment (and deployment script).

Returns

Column Returns
Projection Name The projection targeted for refresh.
Anchor Table The projection's associated anchor table.
Status

Projections' refresh status:

  • queued: Queued for refresh.

  • refreshing: Refresh is in process.

  • refreshed: Refresh successfully completed.

  • failed: Refresh did not successfully complete.

Refresh Method Method used to refresh the projection.
Error Count Number of times a refresh failed for the projection.
Duration (sec) How long (in seconds) the projection refresh ran.

Privileges

Refresh methods

Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.

Vertica can perform incremental refreshes when the following conditions are met:

  • The table being refreshed is partitioned.

  • The table does not contain any unpartitioned data.

  • The operation is a full projection refresh (not a partition range projection refresh).

In an incremental refresh, the refresh operation first loads data from the partition with the highest range of keys. After refreshing this partition, Vertica begins to refresh the partition with next highest partition range. This process continues until all projection partitions are refreshed. While the refresh operation is in progress, projection partitions that have completed the refresh process become available to process query requests.

The method used to refresh a given projection is recorded in the REFRESH_METHOD column of the PROJECTION_REFRESHES system table.

Examples

The following example refreshes the projections in two tables:

=> SELECT REFRESH('t1, t2');
                                             REFRESH
----------------------------------------------------------------------------------------
Refresh completed with the following outcomes:

Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------

"public"."t1_p": [t1] [refreshed] [scratch] [0] [0]"public"."t2_p": [t2] [refreshed] [scratch] [0] [0]

In the following example, only the projection on one table was refreshed:

=> SELECT REFRESH('allow, public.deny, t');
                                               REFRESH
----------------------------------------------------------------------------------------

Refresh completed with the following outcomes:

Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "allow"] [] [1] [0]
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "public.deny"] [] [1] [0]
"public"."t_p1": [t] [refreshed] [scratch] [0] [0]

See also

8 - REFRESH_COLUMNS

Refreshes table columns that are defined with the constraint SET USING or DEFAULT USING.

Refreshes table columns that are defined with the constraint SET USING or DEFAULT USING. All refresh operations associated with a call to REFRESH_COLUMNS belong to the same transaction. Thus, all tables and columns specified by REFRESH_COLUMNS must be refreshed; otherwise, the entire operation is rolled back.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

REFRESH_COLUMNS ( 'table-list', '[column-list]'
   [, '[refresh-mode]' [, min-partition-key, max-partition-key [, force-split] ]
)

Arguments

table-list
A comma-delimited list of the tables to refresh: [[database.]schema.]table[,...]

If you specify multiple tables, refresh-mode must be set to REBUILD.

column-list
A comma-delimited list of columns to refresh: [[[database.]schema.]table.]column[,...] or [[database.]schema.]table.*, where asterisk (*) means to refresh all SET USING/DEFAULT USING columns in the table. For example:
SELECT REFRESH_COLUMNS ('t1, t2', 't1.*, t2.b', 'REBUILD');

If column-list is set to an empty string (''), REFRESH_COLUMNS refreshes all SET USING/DEFAULT USING columns in the specified tables.

The following requirements apply:

  • All specified columns must have a SET USING or DEFAULT USING constraint.

  • If REFRESH_COLUMNS specifies multiple tables, all column names must be qualified by their table names. If the target tables span multiple schemas, all column names must be fully qualified by their schema and table names. For example:

    SELECT REFRESH_COLUMNS ('t1, t2', 't1.a, t2.b', 'REBUILD');
    

If you specify a database, it must be the current database.

refresh-mode
Specifies how to refresh SET USING columns:
  • UPDATE (default): Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement.

  • REBUILD: Replaces all data in the specified columns. The rebuild operation is auto-committed.

If you specify multiple tables, you must explicitly specify REBUILD mode.

In both cases, REFRESH_COLUMNS returns an error if any SET USING column is defined as a primary or unique key in a table that enforces those constraints.

See REBUILD Mode Restrictions for limitations on using the REBUILD option.

min-partition-key, max-partition-key
Qualifies REBUILD mode, limiting the rebuild operation to one or more partitions. To specify a range of partitions, max-partition-key must be greater than min-partition-key. To update one partition, the two arguments must be equal.

The following requirements apply:

  • The function can specify only one table to refresh.

  • The table must be partitioned on the specified keys.

You can use these arguments to refresh columns with recently loaded data—that is, data in the latest partitions. Using this option regularly can significantly minimize the overhead otherwise incurred by rebuilding entire columns in a large table.

See Partition-based REBUILD below for details.

force-split
Boolean, whether to split ROS containers if the range of partition keys spans multiple containers or part of a single container:
  • true (default): Split ROS containers as needed.

  • false: Return with an error if ROS containers must be split to implement this operation.

Privileges

  • Schemas of queried and flattened tables: USAGE

  • Queried table: SELECT

  • Flattened table: SELECT, UPDATE

UPDATE versus REBUILD modes

In general, UPDATE mode is a better choice when changes to SET USING column data are confined to a relatively small number of rows. Use REBUILD mode when a significant amount of SET USING column data is stale and must be updated. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column—for example, to populate a SET USING column after adding it with ALTER TABLE...ADD COLUMN.

REBUILD mode restrictions

If you call REFRESH_COLUMNS on a SET USING column and specify the refresh mode as REBUILD, Vertica returns an error if the column is specified in any of the following:

Partition-based REBUILD operations

If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode, by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For example, table public.orderFact is defined with SET USING column cust_name. This table is partitioned on column order_date, where the partition clause invokes Vertica function CALENDAR_HIERARCHY_DAY. Thus, you can call REFRESH_COLUMNS on specific time-delimited partitions of this table—in this case, on orders over the last two months:

=> SELECT REFRESH_COLUMNS ('public.orderFact',
                        'cust_name',
                        'REBUILD',
                        TO_CHAR(ADD_MONTHS(current_date, -2),'YYYY-MM')||'-01',
                        TO_CHAR(LAST_DAY(ADD_MONTHS(current_date, -1))));
      REFRESH_COLUMNS
---------------------------
 refresh_columns completed
(1 row)

Rewriting SET USING queries

When you call REFRESH_COLUMNS on a flattened table's SET USING (or DEFAULT USING) column, it executes the SET USING query by joining the target and source tables. By default, the source table is always the inner table of the join. In most cases, cardinality of the source table is less than the target table, so REFRESH_COLUMNS executes the join efficiently.

Occasionally—notably, when you call REFRESH_COLUMNS on a partitioned table—the source table can be larger than the target table. In this case, performance of the join operation can be suboptimal.

You can address this issue by enabling configuration parameter RewriteQueryForLargeDim. When enabled (1), Vertica rewrites the query, by reversing the inner and outer join between the target and source tables.

Examples

See Flattened table example and DEFAULT versus SET USING.

9 - START_REFRESH

Refreshes projections in the current schema with the latest data of their respective.

Refreshes projections in the current schema with the latest data of their respective anchor tables. START_REFRESH runs asynchronously in the background, and updates the PROJECTION_REFRESHES system table. This function has no effect if a refresh is already running.

To refresh only projections of a specific table, use REFRESH. When you deploy a design through Database Designer, it automatically refreshes its projections.

If a refresh would violate a table or schema disk quota, the operation fails. For more information, see Disk quotas.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

START_REFRESH()

Privileges

None

Requirements

All nodes must be up.

Refresh methods

Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.

Vertica can perform incremental refreshes when the following conditions are met:

  • The table being refreshed is partitioned.

  • The table does not contain any unpartitioned data.

  • The operation is a full projection refresh (not a partition range projection refresh).

In an incremental refresh, the refresh operation first loads data from the partition with the highest range of keys. After refreshing this partition, Vertica begins to refresh the partition with next highest partition range. This process continues until all projection partitions are refreshed. While the refresh operation is in progress, projection partitions that have completed the refresh process become available to process query requests.

The method used to refresh a given projection is recorded in the REFRESH_METHOD column of the PROJECTION_REFRESHES system table.

Examples

=> SELECT START_REFRESH();
             START_REFRESH
----------------------------------------
 Starting refresh background process.
(1 row)

See also