This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Projection functions

This section contains projection management functions specific to Vertica.

1: CLEAR_PROJECTION_REFRESHES
2: EVALUATE_DELETE_PERFORMANCE
3: GET_PROJECTION_SORT_ORDER
4: GET_PROJECTION_STATUS
5: GET_PROJECTIONS
6: PURGE_PROJECTION
7: REFRESH
8: REFRESH_COLUMNS
9: START_REFRESH

This section contains projection management functions specific to Vertica.

1 - CLEAR_PROJECTION_REFRESHES

Clears information projection refresh history from system table PROJECTION_REFRESHES.

Clears projection refresh history from the PROJECTION_REFRESHES system table. PROJECTION_REFRESHES records information about successful and unsuccessful refresh operations.

CLEAR_PROJECTION_REFRESHES removes information only for refresh operations that are complete, as indicated by the IS_EXECUTING column in PROJECTION_REFRESHES.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

CLEAR_PROJECTION_REFRESHES()

Privileges

Superuser

Examples

=> SELECT CLEAR_PROJECTION_REFRESHES();
 CLEAR_PROJECTION_REFRESHES
----------------------------
 CLEAR
(1 row)

2 - EVALUATE_DELETE_PERFORMANCE

Evaluates projections for potential DELETE and UPDATE performance issues.

Evaluates projections for potential DELETE and UPDATE performance issues. If Vertica finds any issues, it issues a warning message. When evaluating multiple projections, EVALUATE_DELETE_PERFORMANCE returns up to ten projections with issues, and the name of a table that lists all issues that it found.

Note

EVALUATE_DELETE_PERFORMANCE returns messages that specifically reference delete performance. Keep in mind, however, that delete and update operations benefit equally from the same optimizations.

For information on resolving delete and update performance issues, see Optimizing DELETE and UPDATE.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

EVALUATE_DELETE_PERFORMANCE ( ['[[database.]schema.]scope'] )

Parameters

`[database.]schema

Database and schema. The default schema is public. If you specify a database, it must be the current database.

scope

Specifies the projections to evaluate, one of the following:

[table.]projection
Evaluate projection. For example:

SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact.store_orders_fact_b1');

table
Specifies to evaluate all projections of table. For example:

SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact');

If you supply no arguments, EVALUATE_DELETE_PERFORMANCE evaluates all projections that you can access. Depending on the size of your database, this can incur considerable overhead.

Privileges

Non-superuser: SELECT privilege on the anchor table

Examples

EVALUATE_DELETE_PERFORMANCE evaluates all projections of table example for potential DELETE and UPDATE performance issues.

=> create table example (A int, B int,C int);
CREATE TABLE
=> create projection one_sort (A,B,C) as (select A,B,C from example) order by A;
CREATE PROJECTION
=> create projection two_sort (A,B,C) as (select A,B,C from example) order by A,B;
CREATE PROJECTION
=> select evaluate_delete_performance('example');
            evaluate_delete_performance
---------------------------------------------------
 No projection delete performance concerns found.
(1 row)

The previous example show that the two projections one_sort and two_sort have no inherent structural issues that might cause poor DELETE performance. However, the data contained within the projection can create potential delete issues if the sorted columns do not uniquely identify a row or small number of rows.

In the following example, Perl is used to populate the table with data using a nested series of loops:

The inner loop populates column C.
The middle loop populates column B.
The outer loop populates column A.

The result is column A contains only three distinct values (0, 1, and 2), while column B slowly varies between 20 and 0 and column C changes in each row:

=> \! perl -e 'for ($i=0; $i<3; $i++) { for ($j=0; $j<21; $j++) { for ($k=0; $k<19; $k++) { printf "%d,%d,%d\n", $i,$j,$k;}}}' | /opt/vertica/bin/vsql -c "copy example from stdin delimiter ',' direct;"
Password:
=> select * from example;
 A | B  | C
---+----+----
 0 | 20 | 18
 0 | 20 | 17
 0 | 20 | 16
 0 | 20 | 15
 0 | 20 | 14
 0 | 20 | 13
 0 | 20 | 12
 0 | 20 | 11
 0 | 20 | 10
 0 | 20 |  9
 0 | 20 |  8
 0 | 20 |  7
 0 | 20 |  6
 0 | 20 |  5
 0 | 20 |  4
 0 | 20 |  3
 0 | 20 |  2
 0 | 20 |  1
 0 | 20 |  0
 0 | 19 | 18
 ...
 2 |  1 |  0
 2 |  0 | 18
 2 |  0 | 17
 2 |  0 | 16
 2 |  0 | 15
 2 |  0 | 14
 2 |  0 | 13
 2 |  0 | 12
 2 |  0 | 11
 2 |  0 | 10
 2 |  0 |  9
 2 |  0 |  8
 2 |  0 |  7
 2 |  0 |  6
 2 |  0 |  5
 2 |  0 |  4
 2 |  0 |  3
 2 |  0 |  2
 2 |  0 |  1
 2 |  0 |  0
=> SELECT COUNT (*) FROM example;
 COUNT
-------
  1197
(1 row)
=> SELECT COUNT (DISTINCT A) FROM example;
 COUNT
-------
     3
(1 row)

EVALUATE_DELETE_PERFORMANCE is run against the projections again to determine whether the data within the projections causes any potential DELETE performance issues. Projection one_sort has potential delete issues as it only sorts on column A which has few distinct values. Each value in the sort column corresponds to many rows in the projection, which can adversely impact DELETE performance. In contrast, projection two_sort is sorted on columns A and B, where each combination of values in the two sort columns identifies just a few rows, so deletes can be performed faster:


=> select evaluate_delete_performance('example');
            evaluate_delete_performance
---------------------------------------------------
 The following projections exhibit delete performance concerns:
        "public"."one_sort_b1"
        "public"."one_sort_b0"
See v_catalog.projection_delete_concerns for more details.

=> \x
Expanded display is on.
dbadmin=> select * from projection_delete_concerns;
-[ RECORD 1 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id      | 45035996273878562
projection_schema  | public
projection_name    | one_sort_b1
creation_time      | 2019-06-17 13:59:03.777085-04
last_modified_time | 2019-06-17 14:00:27.702223-04
comment            | The squared number of rows matching each sort key is about 159201 on average.
-[ RECORD 2 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id      | 45035996273878548
projection_schema  | public
projection_name    | one_sort_b0
creation_time      | 2019-06-17 13:59:03.777279-04
last_modified_time | 2019-06-17 13:59:03.777279-04
comment            | The squared number of rows matching each sort key is about 159201 on average.

If you omit supplying an argument to EVALUATE_DELETE_PERFORMANCE, it evaluates all projections that you can access:

=> select evaluate_delete_performance();
                          evaluate_delete_performance
---------------------------------------------------------------------------
 The following projections exhibit delete performance concerns:
        "public"."one_sort_b0"
        "public"."one_sort_b1"
See v_catalog.projection_delete_concerns for more details.
(1 row)

3 - GET_PROJECTION_SORT_ORDER

Returns the order of columns in a projection's ORDER BY clause.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_PROJECTION_SORT_ORDER( '[[database.]schema.]projection' );

Parameters

[database.]schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
projection: The target projection.

Privileges

Non-superuser: SELECT privilege on the anchor table

Examples

=> SELECT get_projection_sort_order ('store_orders_super');
                                 get_projection_sort_order
--------------------------------------------------------------------------------------------
 public.store_orders_super [Sort Cols: "order_no", "order_date", "shipper", "ship_date"]

(1 row)

4 - GET_PROJECTION_STATUS

Returns information relevant to the status of a :.

Returns information relevant to the status of a projection:

The current K-safety status of the database
The number of nodes in the database
Whether the projection is segmented
The number and names of buddy projections
Whether the projection is safe
Whether the projection is up to date
Whether statistics have been computed for the projection

Use GET_PROJECTION_STATUS to monitor the progress of a projection data refresh.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_PROJECTION_STATUS ( '[[database.]schema.]projection' );

Parameters

[database.]schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
projection: The projection for which to display status.

Examples

=> SELECT GET_PROJECTION_STATUS('public.customer_dimension_site01');
                                     GET_PROJECTION_STATUS
-----------------------------------------------------------------------------------------------
 Current system K is 1.
# of Nodes: 4.
public.customer_dimension_site01 [Segmented: No] [Seg Cols: ] [K: 3] [public.customer_dimension_site04, public.customer_dimension_site03,
public.customer_dimension_site02]
[Safe: Yes] [UptoDate: Yes][Stats: Yes]

5 - GET_PROJECTIONS

Returns contextual and projection information about projections of the specified anchor table.

Contextual information

Database K-safety
Number of database nodes
Number of projections for this table

Projection data

For each projection, specifies:

All buddy projections
Whether it is segmented
Whether it is safe
Whether it is up-to-date.

You can also use GET_PROJECTIONS to monitor the progress of a projection data refresh.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

GET_PROJECTIONS ( '[[database.]schema-name.]table' )

Parameters

`[database.]schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
table: Anchor table of the projections to list.

Privileges

None

Examples

The following example gets information about projections for VMart table store.store_dimension:

=> SELECT GET_PROJECTIONS('store.store_dimension');
-[ RECORD 1 ]---+
GET_PROJECTIONS | Current system K is 1.
# of Nodes: 3.
Table store.store_dimension has 2 projections.

Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
store.store_dimension_b1 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
store.store_dimension_b0 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]

6 - PURGE_PROJECTION

PURGE_PROJECTION can use significant disk space while purging the data.

Permanently removes deleted data from physical storage so disk space can be reused. You can purge historical data up to and including the Ancient History Mark epoch.

Caution

PURGE_PROJECTION can use significant disk space while purging the data.

See PURGE for details about purge operations.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

PURGE_PROJECTION ( '[[database.]schema.]projection' )

Parameters

[database.]schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
projection: The projection to purge.

Privileges

Table owner
USAGE privilege on schema

Examples

The following example purges all historical data in projection tbl_p that precedes the Ancient History Mark epoch.

=> CREATE TABLE tbl (x int, y int);
CREATE TABLE
=> INSERT INTO tbl VALUES(1,2);
 OUTPUT
--------
      1
(1 row)

=> INSERT INTO tbl VALUES(3,4);
 OUTPUT
--------
      1
(1 row)

dbadmin=> COMMIT;
COMMIT
=> CREATE PROJECTION tbl_p AS SELECT x FROM tbl UNSEGMENTED ALL NODES;
WARNING 4468: Projection <public.tbl_p> is not available for query processing.
Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
=> SELECT START_REFRESH();
             START_REFRESH
----------------------------------------
 Starting refresh background process.
=> DELETE FROM tbl WHERE x=1;
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT MAKE_AHM_NOW();
         MAKE_AHM_NOW
-------------------------------
 AHM set (New AHM Epoch: 9066)
(1 row)

=> SELECT PURGE_PROJECTION ('tbl_p');
 PURGE_PROJECTION
-------------------
 Projection purged
(1 row)

7 - REFRESH

Synchronously refreshes one or more table projections in the foreground, and updates the PROJECTION_REFRESHES system table.

Synchronously refreshes one or more table projections in the foreground, and updates the PROJECTION_REFRESHES system table. If you run REFRESH with no arguments, it refreshes all projections that contain stale data.

To understand projection refreshing in detail, see Refreshing projections.

If a refresh would violate a table or schema disk quota, the operation fails. For more information, see Disk quotas.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

REFRESH ( [ '[[database.]schema.]table[,...]' ] )

Parameters

[database.]schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
table: The anchor table of the projections to refresh. If you specify multiple tables, REFRESH attempts to refresh them in parallel. Such calls are part of the Database Designer deployment (and deployment script).

Returns

Note

If REFRESH does not refresh any projections, it returns a header string with no results.

Column	Returns
`Projection Name`	The projection targeted for refresh.
`Anchor Table`	The projection's associated anchor table.
`Status`	Projections' refresh status: `queued`: Queued for refresh. `refreshing`: Refresh is in process. `refreshed`: Refresh successfully completed. `failed`: Refresh did not successfully complete.
`Refresh Method`	Method used to refresh the projection.
`Error Count`	Number of times a refresh failed for the projection.
`Duration (sec)`	How long (in seconds) the projection refresh ran.

Privileges

Superuser
Owner of the specified tables

Refresh methods

Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.

Vertica can perform incremental refreshes when the following conditions are met:

The table being refreshed is partitioned.
The table does not contain any unpartitioned data.
The operation is a full projection refresh (not a partition range projection refresh).

In an incremental refresh, the refresh operation first loads data from the partition with the highest range of keys. After refreshing this partition, Vertica begins to refresh the partition with next highest partition range. This process continues until all projection partitions are refreshed. While the refresh operation is in progress, projection partitions that have completed the refresh process become available to process query requests.

The method used to refresh a given projection is recorded in the REFRESH_METHOD column of the PROJECTION_REFRESHES system table.

Examples

The following example refreshes the projections in two tables:

=> SELECT REFRESH('t1, t2');
                                             REFRESH
----------------------------------------------------------------------------------------
Refresh completed with the following outcomes:

Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------

"public"."t1_p": [t1] [refreshed] [scratch] [0] [0]"public"."t2_p": [t2] [refreshed] [scratch] [0] [0]

In the following example, only the projection on one table was refreshed:

=> SELECT REFRESH('allow, public.deny, t');
                                               REFRESH
----------------------------------------------------------------------------------------

Refresh completed with the following outcomes:

Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "allow"] [] [1] [0]
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "public.deny"] [] [1] [0]
"public"."t_p1": [t] [refreshed] [scratch] [0] [0]

8 - REFRESH_COLUMNS

Refreshes table columns that are defined with the constraint SET USING or DEFAULT USING.

Refreshes table columns that are defined with the constraint SET USING or DEFAULT USING. All refresh operations associated with a call to REFRESH_COLUMNS belong to the same transaction. Thus, all tables and columns specified by REFRESH_COLUMNS must be refreshed; otherwise, the entire operation is rolled back.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

REFRESH_COLUMNS ( 'table-list', '[column-list]'
   [, '[refresh-mode]' [, min-partition-key, max-partition-key [, force-split] ]
)

Arguments

table-list

A comma-delimited list of the tables to refresh: [[database.]schema.]table[,...]

If you specify multiple tables, refresh-mode must be set to REBUILD.

column-list

A comma-delimited list of columns to refresh: [[[database.]schema.]table.]column[,...] or [[database.]schema.]table.*, where asterisk (*) means to refresh all SET USING/DEFAULT USING columns in the table. For example:

SELECT REFRESH_COLUMNS ('t1, t2', 't1.*, t2.b', 'REBUILD');

If column-list is set to an empty string (''), REFRESH_COLUMNS refreshes all SET USING/DEFAULT USING columns in the specified tables.

The following requirements apply:

All specified columns must have a SET USING or DEFAULT USING constraint.
If REFRESH_COLUMNS specifies multiple tables, all column names must be qualified by their table names. If the target tables span multiple schemas, all column names must be fully qualified by their schema and table names. For example:
```
SELECT REFRESH_COLUMNS ('t1, t2', 't1.a, t2.b', 'REBUILD');
```

If you specify a database, it must be the current database.

refresh-mode

Specifies how to refresh SET USING columns:

UPDATE (default): Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement.
REBUILD: Replaces all data in the specified columns. The rebuild operation is auto-committed.

If you specify multiple tables, you must explicitly specify REBUILD mode.

In both cases, REFRESH_COLUMNS returns an error if any SET USING column is defined as a primary or unique key in a table that enforces those constraints.

See REBUILD Mode Restrictions for limitations on using the REBUILD option.

min-partition-key, max-partition-key

Qualifies REBUILD mode, limiting the rebuild operation to one or more partitions. To specify a range of partitions, max-partition-key must be greater than min-partition-key. To update one partition, the two arguments must be equal.

The following requirements apply:

The function can specify only one table to refresh.
The table must be partitioned on the specified keys.

You can use these arguments to refresh columns with recently loaded data—that is, data in the latest partitions. Using this option regularly can significantly minimize the overhead otherwise incurred by rebuilding entire columns in a large table.

See Partition-based REBUILD below for details.

force-split

Boolean, whether to split ROS containers if the range of partition keys spans multiple containers or part of a single container:

true (default): Split ROS containers as needed.
false: Return with an error if ROS containers must be split to implement this operation.

Privileges

Schemas of queried and flattened tables: USAGE
Queried table: SELECT
Flattened table: SELECT, UPDATE

UPDATE versus REBUILD modes

In general, UPDATE mode is a better choice when changes to SET USING column data are confined to a relatively small number of rows. Use REBUILD mode when a significant amount of SET USING column data is stale and must be updated. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column—for example, to populate a SET USING column after adding it with ALTER TABLE...ADD COLUMN.

REBUILD mode restrictions

If you call REFRESH_COLUMNS on a SET USING column and specify the refresh mode as REBUILD, Vertica returns an error if the column is specified in any of the following:

Table's partition key
Unsegmented projection
Projection with expressions, or any live aggregate projection that invokes a user-defined transform function (UDTF)
Sort order or segmentation of any projection
Any projection that omits an anchor table column that is referenced in the column's SET USING expression
GROUPED clause of any projection

Partition-based REBUILD operations

If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode, by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For example, table public.orderFact is defined with SET USING column cust_name. This table is partitioned on column order_date, where the partition clause invokes Vertica function CALENDAR_HIERARCHY_DAY. Thus, you can call REFRESH_COLUMNS on specific time-delimited partitions of this table—in this case, on orders over the last two months:

=> SELECT REFRESH_COLUMNS ('public.orderFact',
                        'cust_name',
                        'REBUILD',
                        TO_CHAR(ADD_MONTHS(current_date, -2),'YYYY-MM')||'-01',
                        TO_CHAR(LAST_DAY(ADD_MONTHS(current_date, -1))));
      REFRESH_COLUMNS
---------------------------
 refresh_columns completed
(1 row)

Rewriting SET USING queries

When you call REFRESH_COLUMNS on a flattened table's SET USING (or DEFAULT USING) column, it executes the SET USING query by joining the target and source tables. By default, the source table is always the inner table of the join. In most cases, cardinality of the source table is less than the target table, so REFRESH_COLUMNS executes the join efficiently.

Occasionally—notably, when you call REFRESH_COLUMNS on a partitioned table—the source table can be larger than the target table. In this case, performance of the join operation can be suboptimal.

You can address this issue by enabling configuration parameter RewriteQueryForLargeDim. When enabled (1), Vertica rewrites the query, by reversing the inner and outer join between the target and source tables.

Important

Enable this parameter only if the SET USING source data is in a table that is larger than the target table. If the source data is in a table smaller than the target table, then enabling RewriteQueryForLargeDim can adversely affect refresh performance.

Examples

See Flattened table example and DEFAULT versus SET USING.

9 - START_REFRESH

Refreshes projections in the current schema with the latest data of their respective.

Refreshes projections in the current schema with the latest data of their respective anchor tables. START_REFRESH runs asynchronously in the background, and updates the PROJECTION_REFRESHES system table. This function has no effect if a refresh is already running.

To refresh only projections of a specific table, use REFRESH. When you deploy a design through Database Designer, it automatically refreshes its projections.

If a refresh would violate a table or schema disk quota, the operation fails. For more information, see Disk quotas.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

START_REFRESH()

Privileges

None

Requirements

All nodes must be up.

Refresh methods

Vertica can perform incremental refreshes when the following conditions are met:

The table being refreshed is partitioned.
The table does not contain any unpartitioned data.
The operation is a full projection refresh (not a partition range projection refresh).

The method used to refresh a given projection is recorded in the REFRESH_METHOD column of the PROJECTION_REFRESHES system table.

Examples

=> SELECT START_REFRESH();
             START_REFRESH
----------------------------------------
 Starting refresh background process.
(1 row)

Projection functions

See also

1 - CLEAR_PROJECTION_REFRESHES

Behavior type

Syntax

Privileges

Examples

See also

2 - EVALUATE_DELETE_PERFORMANCE

Note

Behavior type

Syntax

Parameters

Privileges

Examples

3 - GET_PROJECTION_SORT_ORDER

Behavior type

Syntax

Parameters

Privileges

Examples

4 - GET_PROJECTION_STATUS

Behavior type

Syntax

Parameters

Examples

5 - GET_PROJECTIONS

Behavior type

Syntax

Parameters

Privileges

Examples

6 - PURGE_PROJECTION

Caution

Behavior type

Syntax

Parameters

Privileges

Examples

See also

7 - REFRESH

Behavior type

Syntax

Parameters

Returns

Note

Privileges

Refresh methods

Examples

See also

8 - REFRESH_COLUMNS

Behavior type

Syntax

Arguments

Privileges

UPDATE versus REBUILD modes

REBUILD mode restrictions

Partition-based REBUILD operations

Rewriting SET USING queries

Important

Examples

9 - START_REFRESH

Behavior type

Syntax

Privileges

Requirements

Refresh methods

Examples

See also