This section contains projection management functions specific to Vertica.
This is the multi-page printable view of this section. Click here to print.
Projection functions
- 1: CLEAR_PROJECTION_REFRESHES
- 2: EVALUATE_DELETE_PERFORMANCE
- 3: GET_PROJECTION_SORT_ORDER
- 4: GET_PROJECTION_STATUS
- 5: GET_PROJECTIONS
- 6: PURGE_PROJECTION
- 7: REFRESH
- 8: REFRESH_COLUMNS
- 9: START_REFRESH
1 - CLEAR_PROJECTION_REFRESHES
Clears projection refresh history from the PROJECTION_REFRESHES system table. PROJECTION_REFRESHES records information about successful and unsuccessful refresh operations.
CLEAR_PROJECTION_REFRESHES removes information only for refresh operations that are complete, as indicated by the IS_EXECUTING column in PROJECTION_REFRESHES.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
CLEAR_PROJECTION_REFRESHES()
Privileges
Superuser
Examples
=> SELECT CLEAR_PROJECTION_REFRESHES();
CLEAR_PROJECTION_REFRESHES
----------------------------
CLEAR
(1 row)
See also
2 - EVALUATE_DELETE_PERFORMANCE
Evaluates projections for potential DELETE and UPDATE performance issues. If Vertica finds any issues, it issues a warning message. When evaluating multiple projections, EVALUATE_DELETE_PERFORMANCE returns up to ten projections with issues, and the name of a table that lists all issues that it found.
Note
EVALUATE_DELETE_PERFORMANCE returns messages that specifically reference delete performance. Keep in mind, however, that delete and update operations benefit equally from the same optimizations.For information on resolving delete and update performance issues, see Optimizing DELETE and UPDATE.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
EVALUATE_DELETE_PERFORMANCE ( ['[[database.]schema.]scope'] )
Parameters
-
`[database.]schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.scope
- Specifies the projections to evaluate, one of the following:
-
[
table
.]
projection
Evaluateprojection
. For example:SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact.store_orders_fact_b1');
-
table
Specifies to evaluate all projections oftable
. For example:SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact');
If you supply no arguments, EVALUATE_DELETE_PERFORMANCE evaluates all projections that you can access. Depending on the size of your database, this can incur considerable overhead.
-
Privileges
Non-superuser: SELECT privilege on the anchor table
Examples
EVALUATE_DELETE_PERFORMANCE evaluates all projections of table exampl
e for potential DELETE and UPDATE performance issues.
=> create table example (A int, B int,C int);
CREATE TABLE
=> create projection one_sort (A,B,C) as (select A,B,C from example) order by A;
CREATE PROJECTION
=> create projection two_sort (A,B,C) as (select A,B,C from example) order by A,B;
CREATE PROJECTION
=> select evaluate_delete_performance('example');
evaluate_delete_performance
---------------------------------------------------
No projection delete performance concerns found.
(1 row)
The previous example show that the two projections one_sort and two_sort have no inherent structural issues that might cause poor DELETE performance. However, the data contained within the projection can create potential delete issues if the sorted columns do not uniquely identify a row or small number of rows.
In the following example, Perl is used to populate the table with data using a nested series of loops:
-
The inner loop populates column
C.
-
The middle loop populates column
B
. -
The outer loop populates column
A
.
The result is column A
contains only three distinct values (0, 1, and 2), while column B
slowly varies between 20 and 0 and column C
changes in each row:
=> \! perl -e 'for ($i=0; $i<3; $i++) { for ($j=0; $j<21; $j++) { for ($k=0; $k<19; $k++) { printf "%d,%d,%d\n", $i,$j,$k;}}}' | /opt/vertica/bin/vsql -c "copy example from stdin delimiter ',' direct;"
Password:
=> select * from example;
A | B | C
---+----+----
0 | 20 | 18
0 | 20 | 17
0 | 20 | 16
0 | 20 | 15
0 | 20 | 14
0 | 20 | 13
0 | 20 | 12
0 | 20 | 11
0 | 20 | 10
0 | 20 | 9
0 | 20 | 8
0 | 20 | 7
0 | 20 | 6
0 | 20 | 5
0 | 20 | 4
0 | 20 | 3
0 | 20 | 2
0 | 20 | 1
0 | 20 | 0
0 | 19 | 18
...
2 | 1 | 0
2 | 0 | 18
2 | 0 | 17
2 | 0 | 16
2 | 0 | 15
2 | 0 | 14
2 | 0 | 13
2 | 0 | 12
2 | 0 | 11
2 | 0 | 10
2 | 0 | 9
2 | 0 | 8
2 | 0 | 7
2 | 0 | 6
2 | 0 | 5
2 | 0 | 4
2 | 0 | 3
2 | 0 | 2
2 | 0 | 1
2 | 0 | 0
=> SELECT COUNT (*) FROM example;
COUNT
-------
1197
(1 row)
=> SELECT COUNT (DISTINCT A) FROM example;
COUNT
-------
3
(1 row)
EVALUATE_DELETE_PERFORMANCE is run against the projections again to determine whether the data within the projections causes any potential DELETE performance issues. Projection one_sort
has potential delete issues as it only sorts on column A which has few distinct values. Each value in the sort column corresponds to many rows in the projection, which can adversely impact DELETE performance. In contrast, projection two_sort
is sorted on columns A
and B
, where each combination of values in the two sort columns identifies just a few rows, so deletes can be performed faster:
=> select evaluate_delete_performance('example');
evaluate_delete_performance
---------------------------------------------------
The following projections exhibit delete performance concerns:
"public"."one_sort_b1"
"public"."one_sort_b0"
See v_catalog.projection_delete_concerns for more details.
=> \x
Expanded display is on.
dbadmin=> select * from projection_delete_concerns;
-[ RECORD 1 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id | 45035996273878562
projection_schema | public
projection_name | one_sort_b1
creation_time | 2019-06-17 13:59:03.777085-04
last_modified_time | 2019-06-17 14:00:27.702223-04
comment | The squared number of rows matching each sort key is about 159201 on average.
-[ RECORD 2 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id | 45035996273878548
projection_schema | public
projection_name | one_sort_b0
creation_time | 2019-06-17 13:59:03.777279-04
last_modified_time | 2019-06-17 13:59:03.777279-04
comment | The squared number of rows matching each sort key is about 159201 on average.
If you omit supplying an argument to EVALUATE_DELETE_PERFORMANCE, it evaluates all projections that you can access:
=> select evaluate_delete_performance();
evaluate_delete_performance
---------------------------------------------------------------------------
The following projections exhibit delete performance concerns:
"public"."one_sort_b0"
"public"."one_sort_b1"
See v_catalog.projection_delete_concerns for more details.
(1 row)
3 - GET_PROJECTION_SORT_ORDER
Returns the order of columns in a projection's ORDER BY clause.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
GET_PROJECTION_SORT_ORDER( '[[database.]schema.]projection' );
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.projection
- The target projection.
Privileges
Non-superuser: SELECT privilege on the anchor table
Examples
=> SELECT get_projection_sort_order ('store_orders_super');
get_projection_sort_order
--------------------------------------------------------------------------------------------
public.store_orders_super [Sort Cols: "order_no", "order_date", "shipper", "ship_date"]
(1 row)
4 - GET_PROJECTION_STATUS
Returns information relevant to the status of a projection:
-
The current K-safety status of the database
-
The number of nodes in the database
-
Whether the projection is segmented
-
The number and names of buddy projections
-
Whether the projection is safe
-
Whether the projection is up to date
-
Whether statistics have been computed for the projection
Use
GET_PROJECTION_STATUS
to monitor the progress of a projection data refresh.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
GET_PROJECTION_STATUS ( '[[database.]schema.]projection' );
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.projection
- The projection for which to display status.
Examples
=> SELECT GET_PROJECTION_STATUS('public.customer_dimension_site01');
GET_PROJECTION_STATUS
-----------------------------------------------------------------------------------------------
Current system K is 1.
# of Nodes: 4.
public.customer_dimension_site01 [Segmented: No] [Seg Cols: ] [K: 3] [public.customer_dimension_site04, public.customer_dimension_site03,
public.customer_dimension_site02]
[Safe: Yes] [UptoDate: Yes][Stats: Yes]
5 - GET_PROJECTIONS
Returns contextual and projection information about projections of the specified anchor table.
- Contextual information
-
Database K-safety
-
Number of database nodes
-
Number of projections for this table
-
- Projection data
- For each projection, specifies:
-
All buddy projections
-
Whether it is segmented
-
Whether it is safe
-
Whether it is up-to-date.
-
You can also use GET_PROJECTIONS
to monitor the progress of a projection data refresh.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
GET_PROJECTIONS ( '[[database.]schema-name.]table' )
Parameters
-
`[database.]schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- Anchor table of the projections to list.
Privileges
None
Examples
The following example gets information about projections for VMart table store.store_dimension
:
=> SELECT GET_PROJECTIONS('store.store_dimension');
-[ RECORD 1 ]---+
GET_PROJECTIONS | Current system K is 1.
# of Nodes: 3.
Table store.store_dimension has 2 projections.
Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
store.store_dimension_b1 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
store.store_dimension_b0 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
6 - PURGE_PROJECTION
Permanently removes deleted data from physical storage so disk space can be reused. You can purge historical data up to and including the Ancient History Mark epoch.
Caution
PURGE_PROJECTION
can use significant disk space while purging the data.
See
PURGE
for details about purge operations.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
PURGE_PROJECTION ( '[[database.]schema.]projection' )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.projection
- The projection to purge.
Privileges
-
Table owner
-
USAGE privilege on schema
Examples
The following example purges all historical data in projection tbl_p
that precedes the Ancient History Mark epoch.
=> CREATE TABLE tbl (x int, y int);
CREATE TABLE
=> INSERT INTO tbl VALUES(1,2);
OUTPUT
--------
1
(1 row)
=> INSERT INTO tbl VALUES(3,4);
OUTPUT
--------
1
(1 row)
dbadmin=> COMMIT;
COMMIT
=> CREATE PROJECTION tbl_p AS SELECT x FROM tbl UNSEGMENTED ALL NODES;
WARNING 4468: Projection <public.tbl_p> is not available for query processing.
Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
=> SELECT START_REFRESH();
START_REFRESH
----------------------------------------
Starting refresh background process.
=> DELETE FROM tbl WHERE x=1;
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT MAKE_AHM_NOW();
MAKE_AHM_NOW
-------------------------------
AHM set (New AHM Epoch: 9066)
(1 row)
=> SELECT PURGE_PROJECTION ('tbl_p');
PURGE_PROJECTION
-------------------
Projection purged
(1 row)
See also
7 - REFRESH
Synchronously refreshes one or more table projections in the foreground, and updates the PROJECTION_REFRESHES system table. If you run REFRESH with no arguments, it refreshes all projections that contain stale data.
To understand projection refreshing in detail, see Refreshing projections.
If a refresh would violate a table or schema disk quota, the operation fails. For more information, see Disk quotas.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
REFRESH ( [ '[[database.]schema.]table[,...]' ] )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- The anchor table of the projections to refresh. If you specify multiple tables, REFRESH attempts to refresh them in parallel. Such calls are part of the Database Designer deployment (and deployment script).
Returns
Note
If REFRESH does not refresh any projections, it returns a header string with no results.Column | Returns |
---|---|
Projection Name |
The projection targeted for refresh. |
Anchor Table |
The projection's associated anchor table. |
Status |
Projections' refresh status:
|
Refresh Method |
Method used to refresh the projection. |
Error Count |
Number of times a refresh failed for the projection. |
Duration (sec) |
How long (in seconds) the projection refresh ran. |
Privileges
- Superuser
-
Owner of the specified tables
Refresh methods
Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.
Vertica can perform incremental refreshes when the following conditions are met:
-
The table being refreshed is partitioned.
-
The table does not contain any unpartitioned data.
-
The operation is a full projection refresh (not a partition range projection refresh).
In an incremental refresh, the refresh operation first loads data from the partition with the highest range of keys. After refreshing this partition, Vertica begins to refresh the partition with next highest partition range. This process continues until all projection partitions are refreshed. While the refresh operation is in progress, projection partitions that have completed the refresh process become available to process query requests.
The method used to refresh a given projection is recorded in the REFRESH_METHOD column of the PROJECTION_REFRESHES system table.
Examples
The following example refreshes the projections in two tables:
=> SELECT REFRESH('t1, t2');
REFRESH
----------------------------------------------------------------------------------------
Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"public"."t1_p": [t1] [refreshed] [scratch] [0] [0]"public"."t2_p": [t2] [refreshed] [scratch] [0] [0]
In the following example, only the projection on one table was refreshed:
=> SELECT REFRESH('allow, public.deny, t');
REFRESH
----------------------------------------------------------------------------------------
Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "allow"] [] [1] [0]
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "public.deny"] [] [1] [0]
"public"."t_p1": [t] [refreshed] [scratch] [0] [0]
See also
8 - REFRESH_COLUMNS
Refreshes table columns that are defined with the constraint SET USING or DEFAULT USING. All refresh operations associated with a call to REFRESH_COLUMNS belong to the same transaction. Thus, all tables and columns specified by REFRESH_COLUMNS must be refreshed; otherwise, the entire operation is rolled back.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
REFRESH_COLUMNS ( 'table-list', '[column-list]'
[, '[refresh-mode]' [, min-partition-key, max-partition-key [, force-split] ]
)
Arguments
table-list
- A comma-delimited list of the tables to refresh:
[[
database
.]
schema.
]
table
[,...]
If you specify multiple tables,
refresh-mode
must be set to REBUILD. column-list
- A comma-delimited list of columns to refresh:
[[[
database
.]
schema.
]
table
.]
column
[,...]
or[[
database
.]
schema.
]
table
.
*
, where asterisk (*
) means to refresh all SET USING/DEFAULT USING columns in the table. For example:SELECT REFRESH_COLUMNS ('t1, t2', 't1.*, t2.b', 'REBUILD');
If
column-list
is set to an empty string (''
), REFRESH_COLUMNS refreshes all SET USING/DEFAULT USING columns in the specified tables.The following requirements apply:
-
All specified columns must have a SET USING or DEFAULT USING constraint.
-
If REFRESH_COLUMNS specifies multiple tables, all column names must be qualified by their table names. If the target tables span multiple schemas, all column names must be fully qualified by their schema and table names. For example:
SELECT REFRESH_COLUMNS ('t1, t2', 't1.a, t2.b', 'REBUILD');
If you specify a database, it must be the current database.
-
refresh-mode
- Specifies how to refresh SET USING columns:
-
UPDATE
(default): Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement. -
REBUILD
: Replaces all data in the specified columns. The rebuild operation is auto-committed.
If you specify multiple tables, you must explicitly specify REBUILD mode.
In both cases, REFRESH_COLUMNS returns an error if any SET USING column is defined as a primary or unique key in a table that enforces those constraints.
See REBUILD Mode Restrictions for limitations on using the REBUILD option.
-
min-partition-key
,max-partition-key
- Qualifies REBUILD mode, limiting the rebuild operation to one or more partitions. To specify a range of partitions,
max-partition-key
must be greater thanmin-partition-key
. To update one partition, the two arguments must be equal.The following requirements apply:
-
The function can specify only one table to refresh.
-
The table must be partitioned on the specified keys.
You can use these arguments to refresh columns with recently loaded data—that is, data in the latest partitions. Using this option regularly can significantly minimize the overhead otherwise incurred by rebuilding entire columns in a large table.
See Partition-based REBUILD below for details.
-
force-split
- Boolean, whether to split ROS containers if the range of partition keys spans multiple containers or part of a single container:
-
true
(default): Split ROS containers as needed. -
false
: Return with an error if ROS containers must be split to implement this operation.
-
Privileges
-
Schemas of queried and flattened tables: USAGE
-
Queried table: SELECT
-
Flattened table: SELECT, UPDATE
UPDATE versus REBUILD modes
In general, UPDATE mode is a better choice when changes to SET USING column data are confined to a relatively small number of rows. Use REBUILD mode when a significant amount of SET USING column data is stale and must be updated. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column—for example, to populate a SET USING column after adding it with ALTER TABLE...ADD COLUMN.
REBUILD mode restrictions
If you call REFRESH_COLUMNS on a SET USING column and specify the refresh mode as REBUILD, Vertica returns an error if the column is specified in any of the following:
-
Table's partition key
-
Unsegmented projection
-
Projection with expressions, or any live aggregate projection that invokes a user-defined transform function (UDTF)
-
Sort order or segmentation of any projection
-
Any projection that omits an anchor table column that is referenced in the column's SET USING expression
-
GROUPED clause of any projection
Partition-based REBUILD operations
If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode, by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For example, table public.orderFact
is defined with SET USING column cust_name
. This table is partitioned on column order_date
, where the partition clause invokes Vertica function CALENDAR_HIERARCHY_DAY. Thus, you can call REFRESH_COLUMNS on specific time-delimited partitions of this table—in this case, on orders over the last two months:
=> SELECT REFRESH_COLUMNS ('public.orderFact',
'cust_name',
'REBUILD',
TO_CHAR(ADD_MONTHS(current_date, -2),'YYYY-MM')||'-01',
TO_CHAR(LAST_DAY(ADD_MONTHS(current_date, -1))));
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
Rewriting SET USING queries
When you call REFRESH_COLUMNS on a flattened table's SET USING (or DEFAULT USING) column, it executes the SET USING query by joining the target and source tables. By default, the source table is always the inner table of the join. In most cases, cardinality of the source table is less than the target table, so REFRESH_COLUMNS executes the join efficiently.
Occasionally—notably, when you call REFRESH_COLUMNS on a partitioned table—the source table can be larger than the target table. In this case, performance of the join operation can be suboptimal.
You can address this issue by enabling configuration parameter RewriteQueryForLargeDim. When enabled (1), Vertica rewrites the query, by reversing the inner and outer join between the target and source tables.
Important
Enable this parameter only if the SET USING source data is in a table that is larger than the target table. If the source data is in a table smaller than the target table, then enabling RewriteQueryForLargeDim can adversely affect refresh performance.Examples
9 - START_REFRESH
Refreshes projections in the current schema with the latest data of their respective anchor tables. START_REFRESH runs asynchronously in the background, and updates the PROJECTION_REFRESHES system table. This function has no effect if a refresh is already running.
To refresh only projections of a specific table, use REFRESH. When you deploy a design through Database Designer, it automatically refreshes its projections.
If a refresh would violate a table or schema disk quota, the operation fails. For more information, see Disk quotas.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
START_REFRESH()
Privileges
None
Requirements
All nodes must be up.
Refresh methods
Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.
Vertica can perform incremental refreshes when the following conditions are met:
-
The table being refreshed is partitioned.
-
The table does not contain any unpartitioned data.
-
The operation is a full projection refresh (not a partition range projection refresh).
In an incremental refresh, the refresh operation first loads data from the partition with the highest range of keys. After refreshing this partition, Vertica begins to refresh the partition with next highest partition range. This process continues until all projection partitions are refreshed. While the refresh operation is in progress, projection partitions that have completed the refresh process become available to process query requests.
The method used to refresh a given projection is recorded in the REFRESH_METHOD column of the PROJECTION_REFRESHES system table.
Examples
=> SELECT START_REFRESH();
START_REFRESH
----------------------------------------
Starting refresh background process.
(1 row)