This section contains projection management functions specific to Vertica.
This is the multi-page printable view of this section. Click here to print.
Projection management functions
- 1: CLEAR_PROJECTION_REFRESHES
- 2: EVALUATE_DELETE_PERFORMANCE
- 3: GET_PROJECTION_SORT_ORDER
- 4: GET_PROJECTION_STATUS
- 5: GET_PROJECTIONS
- 6: PURGE_PROJECTION
- 7: REFRESH
- 8: REFRESH_COLUMNS
- 9: START_REFRESH
1 - CLEAR_PROJECTION_REFRESHES
Clears information projection refresh history from system table PROJECTION_REFRESHES.
System table PROJECTION_REFRESHES records information about refresh operations, successful and unsuccessful. PROJECTION_REFRESHES retains projection refresh data until one of the following events occurs:
-
Another refresh operation starts on a given projection.
-
CLEAR_PROJECTION_REFRESHES is called and clears data on all projections.
-
The table's storage quota is exceeded.
CLEAR_PROJECTION_REFRESHES checks PROJECTION_REFRESHES Boolean column IS_EXECUTING to determine whether refresh operations are still running or are complete. The function only removes information for refresh operations that are complete.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
CLEAR_PROJECTION_REFRESHES()
Privileges
Superuser
Examples
=> SELECT CLEAR_PROJECTION_REFRESHES();
CLEAR_PROJECTION_REFRESHES
----------------------------
CLEAR
(1 row)
See also
2 - EVALUATE_DELETE_PERFORMANCE
Evaluates projections for potential DELETE and UPDATE performance issues. If Vertica finds any issues, it issues a warning message. When evaluating multiple projections, EVALUATE_DELETE_PERFORMANCE returns up to ten projections with issues, and the name of a table that lists all issues that it found.
Note
EVALUATE_DELETE_PERFORMANCE returns messages that specifically reference delete performance. Keep in mind, however, that delete and update operations benefit equally from the same optimizations.For information on resolving delete and update performance issues, see Optimizing DELETE and UPDATE.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
EVALUATE_DELETE_PERFORMANCE ( ['[[database.]schema.]scope'] )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.scope
- Specifies the projections to evaluate, one of the following:
-
[
table
.]
projection
Evaluateprojection
. For example:SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact.store_orders_fact_b1');
-
*
table*
Specifies to evaluate all projections oftable
. For example:SELECT EVALUATE_DELETE_PERFORMANCE('store.store_orders_fact');
If you supply no arguments, EVALUATE_DELETE_PERFORMANCE evaluates all projections that you can access. Depending on the size of your database, this can incur considerable overhead.
-
Privileges
Non-superuser: SELECT privilege on the anchor table
Examples
EVALUATE_DELETE_PERFORMANCE evaluates all projections of table exampl
e for potential DELETE and UPDATE performance issues.
=> create table example (A int, B int,C int);
CREATE TABLE
=> create projection one_sort (A,B,C) as (select A,B,C from example) order by A;
CREATE PROJECTION
=> create projection two_sort (A,B,C) as (select A,B,C from example) order by A,B;
CREATE PROJECTION
=> select evaluate_delete_performance('example');
evaluate_delete_performance
---------------------------------------------------
No projection delete performance concerns found.
(1 row)
The previous example show that the two projections one_sort and two_sort have no inherent structural issues that might cause poor DELETE performance. However, the data contained within the projection can create potential delete issues if the sorted columns do not uniquely identify a row or small number of rows.
In the following example, Perl is used to populate the table with data using a nested series of loops:
-
The inner loop populates column
C.
-
The middle loop populates column
B
. -
The outer loop populates column
A
.
The result is column A
contains only three distinct values (0, 1, and 2), while column B
slowly varies between 20 and 0 and column C
changes in each row:
=> \! perl -e 'for ($i=0; $i<3; $i++) { for ($j=0; $j<21; $j++) { for ($k=0; $k<19; $k++) { printf "%d,%d,%d\n", $i,$j,$k;}}}' | /opt/vertica/bin/vsql -c "copy example from stdin delimiter ',' direct;"
Password:
=> select * from example;
A | B | C
---+----+----
0 | 20 | 18
0 | 20 | 17
0 | 20 | 16
0 | 20 | 15
0 | 20 | 14
0 | 20 | 13
0 | 20 | 12
0 | 20 | 11
0 | 20 | 10
0 | 20 | 9
0 | 20 | 8
0 | 20 | 7
0 | 20 | 6
0 | 20 | 5
0 | 20 | 4
0 | 20 | 3
0 | 20 | 2
0 | 20 | 1
0 | 20 | 0
0 | 19 | 18
...
2 | 1 | 0
2 | 0 | 18
2 | 0 | 17
2 | 0 | 16
2 | 0 | 15
2 | 0 | 14
2 | 0 | 13
2 | 0 | 12
2 | 0 | 11
2 | 0 | 10
2 | 0 | 9
2 | 0 | 8
2 | 0 | 7
2 | 0 | 6
2 | 0 | 5
2 | 0 | 4
2 | 0 | 3
2 | 0 | 2
2 | 0 | 1
2 | 0 | 0
=> SELECT COUNT (*) FROM example;
COUNT
-------
1197
(1 row)
=> SELECT COUNT (DISTINCT A) FROM example;
COUNT
-------
3
(1 row)
EVALUATE_DELETE_PERFORMANCE is run against the projections again to determine whether the data within the projections causes any potential DELETE performance issues. Projection one_sort
has potential delete issues as it only sorts on column A which has few distinct values. Each value in the sort column corresponds to many rows in the projection, which can adversely impact DELETE performance. In contrast, projection two_sort
is sorted on columns A
and B
, where each combination of values in the two sort columns identifies just a few rows, so deletes can be performed faster:
=> select evaluate_delete_performance('example');
evaluate_delete_performance
---------------------------------------------------
The following projections exhibit delete performance concerns:
"public"."one_sort_b1"
"public"."one_sort_b0"
See v_catalog.projection_delete_concerns for more details.
=> \x
Expanded display is on.
dbadmin=> select * from projection_delete_concerns;
-[ RECORD 1 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id | 45035996273878562
projection_schema | public
projection_name | one_sort_b1
creation_time | 2019-06-17 13:59:03.777085-04
last_modified_time | 2019-06-17 14:00:27.702223-04
comment | The squared number of rows matching each sort key is about 159201 on average.
-[ RECORD 2 ]------+------------------------------------------------------------------------------------------------------------------------------------------------------------
projection_id | 45035996273878548
projection_schema | public
projection_name | one_sort_b0
creation_time | 2019-06-17 13:59:03.777279-04
last_modified_time | 2019-06-17 13:59:03.777279-04
comment | The squared number of rows matching each sort key is about 159201 on average.
If you omit supplying an argument to EVALUATE_DELETE_PERFORMANCE, it evaluates all projections that you can access:
=> select evaluate_delete_performance();
evaluate_delete_performance
---------------------------------------------------------------------------
The following projections exhibit delete performance concerns:
"public"."one_sort_b0"
"public"."one_sort_b1"
See v_catalog.projection_delete_concerns for more details.
(1 row)
3 - GET_PROJECTION_SORT_ORDER
Returns the order of columns in a projection's ORDER BY clause.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
GET_PROJECTION_SORT_ORDER( '[[database.]schema.]projection' );
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.projection
- The target projection.
Privileges
Non-superuser: SELECT privilege on the anchor table
Examples
=> SELECT get_projection_sort_order ('store_orders_super');
get_projection_sort_order
--------------------------------------------------------------------------------------------
public.store_orders_super [Sort Cols: "order_no", "order_date", "shipper", "ship_date"]
(1 row)
4 - GET_PROJECTION_STATUS
Returns information relevant to the status of a projection:
-
The current K-safety status of the database
-
The number of nodes in the database
-
Whether the projection is segmented
-
The number and names of buddy projections
-
Whether the projection is safe
-
Whether the projection is up to date
-
Whether statistics have been computed for the projection
Use
[GET_PROJECTION_STATUS](#)
to monitor the progress of a projection data refresh.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
GET_PROJECTION_STATUS ( '[[database.]schema.]projection' );
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.projection
- The projection for which to display status.
Examples
=> SELECT GET_PROJECTION_STATUS('public.customer_dimension_site01');
GET_PROJECTION_STATUS
-----------------------------------------------------------------------------------------------
Current system K is 1.
# of Nodes: 4.
public.customer_dimension_site01 [Segmented: No] [Seg Cols: ] [K: 3] [public.customer_dimension_site04, public.customer_dimension_site03,
public.customer_dimension_site02]
[Safe: Yes] [UptoDate: Yes][Stats: Yes]
5 - GET_PROJECTIONS
Returns contextual and projection information about projections of the specified anchor table.
- Contextual information
-
Database K-safety
-
Number of database nodes
-
Number of projections for this table
-
- Projection data
- For each projection, specifies:
-
All buddy projections
-
Whether it is segmented
-
Whether it is safe
-
Whether it is up-to-date.
-
You can also use GET_PROJECTIONS
to monitor the progress of a projection data refresh.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
GET_PROJECTIONS ( '[[database.]schema-name.]table' )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- Anchor table of the projections to list.
Privileges
None
Examples
The following example gets information about projections for VMart table store.store_dimension
:
=> SELECT GET_PROJECTIONS('store.store_dimension');
-[ RECORD 1 ]---+
GET_PROJECTIONS | Current system K is 1.
# of Nodes: 3.
Table store.store_dimension has 2 projections.
Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
store.store_dimension_b1 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
store.store_dimension_b0 [Segmented: Yes] [Seg Cols: "store.store_dimension.store_key"] [K: 1] [store.store_dimension_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
6 - PURGE_PROJECTION
Permanently removes deleted data from physical storage so disk space can be reused. You can purge historical data up to and including the Ancient History Mark epoch.
Caution
PURGE_PROJECTION
can use significant disk space while purging the data.
See
PURGE
for details about purge operations.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
PURGE_PROJECTION ( '[[database.]schema.]projection' )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.projection
- The projection to purge.
Privileges
-
Table owner
-
USAGE privilege on schema
Examples
The following example purges all historical data in projection tbl_p
that precedes the Ancient History Mark epoch.
=> CREATE TABLE tbl (x int, y int);
CREATE TABLE
=> INSERT INTO tbl VALUES(1,2);
OUTPUT
--------
1
(1 row)
=> INSERT INTO tbl VALUES(3,4);
OUTPUT
--------
1
(1 row)
dbadmin=> COMMIT;
COMMIT
=> CREATE PROJECTION tbl_p AS SELECT x FROM tbl UNSEGMENTED ALL NODES;
WARNING 4468: Projection <public.tbl_p> is not available for query processing.
Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
=> SELECT START_REFRESH();
START_REFRESH
----------------------------------------
Starting refresh background process.
=> DELETE FROM tbl WHERE x=1;
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT MAKE_AHM_NOW();
MAKE_AHM_NOW
-------------------------------
AHM set (New AHM Epoch: 9066)
(1 row)
=> SELECT PURGE_PROJECTION ('tbl_p');
PURGE_PROJECTION
-------------------
Projection purged
(1 row)
See also
7 - REFRESH
Synchronously refreshes one or more table projections in the foreground, and updates system table
PROJECTION_REFRESHES
. If you run REFRESH
with no arguments, it refreshes all projections that contain stale data.
To understand projection refresh in detail, go to Refreshing projections.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
REFRESH ( [ '[[database.]schema.]table-name[,...]' ] )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table-name
- The anchor table of the projections to refresh. If you specify multiple tables,
REFRESH
attempts to refresh them in parallel. Such calls are part of the Database Designer deployment (and deployment script).
Returns
Note
IfREFRESH
does not refresh any projections, it returns a header string with no results.
Column... | Returns... |
---|---|
Projection Name |
The projection targeted for refresh. |
Anchor Table |
The projection's associated anchor table. |
Status |
Projections' refresh status:
|
Refresh Method |
Method used to refresh the projection. |
Error Count |
Number of times a refresh failed for the projection. |
Duration (sec) |
How long (in seconds) the projection refresh ran. |
Privileges
- Superuser
-
Owner of the specified tables
Refresh methods
Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.
To determine the method used to refresh a given projection, query REFRESH_METHOD
from system table
PROJECTION_REFRESHES
.
Examples
The following example refreshes the projections in tables t1
and t2
:
=> SELECT REFRESH('t1, t2');
REFRESH
----------------------------------------------------------------------------------------
Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"public"."t1_p": [t1] [refreshed] [scratch] [0] [0]"public"."t2_p": [t2] [refreshed] [scratch] [0] [0]
This next example shows that only the projection on table t
was refreshed:
=> SELECT REFRESH('allow, public.deny, t');
REFRESH
----------------------------------------------------------------------------------------
Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "allow"] [] [1] [0]
"n/a"."n/a": [n/a] [failed: insufficient permissions on table "public.deny"] [] [1] [0]
"public"."t_p1": [t] [refreshed] [scratch] [0] [0]
See also
8 - REFRESH_COLUMNS
Refreshes table columns that are defined with the constraint SET USING or DEFAULT USING. All refresh operations associated with a call to REFRESH_COLUMNS belong to the same transaction. Thus, all tables and columns specified by REFRESH_COLUMNS must be refreshed; otherwise, the entire operation is rolled back.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
REFRESH_COLUMNS ( 'table-list', '[column-list]'
[, '[refresh-mode ]' [, min-partition-key, max-partition-key] ]
)
Parameters
table-list
- A comma-delimited list of the tables to refresh:
[[
database
.]
schema.
]
table
[,...]
Important
If you specify multiple tables, parameterrefresh-mode
must be set to REBUILD. column-list
- A comma-delimited list of columns to refresh, specified as follows:
-
[[[
database
.]
schema.
]
table
.]
column
[,...]
-
[[
database
.]
schema.
]
table
.
*
where asterisk (
*
) specifies to refresh all SET USING/DEFAULT USING columns intable
. For example:SELECT REFRESH_COLUMNS ('t1, t2', 't1.*, t2.b', 'REBUILD');
If
column-list
is set to an empty string (''
), REFRESH_COLUMNS refreshes all SET USING/DEFAULT USING columns in the specified tables.The following requirements apply:
-
All specified columns must have a SET USING or DEFAULT USING constraint.
-
If REFRESH_COLUMNS specifies multiple tables, all column names must be qualified by their table names. If the target tables span multiple schemas, all column names must be fully qualified by their schema and table names. For example:
SELECT REFRESH_COLUMNS ('t1, t2', 't1.a, t2.b', 'REBUILD');
If you specify a database, it must be the current database.
-
refresh-mode
- Specifies how to refresh SET USING columns:
-
UPDATE : Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement.
-
REBUILD: Replaces all data in the specified columns. The rebuild operation is auto-committed.
If set to an empty string or omitted, REFRESH_COLUMNS executes in UPDATE mode. If you specify multiple tables, you must explicitly specify REBUILD mode.
In both cases, REFRESH_COLUMNS returns an error if any SET USING column is defined as a primary or unique key in a table that enforces those constraints.
See REBUILD Mode Restrictions for limitations on using the REBUILD option.
-
min-partition-key
max-partition-key
- Qualifies REBUILD mode, limiting the rebuild operation to one or more partitions. To specify a range of partitions,
max-partition-key
must be greater thanmin-partition-key
. To update one partition, the two arguments must be equal.The following requirements apply:
-
The function can specify only one table to refresh.
-
The table must be partitioned on the specified keys.
You can use these arguments to refresh columns with recently loaded data—that is, data in the latest partitions. Using this option regularly can significantly minimize the overhead otherwise incurred by rebuilding entire columns in a large table.
See Partition-based REBUILD below for details.
-
Privileges
-
Schemas of queried and flattened tables: USAGE
-
Queried table: SELECT
-
Flattened table: SELECT, UPDATE
UPDATE versus REBUILD modes
In general, UPDATE mode is a better choice when changes to SET USING column data are confined to a relatively small number of rows. Use REBUILD mode when a significant amount of SET USING column data is stale and must be updated. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column—for example, to populate a SET USING column after adding it with ALTER TABLE...ADD COLUMN.
REBUILD mode restrictions
If you call REFRESH_COLUMNS on a SET USING column and specify the refresh mode as REBUILD, Vertica returns an error if the column is specified in any of the following:
-
Table's partition key
-
Unsegmented projection
-
Projection with expressions, or any live aggregate projection that invokes a user-defined transform function (UDTF)
-
Sort order or segmentation of any projection
-
Any projection that omits an anchor table column that is referenced in the column's SET USING expression
-
GROUPED clause of any projection
Partition-based REBUILD operations
If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode, by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For example, table public.orderFact
is defined with SET USING column cust_name
. This table is partitioned on column order_date
, where the partition clause invokes Vertica function CALENDAR_HIERARCHY_DAY. Thus, you can call REFRESH_COLUMNS on specific time-delimited partitions of this table—in this case, on orders over the last two months:
=> SELECT REFRESH_COLUMNS ('public.orderFact',
'cust_name',
'REBUILD',
TO_CHAR(ADD_MONTHS(current_date, -2),'YYYY-MM')||'-01',
TO_CHAR(LAST_DAY(ADD_MONTHS(current_date, -1))));
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
Rewriting SET USING queries
When you call REFRESH_COLUMNS on a flattened table's SET USING (or DEFAULT USING) column, it executes the SET USING query by joining the target and source tables. By default, the source table is always the inner table of the join. In most cases, cardinality of the source table is less than the target table, so REFRESH_COLUMNS executes the join efficiently.
Occasionally—notably, when you call REFRESH_COLUMNS on a partitioned table—the source table can be larger than the target table. In this case, performance of the join operation can be suboptimal.
You can address this issue by enabling configuration parameter RewriteQueryForLargeDim. When enabled (1), Vertica rewrites the query, by reversing the inner and outer join between the target and source tables.
Important
Enable this parameter only if the SET USING source data is in a table that is larger than the target table. If the source data is in a table smaller than the target table, then enabling RewriteQueryForLargeDim can adversely affect refresh performance.Examples
9 - START_REFRESH
Refreshes projections in the current schema with the latest data of their respective anchor tables. START_REFRESH
runs asynchronously in the background, and updates system table
PROJECTION_REFRESHES
. This function has no effect if a refresh is already running.
To refresh only projections of a specific table, use
REFRESH
. When you deploy a design through Database Designer, it automatically refreshes its projections.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
START_REFRESH()
Privileges
None
Requirements
All nodes must be up.
Refresh methods
Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.
To determine the method used to refresh a given projection, query REFRESH_METHOD
from system table
PROJECTION_REFRESHES
.
Examples
=> SELECT START_REFRESH();
START_REFRESH
----------------------------------------
Starting refresh background process.
(1 row)