REFRESH_COLUMNS
Refreshes table columns that are defined with the constraint SET USING or DEFAULT USING. All refresh operations associated with a call to REFRESH_COLUMNS belong to the same transaction. Thus, all tables and columns specified by REFRESH_COLUMNS must be refreshed; otherwise, the entire operation is rolled back.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
REFRESH_COLUMNS ( 'table-list', '[column-list]'
[, '[refresh-mode]' [, min-partition-key, max-partition-key [, force-split] ]
)
Arguments
table-list
- A comma-delimited list of the tables to refresh:
[[
database
.]
schema.
]
table
[,...]
If you specify multiple tables,
refresh-mode
must be set to REBUILD. column-list
- A comma-delimited list of columns to refresh:
[[[
database
.]
schema.
]
table
.]
column
[,...]
or[[
database
.]
schema.
]
table
.
*
, where asterisk (*
) means to refresh all SET USING/DEFAULT USING columns in the table. For example:SELECT REFRESH_COLUMNS ('t1, t2', 't1.*, t2.b', 'REBUILD');
If
column-list
is set to an empty string (''
), REFRESH_COLUMNS refreshes all SET USING/DEFAULT USING columns in the specified tables.The following requirements apply:
-
All specified columns must have a SET USING or DEFAULT USING constraint.
-
If REFRESH_COLUMNS specifies multiple tables, all column names must be qualified by their table names. If the target tables span multiple schemas, all column names must be fully qualified by their schema and table names. For example:
SELECT REFRESH_COLUMNS ('t1, t2', 't1.a, t2.b', 'REBUILD');
If you specify a database, it must be the current database.
-
refresh-mode
- Specifies how to refresh SET USING columns:
-
UPDATE
(default): Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement. -
REBUILD
: Replaces all data in the specified columns. The rebuild operation is auto-committed.
If you specify multiple tables, you must explicitly specify REBUILD mode.
In both cases, REFRESH_COLUMNS returns an error if any SET USING column is defined as a primary or unique key in a table that enforces those constraints.
See REBUILD Mode Restrictions for limitations on using the REBUILD option.
-
min-partition-key
,max-partition-key
- Qualifies REBUILD mode, limiting the rebuild operation to one or more partitions. To specify a range of partitions,
max-partition-key
must be greater thanmin-partition-key
. To update one partition, the two arguments must be equal.The following requirements apply:
-
The function can specify only one table to refresh.
-
The table must be partitioned on the specified keys.
You can use these arguments to refresh columns with recently loaded data—that is, data in the latest partitions. Using this option regularly can significantly minimize the overhead otherwise incurred by rebuilding entire columns in a large table.
See Partition-based REBUILD below for details.
-
force-split
- Boolean, whether to split ROS containers if the range of partition keys spans multiple containers or part of a single container:
-
true
(default): Split ROS containers as needed. -
false
: Return with an error if ROS containers must be split to implement this operation.
-
Privileges
-
Schemas of queried and flattened tables: USAGE
-
Queried table: SELECT
-
Flattened table: SELECT, UPDATE
UPDATE versus REBUILD modes
In general, UPDATE mode is a better choice when changes to SET USING column data are confined to a relatively small number of rows. Use REBUILD mode when a significant amount of SET USING column data is stale and must be updated. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column—for example, to populate a SET USING column after adding it with ALTER TABLE...ADD COLUMN.
REBUILD mode restrictions
If you call REFRESH_COLUMNS on a SET USING column and specify the refresh mode as REBUILD, Vertica returns an error if the column is specified in any of the following:
-
Table's partition key
-
Unsegmented projection
-
Projection with expressions, or any live aggregate projection that invokes a user-defined transform function (UDTF)
-
Sort order or segmentation of any projection
-
Any projection that omits an anchor table column that is referenced in the column's SET USING expression
-
GROUPED clause of any projection
Partition-based REBUILD operations
If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode, by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For example, table public.orderFact
is defined with SET USING column cust_name
. This table is partitioned on column order_date
, where the partition clause invokes Vertica function CALENDAR_HIERARCHY_DAY. Thus, you can call REFRESH_COLUMNS on specific time-delimited partitions of this table—in this case, on orders over the last two months:
=> SELECT REFRESH_COLUMNS ('public.orderFact',
'cust_name',
'REBUILD',
TO_CHAR(ADD_MONTHS(current_date, -2),'YYYY-MM')||'-01',
TO_CHAR(LAST_DAY(ADD_MONTHS(current_date, -1))));
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
Rewriting SET USING queries
When you call REFRESH_COLUMNS on a flattened table's SET USING (or DEFAULT USING) column, it executes the SET USING query by joining the target and source tables. By default, the source table is always the inner table of the join. In most cases, cardinality of the source table is less than the target table, so REFRESH_COLUMNS executes the join efficiently.
Occasionally—notably, when you call REFRESH_COLUMNS on a partitioned table—the source table can be larger than the target table. In this case, performance of the join operation can be suboptimal.
You can address this issue by enabling configuration parameter RewriteQueryForLargeDim. When enabled (1), Vertica rewrites the query, by reversing the inner and outer join between the target and source tables.