DEFAULT versus SET USING
Columns in a flattened table can query other tables with constraints DEFAULT and SET USING. In both cases, changes in the queried tables are not automatically propagated to the flattened table. The two constraints differ as follows:
DEFAULT columns
Vertica typically executes DEFAULT queries on new rows when they are added to the flattened table, through load operations such as INSERT and COPY, or when you create or alter a table with a new column that specifies a DEFAULT expression. In all cases, values in existing rows, including other columns with DEFAULT expressions, remain unchanged.
To refresh a column's default values, you must explicitly call UPDATE on that column as follows:
=> UPDATE table-name SET column-name=DEFAULT;
SET USING columns
Vertica executes SET USING queries only when you invoke the function REFRESH_COLUMNS. Load operations set SET USING columns in new rows to NULL. After the load, you must call REFRESH_COLUMNS to populate these columns from the queried tables. This can be useful in two ways: you can defer the overhead of updating the flattened table to any time that is convenient; and you can repeatedly query source tables for new data.
SET USING is especially useful for large flattened tables that reference data from multiple dimension tables. Often, only a small subset of SET USING columns are subject to change, and queries on the flattened table do not always require up-to-the-minute data. Given this scenario, you can refresh table content at regular intervals, or only during off-peak hours. One or both of these strategies can minimize overhead, and facilitate performance when querying large data sets.
You can control the scope of the refresh operation by calling REFRESH _COLUMNS in one of two modes:
-
UPDATE : Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement.
-
REBUILD: Replaces all data in the specified columns. The rebuild operation is auto-committed.
If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For details, see Partition-based REBUILD Operations.
Examples
The following UPDATE statement updates the custDim
table:
=> UPDATE custDim SET name='Roz', gender='F' WHERE cid=2;
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
Changes are not propagated to flattened table orderFact
, which includes SET USING and DEFAULT columns cust_name
and cust_gender
, respectively:
=> SELECT * FROM custDim ORDER BY cid;
cid | name | age | gender
-----+-------+-----+--------
1 | Alice | 25 | F
2 | Roz | 30 | F
3 | Eva | 32 | F
(3 rows)
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Boz | M | -50.00
500 | 2018-12-31 | 2 | Boz | M | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
(5 rows)
The following INSERT statement invokes the cust_gender
column's DEFAULT query and sets that column to F
. The load operation does not invoke the cust_name
column's SET USING query, so cust_name
is set to null:
=> INSERT INTO orderFact(order_id, cid, amount) VALUES(500, 3, 750);
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Boz | M | -50.00
500 | 2018-12-31 | 2 | Boz | M | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
500 | 2018-12-31 | 3 | | F | 750.00
(6 rows)
To update the values in cust_name
, invoke its SET USING query by calling REFRESH_COLUMNS. REFRESH_COLUMNS executes cust_name
's SET USING query: it queries the name
column in table custDim
and updates cust_name
with the following values:
-
Sets
cust_name
in the new row toEva
. -
Returns updated values for
cid=2
, and changesBoz
toRoz
.
=> SELECT REFRESH_COLUMNS ('orderFact','');
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Roz | M | -50.00
500 | 2018-12-31 | 2 | Roz | M | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
500 | 2018-12-31 | 3 | Eva | F | 750.00
(6 rows)
REFRESH_COLUMNS only affects the values in column cust_name
. Values in column gender
are unchanged, so settings for rows where cid=2
(Roz
) remain set to M
. To repopulate orderFact.cust_gender
with default values from custDim.gender
, call UPDATE on orderFact
:
=> UPDATE orderFact SET cust_gender=DEFAULT WHERE cust_name='Roz';
OUTPUT
--------
2
(1 row)
=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Roz | F | -50.00
500 | 2018-12-31 | 2 | Roz | F | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
500 | 2018-12-31 | 3 | Eva | F | 750.00
(6 rows)