This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Flattened tables
Highly normalized database design often uses a star or snowflake schema model, comprising multiple large fact tables and many smaller dimension tables.
Highly normalized database design often uses a star or snowflake schema model, comprising multiple large fact tables and many smaller dimension tables. Queries typically involve joins between a large fact table and multiple dimension tables. Depending on the number of tables and quantity of data that are joined, these queries can incur significant overhead.
To avoid this problem, some users create wide tables that combine all fact and dimension table columns that their queries require. These tables can dramatically speed up query execution. However, maintaining redundant sets of normalized and denormalized data has its own administrative costs.
Denormalized, or flattened, tables, can minimize these problems. Flattened tables can include columns that get their values by querying other tables. Operations on the source tables and flattened table are decoupled; changes in one are not automatically propagated to the other. This minimizes the overhead that is otherwise typical of denormalized tables.
A flattened table defines derived columns with one or both of the following column constraint clauses:
-
DEFAULT
query-expression
sets the column value when the column is created with CREATE TABLE
or ALTER TABLE...ADD COLUMN
.
-
SET USING
query-expression
sets the column value when the function
REFRESH_COLUMNS
is invoked.
In both cases, query-expression
must return only one row and column value, or none. If the query returns no rows, the column value is set to NULL.
Like other tables defined in Vertica, you can add and remove DEFAULT and SET USING columns from a flattened table at any time. Vertica enforces dependencies between a flattened table and the tables that it queries. For details, see Modifying SET USING and DEFAULT columns.
1 - Flattened table example
In the following example, columns orderFact.cust_name and orderFact.cust_gender are defined as SET USING and DEFAULT columns, respectively.
In the following example, columns orderFact.cust_name
and orderFact.cust_gender
are defined as SET USING and DEFAULT columns, respectively. Both columns obtain their values by querying table custDim
:
=> CREATE TABLE public.custDim(
cid int PRIMARY KEY NOT NULL,
name varchar(20),
age int,
gender varchar(1)
);
=> CREATE TABLE public.orderFact(
order_id int PRIMARY KEY NOT NULL,
order_date timestamp DEFAULT CURRENT_TIMESTAMP NOT NULL,
cid int REFERENCES public.custDim(cid),
cust_name varchar(20) SET USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid)),
cust_gender varchar(1) DEFAULT (SELECT gender FROM public.custDim WHERE (custDim.cid = orderFact.cid)),
amount numeric(12,2)
)
PARTITION BY order_date::DATE GROUP BY CALENDAR_HIERARCHY_DAY(order_date::DATE, 2, 2);
The following INSERT commands load data into both tables:
=> INSERT INTO custDim VALUES(1, 'Alice', 25, 'F');
=> INSERT INTO custDim VALUES(2, 'Boz', 30, 'M');
=> INSERT INTO custDim VALUES(3, 'Eva', 32, 'F');
=>
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(100, 1, 15);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(200, 1, 1000);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(300, 2, -50);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(400, 3, 100);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(500, 2, 200);
=> COMMIT;
When you query the tables, Vertica returns the following result sets:
=> SELECT * FROM custDim;
cid | name | age | gender
-----+-------+-----+--------
1 | Alice | 25 | F
2 | Boz | 30 | M
3 | Eva | 32 | F
(3 rows)
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | | F | 15.00
200 | 2018-12-31 | 1 | | F | 1000.00
300 | 2018-12-31 | 2 | | M | -50.00
500 | 2018-12-31 | 2 | | M | 200.00
400 | 2018-12-31 | 3 | | F | 100.00
(5 rows)
Vertica automatically populates the DEFAULT column orderFact.cust_gender
, but the SET USING column orderFact.cust_name
remains NULL. You can automatically populate this column by calling the function REFRESH_COLUMNS on flattened table orderFact. This function invokes the SET USING query for column orderFact.cust_name
and populates the column from the result set:
=> SELECT REFRESH_COLUMNS('orderFact', 'cust_name', 'REBUILD');
REFRESH_COLUMNS
-------------------------------
refresh_columns completed
(1 row)
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Boz | M | -50.00
500 | 2018-12-31 | 2 | Boz | M | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
(5 rows)
2 - Creating flattened tables
A flattened table is typically a fact table where one or more columns query other tables for their values, through DEFAULT or SET USING constraints.
A flattened table is typically a fact table where one or more columns query other tables for their values, through DEFAULT or SET USING constraints. DEFAULT and SET USING constraints can be used for columns of all data types. Like other columns, you can set these constraints when you create the flattened table, or any time thereafter by modifying the table DDL:
In all cases, the expressions that you set for these constraints are stored in the system table COLUMNS, in columns COLUMN_DEFAULT and COLUMN_SET_USING.
Supported expressions
DEFAULT and SET USING generally support the same expressions. These include:
-
Queries
-
Other columns in the same table
-
Literals (constants)
-
All operators supported by Vertica
-
The following categories of functions:
For more information about DEFAULT and SET USING expressions, including restrictions, see Defining column values.
3 - Required privileges
The following operations on flattened table require privileges as shown:.
The following operations on flattened table require privileges as shown:
Operation |
Object |
Privileges |
Retrieve data from a flattened table. |
Schema |
USAGE |
Flattened table |
SELECT |
Add SET USING or DEFAULT columns to a table. |
Schemas (queried/flattened tables) |
USAGE |
Queried tables |
SELECT |
Target table |
CREATE |
INSERT data on a flattened table with SET USING and/or DEFAULT columns. |
Schemas (queried/flattened tables) |
USAGE |
Queried tables |
SELECT |
Flattened table |
INSERT |
Run REFRESH_COLUMNS on a flattened table. |
Schemas (queried/flattened tables) |
USAGE |
Queried tables |
SELECT |
Flattened table |
SELECT, UPDATE |
4 - DEFAULT versus SET USING
Columns in a flattened table can query other tables with constraints DEFAULT and SET USING.
Columns in a flattened table can query other tables with constraints DEFAULT and SET USING. In both cases, changes in the queried tables are not automatically propagated to the flattened table. The two constraints differ as follows:
DEFAULT columns
Vertica typically executes DEFAULT queries on new rows when they are added to the flattened table, through load operations such as INSERT and COPY, or when you create or alter a table with a new column that specifies a DEFAULT expression. In all cases, values in existing rows, including other columns with DEFAULT expressions, remain unchanged.
To refresh a column's default values, you must explicitly call UPDATE on that column as follows:
=> UPDATE table-name SET column-name=DEFAULT;
SET USING columns
Vertica executes SET USING queries only when you invoke the function REFRESH_COLUMNS. Load operations set SET USING columns in new rows to NULL. After the load, you must call REFRESH_COLUMNS to populate these columns from the queried tables. This can be useful in two ways: you can defer the overhead of updating the flattened table to any time that is convenient; and you can repeatedly query source tables for new data.
SET USING is especially useful for large flattened tables that reference data from multiple dimension tables. Often, only a small subset of SET USING columns are subject to change, and queries on the flattened table do not always require up-to-the-minute data. Given this scenario, you can refresh table content at regular intervals, or only during off-peak hours. One or both of these strategies can minimize overhead, and facilitate performance when querying large data sets.
You can control the scope of the refresh operation by calling REFRESH _COLUMNS in one of two modes:
-
UPDATE : Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement.
-
REBUILD: Replaces all data in the specified columns. The rebuild operation is auto-committed.
If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For details, see Partition-based REBUILD Operations.
Examples
The following UPDATE statement updates the custDim
table:
=> UPDATE custDim SET name='Roz', gender='F' WHERE cid=2;
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
Changes are not propagated to flattened table orderFact
, which includes SET USING and DEFAULT columns cust_name
and cust_gender
, respectively:
=> SELECT * FROM custDim ORDER BY cid;
cid | name | age | gender
-----+-------+-----+--------
1 | Alice | 25 | F
2 | Roz | 30 | F
3 | Eva | 32 | F
(3 rows)
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Boz | M | -50.00
500 | 2018-12-31 | 2 | Boz | M | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
(5 rows)
The following INSERT statement invokes the cust_gender
column's DEFAULT query and sets that column to F
. The load operation does not invoke the cust_name
column's SET USING query, so cust_name
is set to null:
=> INSERT INTO orderFact(order_id, cid, amount) VALUES(500, 3, 750);
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Boz | M | -50.00
500 | 2018-12-31 | 2 | Boz | M | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
500 | 2018-12-31 | 3 | | F | 750.00
(6 rows)
To update the values in cust_name
, invoke its SET USING query by calling REFRESH_COLUMNS. REFRESH_COLUMNS executes cust_name
's SET USING query: it queries the name
column in table custDim
and updates cust_name
with the following values:
=> SELECT REFRESH_COLUMNS ('orderFact','');
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Roz | M | -50.00
500 | 2018-12-31 | 2 | Roz | M | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
500 | 2018-12-31 | 3 | Eva | F | 750.00
(6 rows)
REFRESH_COLUMNS only affects the values in column cust_name
. Values in column gender
are unchanged, so settings for rows where cid=2
(Roz
) remain set to M
. To repopulate orderFact.cust_gender
with default values from custDim.gender
, call UPDATE on orderFact
:
=> UPDATE orderFact SET cust_gender=DEFAULT WHERE cust_name='Roz';
OUTPUT
--------
2
(1 row)
=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | Alice | F | 15.00
200 | 2018-12-31 | 1 | Alice | F | 1000.00
300 | 2018-12-31 | 2 | Roz | F | -50.00
500 | 2018-12-31 | 2 | Roz | F | 200.00
400 | 2018-12-31 | 3 | Eva | F | 100.00
500 | 2018-12-31 | 3 | Eva | F | 750.00
(6 rows)
5 - Modifying SET USING and DEFAULT columns
Examples use the custDim and orderFact tables described in Flattened Table Example.
Modifying a SET USING and DEFAULT expression
ALTER TABLE...ALTER COLUMN can set an existing column to SET USING or DEFAULT, or change the query expression of an existing SET USING or DEFAULT column. In both cases, existing values remain unchanged. Vertica refreshes column values in the following cases:
-
DEFAULT column: Refreshed only when you load new rows, or when you invoke UPDATE to set column values to DEFAULT
.
-
SET USING column: Refreshed only when you call REFRESH_COLUMNS on the table.
For example, you might set an entire column to NULL as follows:
=> ALTER TABLE orderFact ALTER COLUMN cust_name SET USING NULL;
ALTER TABLE
=> SELECT REFRESH_COLUMNS('orderFact', 'cust_name', 'REBUILD');
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
100 | 2018-12-31 | 1 | | F | 15.00
200 | 2018-12-31 | 1 | | F | 1000.00
300 | 2018-12-31 | 2 | | M | -50.00
500 | 2018-12-31 | 2 | | M | 200.00
400 | 2018-12-31 | 3 | | F | 100.00
(5 rows)
For details, see Defining column values
Removing SET USING and DEFAULT constraints
You remove a column's SET USING
or DEFAULT
constraint with ALTER TABLE...ALTER COLUMN, as follows:
ALTER TABLE table-name ALTER COLUMN column-name DROP { SET USING | DEFAULT };
Vertica removes the constraint from the specified column, but the column and its data are otherwise unaffected. For example:
=> ALTER TABLE orderFact ALTER COLUMN cust_name DROP SET USING;
ALTER TABLE
Dropping columns queried by SET USING or DEFAULT
Vertica enforces dependencies between a flattened table and the tables that it queries. Attempts to drop a queried column or its table return an error unless the drop operation also includes the CASCADE
option. Vertica implements CASCADE
by removing the SET USING or DEFAULT constraint from the flattened table. The table column and its data are otherwise unaffected.
For example, attempts to drop column name
in table custDim
returns a rollback error, as this column is referenced by SET USING column orderFact.cust_gender
:
=> ALTER TABLE custDim DROP COLUMN gender;
ROLLBACK 7301: Cannot drop column "gender" since it is referenced in the default expression of table "public.orderFact", column "cust_gender"
To drop this column, use the CASCADE
option:
=> ALTER TABLE custDim DROP COLUMN gender CASCADE;
ALTER TABLE
Vertica removes the DEFAULT constraint from orderFact.cust_gender
as part of the drop operation. However, cust_gender
retains the data that it previously queried from the dropped column custDim.gender
:
=> SELECT EXPORT_TABLES('','orderFact');
EXPORT_TABLES
------------------------------------------------------------------------------------------------------------
CREATE TABLE public.orderFact
(
order_id int NOT NULL,
order_date timestamp NOT NULL DEFAULT (now())::timestamptz(6),
cid int,
cust_name varchar(20),
cust_gender varchar(1) SET USING NULL,
amount numeric(12,2),
CONSTRAINT C_PRIMARY PRIMARY KEY (order_id) DISABLED
)
PARTITION BY ((orderFact.order_date)::date) GROUP BY (CASE WHEN ("datediff"('year', (orderFact.order_date)::date, ((now())::timestamptz(6))::date) >= 2) THEN (date_trunc('year', (orderFact.order_date)::date))::date WHEN ("datediff"('month', (orderFact.order_date)::date, ((now())::timestamptz(6))::date) >= 2) THEN (date_trunc('month', (orderFact.order_date)::date))::date ELSE (orderFact.order_date)::date END);
ALTER TABLE public.orderFact ADD CONSTRAINT C_FOREIGN FOREIGN KEY (cid) references public.custDim (cid);
(1 row)
=> SELECT * FROM orderFact;
order_id | order_date | cid | cust_name | cust_gender | amount
----------+----------------------------+-----+-----------+-------------+---------
400 | 2021-01-05 13:27:56.026115 | 3 | | F | 100.00
300 | 2021-01-05 13:27:56.026115 | 2 | | F | -50.00
200 | 2021-01-05 13:27:56.026115 | 1 | | F | 1000.00
500 | 2021-01-05 13:30:18.138386 | 3 | | F | 750.00
100 | 2021-01-05 13:27:56.026115 | 1 | | F | 15.00
500 | 2021-01-05 13:27:56.026115 | 2 | | F | 200.00
(6 rows)
6 - Rewriting SET USING queries
When you call REFRESH_COLUMNS on a flattened table's SET USING (or DEFAULT USING) column, it executes the SET USING query by joining the target and source tables. By default, the source table is always the inner table of the join. In most cases, cardinality of the source table is less than the target table, so REFRESH_COLUMNS executes the join efficiently.
Occasionally—notably, when you call REFRESH_COLUMNS on a partitioned table—the source table can be larger than the target table. In this case, performance of the join operation can be suboptimal.
You can address this issue by enabling configuration parameter RewriteQueryForLargeDim. When enabled (1), Vertica rewrites the query, by reversing the inner and outer join between the target and source tables.
Important
Enable this parameter only if the SET USING source data is in a table that is larger than the target table. If the source data is in a table smaller than the target table, then enabling RewriteQueryForLargeDim can adversely affect refresh performance.
7 - Impact of SET USING columns on license limits
Vertica does not count the data in denormalized columns towards your raw data license limit.
Vertica does not count the data in denormalized columns towards your raw data license limit. SET USING columns obtain their data by querying columns in other tables. Only data from the source tables counts against your raw data license limit.
For a list of SET USING restrictions, see Defining column values.
You can remove a SET USING column so it counts toward your license limit with the following command:
=> ALTER TABLE table1 ALTER COLUMN column1 DROP SET USING;