This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Flattened tables

Highly normalized database design often uses a star or snowflake schema model, comprising multiple large fact tables and many smaller dimension tables.

Highly normalized database design often uses a star or snowflake schema model, comprising multiple large fact tables and many smaller dimension tables. Queries typically involve joins between a large fact table and multiple dimension tables. Depending on the number of tables and quantity of data that are joined, these queries can incur significant overhead.

To avoid this problem, some users create wide tables that combine all fact and dimension table columns that their queries require. These tables can dramatically speed up query execution. However, maintaining redundant sets of normalized and denormalized data has its own administrative costs.

Denormalized, or flattened, tables, can minimize these problems. Flattened tables can include columns that get their values by querying other tables. Operations on the source tables and flattened table are decoupled; changes in one are not automatically propagated to the other. This minimizes the overhead that is otherwise typical of denormalized tables.

A flattened table defines derived columns with one or both of the following column constraint clauses:

  • DEFAULT query-expression sets the column value when the column is created with CREATE TABLE or ALTER TABLE...ADD COLUMN.

  • SET USING query-expression sets the column value when the function REFRESH_COLUMNS is invoked.

In both cases, query-expression must return only one row and column value, or none. If the query returns no rows, the column value is set to NULL.

Like other tables defined in Vertica, you can add and remove DEFAULT and SET USING columns from a flattened table at any time. Vertica enforces dependencies between a flattened table and the tables that it queries. For details, see Modifying SET USING and DEFAULT columns.

1 - Flattened table example

In the following example, columns orderFact.cust_name and orderFact.cust_gender are defined as SET USING and DEFAULT columns, respectively.

In the following example, columns orderFact.cust_name and orderFact.cust_gender are defined as SET USING and DEFAULT columns, respectively. Both columns obtain their values by querying table custDim:

=> CREATE TABLE public.custDim(
       cid int PRIMARY KEY NOT NULL,
       name varchar(20),
       age int,
      gender varchar(1)
);

=> CREATE TABLE public.orderFact(
       order_id int PRIMARY KEY NOT NULL,
       order_date timestamp DEFAULT CURRENT_TIMESTAMP NOT NULL,
       cid int REFERENCES public.custDim(cid),
       cust_name varchar(20) SET USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid)),
       cust_gender varchar(1) DEFAULT (SELECT gender FROM public.custDim WHERE (custDim.cid = orderFact.cid)),
       amount numeric(12,2)
)
PARTITION BY order_date::DATE GROUP BY CALENDAR_HIERARCHY_DAY(order_date::DATE, 2, 2);

The following INSERT commands load data into both tables:

=> INSERT INTO custDim VALUES(1, 'Alice', 25, 'F');
=> INSERT INTO custDim VALUES(2, 'Boz', 30, 'M');
=> INSERT INTO custDim VALUES(3, 'Eva', 32, 'F');
=>
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(100, 1, 15);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(200, 1, 1000);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(300, 2, -50);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(400, 3, 100);
=> INSERT INTO orderFact (order_id, cid, amount) VALUES(500, 2, 200);
=> COMMIT;

When you query the tables, Vertica returns the following result sets:

=> SELECT * FROM custDim;
 cid | name  | age | gender
-----+-------+-----+--------
   1 | Alice |  25 | F
   2 | Boz   |  30 | M
   3 | Eva   |  32 | F
(3 rows)

=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 |           | F           |   15.00
      200 | 2018-12-31 |   1 |           | F           | 1000.00
      300 | 2018-12-31 |   2 |           | M           |  -50.00
      500 | 2018-12-31 |   2 |           | M           |  200.00
      400 | 2018-12-31 |   3 |           | F           |  100.00
(5 rows)

Vertica automatically populates the DEFAULT column orderFact.cust_gender, but the SET USING column orderFact.cust_name remains NULL. You can automatically populate this column by calling the function REFRESH_COLUMNS on flattened table orderFact. This function invokes the SET USING query for column orderFact.cust_name and populates the column from the result set:

=> SELECT REFRESH_COLUMNS('orderFact', 'cust_name', 'REBUILD');

    REFRESH_COLUMNS
-------------------------------
refresh_columns completed
(1 row)

=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Boz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Boz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
(5 rows)

2 - Creating flattened tables

A flattened table is typically a fact table where one or more columns query other tables for their values, through DEFAULT or SET USING constraints.

A flattened table is typically a fact table where one or more columns query other tables for their values, through DEFAULT or SET USING constraints. DEFAULT and SET USING constraints can be used for columns of all data types. Like other columns, you can set these constraints when you create the flattened table, or any time thereafter by modifying the table DDL:

In all cases, the expressions that you set for these constraints are stored in the system table COLUMNS, in columns COLUMN_DEFAULT and COLUMN_SET_USING.

Supported expressions

DEFAULT and SET USING generally support the same expressions. These include:

For more information about DEFAULT and SET USING expressions, including restrictions, see Defining column values.

3 - Required privileges

The following operations on flattened table require privileges as shown:.

The following operations on flattened table require privileges as shown:

Operation Object Privileges
Retrieve data from a flattened table. Schema USAGE
Flattened table SELECT
Add SET USING or DEFAULT columns to a table. Schemas (queried/flattened tables) USAGE
Queried tables SELECT
Target table CREATE
INSERT data on a flattened table with SET USING and/or DEFAULT columns. Schemas (queried/flattened tables) USAGE
Queried tables SELECT
Flattened table INSERT
Run REFRESH_COLUMNS on a flattened table. Schemas (queried/flattened tables) USAGE
Queried tables SELECT
Flattened table SELECT, UPDATE

4 - DEFAULT versus SET USING

Columns in a flattened table can query other tables with constraints DEFAULT and SET USING.

Columns in a flattened table can query other tables with constraints DEFAULT and SET USING. In both cases, changes in the queried tables are not automatically propagated to the flattened table. The two constraints differ as follows:

DEFAULT columns

Vertica typically executes DEFAULT queries on new rows when they are added to the flattened table, through load operations such as INSERT and COPY, or when you create or alter a table with a new column that specifies a DEFAULT expression. In all cases, values in existing rows, including other columns with DEFAULT expressions, remain unchanged.

To refresh a column's default values, you must explicitly call UPDATE on that column as follows:

=> UPDATE table-name SET column-name=DEFAULT;

SET USING columns

Vertica executes SET USING queries only when you invoke the function REFRESH_COLUMNS. Load operations set SET USING columns in new rows to NULL. After the load, you must call REFRESH_COLUMNS to populate these columns from the queried tables. This can be useful in two ways: you can defer the overhead of updating the flattened table to any time that is convenient; and you can repeatedly query source tables for new data.

SET USING is especially useful for large flattened tables that reference data from multiple dimension tables. Often, only a small subset of SET USING columns are subject to change, and queries on the flattened table do not always require up-to-the-minute data. Given this scenario, you can refresh table content at regular intervals, or only during off-peak hours. One or both of these strategies can minimize overhead, and facilitate performance when querying large data sets.

You can control the scope of the refresh operation by calling REFRESH _COLUMNS in one of two modes:

  • UPDATE : Marks original rows as deleted and replaces them with new rows. In order to save these updates, you must issue a COMMIT statement.

  • REBUILD: Replaces all data in the specified columns. The rebuild operation is auto-committed.

If a flattened table is partitioned, you can reduce the overhead of calling REFRESH_COLUMNS in REBUILD mode by specifying one or more partition keys. Doing so limits the rebuild operation to the specified partitions. For details, see Partition-based REBUILD Operations.

Examples

The following UPDATE statement updates the custDim table:

=> UPDATE custDim SET name='Roz', gender='F' WHERE cid=2;
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT

Changes are not propagated to flattened table orderFact, which includes SET USING and DEFAULT columns cust_name and cust_gender, respectively:


=> SELECT * FROM custDim ORDER BY cid;
 cid | name  | age | gender
-----+-------+-----+--------
   1 | Alice |  25 | F
   2 | Roz   |  30 | F
   3 | Eva   |  32 | F
(3 rows)

=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Boz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Boz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
(5 rows)

The following INSERT statement invokes the cust_gender column's DEFAULT query and sets that column to F. The load operation does not invoke the cust_name column's SET USING query, so cust_name is set to null:

=> INSERT INTO orderFact(order_id, cid, amount)  VALUES(500, 3, 750);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=>  SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Boz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Boz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 |           | F           |  750.00
(6 rows)

To update the values in cust_name, invoke its SET USING query by calling REFRESH_COLUMNS. REFRESH_COLUMNS executes cust_name's SET USING query: it queries the name column in table custDim and updates cust_name with the following values:

  • Sets cust_name in the new row to Eva.

  • Returns updated values for cid=2, and changes Boz to Roz.

=> SELECT REFRESH_COLUMNS ('orderFact','');
      REFRESH_COLUMNS
---------------------------
 refresh_columns completed
(1 row)

=> COMMIT;
COMMIT
=>  SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Roz       | M           |  -50.00
      500 | 2018-12-31 |   2 | Roz       | M           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 | Eva       | F           |  750.00
(6 rows)

REFRESH_COLUMNS only affects the values in column cust_name. Values in column gender are unchanged, so settings for rows where cid=2 (Roz) remain set to M. To repopulate orderFact.cust_gender with default values from custDim.gender, call UPDATE on orderFact:

=> UPDATE orderFact SET cust_gender=DEFAULT WHERE cust_name='Roz';
 OUTPUT
--------
      2
(1 row)

=> COMMIT;
COMMIT
=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
 order_id | order_date | cid | cust_name | cust_gender | amount
----------+------------+-----+-----------+-------------+---------
      100 | 2018-12-31 |   1 | Alice     | F           |   15.00
      200 | 2018-12-31 |   1 | Alice     | F           | 1000.00
      300 | 2018-12-31 |   2 | Roz       | F           |  -50.00
      500 | 2018-12-31 |   2 | Roz       | F           |  200.00
      400 | 2018-12-31 |   3 | Eva       | F           |  100.00
      500 | 2018-12-31 |   3 | Eva       | F           |  750.00
(6 rows)

5 - Modifying SET USING and DEFAULT columns

Examples use the custDim and orderFact tables described in Flattened Table Example.

Modifying a SET USING and DEFAULT expression

ALTER TABLE...ALTER COLUMN can set an existing column to SET USING or DEFAULT, or change the query expression of an existing SET USING or DEFAULT column. In both cases, existing values remain unchanged. Vertica refreshes column values in the following cases:

  • DEFAULT column: Refreshed only when you load new rows, or when you invoke UPDATE to set column values to DEFAULT.

  • SET USING column: Refreshed only when you call REFRESH_COLUMNS on the table.

For example, you might set an entire column to NULL as follows:

=> ALTER TABLE orderFact ALTER COLUMN cust_name SET USING NULL;
ALTER TABLE
=> SELECT REFRESH_COLUMNS('orderFact', 'cust_name', 'REBUILD');
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)

=> SELECT order_id, order_date::date, cid, cust_name, cust_gender, amount FROM orderFact ORDER BY cid;
  order_id | order_date | cid | cust_name | cust_gender | amount
 ----------+------------+-----+-----------+-------------+---------
    100 | 2018-12-31 |   1 |           | F           |   15.00
    200 | 2018-12-31 |   1 |           | F           | 1000.00
    300 | 2018-12-31 |   2 |           | M           |  -50.00
    500 | 2018-12-31 |   2 |           | M           |  200.00
    400 | 2018-12-31 |   3 |           | F           |  100.00
(5 rows)

For details, see Defining column values

Removing SET USING and DEFAULT constraints

You remove a column's SET USING or DEFAULT constraint with ALTER TABLE...ALTER COLUMN, as follows:

ALTER TABLE table-name ALTER COLUMN column-name DROP { SET USING | DEFAULT };

Vertica removes the constraint from the specified column, but the column and its data are otherwise unaffected. For example:

=> ALTER TABLE orderFact ALTER COLUMN cust_name DROP SET USING;
ALTER TABLE

Dropping columns queried by SET USING or DEFAULT

Vertica enforces dependencies between a flattened table and the tables that it queries. Attempts to drop a queried column or its table return an error unless the drop operation also includes the CASCADE option. Vertica implements CASCADE by removing the SET USING or DEFAULT constraint from the flattened table. The table column and its data are otherwise unaffected.

For example, attempts to drop column name in table custDim returns a rollback error, as this column is referenced by SET USING column orderFact.cust_gender:

=> ALTER TABLE custDim DROP COLUMN gender;
ROLLBACK 7301:  Cannot drop column "gender" since it is referenced in the default expression of table "public.orderFact", column "cust_gender"

To drop this column, use the CASCADE option:

=> ALTER TABLE custDim DROP COLUMN gender CASCADE;
ALTER TABLE

Vertica removes the DEFAULT constraint from orderFact.cust_gender as part of the drop operation. However, cust_gender retains the data that it previously queried from the dropped column custDim.gender:

=> SELECT EXPORT_TABLES('','orderFact');
                                                EXPORT_TABLES
------------------------------------------------------------------------------------------------------------
CREATE TABLE public.orderFact
(
    order_id int NOT NULL,
    order_date timestamp NOT NULL DEFAULT (now())::timestamptz(6),
    cid int,
    cust_name varchar(20),
    cust_gender varchar(1) SET USING NULL,
    amount numeric(12,2),
    CONSTRAINT C_PRIMARY PRIMARY KEY (order_id) DISABLED
)
PARTITION BY ((orderFact.order_date)::date) GROUP BY (CASE WHEN ("datediff"('year', (orderFact.order_date)::date, ((now())::timestamptz(6))::date) >= 2) THEN (date_trunc('year', (orderFact.order_date)::date))::date WHEN ("datediff"('month', (orderFact.order_date)::date, ((now())::timestamptz(6))::date) >= 2) THEN (date_trunc('month', (orderFact.order_date)::date))::date ELSE (orderFact.order_date)::date END);
ALTER TABLE public.orderFact ADD CONSTRAINT C_FOREIGN FOREIGN KEY (cid) references public.custDim (cid);

(1 row)

=> SELECT * FROM orderFact;
 order_id |         order_date         | cid | cust_name | cust_gender | amount
----------+----------------------------+-----+-----------+-------------+---------
      400 | 2021-01-05 13:27:56.026115 |   3 |           | F           |  100.00
      300 | 2021-01-05 13:27:56.026115 |   2 |           | F           |  -50.00
      200 | 2021-01-05 13:27:56.026115 |   1 |           | F           | 1000.00
      500 | 2021-01-05 13:30:18.138386 |   3 |           | F           |  750.00
      100 | 2021-01-05 13:27:56.026115 |   1 |           | F           |   15.00
      500 | 2021-01-05 13:27:56.026115 |   2 |           | F           |  200.00
(6 rows)

6 - Rewriting SET USING queries

When you call REFRESH_COLUMNS on a flattened table's SET USING (or DEFAULT USING) column, it executes the SET USING query by joining the target and source tables. By default, the source table is always the inner table of the join. In most cases, cardinality of the source table is less than the target table, so REFRESH_COLUMNS executes the join efficiently.

Occasionally—notably, when you call REFRESH_COLUMNS on a partitioned table—the source table can be larger than the target table. In this case, performance of the join operation can be suboptimal.

You can address this issue by enabling configuration parameter RewriteQueryForLargeDim. When enabled (1), Vertica rewrites the query, by reversing the inner and outer join between the target and source tables.

7 - Impact of SET USING columns on license limits

Vertica does not count the data in denormalized columns towards your raw data license limit.

Vertica does not count the data in denormalized columns towards your raw data license limit. SET USING columns obtain their data by querying columns in other tables. Only data from the source tables counts against your raw data license limit.

For a list of SET USING restrictions, see Defining column values.

You can remove a SET USING column so it counts toward your license limit with the following command:

=> ALTER TABLE table1 ALTER COLUMN column1 DROP SET USING;