Defining column values

You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:.

You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:

  • DEFAULT

  • SET USING

  • DEFAULT USING

DEFAULT

The DEFAULT option sets column values to a specified value. It has the following syntax:

DEFAULT default-expression

Default values are set when you:

  • Load new rows into a table, for example, with INSERT or COPY. Vertica populates DEFAULT columns in new rows with their default values. Values in existing rows, including columns with DEFAULT expressions, remain unchanged.

  • Execute UPDATE on a table and set the value of a DEFAULT column to DEFAULT:

    => UPDATE table-name SET column-name=DEFAULT;
    
  • Add a column with a DEFAULT expression to an existing table. Vertica populates the new column with its default values when it is added to the table.

Restrictions

DEFAULT expressions cannot specify volatile functions with ALTER TABLE...ADD COLUMN. To specify volatile functions, use CREATE TABLE or ALTER TABLE...ALTER COLUMN statements.

SET USING

The SET USING option sets the column value to an expression when the function REFRESH_COLUMNS is invoked on that column. This option has the following syntax:

SET USING using-expression

This approach is useful for large denormalized (flattened) tables, where multiple columns get their values by querying other tables.

Restrictions

SET USING has the following restrictions:

  • Volatile functions are not allowed.

  • The expression cannot specify a sequence.

  • Vertica limits the use of several meta-functions that copy table data: COPY_TABLE, COPY_PARTITIONS_TO_TABLE, MOVE_PARTITIONS_TO_TABLE, and SWAP_PARTITIONS_BETWEEN_TABLES:

    • If the source and target tables both have SET USING columns, the operation is permitted only if each source SET USING column has a corresponding target SET USING column.

    • If only the source table has SET USING columns, SWAP_PARTITIONS_BETWEEN_TABLES is disallowed.

    • If only the target table has SET USING columns, the operation is disallowed.

DEFAULT USING

The DEFAULT USING option sets DEFAULT and SET USING constraints on a column, equivalent to using DEFAULT and SET USING separately with the same expression on the same column. It has the following syntax:

DEFAULT USING expression

For example, the following column definitions are effectively identical:

=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
     DEFAULT USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));
=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
     DEFAULT (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid))
     SET USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));

DEFAULT USING supports the same expressions as SET USING and is subject to the same restrictions.

Supported expressions

DEFAULT and SET USING generally support the same expressions. These include:

Expression restrictions

The following restrictions apply to DEFAULT and SET USING expressions:

  • The return value data type must match or be cast to the column data type.

  • The expression must return a value that conforms to the column bounds. For example, a column that is defined as a VARCHAR(1) cannot be set to a default string of abc.

  • In a temporary table, DEFAULT and SET USING do not support subqueries. If you try to create a temporary table where DEFAULT or SET USING use subquery expressions, Vertica returns an error.

  • A column's SET USING expression cannot specify another column in the same table that also sets its value with SET USING. Similarly, a column's DEFAULT expression cannot specify another column in the same table that also sets its value with DEFAULT, or whose value is automatically set to a sequence. However, a column's SET USING expression can specify another column that sets its value with DEFAULT.

  • DEFAULT and SET USING expressions only support one SELECT statement; attempts to include multiple SELECT statements in the expression return an error. For example, given table t1:

    => SELECT * FROM t1;
     a |    b
    ---+---------
     1 | hello
     2 | world
    (2 rows)
    

    Attempting to create table t2 with the following DEFAULT expression returns with an error:

    => CREATE TABLE t2 (aa int, bb varchar(30) DEFAULT (SELECT 'I said ')||(SELECT b FROM t1 where t1.a = t2.aa));
    ERROR 9745:  Expressions with multiple SELECT statements cannot be used in 'set using' query definitions
    

Disambiguating predicate columns

If a SET USING or DEFAULT query expression joins two columns with the same name, the column names must include their table names. Otherwise, Vertica assumes that both columns reference the dimension table, and the predicate always evaluates to true.

For example, tables orderFact and custDim both include column cid. Flattened table orderFact defines column cust_name with a SET USING query expression. Because the query predicate references columns cid from both tables, the column names are fully qualified:

=> CREATE TABLE public.orderFact
(
    ...
    cid int REFERENCES public.custDim(cid),
    cust_name varchar(20) SET USING (
        SELECT name FROM public.custDim WHERE (custDIM.cid = orderFact.cid)),
    ...
)

Examples

Derive a column's default value from another column

  1. Create table t with two columns, date and state, and insert a row of data:

    => CREATE TABLE t (date DATE, state VARCHAR(2));
    CREATE TABLE
    => INSERT INTO t VALUES (CURRENT_DATE, 'MA');
     OUTPUT
    --------
          1
    (1 row)
    
    => COMMIT;
    COMMMIT
    SELECT * FROM t;
        date    | state
    ------------+-------
     2017-12-28 | MA
    (1 row)
    
  2. Use ALTER TABLE to add a third column that extracts the integer month value from column date:

    => ALTER TABLE t ADD COLUMN month INTEGER DEFAULT date_part('month', date);
    ALTER TABLE
    
  3. When you query table t, Vertica returns the number of the month in column date:

    => SELECT * FROM t;
        date    | state | month
    ------------+-------+-------
     2017-12-28 | MA    |    12
    (1 row)
    

Update default column values

  1. Update table t by subtracting 30 days from date:

    => UPDATE t SET date = date-30;
     OUTPUT
    --------
          1
    (1 row)
    
    => COMMIT;
    COMMIT
    => SELECT * FROM t;
        date    | state | month
    ------------+-------+-------
     2017-11-28 | MA    |    12
    (1 row)
    

    The value in month remains unchanged.

  2. Refresh the default value in month from column date:

    => UPDATE t SET month=DEFAULT;
     OUTPUT
    --------
          1
    (1 row)
    
    => COMMIT;
    COMMIT
    => SELECT * FROM t;
        date    | state | month
    ------------+-------+-------
     2017-11-28 | MA    |    11
    (1 row)
    

Derive a default column value from user-defined scalar function

This example shows a user-defined scalar function that adds two integer values. The function is called add2ints and takes two arguments.

  1. Develop and deploy the function, as described in Scalar functions (UDSFs).

  2. Create a sample table, t1, with two integer columns:

    => CREATE TABLE t1 ( x int, y int );
    CREATE TABLE
    
  3. Insert some values into t1:

    => insert into t1 values (1,2);
    OUTPUT
    --------
          1
    (1 row)
    => insert into t1 values (3,4);
     OUTPUT
    --------
          1
    (1 row)
    
  4. Use ALTER TABLE to add a column to t1, with the default column value derived from the UDSF add2ints:

    alter table t1 add column z int default add2ints(x,y);
    ALTER TABLE
    
  5. List the new column:

    select z from t1;
     z
    ----
      3
      7
    (2 rows)
    

Table with a SET USING column that queries another table for its values

  1. Define tables t1 and t2. Column t2.b is defined to get its data from column t1.b, through the query in its SET USING clause:

    => CREATE TABLE t1 (a INT PRIMARY KEY ENABLED, b INT);
    CREATE TABLE
    
    => CREATE TABLE t2 (a INT, alpha VARCHAR(10),
          b INT SET USING (SELECT t1.b FROM t1 WHERE t1.a=t2.a))
          ORDER BY a SEGMENTED BY HASH(a) ALL NODES;
    CREATE TABLE
    
  2. Populate the tables with data:

    => INSERT INTO t1 VALUES(1,11),(2,22),(3,33),(4,44);
    => INSERT INTO t2 VALUES (1,'aa'),(2,'bb');
    => COMMIT;
    COMMIT
    
  3. View the data in table t2: Column in SET USING column b is empty, pending invocation of Vertica function REFRESH_COLUMNS:

    => SELECT * FROM t2;
     a | alpha | b
    ---+-------+---
     1 | aa    |
     2 | bb    |
    (2 rows)
    
  4. Refresh the column data in table t2 by calling function REFRESH_COLUMNS:

    => SELECT REFRESH_COLUMNS ('t2','b', 'REBUILD');
          REFRESH_COLUMNS
    ---------------------------
     refresh_columns completed
    (1 row)
    

    In this example, REFRESH_COLUMNS is called with the optional argument REBUILD. This argument specifies to replace all data in SET USING column b. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column. For details, see REFRESH_COLUMNS.

  5. View data in refreshed column b, whose data is obtained from table t1 as specified in the column's SET USING query:

    => SELECT * FROM t2 ORDER BY a;
      a | alpha | b
    ---+-------+----
     1 | aa    | 11
     2 | bb    | 22
    (2 rows)
    

Expressions with correlated subqueries

DEFAULT and SET USING expressions support subqueries that can obtain values from other tables, and use those with values in the current table to compute column values. The following example adds a column gmt_delivery_time to fact table customer_orders. The column specifies a DEFAULT expression to set values in the new column as follows:

  1. Calls meta-function NEW_TIME, which performs the following tasks:

    • Uses customer keys in customer_orders to query the customers dimension table for customer time zones.

    • Uses the queried time zone data to convert local delivery times to GMT.

  2. Populates the gmt_delivery_time column with the converted values.

=> CREATE TABLE public.customers(
    customer_key int,
    customer_name varchar(64),
    customer_address varchar(64),
    customer_tz varchar(5),
    ...);

=> CREATE TABLE public.customer_orders(
    customer_key int,
    order_number int,
    product_key int,
    product_version int,
    quantity_ordered int,
    store_key int,
    date_ordered date,
    date_shipped date,
    expected_delivery_date date,
    local_delivery_time timestamptz,
    ...);

=> ALTER TABLE customer_orders ADD COLUMN gmt_delivery_time timestamp
   DEFAULT NEW_TIME(customer_orders.local_delivery_time,
      (SELECT c.customer_tz FROM customers c WHERE (c.customer_key = customer_orders.customer_key)),
      'GMT');