This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Working with native tables

You can create two types of native tables in Vertica (ROS format), columnar and flexible.

You can create two types of native tables in Vertica (ROS format), columnar and flexible. You can create both types as persistent or temporary. You can also create views that query a specific set of table columns.

The tables described in this section store their data in and are managed by the Vertica database. Vertica also supports external tables, which are defined in the database and store their data externally. For more information about external tables, see Working with external data.

1 - Creating tables

CREATE TABLE creates a table in the Vertica.

CREATE TABLE creates a table in the Vertica logical schema. For example:

=> CREATE TABLE orders(
    orderkey    INT,
    custkey     INT,
    prodkey     ARRAY[VARCHAR(10)],
    orderprices ARRAY[DECIMAL(12,2)],
    orderdate   DATE
);

See CREATE TABLE for important restrictions on defining columns.

Table data storage

Unlike traditional databases that store data in tables, Vertica physically stores table data in projections, which are collections of table columns. Projections store data in a format that optimizes query execution. Similar to materialized views, they store result sets on disk rather than compute them each time they are used in a query.

In order to query or perform any operation on a Vertica table, the table must have one or more projections associated with it. For more information, see Projections.

Deriving a table definition from the data

You can use the INFER_TABLE_DDL function to inspect Parquet, ORC, or Avro data and produce a starting point for a table definition. This function returns a CREATE TABLE statement, which might require further editing. For columns where the function could not infer the data type, the function labels the type as unknown and emits a warning. For VARCHAR and VARBINARY columns, you might need to adjust the length. Always review the statement the function returns, but especially for tables with many columns, using this function can save time and effort:

=> SELECT INFER_TABLE_DDL('/data/people/*.parquet'
        USING PARAMETERS format = 'parquet', table_name = 'employees');
WARNING 9311:  This generated statement contains one or more varchar/varbinary columns which default to length 80
                    INFER_TABLE_DDL
-------------------------------------------------------------------------
 create table "employees"(
  "employeeID" int,
  "personal" Row(
    "name" varchar,
    "address" Row(
      "street" varchar,
      "city" varchar,
      "zipcode" int
    ),
    "taxID" int
  ),
  "department" varchar
 );
(1 row)

For Parquet files, you can use the GET_METADATA function to inspect a file and report metadata including information about columns.

See also

2 - Creating temporary tables

CREATE TEMPORARY TABLE creates a table whose data persists only during the current session.

CREATE TEMPORARY TABLE creates a table whose data persists only during the current session. Temporary table data is never visible to other sessions.

By default, all temporary table data is transaction-scoped—that is, the data is discarded when a COMMIT statement ends the current transaction. If CREATE TEMPORARY TABLE includes the parameter ON COMMIT PRESERVE ROWS, table data is retained until the current session ends.

Temporary tables can be used to divide complex query processing into multiple steps. Typically, a reporting tool holds intermediate results while reports are generated—for example, the tool first gets a result set, then queries the result set, and so on.

When you create a temporary table, Vertica automatically generates a default projection for it. For more information, see Auto-projections.

Global versus local tables

CREATE TEMPORARY TABLE can create tables at two scopes, global and local, through the keywords GLOBAL and LOCAL, respectively:

Global temporary tables Vertica creates global temporary tables in the public schema. Definitions of these tables are visible to all sessions, and persist across sessions until they are explicitly dropped. Multiple users can access the table concurrently. Table data is session-scoped, so it is visible only to the session user, and is discarded when the session ends.
Local temporary tables Vertica creates local temporary tables in the V_TEMP_SCHEMA namespace and inserts them transparently into the user's search path. These tables are visible only to the session where they are created. When the session ends, Vertica automatically drops the table and its data.

Data retention

You can specify whether temporary table data is transaction- or session-scoped:

  • [ON COMMIT DELETE ROWS](#on) (default): Vertica automatically removes all table data when each transaction ends.

  • [ON COMMIT PRESERVE ROWS](#on2): Vertica preserves table data across transactions in the current session. Vertica automatically truncates the table when the session ends.

ON COMMIT DELETE ROWS
By default, Vertica removes all data from a temporary table, whether global or local, when the current transaction ends.

For example:

=> CREATE TEMPORARY TABLE tempDelete (a int, b int);
CREATE TABLE
=> INSERT INTO tempDelete VALUES(1,2);
 OUTPUT
--------
      1
(1 row)

=> SELECT * FROM tempDelete;
 a | b
---+---
 1 | 2
(1 row)

=> COMMIT;
COMMIT

=> SELECT * FROM tempDelete;
 a | b
---+---
(0 rows)

If desired, you can use DELETE within the same transaction multiple times, in order to refresh table data repeatedly.

ON COMMIT PRESERVE ROWS
You can specify that a temporary table retain data across transactions in the current session, by defining the table with the keywords ON COMMIT PRESERVE ROWS. Vertica automatically removes all data from the table only when the current session ends.

For example:

=> CREATE TEMPORARY TABLE tempPreserve (a int, b int) ON COMMIT PRESERVE ROWS;
CREATE TABLE
=> INSERT INTO tempPreserve VALUES (1,2);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM tempPreserve;
 a | b
---+---
 1 | 2
(1 row)

=> INSERT INTO tempPreserve VALUES (3,4);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM tempPreserve;
 a | b
---+---
 1 | 2
 3 | 4
(2 rows)

Eon restrictions

The following Eon Mode restrictions apply to temporary tables:

  • K-safety of temporary tables is always set to 0, regardless of system K-safety. If a CREATE TEMPORARY TABLE statement sets k-num greater than 0, Vertica returns an warning.
  • If subscriptions to the current session change, temporary tables in that session becomes inaccessible. Causes for session subscription changes include:

    • A node left the list of participating nodes.

    • A new node appeared in the list of participating nodes.

    • An active node changed for one or more shards.

    • A mergeout operation in the same session that is triggered by a user explicitly invoking DO_TM_TASK('mergeout'), or changing a column data type with ALTER TABLE...ALTER COLUMN.

3 - Creating a table from other tables

You can create a table from other tables in two ways:.

You can create a table from other tables in two ways:

3.1 - Replicating a table

You can create a table from an existing one using CREATE TABLE with the LIKE clause:.

You can create a table from an existing one using CREATE TABLE with the LIKE clause:

CREATE TABLE [schema.]table-name LIKE [schema.]existing-table
   [ {INCLUDING | EXCLUDING} PROJECTIONS ]
   [ {INCLUDE | EXCLUDE} [SCHEMA] PRIVILEGES ]

Creating a table with LIKE replicates the source table definition and any storage policy associated with it. It does not copy table data or expressions on columns.

Copying constraints

CREATE TABLE...LIKE copies all table constraints, with the following exceptions:

  • Foreign key constraints.

  • Any column that obtains its values from a sequence, including IDENTITY and AUTO_INCREMENT columns. Vertica copies the column values into the new table, but removes the original constraint. For example, the following table definition sets an IDENTITY constraint on column ID:

    CREATE TABLE public.Premium_Customer
    (
        ID IDENTITY ,
        lname varchar(25),
        fname varchar(25),
        store_membership_card int
    );
    

    The following CREATE TABLE...LIKE statement replicates this table as All_Customers. Vertica removes the IDENTITY constraint from All_Customers.ID, changing it to an integer column with a NOT NULL constraint:

    => CREATE TABLE All_Customers like Premium_Customer;
    CREATE TABLE
    => select export_tables('','All_Customers');
                       export_tables
    ---------------------------------------------------
    CREATE TABLE public.All_Customers
    (
        ID int NOT NULL,
        lname varchar(25),
        fname varchar(25),
        store_membership_card int
    );
    
    (1 row)
    

Including projections

You can qualify the LIKE clause with INCLUDING PROJECTIONS or EXCLUDING PROJECTIONS, which specify whether to copy projections from the source table:

  • EXCLUDING PROJECTIONS (default): Do not copy projections from the source table.

  • INCLUDING PROJECTIONS: Copy current projections from the source table. Vertica names the new projections according to Vertica naming conventions, to avoid name conflicts with existing objects.

Including schema privileges

You can specify default inheritance of schema privileges for the new table:

  • EXCLUDE [SCHEMA] PRIVILEGES (default) disables inheritance of privileges from the schema

  • INCLUDE [SCHEMA] PRIVILEGES grants the table the same privileges granted to its schema

For more information see Setting privilege inheritance on tables and views.

Restrictions

The following restrictions apply to the source table:

  • It cannot have out-of-date projections.

  • It cannot be a temporary table.

Example

  1. Create the table states:

    
    => CREATE TABLE states (
         state char(2) NOT NULL, bird varchar(20), tree varchar (20), tax float, stateDate char (20))
         PARTITION BY state;
    
  2. Populate the table with data:

    INSERT INTO states VALUES ('MA', 'chickadee', 'american_elm', 5.675, '07-04-1620');
    INSERT INTO states VALUES ('VT', 'Hermit_Thrasher', 'Sugar_Maple', 6.0, '07-04-1610');
    INSERT INTO states VALUES ('NH', 'Purple_Finch', 'White_Birch', 0, '07-04-1615');
    INSERT INTO states VALUES ('ME', 'Black_Cap_Chickadee', 'Pine_Tree', 5, '07-04-1615');
    INSERT INTO states VALUES ('CT', 'American_Robin', 'White_Oak', 6.35, '07-04-1618');
    INSERT INTO states VALUES ('RI', 'Rhode_Island_Red', 'Red_Maple', 5, '07-04-1619');
    
  3. View the table contents:

    => SELECT * FROM states;
    
    
     state |        bird         |     tree     |  tax  |      stateDate
    -------+---------------------+--------------+-------+----------------------
     VT    | Hermit_Thrasher     | Sugar_Maple  |     6 | 07-04-1610
     CT    | American_Robin      | White_Oak    |  6.35 | 07-04-1618
     RI    | Rhode_Island_Red    | Red_Maple    |     5 | 07-04-1619
     MA    | chickadee           | american_elm | 5.675 | 07-04-1620
     NH    | Purple_Finch        | White_Birch  |     0 | 07-04-1615
     ME    | Black_Cap_Chickadee | Pine_Tree    |     5 | 07-04-1615
    (6 rows
    
  4. Create a sample projection and refresh:

    => CREATE PROJECTION states_p AS SELECT state FROM states;
    
    => SELECT START_REFRESH();
    
  5. Create a table like the states table and include its projections:

    => CREATE TABLE newstates LIKE states INCLUDING PROJECTIONS;
    
  6. View projections for the two tables. Vertica has copied projections from states to newstates:

    => \dj
                                                          List of projections
                Schema             |                   Name                    |  Owner  |       Node       | Comment
    -------------------------------+-------------------------------------------+---------+------------------+---------
     public                        | newstates_b0                              | dbadmin |                  |
     public                        | newstates_b1                              | dbadmin |                  |
     public                        | newstates_p_b0                            | dbadmin |                  |
     public                        | newstates_p_b1                            | dbadmin |                  |
     public                        | states_b0                                 | dbadmin |                  |
     public                        | states_b1                                 | dbadmin |                  |
     public                        | states_p_b0                               | dbadmin |                  |
     public                        | states_p_b1                               | dbadmin |                  |
    
  7. View the table newstates, which shows columns copied from states:

    
    => SELECT * FROM newstates;
    
    
     state | bird | tree | tax | stateDate
    -------+------+------+-----+-----------
    (0 rows)
    

When you use the CREATE TABLE...LIKE statement, storage policy objects associated with the table are also copied. Data added to the new table use the same labeled storage location as the source table, unless you change the storage policy. For more information, see Working With Storage Locations.

See also

3.2 - Creating a table from a query

CREATE TABLE can specify an AS clause to create a table from a query, as follows:.

CREATE TABLE can specify an AS clause to create a table from a query, as follows:

CREATE [TEMPORARY] TABLE [schema.]table-name
    [ ( column-name-list ) ]
    [ {INCLUDE | EXCLUDE} [SCHEMA] PRIVILEGES ]
AS  [  /*+ LABEL */ ] [ AT epoch ] query [ ENCODED BY column-ref-list ]

Vertica creates a table from the query results and loads the result set into it. For example:

=> CREATE TABLE cust_basic_profile AS SELECT
     customer_key, customer_gender, customer_age, marital_status, annual_income, occupation
     FROM customer_dimension WHERE customer_age>18 AND customer_gender !='';
CREATE TABLE
=> SELECT customer_age, annual_income, occupation FROM cust_basic_profile
     WHERE customer_age > 23 ORDER BY customer_age;
 customer_age | annual_income |     occupation
--------------+---------------+--------------------
           24 |        469210 | Hairdresser
           24 |        140833 | Butler
           24 |        558867 | Lumberjack
           24 |        529117 | Mechanic
           24 |        322062 | Acrobat
           24 |        213734 | Writer
           ...

AS clause options

You can qualify an AS clause with one or both of the following options:

  • LABEL hint that identifies a statement for profiling and debugging

  • AT epoch clause to specify that the query return historical data

Labeling the AS clause

You can embed a [LABEL](/en/sql-reference/language-elements/hints/label/) hint in an AS clause in two places:

  • Immediately after the keyword AS:

    CREATE TABLE myTable AS /*+LABEL myLabel*/...
    
  • In the SELECT statement:

    CREATE TABLE myTable AS SELECT /*+LABEL myLabel*/
    

If the AS clause contains a LABEL hint in both places, the first label has precedence.

Loading historical data

You can qualify a CREATE TABLE AS query with an AT epoch clause, to specify that the query return historical data, where epoch is one of the following:

  • EPOCH LATEST: Return data up to but not including the current epoch. The result set includes data from the latest committed DML transaction.

  • EPOCH integer: Return data up to and including the integer-specified epoch.

  • TIME 'timestamp': Return data from the timestamp-specified epoch.

See Epochs for additional information about how Vertica uses epochs.

For details, see Historical queries.

Zero-width column handling

If the query returns a column with zero width, Vertica automatically converts it to a VARCHAR(80) column. For example:

=> CREATE TABLE example AS SELECT '' AS X;
CREATE TABLE
=> SELECT EXPORT_TABLES ('', 'example');
                       EXPORT_TABLES
----------------------------------------------------------
CREATE TEMPORARY TABLE public.example
(
    X varchar(80)
);

Requirements and restrictions

  • If you create a temporary table from a query, you must specify ON COMMIT PRESERVE ROWS in order to load the result set into the table. Otherwise, Vertica creates an empty table.

  • If the query output has expressions other than simple columns, such as constants or functions, you must specify an alias for that expression, or list all columns in the column name list.

  • You cannot use CREATE TABLE AS SELECT with a SELECT that returns values of complex types. You can, however, use CREATE TABLE LIKE.

See also

4 - Managing table columns

After you define a table, you can use ALTER TABLE to modify existing table columns.

After you define a table, you can use ALTER TABLE to modify existing table columns. You can perform the following operations on a column:

4.1 - Renaming columns

You rename a column with ALTER TABLE as follows:.

You rename a column with ALTER TABLE as follows:

ALTER TABLE [schema.]table-name  RENAME [ COLUMN ] column-name TO new-column-name

The following example renames a column in the Retail.Product_Dimension table from Product_description to Item_description:

=> ALTER TABLE Retail.Product_Dimension
    RENAME COLUMN Product_description TO Item_description;

If you rename a column that is referenced by a view, the column does not appear in the result set of the view even if the view uses the wild card (*) to represent all columns in the table. Recreate the view to incorporate the column's new name.

4.2 - Changing column data type

In general, you can change a column's data type with ALTER TABLE if doing so does not require storage reorganization.

In general, you can change a column's data type with ALTER TABLE if doing so does not require storage reorganization. After you modify a column's data type, data that you load conforms to the new definition.

The sections that follow describe requirements and restrictions associated with changing a column's data type.

Supported data type conversions

Vertica supports conversion for the following data types:

Data Types Supported Conversions
Binary Expansion and contraction.
Character All conversions between CHAR, VARCHAR, and LONG VARCHAR.
Exact numeric

All conversions between the following numeric data types: integer data types—INTEGER, INT, BIGINT, TINYINT, INT8, SMALLINT—and NUMERIC values of scale <=18 and precision 0.

You cannot modify the scale of NUMERIC data types; however, you can change precision in the ranges (0-18), (19-37), and so on.

Collection

The following conversions are supported:

  • Collection of one element type to collection of another element type, if the source element type can be coerced to the target element type.
  • Between arrays and sets.
  • Collection type to the same type (array to array or set to set), to change bounds or binary size.

For details, see Changing Collection Columns.

Unsupported data type conversions

Vertica does not allow data type conversion on types that require storage reorganization:

  • Boolean

  • DATE/TIME

  • Approximate numeric type

  • BINARY to VARBINARY and vice versa

You also cannot change a column's data type if the column is one of the following:

  • Primary key

  • Foreign key

  • Included in the SEGMENTED BY clause of any projection for that table.

  • Complex type column. One exception applies: in external tables, you can change a primitive column type to a complex type.

You can work around some of these restrictions. For details, see Working with column data conversions.

4.2.1 - Changing column width

You can expand columns within the same class of data type.

You can expand columns within the same class of data type. Doing so is useful for storing larger items in a column. Vertica validates the data before it performs the conversion.

In general, you can also reduce column widths within the data type class. This is useful to reclaim storage if the original declaration was longer than you need, particularly with strings. You can reduce column width only if the following conditions are true:

  • Existing column data is no greater than the new width.

  • All nodes in the database cluster are up.

Otherwise, Vertica returns an error and the conversion fails. For example, if you try to convert a column from varchar(25) to varchar(10)Vertica allows the conversion as long as all column data is no more than 10 characters.

In the following example, columns y and z are initially defined as VARCHAR data types, and loaded with values 12345 and 654321, respectively. The attempt to reduce column z's width to 5 fails because it contains six-character data. The attempt to reduce column y's width to 5 succeeds because its content conforms with the new width:

=> CREATE TABLE t (x int, y VARCHAR, z VARCHAR);
CREATE TABLE
=> CREATE PROJECTION t_p1 AS SELECT * FROM t SEGMENTED BY hash(x) ALL NODES;
CREATE PROJECTION
=> INSERT INTO t values(1,'12345','654321');
 OUTPUT
--------
      1
(1 row)

=> SELECT * FROM t;
 x |   y   |   z
---+-------+--------
 1 | 12345 | 654321
(1 row)

=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ROLLBACK 2378:  Cannot convert column "z" to type "char(5)"
HINT:  Verify that the data in the column conforms to the new type
=> ALTER TABLE t ALTER COLUMN y SET DATA TYPE char(5);
ALTER TABLE

Changing collection columns

If a column is a collection data type, you can use ALTER TABLE to change either its bounds or its maximum binary size. These properties are set at table creation time and can then be altered.

You can make a collection bounded, setting its maximum number of elements, as in the following example.

=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int,10];
ALTER TABLE

=> \d test.t1
                                     List of Fields by Tables
 Schema | Table | Column |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
  test  |  t1   | arr    | array[int8, 10] |   80 |         | f        | f           |
(1 row)

Alternatively, you can set the binary size for the entire collection instead of setting bounds. Binary size is set either explicitly or from the DefaultArrayBinarySize configuration parameter. The following example creates an array column from the default, changes the default, and then uses ALTER TABLE to change it to the new default.

=> SELECT get_config_parameter('DefaultArrayBinarySize');
 get_config_parameter
----------------------
 100
(1 row)

=> CREATE TABLE test.t1 (arr array[int]);
CREATE TABLE

=> \d test.t1
                                     List of Fields by Tables
 Schema | Table | Column |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
  test  |  t1   | arr    | array[int8](96) |   96 |         | f        | f           |
(1 row)

=> ALTER DATABASE DEFAULT SET DefaultArrayBinarySize=200;
ALTER DATABASE

=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int];
ALTER TABLE

=> \d test.t1
                                     List of Fields by Tables
 Schema | Table | Column |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
  test  |  t1   | arr    | array[int8](200)|  200 |         | f        | f           |
(1 row)

Alternatively, you can set the binary size explicitly instead of using the default value.

=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int](300);

Purging historical data

You cannot reduce a column's width if Vertica retains any historical data that exceeds the new width. To reduce the column width, first remove that data from the table:

  1. Advance the AHM to an epoch more recent than the historical data that needs to be removed from the table.

  2. Purge the table of all historical data that precedes the AHM with the function PURGE_TABLE.

For example, given the previous example, you can update the data in column t.z as follows:

=> UPDATE t SET z = '54321';
 OUTPUT
--------
      1
(1 row)

=> SELECT * FROM t;
 x |   y   |   z
---+-------+-------
 1 | 12345 | 54321
(1 row)

Although no data in column z now exceeds 5 characters, Vertica retains the history of its earlier data, so attempts to reduce the column width to 5 return an error:

=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ROLLBACK 2378:  Cannot convert column "z" to type "char(5)"
HINT:  Verify that the data in the column conforms to the new type

You can reduce the column width by purging the table's historical data as follows:

=> SELECT MAKE_AHM_NOW();
         MAKE_AHM_NOW
-------------------------------
 AHM set (New AHM Epoch: 6350)
(1 row)

=> SELECT PURGE_TABLE('t');
                                                     PURGE_TABLE
----------------------------------------------------------------------------------------------------------------------
 Task: purge operation
(Table: public.t) (Projection: public.t_p1_b0)
(Table: public.t) (Projection: public.t_p1_b1)

(1 row)

=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ALTER TABLE

4.2.2 - Working with column data conversions

Vertica conforms to the SQL standard by disallowing certain data conversions for table columns.

Vertica conforms to the SQL standard by disallowing certain data conversions for table columns. However, you sometimes need to work around this restriction when you convert data from a non-SQL database. The following examples describe one such workaround, using the following table:

=> CREATE TABLE sales(id INT, price VARCHAR) UNSEGMENTED ALL NODES;
CREATE TABLE
=> INSERT INTO sales VALUES (1, '$50.00');
 OUTPUT
--------
      1
(1 row)

=> INSERT INTO sales VALUES (2, '$100.00');
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM SALES;
 id |  price
----+---------
  1 | $50.00
  2 | $100.00
(2 rows)

To convert the price column's existing data type from VARCHAR to NUMERIC, complete these steps:

  1. Add a new column for temporary use. Assign the column a NUMERIC data type, and derive its default value from the existing price column.

  2. Drop the original price column.

  3. Rename the new column to the original column.

Add a new column for temporary use

  1. Add a column temp_price to table sales. You can use the new column temporarily, setting its data type to what you want (NUMERIC), and deriving its default value from the price column. Cast the default value for the new column to a NUMERIC data type and query the table:

    => ALTER TABLE sales ADD COLUMN temp_price NUMERIC(10,2) DEFAULT
    SUBSTR(sales.price, 2)::NUMERIC;
    ALTER TABLE
    
    => SELECT * FROM SALES;
     id |  price  | temp_price
    ----+---------+------------
      1 | $50.00  |      50.00
      2 | $100.00 |     100.00
    (2 rows)
    
  2. Use ALTER TABLE to drop the default expression from the new column temp_price. Vertica retains the values stored in this column:

    => ALTER TABLE sales ALTER COLUMN temp_price DROP DEFAULT;
    ALTER TABLE
    

Drop the original price column

Drop the extraneous price column. Before doing so, you must first advance the AHM to purge historical data that would otherwise prevent the drop operation:

  1. Advance the AHM:

    => SELECT MAKE_AHM_NOW();
             MAKE_AHM_NOW
    -------------------------------
     AHM set (New AHM Epoch: 6354)
    (1 row)
    
  2. Drop the original price column:

    => ALTER TABLE sales DROP COLUMN price CASCADE;
    ALTER COLUMN
    

Rename the new column to the original column

You can now rename the temp_price column to price:

  1. Use ALTER TABLE to rename the column:

    => ALTER TABLE sales RENAME COLUMN temp_price to price;
    
  2. Query the sales table again:

    => SELECT * FROM sales;
     id | price
    ----+--------
      1 |  50.00
      2 | 100.00
    (2 rows)
    

4.3 - Defining column values

You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:.

You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:

DEFAULT default-expression

Sets column values to default-expression in the following cases:

  • Load new rows into a table, for example, with INSERT and COPY. Vertica populates DEFAULT columns in new rows with their default values. Values in existing rows, including columns with DEFAULT expressions, remain unchanged.

  • Execute UPDATE on a table and set the value of a DEFAULT column to DEFAULT:

    => UPDATE table-name SET column-name=DEFAULT;

  • Add a column with a DEFAULT expression to an existing table. Vertica populates the new column with its default values when it is added to the table.

SET USING using-expression Sets the column value to using-expression when the function REFRESH_COLUMNS is invoked on that column. This approach is useful for large denormalized (flattened) tables, where multiple columns get their values by querying other tables.
DEFAULT USING expression

Sets DEFAULT and SET USING constraints on a column, equivalent to setting separate DEFAULT and SET USING constraints on the same column, where each constraint specifies the same expression. For example, the following column definitions are effectively identical:

=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
     DEFAULT USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));
=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
     DEFAULT (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid))
     SET USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));

DEFAULT USING columns support the same expressions as SET USING columns, and are subject to the same restrictions.

Supported expressions

DEFAULT and SET USING generally support the same expressions. These include:

Expression restrictions

The following restrictions apply to DEFAULT and SET USING expressions:

  • The return value data type must match or be cast to the column data type.

  • The expression must return a value that conforms to the column bounds. For example, a column that is defined as a VARCHAR(1) cannot be set to a default string of abc.

  • In a temporary table, DEFAULT and SET USING do not support subqueries. If you try to create a temporary table where DEFAULT or SET USING use subquery expressions, Vertica returns an error.

  • A column's SET USING expression cannot specify another column in the same table that also sets its value with SET USING. Similarly, a column's DEFAULT expression cannot specify another column in the same table that also sets its value with DEFAULT, or whose value is automatically set to a sequence. However, a column's SET USING expression can specify another column that sets its value with DEFAULT.

  • DEFAULT and SET USING expressions only support one SELECT statement; attempts to include multiple SELECT statements in the expression return an error. For example, given table t1:

    => SELECT * FROM t1;
     a |    b
    ---+---------
     1 | hello
     2 | world
    (2 rows)
    

    Attempting to create table t2 with the following DEFAULT expression returns with an error:

    => CREATE TABLE t2 (aa int, bb varchar(30) DEFAULT (SELECT 'I said ')||(SELECT b FROM t1 where t1.a = t2.aa));
    ERROR 9745:  Expressions with multiple SELECT statements cannot be used in 'set using' query definitions
    

DEFAULT restrictions

DEFAULT expressions cannot specify volatile functions with ALTER TABLE...ADD COLUMN. To specify volatile functions, use CREATE TABLE or ALTER TABLE...ALTER COLUMN statements.

SET USING restrictions

The following restrictions apply to SET USING expressions:

Disambiguating predicate columns

If a SET USING or DEFAULT query expression joins two columns with the same name, the column names must include their table names. Otherwise, Vertica assumes that both columns reference the dimension table, and the predicate always evaluates to true.

For example, tables orderFact and custDim both include column cid. Flattened table orderFact defines column cust_name with a SET USING query expression. Because the query predicate references columns cid from both tables, the column names are fully qualified:

=> CREATE TABLE public.orderFact
(
...
cid int REFERENCES public.custDim(cid),
cust_name varchar(20) SET USING (
    SELECT name FROM public.custDim WHERE (custDIM.cid = orderFact.cid)),
...
)

Examples

Derive a column's default value from another column

  1. Create table t with two columns, date and state, and insert a row of data:
=> CREATE TABLE t (date DATE, state VARCHAR(2));
CREATE TABLE
=> INSERT INTO t VALUES (CURRENT_DATE, 'MA');
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMMIT
SELECT * FROM t;
    date    | state
------------+-------
 2017-12-28 | MA
(1 row)
  1. Use ALTER TABLE to add a third column that extracts the integer month value from column date:
=> ALTER TABLE t ADD COLUMN month INTEGER DEFAULT date_part('month', date);
ALTER TABLE
  1. When you query table t, Vertica returns the number of the month in column date:
=> SELECT * FROM t;
    date    | state | month
------------+-------+-------
 2017-12-28 | MA    |    12
(1 row)

Update default column values

  1. Update table t by subtracting 30 days from date:
=> UPDATE t SET date = date-30;
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM t;
    date    | state | month
------------+-------+-------
 2017-11-28 | MA    |    12
(1 row)

The value in month remains unchanged.

  1. Refresh the default value in month from column date:
=> UPDATE t SET month=DEFAULT;
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM t;
    date    | state | month
------------+-------+-------
 2017-11-28 | MA    |    11
(1 row)

Derive a default column value from user-defined scalar function

This example shows a user-defined scalar function that adds two integer values. The function is called add2ints and takes two arguments.

  1. Develop and deploy the function, as described in Scalar functions (UDSFs).

  2. Create a sample table, t1, with two integer columns:

=> CREATE TABLE t1 ( x int, y int );
CREATE TABLE
  1. Insert some values into t1:
=> insert into t1 values (1,2);
OUTPUT
--------
      1
(1 row)
=> insert into t1 values (3,4);
 OUTPUT
--------
      1
(1 row)
  1. Use ALTER TABLE to add a column to t1, with the default column value derived from the UDSF add2ints:
alter table t1 add column z int default add2ints(x,y);
ALTER TABLE
  1. List the new column:
select z from t1;
 z
----
  3
  7
(2 rows)

Table with a SET USING column that queries another table for its values

  1. Define tables t1 and t2. Column t2.b is defined to get its data from column t1.b, through the query in its SET USING clause:
=> CREATE TABLE t1 (a INT PRIMARY KEY ENABLED, b INT);
CREATE TABLE

=> CREATE TABLE t2 (a INT, alpha VARCHAR(10),
      b INT SET USING (SELECT t1.b FROM t1 WHERE t1.a=t2.a))
      ORDER BY a SEGMENTED BY HASH(a) ALL NODES;
CREATE TABLE
  1. Populate the tables with data:
=> INSERT INTO t1 VALUES(1,11);
=> INSERT INTO t1 VALUES(2,22);
=> INSERT INTO t1 VALUES(3,33);
=> INSERT INTO t1 VALUES(4,44);
=> INSERT INTO t2 VALUES(1,'aa');
=> INSERT INTO t2 VALUES(2,'bb');
=> COMMIT;
COMMIT
  1. View the data in table t2: Column in SET USING column b is empty, pending invocation of Vertica function REFRESH_COLUMNS:
=> SELECT * FROM t2;
 a | alpha | b
---+-------+---
 1 | aa    |
 2 | bb    |
(2 rows)
  1. Refresh the column data in table t2 by calling function REFRESH_COLUMNS:
=> SELECT REFRESH_COLUMNS ('t2','b', 'REBUILD');
      REFRESH_COLUMNS
---------------------------
 refresh_columns completed
(1 row)

In this example, REFRESH_COLUMNS is called with the optional argument REBUILD. This argument specifies to replace all data in SET USING column b. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column. For details, see REFRESH_COLUMNS.

  1. View data in refreshed column b, whose data is obtained from table t1 as specified in the column's SET USING query:
=> SELECT * FROM t2 ORDER BY a;
  a | alpha | b
---+-------+----
 1 | aa    | 11
 2 | bb    | 22
(2 rows)

Expressions with correlated subqueries

DEFAULT and SET USING expressions support subqueries that can obtain values from other tables, and use those with values in the current table to compute column values. The following example adds a column gmt_delivery_time to fact table customer_orders. The column specifies a DEFAULT expression to set values in the new column as follows:

  1. Calls meta-function NEW_TIME, which performs the following tasks:
  • Uses customer keys in customer_orders to query the customers dimension table for customer time zones.

  • Uses the queried time zone data to convert local delivery times to GMT.

  1. Populates the gmt_delivery_time column with the converted values.
=> CREATE TABLE public.customers(
customer_key int,
customer_name varchar(64),
customer_address varchar(64),
customer_tz varchar(5),
...);

=> CREATE TABLE public.customer_orders(
customer_key int,
order_number int,
product_key int,
product_version int,
quantity_ordered int,
store_key int,
date_ordered date,
date_shipped date,
expected_delivery_date date,
local_delivery_time timestamptz,
...);

=> ALTER TABLE customer_orders ADD COLUMN gmt_delivery_time timestamp
   DEFAULT NEW_TIME(customer_orders.local_delivery_time,
                (SELECT c.customer_tz FROM customers c WHERE (c.customer_key = customer_orders.customer_key)),
                'GMT');

5 - Altering table definitions

You can modify a table's definition with ALTER TABLE , in response to evolving database schema requirements.

You can modify a table's definition with ALTER TABLE , in response to evolving database schema requirements. Changing a table definition is often more efficient than staging data in a temporary table, consuming fewer resources and less storage.

See also

5.1 - Adding table columns

You add a column to a persistent table with ALTER TABLE...ADD COLUMN:.

You add a column to a persistent table with ALTER TABLE...ADD COLUMN:

ALTER TABLE
...
ADD COLUMN [IF NOT EXISTS] column datatype
  [column-constraint]
  [ENCODING encoding-type]
  [PROJECTIONS (projections-list) | ALL PROJECTIONS ]

Table locking

When you use ADD COLUMN to alter a table, Vertica takes an O lock on the table until the operation completes. The lock prevents DELETE, UPDATE, INSERT, and COPY statements from accessing the table. The lock also blocks SELECT statements issued at SERIALIZABLE isolation level, until the operation completes.

Adding a column to a table does not affect K-safety of the physical schema design.

You can add columns when nodes are down.

Adding new columns to projections

When you add a column to a table, Vertica automatically adds the column to superprojections of that table. The ADD...COLUMN clause can also specify to add the column to one or more non-superprojections, with one of these options:

  • PROJECTIONS (projections-list): Adds the new column to one or more projections of this table, specified as a comma-delimted list of projection base names. Vertica adds the column to all buddies of each projection. The projection list cannot include projections with pre-aggregated data such as live aggregate projections; otherwise, Vertica rolls back the ALTER TABLE statement.

  • ALL PROJECTIONS adds the column to all projections of this table, excluding projections with pre-aggregated data.

For example, the store_orders table has two projections—superprojection store_orders_super, and user-created projection store_orders_p. The following ALTER TABLE...ADD COLUMN statement adds column expected_ship_date to the store_orders table. Because the statement omits the PROJECTIONS option, Vertica adds the column only to the table's superprojection:

=> ALTER TABLE public.store_orders ADD COLUMN expected_ship_date date;
ALTER TABLE
=> SELECT projection_column_name, projection_name FROM projection_columns WHERE table_name ILIKE 'store_orders'
     ORDER BY projection_name , projection_column_name;
 projection_column_name |  projection_name
------------------------+--------------------
 order_date             | store_orders_p_b0
 order_no               | store_orders_p_b0
 ship_date              | store_orders_p_b0
 order_date             | store_orders_p_b1
 order_no               | store_orders_p_b1
 ship_date              | store_orders_p_b1
 expected_ship_date     | store_orders_super
 order_date             | store_orders_super
 order_no               | store_orders_super
 ship_date              | store_orders_super
 shipper                | store_orders_super
(11 rows)

The following ALTER TABLE...ADD COLUMN statement includes the PROJECTIONS option. This specifies to include projection store_orders_p in the add operation. Vertica adds the new column to this projection and the table's superprojection:

=> ALTER TABLE public.store_orders ADD COLUMN delivery_date date PROJECTIONS (store_orders_p);
=> SELECT projection_column_name, projection_name FROM projection_columns WHERE table_name ILIKE 'store_orders'
     ORDER BY projection_name, projection_column_name;
 projection_column_name |  projection_name
------------------------+--------------------
 delivery_date          | store_orders_p_b0
 order_date             | store_orders_p_b0
 order_no               | store_orders_p_b0
 ship_date              | store_orders_p_b0
 delivery_date          | store_orders_p_b1
 order_date             | store_orders_p_b1
 order_no               | store_orders_p_b1
 ship_date              | store_orders_p_b1
 delivery_date          | store_orders_super
 expected_ship_date     | store_orders_super
 order_date             | store_orders_super
 order_no               | store_orders_super
 ship_date              | store_orders_super
 shipper                | store_orders_super
(14 rows)

Updating associated table views

Adding new columns to a table that has an associated view does not update the view's result set, even if the view uses a wildcard (*) to represent all table columns. To incorporate new columns, you must recreate the view.

5.2 - Dropping table columns

ALTER TABLE...DROP COLUMN drops the specified table column and the ROS containers that correspond to the dropped column:.

[ALTER TABLE...DROP COLUMN](/en/sql-reference/statements/alter-statements/alter-table/) drops the specified table column and the ROS containers that correspond to the dropped column:

ALTER TABLE [schema.]table DROP [ COLUMN ] [IF EXISTS] column [CASCADE | RESTRICT]

After the drop operation completes, data backed up from the current epoch onward recovers without the column. Data recovered from a backup that precedes the current epoch re-add the table column. Because drop operations physically purge object storage and catalog definitions (table history) from the table, AT EPOCH (historical) queries return nothing for the dropped column.

The altered table retains its object ID.

Restrictions

  • You cannot drop or alter a primary key column or a column that participates in the table partitioning clause.

  • You cannot drop the first column of any projection sort order, or columns that participate in a projection segmentation expression.

  • In Enterprise Mode, all nodes must be up. This restriction does not apply to Eon mode.

  • You cannot drop a column associated with an access policy. Attempts to do so produce the following error:
    ERROR 6482: Failed to parse Access Policies for table "t1"

Using CASCADE to force a drop

If the table column to drop has dependencies, you must qualify the DROP COLUMN clause with the CASCADE option. For example, the target column might be specified in a projection sort order. In this and other cases, DROP COLUMN...CASCADE handles the dependency by reorganizing catalog definitions or dropping a projection. In all cases, CASCADE performs the minimal reorganization required to drop the column.

Use CASCADE to drop a column with the following dependencies:

Dropped column dependency CASCADE behavior
Any constraint Vertica drops the column when a FOREIGN KEY constraint depends on a UNIQUE or PRIMARY KEY constraint on the referenced columns.
Specified in projection sort order Vertica truncates projection sort order up to and including the projection that is dropped without impact on physical storage for other columns and then drops the specified column. For example if a projection's columns are in sort order (a,b,c), dropping column b causes the projection's sort order to be just (a), omitting column (c).
Specified in a projection segmentation expression The column to drop is integral to the projection definition. If possible, Vertica drops the projection as long as doing so does not compromise K-safety; otherwise, the transaction rolls back.
Referenced as default value of another column See Dropping a Column Referenced as Default, below.

Dropping a column referenced as default

You might want to drop a table column that is referenced by another column as its default value. For example, the following table is defined with two columns, a and b:, where b gets its default value from column a:

=> CREATE TABLE x (a int) UNSEGMENTED ALL NODES;
CREATE TABLE
=> ALTER TABLE x ADD COLUMN b int DEFAULT a;
ALTER TABLE

In this case, dropping column a requires the following procedure:

  1. Remove the default dependency through ALTER COLUMN..DROP DEFAULT:

    => ALTER TABLE x ALTER COLUMN b DROP DEFAULT;
    
  2. Create a replacement superprojection for the target table if one or both of the following conditions is true:

    • The target column is the table's first sort order column. If the table has no explicit sort order, the default table sort order specifies the first table column as the first sort order column. In this case, the new superprojection must specify a sort order that excludes the target column.

    • If the table is segmented, the target column is specified in the segmentation expression. In this case, the new superprojection must specify a segmentation expression that excludes the target column.

    Given the previous example, table x has a default sort order of (a,b). Because column a is the table's first sort order column, you must create a replacement superprojection that is sorted on column b:

    => CREATE PROJECTION x_p1 as select * FROM x ORDER BY b UNSEGMENTED ALL NODES;
    
  3. Run START_REFRESH:

    
    => SELECT START_REFRESH();
                  START_REFRESH
    ----------------------------------------
     Starting refresh background process.
    
    (1 row)
    
  4. Run MAKE_AHM_NOW:

    => SELECT MAKE_AHM_NOW();
             MAKE_AHM_NOW
    -------------------------------
     AHM set (New AHM Epoch: 1231)
    (1 row)
    
  5. Drop the column:

    => ALTER TABLE x DROP COLUMN a CASCADE;
    

Vertica implements the CASCADE directive as follows:

  • Drops the original superprojection for table x (x_super).

  • Updates the replacement superprojection x_p1 by dropping column a.

Examples

The following series of commands successfully drops a BYTEA data type column:

=> CREATE TABLE t (x BYTEA(65000), y BYTEA, z BYTEA(1));
CREATE TABLE
=> ALTER TABLE t DROP COLUMN y;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624:  Column "y" does not exist
=> ALTER TABLE t DROP COLUMN x RESTRICT;
ALTER TABLE
=> SELECT x FROM t;
ERROR 2624:  Column "x" does not exist
=> SELECT * FROM t;
 z
---
(0 rows)
=> DROP TABLE t CASCADE;
DROP TABLE

The following series of commands tries to drop a FLOAT(8) column and fails because there are not enough projections to maintain K-safety.

=> CREATE TABLE t (x FLOAT(8),y FLOAT(08));
CREATE TABLE
=> ALTER TABLE t DROP COLUMN y RESTRICT;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624:  Column "y" does not exist
=> ALTER TABLE t DROP x CASCADE;
ROLLBACK 2409:  Cannot drop any more columns in t
=> DROP TABLE t CASCADE;

5.3 - Altering constraint enforcement

ALTER TABLE...ALTER CONSTRAINT can enable or disable enforcement of primary key, unique, and check constraints.

ALTER TABLE...ALTER CONSTRAINT can enable or disable enforcement of primary key, unique, and check constraints. You must qualify this clause with the keyword ENABLED or DISABLED:

  • ENABLED enforces the specified constraint.

  • DISABLED disables enforcement of the specified constraint.

For example:

ALTER TABLE public.new_sales ALTER CONSTRAINT C_PRIMARY ENABLED;

For details, see Constraint enforcement.

5.4 - Renaming tables

ALTER TABLE...RENAME TO renames one or more tables.

ALTER TABLE...RENAME TO renames one or more tables. Renamed tables retain their original OIDs.

You rename multiple tables by supplying two comma-delimited lists. Vertica maps the names according to their order in the two lists. Only the first list can qualify table names with a schema. For example:

=> ALTER TABLE S1.T1, S1.T2 RENAME TO U1, U2;

The RENAME TO parameter is applied atomically: all tables are renamed, or none of them. For example, if the number of tables to rename does not match the number of new names, none of the tables is renamed.

Using rename to swap tables within a schema

You can use ALTER TABLE...RENAME TO to swap tables within the same schema, without actually moving data. You cannot swap tables across schemas.

The following example swaps the data in tables T1 and T2 through intermediary table temp:

  1. t1 to temp

  2. t2 to t1

  3. temp to t2

=> DROP TABLE IF EXISTS temp, t1, t2;
DROP TABLE
=> CREATE TABLE t1 (original_name varchar(24));
CREATE TABLE
=> CREATE TABLE t2 (original_name varchar(24));
CREATE TABLE
=> INSERT INTO t1 VALUES ('original name t1');
 OUTPUT
--------
      1
(1 row)

=> INSERT INTO t2 VALUES ('original name t2');
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> ALTER TABLE t1, t2, temp RENAME TO temp, t1, t2;
ALTER TABLE
=> SELECT * FROM t1, t2;
  original_name   |  original_name
------------------+------------------
 original name t2 | original name t1
(1 row)

5.5 - Moving tables to another schema

ALTER TABLE...SET SCHEMA moves a table from one schema to another.

ALTER TABLE...SET SCHEMA moves a table from one schema to another. Vertica automatically moves all projections that are anchored to the source table to the destination schema. It also moves all IDENTITY and AUTO_INCREMENT columns to the destination schema.

Moving a table across schemas requires that you have USAGE privileges on the current schema and CREATE privileges on destination schema. You can move only one table between schemas at a time. You cannot move temporary tables across schemas.

Name conflicts

If a table of the same name or any of the projections that you want to move already exist in the new schema, the statement rolls back and does not move either the table or any projections. To work around name conflicts:

  1. Rename any conflicting table or projections that you want to move.

  2. Run ALTER TABLE...SET SCHEMA again.

Example

The following example moves table T1 from schema S1 to schema S2. All projections that are anchored on table T1 automatically move to schema S2:

=> ALTER TABLE S1.T1 SET SCHEMA S2;

5.6 - Changing table ownership

As a superuser or table owner, you can reassign table ownership with ALTER TABLE...OWNER TO, as follows:.

As a superuser or table owner, you can reassign table ownership with ALTER TABLE...OWNER TO, as follows:

ALTER TABLE [schema.]table-name OWNER TO owner-name

Changing table ownership is useful when moving a table from one schema to another. Ownership reassignment is also useful when a table owner leaves the company or changes job responsibilities. Because you can change the table owner, the tables won't have to be completely rewritten, you can avoid loss in productivity.

Changing table ownership automatically causes the following changes:

  • Grants on the table that were made by the original owner are dropped and all existing privileges on the table are revoked from the previous owner. Changes in table ownership has no effect on schema privileges.

  • Ownership of dependent IDENTITY/AUTO-INCREMENT sequences are transferred with the table. However, ownership does not change for named sequences created with CREATE SEQUENCE. To transfer ownership of these sequences, use ALTER SEQUENCE.

  • New table ownership is propagated to its projections.

Example

In this example, user Bob connects to the database, looks up the tables, and transfers ownership of table t33 from himself to user Alice.

=> \c - Bob
You are now connected as user "Bob".
=> \d
 Schema |  Name  | Kind  |  Owner  | Comment
--------+--------+-------+---------+---------
 public | applog | table | dbadmin |
 public | t33    | table | Bob     |
(2 rows)
=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE

When Bob looks up database tables again, he no longer sees table t33:

=> \d                List of tables
               List of tables
 Schema |  Name  | Kind  |  Owner  | Comment
--------+--------+-------+---------+---------
 public | applog | table | dbadmin |
(1 row)

When user Alice connects to the database and looks up tables, she sees she is the owner of table t33.

=> \c - Alice
You are now connected as user "Alice".
=> \d
             List of tables
 Schema | Name | Kind  | Owner | Comment
--------+------+-------+-------+---------
 public | t33  | table | Alice |
(2 rows)

Alice or a superuser can transfer table ownership back to Bob. In the following case a superuser performs the transfer.

=> \c - dbadmin
You are now connected as user "dbadmin".
=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> \d
                List of tables
 Schema |   Name   | Kind  |  Owner  | Comment
--------+----------+-------+---------+---------
 public | applog   | table | dbadmin |
 public | comments | table | dbadmin |
 public | t33      | table | Bob     |
 s1     | t1       | table | User1   |
(4 rows)

You can also query system table V_CATALOG.TABLES to view table and owner information. Note that a change in ownership does not change the table ID.

In the below series of commands, the superuser changes table ownership back to Alice and queries the TABLES system table.


=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE
=> SELECT table_schema_id, table_schema, table_id, table_name, owner_id, owner_name FROM tables;
  table_schema_id  | table_schema |     table_id      | table_name |     owner_id      | owner_name
-------------------+--------------+-------------------+------------+-------------------+------------
 45035996273704968 | public       | 45035996273713634 | applog     | 45035996273704962 | dbadmin
 45035996273704968 | public       | 45035996273724496 | comments   | 45035996273704962 | dbadmin
 45035996273730528 | s1           | 45035996273730548 | t1         | 45035996273730516 | User1
 45035996273704968 | public       | 45035996273795846 | t33        | 45035996273724576 | Alice
(5 rows)

Now the superuser changes table ownership back to Bob and queries the TABLES table again. Nothing changes but the owner_name row, from Alice to Bob.

=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> SELECT table_schema_id, table_schema, table_id, table_name, owner_id, owner_name FROM tables;
  table_schema_id  | table_schema |     table_id      | table_name |     owner_id      | owner_name
-------------------+--------------+-------------------+------------+-------------------+------------
 45035996273704968 | public       | 45035996273713634 | applog     | 45035996273704962 | dbadmin
 45035996273704968 | public       | 45035996273724496 | comments   | 45035996273704962 | dbadmin
 45035996273730528 | s1           | 45035996273730548 | t1         | 45035996273730516 | User1
 45035996273704968 | public       | 45035996273793876 | foo        | 45035996273724576 | Alice
 45035996273704968 | public       | 45035996273795846 | t33        | 45035996273714428 | Bob
(5 rows)

6 - Sequences

Sequences let you set the default values of columns to sequential integer values.

Sequences let you set the default values of columns to sequential integer values. Sequences guarantee uniqueness, and avoid constraint enforcement problems and overhead. Sequences are especially useful for primary key columns.

While sequence object values are guaranteed to be unique, they are not guaranteed to be contiguous, so you might interpret the returned values as missing. For example, two nodes can increment a sequence at different rates. The node with a heavier processing load increments the sequence, but the values are not contiguous with those being incremented on a node with less processing.

Vertica supports the following sequence types:

  • Named sequences are database objects that generates unique numbers in sequential ascending or descending order. Named sequences are defined independently through CREATE SEQUENCE statements, and are managed independently of the tables that reference them. A table can set the default values of one or more columns to named sequences.

  • AUTO_INCREMENT/IDENTITY column sequences: Column constraints AUTO_INCREMENT and IDENTITY are synonyms that specify to increment or decrement a column's value as new rows are added. This sequence type is table-dependent and does not persist independently. A table can contain only one AUTO_INCREMENT or IDENTITY column.

6.1 - Sequence types compared

The following table lists the differences between the two sequence types:.

The following table lists the differences between the two sequence types:

Supported Behavior Named Sequence AUTO_INCREMENT/IDENTITY
Default cache value 250K
Set initial cache
Define start value
Specify increment unit
Exists as an independent object
Exists only as part of table
Create as column constraint
Requires name
Use in expressions
Unique across tables
Change parameters
Move to different schema
Set to increment or decrement
Grant privileges to object
Specify minimum value
Specify maximum value

6.2 - Named sequences

Named sequences are sequences that are defined by CREATE SEQUENCE.

Named sequences are sequences that are defined by CREATE SEQUENCE. While you can set the value of a table column to a named sequence, a named sequence, unlike AUTO_INCREMENT and IDENTITY sequences, exists independently of the table.

Named sequences are used most often when an application requires a unique identifier in a table or an expression. After a named sequence returns a value, it never returns the same value again in the same session.

6.2.1 - Creating and using named sequences

You create a named sequence with CREATE SEQUENCE.

You create a named sequence with CREATE SEQUENCE. The statement requires only a sequence name; all other parameters are optional. To create a sequence, a user must have CREATE privileges on a schema that contains the sequence.

The following example creates an ascending named sequence, my_seq, starting at the value 100:

=> CREATE SEQUENCE my_seq START 100;
CREATE SEQUENCE

Incrementing and decrementing a sequence

When you create a named sequence object, you can also specify its increment or decrement value by setting its INCREMENT parameter. If you omit this parameter, as in the previous example, the default is set to 1.

You increment or decrement a sequence by calling the function NEXTVAL on it—either directly on the sequence itself, or indirectly by adding new rows to a table that references the sequence. When called for the first time on a new sequence, NEXTVAL initializes the sequence to its start value. Vertica also creates a cache for the sequence. Subsequent NEXTVAL calls on the sequence increment its value.

The following call to NEXTVAL initializes the new my_seq sequence to 100:

=> SELECT NEXTVAL('my_seq');
 nextval
---------
     100
(1 row)

Getting a sequence's current value

You can obtain the current value of a sequence by calling CURRVAL on it. For example:

=> SELECT CURRVAL('my_seq');
 CURRVAL
---------
     100
(1 row)

Referencing sequences in tables

A table can set the default values of any column to a named sequence. The table creator must have the following privileges: SELECT on the sequence, and USAGE on its schema.

In the following example, column id gets its default values from named sequence my_seq:

=> CREATE TABLE customer(id INTEGER DEFAULT my_seq.NEXTVAL,
  lname VARCHAR(25),
  fname VARCHAR(25),
  membership_card INTEGER
);

For each row that you insert into table customer, the sequence invokes the NEXTVAL function to set the value of the id column. For example:

=> INSERT INTO customer VALUES (default, 'Carr', 'Mary', 87432);
=> INSERT INTO customer VALUES (default, 'Diem', 'Nga', 87433);
=> COMMIT;

For each row, the insert operation invokes NEXTVAL on the sequence my_seq, which increments the sequence to 101 and 102, and sets the id column to those values:

=> SELECT * FROM customer;
 id  | lname | fname | membership_card
-----+-------+-------+-----------------
 101 | Carr  | Mary  |           87432
 102 | Diem  | Nga   |           87433
(1 row)

6.2.2 - Distributing named sequences

When you create a named sequence, its CACHE parameter determines the number of sequence values each node maintains during a session.

When you create a named sequence, its CACHE parameter determines the number of sequence values each node maintains during a session. The default cache value is 250K, so each node reserves 250,000 values per session for each sequence. The default cache size provides an efficient means for large insert or copy operations.

If sequence caching is set to a low number, nodes are liable to request a new set of cache values more frequently. While it supplies a new cache, Vertica must lock the catalog. Until Vertica releases the lock, other database activities such as table inserts are blocked, which can adversely affect overall performance.

When a new session starts, node caches are initially empty. By default, the initiator node requests and reserves cache for all nodes in a cluster. You can change this default so each node requests its own cache, by setting configuration parameter ClusterSequenceCacheMode to 0.

For information on how Vertica requests and distributes cache among all nodes in a cluster, refer to Sequence caching.

Effects of distributed sessions

Vertica distributes a session across all nodes. The first time a cluster node calls the function NEXTVAL on a sequence to increment (or decrement) its value, the node requests its own cache of sequence values. The node then maintains that cache for the current session. As other nodes call NEXTVAL, they too create and maintain their own cache of sequence values.

During a session, nodes call NEXTVAL independently and at different frequencies. Each node uses its own cache to populate the sequence. All sequence values are guaranteed to be unique, but can be out of order with a NEXTVAL statement executed on another node. As a result, sequence values are often non-contiguous.

In all cases, increments a sequence only once per row. Thus, if the same sequence is referenced by multiple columns, NEXTVAL sets all columns in that row to the same value. This applies to rows of joined tables.

Calculating named sequences

Vertica calculates the current value of a sequence as follows:

  • At the end of every statement, the state of all sequences used in the session is returned to the initiator node.

  • The initiator node calculates the maximum CURRVAL of each sequence across all states on all nodes.

  • This maximum value is used as CURRVAL in subsequent statements until another NEXTVAL is invoked.

Losing sequence values

Sequence values in cache can be lost in the following situations:

  • If a statement fails after NEXTVAL is called (thereby consuming a sequence value from the cache), the value is lost.

  • If a disconnect occurs (for example, dropped session), any remaining values in cache that have not been returned through NEXTVAL are lost.

  • When the initiator node distributes a new block of cache to each node where one or more nodes has not used up its current cache allotment. For information on this scenario, refer to Sequence caching.

You can recover lost sequence values by using ALTER SEQUENCE...RESTART, which resets the sequence to the specified value in the next session.

6.2.3 - Altering sequences

ALTER SEQUENCE can change a named sequence in two ways:.

ALTER SEQUENCE can change a named sequence in two ways:

  • Reset parameters that control sequence behavior—for example, its start value, or range of minimum and maximum values. These changes take effect only when you start a new database session.

  • Reset sequence name, schema, or ownership. These changes take effect immediately.

Changing sequence behavior

ALTER SEQUENCE can change one or more sequence attributes through the following parameters:

These parameters... Control...
INCREMENT How much to increment or decrement the sequence on each call to NEXTVAL.
MINVALUE/MAXVALUE The range of valid integers.
RESTART The sequence value on its next call to NEXTVAL.
CACHE/NO CACHE How many sequence numbers are pre-allocated and stored in memory for faster access.
CYCLE/NO CYCLE Whether the sequence wraps when its minimum or maximum values are reached.

These changes take effect only when you start a new database session. For example, if you create a named sequence my_sequence that starts at 10 and increments by 1 (the default), each sequence call to NEXTVAL increments its value by 1:

=> CREATE SEQUENCE my_sequence START 10;
=> SELECT NEXTVAL('my_sequence');
 nextval
---------
      10
(1 row)
=> SELECT NEXTVAL('my_sequence');
 nextval
---------
      11
(1 row)

The following ALTER SEQUENCE statement specifies to restart the sequence at 50:

=>ALTER SEQUENCE my_sequence RESTART WITH 50;

However, this change has no effect in the current session. The next call to NEXTVAL increments the sequence to 12:

=> SELECT NEXTVAL('my_sequence');
 NEXTVAL
---------
      12
(1 row)

The sequence restarts at 50 only after you start a new database session:

=> \q
$ vsql
Welcome to vsql, the Vertica Analytic Database interactive terminal.

=> SELECT NEXTVAL('my_sequence');
 NEXTVAL
---------
      50
(1 row)

Changing sequence name, schema, and ownership

You can use ALTER SEQUENCE to make the following changes to a named sequence:

  • Rename it.

  • Move it to another schema.

  • Reassign ownership.

Each of these changes requires separate ALTER SEQUENCE statements. These changes take effect immediately.

For example, the following statement renames a sequence from my_seq to serial:

=> ALTER SEQUENCE s1.my_seq RENAME TO s1.serial;

This statement moves sequence s1.serial to schema s2:

=> ALTER SEQUENCE s1.my_seq SET SCHEMA TO s2;

The following statement reassigns ownership of s2.serial to another user:

=> ALTER SEQUENCE s2.serial OWNER TO bertie;

6.2.4 - Dropping sequences

Use DROP SEQUENCE to remove a named sequence.

Use DROP SEQUENCE to remove a named sequence. For example:

=> DROP SEQUENCE my_sequence;

You cannot drop a sequence if one of the following conditions is true:

  • Other objects depend on the sequence. DROP SEQUENCE does not support cascade operations.

  • A column's DEFAULT expression references the sequence. Before dropping the sequence, you must remove all column references to it.

6.3 - AUTO_INCREMENT and IDENTITY sequences

Column constraints AUTO_INCREMENT and IDENTITY are synonyms that associate a column with a sequence.

Column constraints AUTO_INCREMENT and IDENTITY are synonyms that associate a column with a sequence. This sequence automatically increments the column value as new rows are added.

You define an AUTO_INCREMENT/IDENTITY column in a table as follows:

CREATE TABLE table-name...
  (column-name {AUTO_INCREMENT | IDENTITY} [(args)], ...)

where args is 1 to 3 optional arguments that let you control sequence behavior (see Arguments below).

AUTO_INCREMENT/IDENTITY sequences are owned by the table in which they are defined, and do not exist outside that table. Unlike named sequences, you cannot manage an AUTO_INCREMENT/IDENTITY sequence with ALTER SEQUENCE. For example, you cannot change the schema of an AUTO_INCREMENT/IDENTITY sequence independently of its table. If you move the table to another schema, the sequence automatically moves with it.

You can obtain the last value generated for an AUTO_INCREMENT/IDENTITY sequence by calling Vertica meta-function LAST_INSERT_ID.

Arguments

AUTO_INCREMENT/``IDENTITY constraints can take between 0 and three arguments. These arguments let you specify the column's start value, how much it increments or decrements, and how many unique numbers each node caches per session.

You specify these arguments as follows:

# arguments Description
None

The following default settings apply:

  • The starting value is 1.

  • Values increment by at least 1.

  • Each node caches 250,000 unique numbers per session for this sequence.

1

Specifies how many unique numbers each node can cache per session, as follows:

  • >1 specifies how many unique numbers each node caches per session.

  • 0 or 1 specifies to disable caching.

IDENTITY(cache)

Default: 250,000

2 or 3

Set as follows:

{AUTO_INCREMENT | IDENTITY} (start, increment[, cache)] )

  • start: The first value that is set for this column.

    Default: 1

  • increment: A positive or negative integer that specifies the minimum amount to increment or decrement the column value from its value in the previous row.

    Default: 1

  • cache: One of the following:

    • >1 specifies how many unique numbers each node can cache per session for this sequence.

    • 0 or 1 specifies to disable caching.

    Default: 250,000

    For details, see Sequence caching.

Restrictions

The following restrictions apply to AUTO_INCREMENT/IDENTITY columns:

  • A table can contain only one AUTO_INCREMENT/IDENTITY column.

  • AUTO_INCREMENT/``IDENTITY values are never rolled back, even if a transaction that tries to insert a value into a table is not committed.

  • You cannot change the value of an AUTO_INCREMENT/IDENTITY column.

Examples

The following example shows how to use the IDENTITY column-constraint to create a table with an ID column. The ID column has an initial value of 1. It is incremented by 1 every time a row is inserted.

  1. Create table Premium_Customer:

    => CREATE TABLE Premium_Customer(
         ID IDENTITY(1,1),
         lname VARCHAR(25),
         fname VARCHAR(25),
         store_membership_card INTEGER
    );
    => INSERT INTO Premium_Customer (lname, fname, store_membership_card )
         VALUES ('Gupta', 'Saleem', 475987);
    

    The IDENTITY column has a seed of 1, which specifies the value for the first row loaded into the table, and an increment of 1, which specifies the value that is added to the IDENTITY value of the previous row.

  2. Confirm the row you added and see the ID value:

    => SELECT * FROM Premium_Customer;
     ID | lname | fname  | store_membership_card
    ----+-------+--------+-----------------------
      1 | Gupta | Saleem |                475987
    (1 row)
    
  3. Add another row:

    => INSERT INTO Premium_Customer (lname, fname, store_membership_card)
       VALUES ('Lee', 'Chen', 598742);
    
  4. Call the Vertica function LAST_INSERT_ID. The function returns value 2 because you previously inserted a new customer (Chen Lee), and this value is incremented each time a row is inserted:

    => SELECT LAST_INSERT_ID();
     last_insert_id
    ----------------
                   2
    (1 row)
    
  5. View all the ID values in the Premium_Customer table:

    => SELECT * FROM Premium_Customer;
     ID | lname | fname  | store_membership_card
    ----+-------+--------+-----------------------
      1 | Gupta | Saleem |                475987
      2 | Lee   | Chen   |                598742
    (2 rows)
    

The next three examples illustrate the three valid ways to use IDENTITY arguments.These examples are valid for the AUTO_INCREMENT argument also.

The first example uses a cache of 100, and the defaults for start value (1) and increment value (1):

=> CREATE TABLE t1(x IDENTITY(100), y INT);

The next example specifies the start and increment values as 1, and defaults to a cache value of 250,000:

=> CREATE TABLE t2(y IDENTITY(1,1), x INT);

The third example specifies start and increment values of 1, and a cache value of 100:

=> CREATE TABLE t3(z IDENTITY(1,1,100), zx INT);

6.4 - Sequence caching

Caching is similar for all sequence types: named sequences, identity sequences, and auto-increment sequences.

Caching is similar for all sequence types: named sequences, identity sequences, and auto-increment sequences. To allocate cache among the nodes in a cluster for a given sequences, Vertica uses the following process.

  1. By default, when a session begins, the cluster initiator node requests cache for itself and other nodes in the cluster.

  2. The initiator node distributes cache to other nodes when it distributes the execution plan.

  3. Because the initiator node requests caching for all nodes, only the initiator locks the global catalog for the cache request.

This approach is optimal for handling large INSERT-SELECT and COPY operations. The following figure shows how the initiator request and distributes cache for a named sequence in a three-node cluster, where caching for that sequence is set to 250 K:

Nodes run out of cache at different times. While executing the same query, nodes individually request additional cache as needed.

For new queries in the same session, the initiator might have an empty cache if it used all of its cache to execute the previous query execution. In this case, the initiator requests cache for all nodes.

You can change how nodes obtain sequence caches by setting the configuration parameter ClusterSequenceCacheMode to 0 (disabled). When this parameter is set to 0, all nodes in the cluster request their own cache and catalog lock. However, for initial large INSERT-SELECT and COPY operations, when the cache is empty for all nodes, each node requests cache at the same time. These multiple requests result in multiple simultaneous locks on the global catalog, which can adversely affect performance. For this reason, ClusterSequenceCacheMode should remain set to its default value of 1 (enabled).

The following example compares how different settings of ClusterSequenceCacheMode affect how Vertica manages sequence caching. The example assumes a three-node cluster, 250 K caches for each node (the default), and sequence ID values that increment by 1.

Workflow step ClusterSequenceCacheMode = 1 ClusterSequenceCacheMode = 0
1

Cache is empty for all nodes.

Initiator node requests 250 K cache for each node.

Cache is empty for all nodes.

Each node, including initiator, requests its own 250 K cache.

2

Blocks of cache are distributed to each node as follows:

  • Node 1: 0–250 K

  • Node 2: 250 K + 1 to 500 K

  • Node 3: 500 K + 1 to 750 K

Each node begins to use its cache as it processes sequence updates.

3

Initiator node and node 3 run out of cache.

Node 2 only uses 250 K +1 to 400 K, 100 K of cache remains from 400 K +1 to 500 K.

4

Executing same statement:

  • As each node uses up its cache, it requests a new cache allocation.

  • If node 2 never uses its cache, the 100-K unused cache becomes a gap in sequence IDs.

Executing a new statement in same session, if initiator node cache is empty:

  • It requests and distributes new cache blocks for all nodes.

  • Nodes receive a new cache before the old cache is used, creating a gap in ID sequencing.

Executing same or new statement:

  • As each node uses up its cache, it requests a new cache allocation.

  • If node 2 never uses its cache, the 100 K unused cache becomes a gap in sequence IDs.

7 - Merging table data

MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set.

MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set. The join can match a source row with only one target row; otherwise, Vertica returns an error.

MERGE has the following syntax:

MERGE INTO target-table USING source-dataset ON  join-condition
matching-clause[ matching-clause ]

Merge operations have at least three components:

7.1 - Basic MERGE example

In this example, a merge operation involves two tables:.

In this example, a merge operation involves two tables:

  • visits_daily logs daily restaurant traffic, and is updated with each customer visit. Data in this table is refreshed every 24 hours.

  • visits_history stores the history of customer visits to various restaurants, accumulated over an indefinite time span.

Each night, you merge the daily visit count from visits_daily into visits_history. The merge operation modifies the target table in two ways:

  • Updates existing customer data.

  • Inserts new rows of data for first-time customers.

One MERGE statement executes both operations as a single (upsert) transaction.

Source and target tables

The source and target tables visits_daily and visits_history are defined as follows:

CREATE TABLE public.visits_daily
(
    customer_id int,
    location_name varchar(20),
    visit_time time(0) DEFAULT (now())::timetz(6)
);

CREATE TABLE public.visits_history
(
    customer_id int,
    location_name varchar(20),
    visit_count int
);

Table visits_history contains rows of three customers who between them visited two restaurants, Etoile and LaRosa:

=> SELECT * FROM visits_history ORDER BY customer_id, location_name;
 customer_id | location_name | visit_count
-------------+---------------+-------------
        1001 | Etoile        |           2
        1002 | La Rosa       |           4
        1004 | Etoile        |           1
(3 rows)

By close of business, table visits_daily contains three rows of restaurant visits:

=> SELECT * FROM visits_daily ORDER BY customer_id, location_name;
 customer_id | location_name | visit_time
-------------+---------------+------------
        1001 | Etoile        | 18:19:29
        1003 | Lux Cafe      | 08:07:00
        1004 | La Rosa       | 11:49:20
(3 rows)

Table data merge

The following MERGE statement merges visits_daily data into visits_history:

  • For matching customers, MERGE updates the occurrence count.

  • For non-matching customers, MERGE inserts new rows.

=> MERGE INTO visits_history h USING visits_daily d
    ON (h.customer_id=d.customer_id AND h.location_name=d.location_name)
    WHEN MATCHED THEN UPDATE SET visit_count = h.visit_count  + 1
    WHEN NOT MATCHED THEN INSERT (customer_id, location_name, visit_count)
    VALUES (d.customer_id, d.location_name, 1);
 OUTPUT
--------
      3
(1 row)

MERGE returns the number of rows updated and inserted. In this case, the returned value specifies three updates and inserts:

  • Customer 1001's third visit to Etoile

  • New customer 1003's first visit to new restaurant Lux Cafe

  • Customer 1004's first visit to La Rosa

If you now query table visits_history, the result set shows the merged (updated and inserted) data. Updated and new rows are highlighted:

7.2 - MERGE source options

A MERGE operation joins the target table to one of the following data sources:.

A MERGE operation joins the target table to one of the following data sources:

  • Another table

  • View

  • Subquery result set

Merging from table and view data

You merge data from one table into another as follows:

MERGE INTO target-table USING { source-table | source-view } join-condition
   matching-clause[ matching-clause ]

If you specify a view, Vertica expands the view name to the query that it encapsulates, and uses the result set as the merge source data.

For example, the VMart table public.product_dimension contains current and discontinued products. You can move all discontinued products into a separate table public.product_dimension_discontinued, as follows:

=> CREATE TABLE public.product_dimension_discontinued (
     product_key int,
     product_version int,
     sku_number char(32),
     category_description char(32),
     product_description varchar(128));

=> MERGE INTO product_dimension_discontinued tgt
     USING product_dimension src ON tgt.product_key = src.product_key
                                AND tgt.product_version = src.product_version
     WHEN NOT MATCHED AND src.discontinued_flag='1' THEN INSERT VALUES
       (src.product_key,
        src.product_version,
        src.sku_number,
        src.category_description,
        src.product_description);
 OUTPUT
--------
   1186
(1 row)

Source table product_dimension uses two columns, product_key and product_version, to identify unique products. The MERGE statement joins the source and target tables on these columns in order to return single instances of non-matching rows. The WHEN NOT MATCHED clause includes a filter (src.discontinued_flag='1'), which reduces the result set to include only discontinued products. The remaining rows are inserted into target table product_dimension_discontinued.

Merging from a subquery result set

You can merge into a table the result set that is returned by a subquery, as follows:

MERGE INTO target-table USING (subquery) sq-alias join-condition
   matching-clause[ matching-clause ]

For example, the VMart table public.product_dimension is defined as follows (DDL truncated):

CREATE TABLE public.product_dimension
(
    product_key int NOT NULL,
    product_version int NOT NULL,
    product_description varchar(128),
    sku_number char(32),
    ...
)
ALTER TABLE public.product_dimension
    ADD CONSTRAINT C_PRIMARY PRIMARY KEY (product_key, product_version) DISABLED;

Columns product_key and product_version comprise the table's primary key. You can modify this table so it contains a single column that concatenates the values of these two columns. This column can be used to uniquely identify each product, while also maintaining the original values from product_key and product_version.

You populate the new column with a MERGE statement that queries the other two columns:

=> ALTER TABLE public.product_dimension ADD COLUMN product_ID numeric(8,2);
ALTER TABLE

=> MERGE INTO product_dimension tgt
     USING (SELECT (product_key||'.0'||product_version)::numeric(8,2) AS pid, sku_number
     FROM product_dimension) src
     ON tgt.product_key||'.0'||product_version::numeric=src.pid
     WHEN MATCHED THEN UPDATE SET product_ID = src.pid;
 OUTPUT
--------
  60000
(1 row)

The following query verifies that the new column values correspond to the values in product_key and product_version:

=> SELECT product_ID, product_key, product_version, product_description
   FROM product_dimension
   WHERE category_description = 'Medical'
     AND product_description ILIKE '%diabetes%'
     AND discontinued_flag = 1 ORDER BY product_ID;
 product_ID | product_key | product_version |           product_description
------------+-------------+-----------------+-----------------------------------------
    5836.02 |        5836 |               2 | Brand #17487 diabetes blood testing kit
   14320.02 |       14320 |               2 | Brand #43046 diabetes blood testing kit
   18881.01 |       18881 |               1 | Brand #56743 diabetes blood testing kit
(3 rows)

7.3 - MERGE matching clauses

MERGE supports one instance of the following matching clauses:.

MERGE supports one instance of the following matching clauses:

  • [WHEN MATCHED THEN UPDATE SET](#WHEN_MATCHED)
  • [WHEN NOT MATCHED THEN INSERT](#WHEN_NOT_MATCHED)

Each matching clause can specify an additional filter, as described in Update and insert filters.

WHEN MATCHED THEN UPDATE SET

Updates all target table rows that are joined to the source table, typically with data from the source table:

WHEN MATCHED [ AND update-filter ] THEN UPDATE
   SET { target-column = expression }[,...]

Vertica can execute the join only on unique values in the source table's join column. If the source table's join column contains more than one matching value, the MERGE statement returns with a run-time error.

WHEN NOT MATCHED THEN INSERT

WHEN NOT MATCHED THEN INSERT inserts into the target table a new row for each source table row that is excluded from the join:

WHEN NOT MATCHED [ AND insert-filter ] THEN INSERT
   [ ( column-list ) ] VALUES ( values-list )

column-list is a comma-delimited list of one or more target columns in the target table, listed in any order. MERGE maps column-list columns to values-list values in the same order, and each column-value pair must be compatible. If you omit column-list, Vertica maps values-list values to columns according to column order in the table definition.

For example, given the following source and target table definitions:

CREATE TABLE t1 (a int, b int, c int);
CREATE TABLE t2 (x int, y int, z int);

The following WHEN NOT MATCHED clause implicitly sets the values of the target table columns a, b, and c in the newly inserted rows:

MERGE INTO t1 USING t2 ON t1.a=t2.x
   WHEN NOT MATCHED THEN INSERT VALUES (t2.x, t2.y, t2.z);

In contrast, the following WHEN NOT MATCHED clause excludes columns t1.b and t2.y from the merge operation. The WHEN NOT MATCHED clause explicitly pairs two sets of columns from the target and source tables: t1.a to t2.x, and t1.c to t2.z. Vertica sets excluded column t1.b. to null:

MERGE INTO t1 USING t2 ON t1.a=t2.x
   WHEN NOT MATCHED THEN INSERT (a, c) VALUES (t2.x, t2.z);

7.4 - Update and insert filters

Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:.

Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:

WHEN MATCHED AND update-filter THEN UPDATE ...
WHEN NOT MATCHED AND insert-filter THEN INSERT ...

Vertica also supports Oracle syntax for specifying update and insert filters:

WHEN MATCHED THEN UPDATE SET column-updates WHERE update-filter
WHEN NOT MATCHED THEN INSERT column-values WHERE insert-filter

Each filter can specify multiple conditions. Vertica handles the filters as follows:

  • An update filter is applied to the set of matching rows in the target table that are returned by the MERGE join. For each row where the update filter evaluates to true, Vertica updates the specified columns.

  • An insert filter is applied to the set of source table rows that are excluded from the MERGE join. For each row where the insert filter evaluates to true, Vertica adds a new row to the target table with the specified values.

For example, given the following data in tables t11 and t22:


=> SELECT * from t11 ORDER BY pk;
 pk | col1 | col2 | SKIP_ME_FLAG
----+------+------+--------------
  1 |    2 |    3 | t
  2 |    3 |    4 | t
  3 |    4 |    5 | f
  4 |      |    6 | f
  5 |    6 |    7 | t
  6 |      |    8 | f
  7 |    8 |      | t
(7 rows)

=> SELECT * FROM t22 ORDER BY pk;
 pk | col1 | col2
----+------+------
  1 |    2 |    4
  2 |    4 |    8
  3 |    6 |
  4 |    8 |   16
(4 rows)

You can merge data from table t11 into table t22 with the following MERGE statement, which includes update and insert filters:

=> MERGE INTO t22 USING t11 ON ( t11.pk=t22.pk )
   WHEN MATCHED
       AND t11.SKIP_ME_FLAG=FALSE AND (
         COALESCE (t22.col1<>t11.col1, (t22.col1 is null)<>(t11.col1 is null))
       )
   THEN UPDATE SET col1=t11.col1, col2=t11.col2
   WHEN NOT MATCHED
      AND t11.SKIP_ME_FLAG=FALSE
   THEN INSERT (pk, col1, col2) VALUES (t11.pk, t11.col1, t11.col2);
 OUTPUT
--------
      3
(1 row)

=> SELECT * FROM t22 ORDER BY pk;
 pk | col1 | col2
----+------+------
  1 |    2 |    4
  2 |    4 |    8
  3 |    4 |    5
  4 |      |    6
  6 |      |    8
(5 rows)

Vertica uses the update and insert filters as follows:

  • Evaluates all matching rows against the update filter conditions. Vertica updates each row where the following two conditions both evaluate to true:

    • Source column t11.SKIP_ME_FLAG is set to false.

    • The COALESCE function evaluates to true.

  • Evaluates all non-matching rows in the source table against the insert filter. For each row where column t11.SKIP_ME_FLAG is set to false, Vertica inserts a new row in the target table.

7.5 - MERGE optimization

You can improve MERGE performance in several ways:.

You can improve MERGE performance in several ways:

Projections for MERGE operations

The Vertica query optimizer automatically chooses the best projections to implement a merge operation. A good projection design strategy provides projections that help the query optimizer avoid extra sort and data transfer operations, and facilitate MERGE performance.

For example, the following MERGE statement fragment joins source and target tables tgt and src, respectively, on columns tgt.a and src.b:

=> MERGE INTO tgt USING src ON tgt.a = src.b ...

Vertica can use a local merge join if projections for tables tgt and src use one of the following projection designs, where inputs are presorted by projection ORDER BY clauses:

  • Replicated projections are sorted on:

    • Column a for table tgt

    • Column b for table src

  • Segmented projections are identically segmented on:

    • Column a for table tgt

    • Column b for table src

    • Corresponding segmented columns

Optimizing MERGE query plans

Vertica prepares an optimized query plan if the following conditions are all true:

  • The MERGE statement contains both matching clauses WHEN MATCHED THEN UPDATE SET and WHEN NOT MATCHED THEN INSERT. If the MERGE statement contains only one matching clause, it uses a non-optimized query plan.

  • The MERGE statement excludes update and insert filters.

  • The target table join column has a unique or primary key constraint. This requirement does not apply to the source table join column.

  • Both matching clauses specify all columns in the target table.

  • Both matching clauses specify identical source values.

For details on evaluating an EXPLAIN-generated query plan, see MERGE path.

The examples that follow use a simple schema to illustrate some of the conditions under which Vertica prepares or does not prepare an optimized query plan for MERGE:

CREATE TABLE target(a INT PRIMARY KEY, b INT, c INT) ORDER BY b,a;
CREATE TABLE source(a INT, b INT, c INT) ORDER BY b,a;
INSERT INTO target VALUES(1,2,3);
INSERT INTO target VALUES(2,4,7);
INSERT INTO source VALUES(3,4,5);
INSERT INTO source VALUES(4,6,9);
COMMIT;

Optimized MERGE statement

Vertica can prepare an optimized query plan for the following MERGE statement because:

  • The target table's join column t.a has a primary key constraint.

  • All columns in the target table (a,b,c) are included in the UPDATE and INSERT clauses.

  • The UPDATE and INSERT clauses specify identical source values: s.a, s.b, and s.c.

MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a, b=s.b, c=s.c
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);

 OUTPUT
--------
2
(1 row)

The output value of 2 indicates success and denotes the number of rows updated/inserted from the source into the target.

Non-optimized MERGE statement

In the next example, the MERGE statement runs without optimization because the source values in the UPDATE/INSERT clauses are not identical. Specifically, the UPDATE clause includes constants for columns s.a and s.c and the INSERT clause does not:


MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c - 1
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);

To make the previous MERGE statement eligible for optimization, rewrite the statement so that the source values in the UPDATE and INSERT clauses are identical:


MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c -1
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a + 1, s.b, s.c - 1);

7.6 - MERGE restrictions

The following restrictions apply to updating and inserting table data with MERGE.

The following restrictions apply to updating and inserting table data with MERGE.

Constraint enforcement

If primary key, unique key, or check constraints are enabled for automatic enforcement in the target table, Vertica enforces those constraints when you load new data. If a violation occurs, Vertica rolls back the operation and returns an error.

Columns prohibited from merge

The following columns cannot be specified in a merge operation; attempts to do so return with an error:

8 - Removing table data

Vertica provides several ways to remove data from a table:.

Vertica provides several ways to remove data from a table:

Delete operation Description
Drop a table Permanently remove a table and its definition, optionally remove associated views and projections.
Delete table rows Mark rows with delete vectors and store them so data can be rolled back to a previous epoch. The data must be purged to reclaim disk space.
Truncate table data Remove all storage and history associated with a table. The table structure is preserved for future use.
Purge data Permanently remove historical data from physical storage and free disk space for reuse.
Drop partitions Remove one more partitions from a table. Each partition contains a related subset of data in the table. Dropping partitioned data is efficient, and provides query performance benefits.

8.1 - Data removal operations compared

/need to include purge operations? or is that folded into DELETE operations?/.

The following table summarizes differences between various data removal operations.

Operations and options Performance Auto commits Saves history
DELETE FROM *table* Normal No Yes
DELETE FROM *temp-table* High No No

DELETE FROM table

where-clause

Normal No Yes

DELETE FROM temp-table

where-clause

Normal No Yes
DELETE FROM temp-table where-clause 
ON COMMIT PRESERVE ROWS
Normal No Yes
DELETE FROM temp-table where-clause 
ON COMMIT DELETE ROWS
High Yes No
DROP table High Yes No
TRUNCATE table High Yes No
TRUNCATE temp-table High Yes No
SELECT DROP_PARTITIONS (...) High Yes No

Choosing the best operation

The following table can help you decide which operation is best for removing table data:

If you want to... Use...
Delete both table data and definitions and start from scratch. DROP TABLE
Quickly drop data while preserving table definitions, and reload data. TRUNCATE TABLE
Regularly perform bulk delete operations on logical sets of data. DROP_PARTITIONS
Occasionally perform small deletes with the option to roll back or review history. DELETE

8.2 - Optimizing DELETE and UPDATE

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries.

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries. A DELETE and UPDATE operation must update all projections, so the operation can only be as fast as the slowest projection.

To improve the performance of DELETE and UPDATE queries, consider the following issues:

  • Query performance after large DELETE operations: Vertica's implementation of DELETE differs from traditional databases: it does not delete data from disk storage; rather, it marks rows as deleted so they are available for historical queries. Deletion of 10% or more of the total rows in a table can adversely affect queries on that table. In that case, consider purging those rows to improve performance.
  • Recovery performance: Recovery is the action required for a cluster to restore K-safety after a crash. Large numbers of deleted records can degrade the performance of a recovery. To improve recovery performance, purge the deleted rows.
  • Concurrency: DELETE and UPDATE take exclusive locks on the table. Only one DELETE or UPDATE transaction on a table can be in progress at a time and only when no load operations are in progress. Delete and update operations on different tables can run concurrently.

Projection column requirements for optimized delete

A projection is optimized for delete and update operations if it contains all columns required by the query predicate. In general, DML operations are significantly faster when performed on optimized projections than on non-optimized projections.

For example, consider the following table and projections:

=> CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER);
=> CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a;
=> CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;

In the following query, both p1 and p2 are eligible for DELETE and UPDATE optimization because column a is available:

=> DELETE from t WHERE a = 1;

In the following example, only projection p1 is eligible for DELETE and UPDATE optimization because the b column is not available in p2:

=> DELETE from t WHERE b = 1;

Optimized DELETE in subqueries

To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE statement's WHERE clause must be in the projection definition.

For example, the following simple schema has two tables and three projections:

=> CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
=> CREATE TABLE tb2 (g INT, h INT, i INT, j INT);

The first projection references all columns in tb1 and sorts on column a:

=> CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;

The buddy projection references and sorts on column a in tb1:

=> CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;

This projection references all columns in tb2 and sorts on column i:

=> CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;

Consider the following DML statement, which references tb1.a in its WHERE clause. Since both projections on tb1 contain column a, both are eligible for the optimized DELETE:

=> DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);

Restrictions

Optimized DELETE operations are not supported under the following conditions:

  • With replicated projections if subqueries reference the target table. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e = tb1.e);
    
  • With subqueries that do not return multiple rows. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
    

Projection sort order for optimizing DELETE

Design your projections so that frequently-used DELETE or UPDATE predicate columns appear in the sort order of all projections for large DELETE and UPDATE operations.

For example, suppose most of the DELETE queries you perform on a projection look like the following:

=> DELETE from t where time_key < '1-1-2007'

To optimize the delete operations, make time_key appear in the ORDER BY clause of all projections. This schema design results in better performance of the delete operation.

In addition, add sort columns to the sort order such that each combination of the sort key values uniquely identifies a row or a small set of rows. For more information, see Choosing sort order: best practices. To analyze projections for sort order issues, use the EVALUATE_DELETE_PERFORMANCE function.

8.3 - Purging deleted data

In Vertica, delete operations do not remove rows from physical storage.

In Vertica, delete operations do not remove rows from physical storage. DELETE marks rows as deleted, as does UPDATE, which combines delete and insert operations. In both cases, Vertica retains discarded rows as historical data, which remains accessible to historical queries until it is purged.

The cost of retaining historical data is twofold:

  • Disk space is allocated to deleted rows and delete markers.

  • Typical (non-historical) queries must read and skip over deleted data, which can impact performance.

A purge operation permanently removes historical data from physical storage and frees disk space for reuse. Only historical data that precedes the Ancient History Mark (AHM) is eligible to be purged.

You can purge data in two ways:

In both cases, Vertica purges all historical data up to and including the AHM epoch and resets the AHM. See Epochs for additional information about how Vertica uses epochs.

8.3.1 - Setting a purge policy

The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged.

The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged. Eligible data is automatically purged when the Tuple Mover performs mergeout operations.

Vertica provides two methods for determining when deleted data is eligible to be purged:

  • Specifying the time for which delete data is saved

  • Specifying the number of epochs that are saved

Specifying the time for which delete data is saved

Specifying the time for which delete data is saved is the preferred method for determining which deleted data can be purged. By default, Vertica saves historical data only when nodes are down.

To change the specified time for saving deleted data, use the HistoryRetentionTime configuration parameter:

=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = {seconds | -1};

In the above syntax:

  • seconds is the amount of time (in seconds) for which to save deleted data.

  • -1 indicates that you do not want to use the HistoryRetentionTime configuration parameter to determine which deleted data is eligible to be purged. Use this setting if you prefer to use the other method (HistoryRetentionEpochs) for determining which deleted data can be purged.

The following example sets the history epoch retention level to 240 seconds:

=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = 240;

Specifying the number of epochs that are saved

Unless you have a reason to limit the number of epochs, Vertica recommends that you specify the time over which delete data is saved.

To specify the number of historical epoch to save through the HistoryRetentionEpochs configuration parameter:

  1. Turn off the HistoryRetentionTime configuration parameter:

    => ALTER DATABASE DEFAULT SET HistoryRetentionTime = -1;
    
  2. Set the history epoch retention level through the HistoryRetentionEpochs configuration parameter:

    => ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = {num_epochs | -1};
    
    • num_epochs is the number of historical epochs to save.

    • -1 indicates that you do not want to use the HistoryRetentionEpochs configuration parameter to trim historical epochs from the epoch map. By default, HistoryRetentionEpochs is set to -1.

The following example sets the number of historical epochs to save to 40:

=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = 40;

Modifications are immediately implemented across all nodes within the database cluster. You do not need to restart the database.

See Epoch management parameters for additional details. See Epochs for information about how Vertica uses epochs.

Disabling purge

If you want to preserve all historical data, set the value of both historical epoch retention parameters to -1, as follows:

=> ALTER DABABASE mydb SET HistoryRetentionTime = -1;
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = -1;

8.3.2 - Manually purging data

You manually purge deleted data as follows:.

You manually purge deleted data as follows:

  1. Set the cut-off date for purging deleted data. First, call one of the following functions to verify the current ancient history mark (AHM):

    • GET_AHM_TIME returns a TIMESTAMP value of the AHM.

    • GET_AHM_EPOCH returns the number of the epoch in which the AHM is located.

  2. Set the AHM to the desired cut-off date with one of the following functions:

    If you call SET_AHM_TIME, keep in mind that the timestamp you specify is mapped to an epoch, which by default has a three-minute granularity. Thus, if you specify an AHM time of 2008-01-01 00:00:00.00, Vertica might purge data from the first three minutes of 2008, or retain data from last three minutes of 2007.

  3. Purge deleted data from the desired projections with one of the following functions:

    The tuple mover performs a mergeout operation to purge the data. Vertica periodically invokes the tuple mover to perform mergeout operations, as configured by tuple mover parameters. You can manually invoke the tuple mover by calling the function DO_TM_TASK.

See Epochs for additional information about how Vertica uses epochs.

8.4 - Truncating tables

TRUNCATE TABLE removes all storage associated with the target table and its projections.

TRUNCATE TABLE removes all storage associated with the target table and its projections. Vertica preserves the table and the projection definitions. If the truncated table has out-of-date projections, those projections are cleared and marked up-to-date when TRUNCATE TABLE returns.

TRUNCATE TABLE commits the entire transaction after statement execution, even if truncating the table fails. You cannot roll back a TRUNCATE TABLE statement.

Use TRUNCATE TABLE for testing purposes. You can use it to remove all data from a table and load it with fresh data, without recreating the table and its projections.

Table locking

TRUNCATE TABLE takes an O (owner) lock on the table until the truncation process completes. The savepoint is then released.

If the operation cannot obtain an O lock on the target table, Vertica tries to close any internal Tuple Mover sessions that are running on that table. If successful, the operation can proceed. Explicit Tuple Mover operations that are running in user sessions do not close. If an explicit Tuple Mover operation is running on the table, the operation proceeds only when the operation is complete.

Restrictions

You cannot truncate an external table.

Examples

=> INSERT INTO sample_table (a) VALUES (3);
=> SELECT * FROM sample_table;
a
---
3
(1 row)
=> TRUNCATE TABLE sample_table;
TRUNCATE TABLE
=> SELECT * FROM sample_table;
a
---
(0 rows)

9 - Rebuilding tables

You can reclaim disk space on a large scale by rebuilding tables, as follows:.

You can reclaim disk space on a large scale by rebuilding tables, as follows:

  1. Create a table with the same (or similar) definition as the table to rebuild.

  2. Create projections for the new table.

  3. Copy data from the target table into the new one with INSERT...SELECT.

  4. Drop the old table and its projections.

  5. Rename the new table with ALTER TABLE...RENAME, using the name of the old table.

Projection considerations

  • You must have enough disk space to contain the old and new projections at the same time. If necessary, you can drop some of the old projections before loading the new table. You must, however, retain at least one superprojection of the old table (or two buddy superprojections to maintain K-safety) until the new table is loaded. See Prepare disk storage locations for disk space requirements.

  • You can specify different names for the new projections or use ALTER TABLE...RENAME to change the names of the old projections.

  • The relationship between tables and projections does not depend on object names. Instead, it depends on object identifiers that are not affected by rename operations. Thus, if you rename a table, its projections continue to work normally.

10 - Dropping tables

DROP TABLE drops a table from the database catalog.

DROP TABLE drops a table from the database catalog. If any projections are associated with the table, DROP TABLE returns an error message unless it also includes the CASCADE option. One exception applies: the table only has an auto-generated superprojection (auto-projection) associated with it.

Using CASCADE

In the following example, DROP TABLE tries to remove a table that has several projections associated with it. Because it omits the CASCADE option, Vertica returns an error:

=> DROP TABLE d1;
NOTICE: Constraint - depends on Table d1
NOTICE: Projection d1p1 depends on Table d1
NOTICE: Projection d1p2 depends on Table d1
NOTICE: Projection d1p3 depends on Table d1
NOTICE: Projection f1d1p1 depends on Table d1
NOTICE: Projection f1d1p2 depends on Table d1
NOTICE: Projection f1d1p3 depends on Table d1
ERROR: DROP failed due to dependencies: Cannot drop Table d1 because other objects depend on it
HINT: Use DROP ... CASCADE to drop the dependent objects too.
=> DROP TABLE d1 CASCADE;
DROP TABLE
=> CREATE TABLE mytable (a INT, b VARCHAR(256));
CREATE TABLE
=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Doesn't exist
NOTICE:  Nothing was dropped
DROP TABLE

The next attempt includes the CASCADE option and succeeds:

=> DROP TABLE d1 CASCADE;
DROP TABLE
=> CREATE TABLE mytable (a INT, b VARCHAR(256));
CREATE TABLE
=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Doesn't exist
NOTICE:  Nothing was dropped
DROP TABLE

Using IF EXISTS

In the following example, DROP TABLE includes the option IF EXISTS. This option specifies not to report an error if one or more of the tables to drop does not exist. This clause is useful in SQL scripts—for example, to ensure that a table is dropped before you try to recreate it:

=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Table doesn't exist
NOTICE:  Nothing was dropped
DROP TABLE

Dropping and restoring view tables

Views that reference a table that is dropped and then replaced by another table with the same name continue to function and use the contents of the new table. The new table must have the same column definitions.