This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Merging table data

MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set.

1: Basic MERGE example
2: MERGE source options
3: MERGE matching clauses
4: Update and insert filters
5: MERGE optimization
6: MERGE restrictions

MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set. The join can match a source row with only one target row; otherwise, Vertica returns an error.

MERGE has the following syntax:

MERGE INTO target-table USING source-dataset ON  join-condition
matching-clause[ matching-clause ]

Merge operations have at least three components:

The target table on which to perform update and insert operations. MERGE takes an X (exclusive) lock on the target table until the merge operation is complete.
Join to another data set, one of the following: a table, view, or subquery result set.
One or both matching clauses: WHEN MATCHED THEN UPDATE SET and WHEN NOT MATCHED THEN INSERT.

1 - Basic MERGE example

In this example, a merge operation involves two tables:.

In this example, a merge operation involves two tables:

visits_daily logs daily restaurant traffic, and is updated with each customer visit. Data in this table is refreshed every 24 hours.
visits_history stores the history of customer visits to various restaurants, accumulated over an indefinite time span.

Each night, you merge the daily visit count from visits_daily into visits_history. The merge operation modifies the target table in two ways:

Updates existing customer data.
Inserts new rows of data for first-time customers.

One MERGE statement executes both operations as a single (upsert) transaction.

Source and target tables

The source and target tables visits_daily and visits_history are defined as follows:

CREATE TABLE public.visits_daily
(
    customer_id int,
    location_name varchar(20),
    visit_time time(0) DEFAULT (now())::timetz(6)
);

CREATE TABLE public.visits_history
(
    customer_id int,
    location_name varchar(20),
    visit_count int
);

Table visits_history contains rows of three customers who between them visited two restaurants, Etoile and LaRosa:

=> SELECT * FROM visits_history ORDER BY customer_id, location_name;
 customer_id | location_name | visit_count
-------------+---------------+-------------
        1001 | Etoile        |           2
        1002 | La Rosa       |           4
        1004 | Etoile        |           1
(3 rows)

By close of business, table visits_daily contains three rows of restaurant visits:

=> SELECT * FROM visits_daily ORDER BY customer_id, location_name;
 customer_id | location_name | visit_time
-------------+---------------+------------
        1001 | Etoile        | 18:19:29
        1003 | Lux Cafe      | 08:07:00
        1004 | La Rosa       | 11:49:20
(3 rows)

Table data merge

The following MERGE statement merges visits_daily data into visits_history:

For matching customers, MERGE updates the occurrence count.
For non-matching customers, MERGE inserts new rows.

=> MERGE INTO visits_history h USING visits_daily d
    ON (h.customer_id=d.customer_id AND h.location_name=d.location_name)
    WHEN MATCHED THEN UPDATE SET visit_count = h.visit_count  + 1
    WHEN NOT MATCHED THEN INSERT (customer_id, location_name, visit_count)
    VALUES (d.customer_id, d.location_name, 1);
 OUTPUT
--------
      3
(1 row)

MERGE returns the number of rows updated and inserted. In this case, the returned value specifies three updates and inserts:

Customer 1001's third visit to Etoile
New customer 1003's first visit to new restaurant Lux Cafe
Customer 1004's first visit to La Rosa

If you now query table visits_history, the result set shows the merged (updated and inserted) data. Updated and new rows are highlighted:

2 - MERGE source options

A MERGE operation joins the target table to one of the following data sources:.

A MERGE operation joins the target table to one of the following data sources:

Another table
View
Subquery result set

Merging from table and view data

You merge data from one table into another as follows:

MERGE INTO target-table USING { source-table | source-view } join-condition
   matching-clause[ matching-clause ]

If you specify a view, Vertica expands the view name to the query that it encapsulates, and uses the result set as the merge source data.

For example, the VMart table public.product_dimension contains current and discontinued products. You can move all discontinued products into a separate table public.product_dimension_discontinued, as follows:

=> CREATE TABLE public.product_dimension_discontinued (
     product_key int,
     product_version int,
     sku_number char(32),
     category_description char(32),
     product_description varchar(128));

=> MERGE INTO product_dimension_discontinued tgt
     USING product_dimension src ON tgt.product_key = src.product_key
                                AND tgt.product_version = src.product_version
     WHEN NOT MATCHED AND src.discontinued_flag='1' THEN INSERT VALUES
       (src.product_key,
        src.product_version,
        src.sku_number,
        src.category_description,
        src.product_description);
 OUTPUT
--------
   1186
(1 row)

Source table product_dimension uses two columns, product_key and product_version, to identify unique products. The MERGE statement joins the source and target tables on these columns in order to return single instances of non-matching rows. The WHEN NOT MATCHED clause includes a filter (src.discontinued_flag='1'), which reduces the result set to include only discontinued products. The remaining rows are inserted into target table product_dimension_discontinued.

Merging from a subquery result set

You can merge into a table the result set that is returned by a subquery, as follows:

MERGE INTO target-table USING (subquery) sq-alias join-condition
   matching-clause[ matching-clause ]

For example, the VMart table public.product_dimension is defined as follows (DDL truncated):

CREATE TABLE public.product_dimension
(
    product_key int NOT NULL,
    product_version int NOT NULL,
    product_description varchar(128),
    sku_number char(32),
    ...
)
ALTER TABLE public.product_dimension
    ADD CONSTRAINT C_PRIMARY PRIMARY KEY (product_key, product_version) DISABLED;

Columns product_key and product_version comprise the table's primary key. You can modify this table so it contains a single column that concatenates the values of these two columns. This column can be used to uniquely identify each product, while also maintaining the original values from product_key and product_version.

You populate the new column with a MERGE statement that queries the other two columns:

=> ALTER TABLE public.product_dimension ADD COLUMN product_ID numeric(8,2);
ALTER TABLE

=> MERGE INTO product_dimension tgt
     USING (SELECT (product_key||'.0'||product_version)::numeric(8,2) AS pid, sku_number
     FROM product_dimension) src
     ON tgt.product_key||'.0'||product_version::numeric=src.pid
     WHEN MATCHED THEN UPDATE SET product_ID = src.pid;
 OUTPUT
--------
  60000
(1 row)

The following query verifies that the new column values correspond to the values in product_key and product_version:

=> SELECT product_ID, product_key, product_version, product_description
   FROM product_dimension
   WHERE category_description = 'Medical'
     AND product_description ILIKE '%diabetes%'
     AND discontinued_flag = 1 ORDER BY product_ID;
 product_ID | product_key | product_version |           product_description
------------+-------------+-----------------+-----------------------------------------
    5836.02 |        5836 |               2 | Brand #17487 diabetes blood testing kit
   14320.02 |       14320 |               2 | Brand #43046 diabetes blood testing kit
   18881.01 |       18881 |               1 | Brand #56743 diabetes blood testing kit
(3 rows)

3 - MERGE matching clauses

MERGE supports one instance of the following matching clauses:.

MERGE supports one instance of the following matching clauses:

[WHEN MATCHED THEN UPDATE SET](#WHEN_MATCHED)
[WHEN NOT MATCHED THEN INSERT](#WHEN_NOT_MATCHED)

Each matching clause can specify an additional filter, as described in Update and insert filters.

`WHEN MATCHED THEN UPDATE SET`

Updates all target table rows that are joined to the source table, typically with data from the source table:

WHEN MATCHED [ AND update-filter ] THEN UPDATE
   SET { target-column = expression }[,...]

Vertica can execute the join only on unique values in the source table's join column. If the source table's join column contains more than one matching value, the MERGE statement returns with a run-time error.

`WHEN NOT MATCHED THEN INSERT`

WHEN NOT MATCHED THEN INSERT inserts into the target table a new row for each source table row that is excluded from the join:

WHEN NOT MATCHED [ AND insert-filter ] THEN INSERT
   [ ( column-list ) ] VALUES ( values-list )

column-list is a comma-delimited list of one or more target columns in the target table, listed in any order. MERGE maps column-list columns to values-list values in the same order, and each column-value pair must be compatible. If you omit column-list, Vertica maps values-list values to columns according to column order in the table definition.

For example, given the following source and target table definitions:

CREATE TABLE t1 (a int, b int, c int);
CREATE TABLE t2 (x int, y int, z int);

The following WHEN NOT MATCHED clause implicitly sets the values of the target table columns a, b, and c in the newly inserted rows:

MERGE INTO t1 USING t2 ON t1.a=t2.x
   WHEN NOT MATCHED THEN INSERT VALUES (t2.x, t2.y, t2.z);

In contrast, the following WHEN NOT MATCHED clause excludes columns t1.b and t2.y from the merge operation. The WHEN NOT MATCHED clause explicitly pairs two sets of columns from the target and source tables: t1.a to t2.x, and t1.c to t2.z. Vertica sets excluded column t1.b. to null:

MERGE INTO t1 USING t2 ON t1.a=t2.x
   WHEN NOT MATCHED THEN INSERT (a, c) VALUES (t2.x, t2.z);

4 - Update and insert filters

Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:.

Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:

WHEN MATCHED AND update-filter THEN UPDATE ...
WHEN NOT MATCHED AND insert-filter THEN INSERT ...

Vertica also supports Oracle syntax for specifying update and insert filters:

WHEN MATCHED THEN UPDATE SET column-updates WHERE update-filter
WHEN NOT MATCHED THEN INSERT column-values WHERE insert-filter

Each filter can specify multiple conditions. Vertica handles the filters as follows:

An update filter is applied to the set of matching rows in the target table that are returned by the MERGE join. For each row where the update filter evaluates to true, Vertica updates the specified columns.
An insert filter is applied to the set of source table rows that are excluded from the MERGE join. For each row where the insert filter evaluates to true, Vertica adds a new row to the target table with the specified values.

For example, given the following data in tables t11 and t22:


=> SELECT * from t11 ORDER BY pk;
 pk | col1 | col2 | SKIP_ME_FLAG
----+------+------+--------------
  1 |    2 |    3 | t
  2 |    3 |    4 | t
  3 |    4 |    5 | f
  4 |      |    6 | f
  5 |    6 |    7 | t
  6 |      |    8 | f
  7 |    8 |      | t
(7 rows)

=> SELECT * FROM t22 ORDER BY pk;
 pk | col1 | col2
----+------+------
  1 |    2 |    4
  2 |    4 |    8
  3 |    6 |
  4 |    8 |   16
(4 rows)

You can merge data from table t11 into table t22 with the following MERGE statement, which includes update and insert filters:

=> MERGE INTO t22 USING t11 ON ( t11.pk=t22.pk )
   WHEN MATCHED
       AND t11.SKIP_ME_FLAG=FALSE AND (
         COALESCE (t22.col1<>t11.col1, (t22.col1 is null)<>(t11.col1 is null))
       )
   THEN UPDATE SET col1=t11.col1, col2=t11.col2
   WHEN NOT MATCHED
      AND t11.SKIP_ME_FLAG=FALSE
   THEN INSERT (pk, col1, col2) VALUES (t11.pk, t11.col1, t11.col2);
 OUTPUT
--------
      3
(1 row)

=> SELECT * FROM t22 ORDER BY pk;
 pk | col1 | col2
----+------+------
  1 |    2 |    4
  2 |    4 |    8
  3 |    4 |    5
  4 |      |    6
  6 |      |    8
(5 rows)

Vertica uses the update and insert filters as follows:

Evaluates all matching rows against the update filter conditions. Vertica updates each row where the following two conditions both evaluate to true:
- Source column t11.SKIP_ME_FLAG is set to false.
- The COALESCE function evaluates to true.
Evaluates all non-matching rows in the source table against the insert filter. For each row where column t11.SKIP_ME_FLAG is set to false, Vertica inserts a new row in the target table.

5 - MERGE optimization

You can improve MERGE performance in several ways:.

You can improve MERGE performance in several ways:

Design projections for optimal MERGE performance.
Facilitate creation of optimized query plans.
Use source tables that are smaller than target tables.

Projections for MERGE operations

The Vertica query optimizer automatically chooses the best projections to implement a merge operation. A good projection design strategy provides projections that help the query optimizer avoid extra sort and data transfer operations, and facilitate MERGE performance.

Tip

You can rely on Database Designer to generate projections that address merge requirements. You can then customize these projections as needed.

For example, the following MERGE statement fragment joins source and target tables tgt and src, respectively, on columns tgt.a and src.b:

=> MERGE INTO tgt USING src ON tgt.a = src.b ...

Vertica can use a local merge join if projections for tables tgt and src use one of the following projection designs, where inputs are presorted by projection ORDER BY clauses:

Replicated projections are sorted on:
- Column a for table tgt
- Column b for table src
Segmented projections are identically segmented on:
- Column a for table tgt
- Column b for table src
- Corresponding segmented columns

Optimizing MERGE query plans

Vertica prepares an optimized query plan if the following conditions are all true:

The MERGE statement contains both matching clauses WHEN MATCHED THEN UPDATE SET and WHEN NOT MATCHED THEN INSERT. If the MERGE statement contains only one matching clause, it uses a non-optimized query plan.
The MERGE statement excludes update and insert filters.
The target table join column has a unique or primary key constraint. This requirement does not apply to the source table join column.
Both matching clauses specify all columns in the target table.
Both matching clauses specify identical source values.

For details on evaluating an EXPLAIN-generated query plan, see MERGE path.

The examples that follow use a simple schema to illustrate some of the conditions under which Vertica prepares or does not prepare an optimized query plan for MERGE:

CREATE TABLE target(a INT PRIMARY KEY, b INT, c INT) ORDER BY b,a;
CREATE TABLE source(a INT, b INT, c INT) ORDER BY b,a;
INSERT INTO target VALUES(1,2,3);
INSERT INTO target VALUES(2,4,7);
INSERT INTO source VALUES(3,4,5);
INSERT INTO source VALUES(4,6,9);
COMMIT;

Optimized MERGE statement

Vertica can prepare an optimized query plan for the following MERGE statement because:

The target table's join column t.a has a primary key constraint.
All columns in the target table (a,b,c) are included in the UPDATE and INSERT clauses.
The UPDATE and INSERT clauses specify identical source values: s.a, s.b, and s.c.

MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a, b=s.b, c=s.c
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);

 OUTPUT
--------
2
(1 row)

The output value of 2 indicates success and denotes the number of rows updated/inserted from the source into the target.

Non-optimized MERGE statement

In the next example, the MERGE statement runs without optimization because the source values in the UPDATE/INSERT clauses are not identical. Specifically, the UPDATE clause includes constants for columns s.a and s.c and the INSERT clause does not:


MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c - 1
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);

To make the previous MERGE statement eligible for optimization, rewrite the statement so that the source values in the UPDATE and INSERT clauses are identical:


MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c -1
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a + 1, s.b, s.c - 1);

6 - MERGE restrictions

The following restrictions apply to updating and inserting table data with MERGE.

The following restrictions apply to updating and inserting table data with MERGE.

Constraint enforcement

If primary key, unique key, or check constraints are enabled for automatic enforcement in the target table, Vertica enforces those constraints when you load new data. If a violation occurs, Vertica rolls back the operation and returns an error.

Caution

If you run MERGE multiple times using the same target and source table, each iteration is liable to introduce duplicate values into the target columns and return with an error.

Columns prohibited from merge

The following columns cannot be specified in a merge operation; attempts to do so return with an error:

Identity/auto-increment columns, or columns whose default value is set to a named sequence.
Vmap columns such as __raw__ in flex tables.
Columns of complex types (ARRAY, SET, ROW).