This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Merging table data
MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set.
MERGE
statements can perform update and insert operations on a target table based on the results of a join with a source data set. The join can match a source row with only one target row; otherwise, Vertica returns an error.
MERGE
has the following syntax:
MERGE INTO target-table USING source-dataset ON join-condition
matching-clause[ matching-clause ]
Merge operations have at least three components:
1 - Basic MERGE example
In this example, a merge operation involves two tables:.
In this example, a merge operation involves two tables:
-
visits_daily
logs daily restaurant traffic, and is updated with each customer visit. Data in this table is refreshed every 24 hours.
-
visits_history
stores the history of customer visits to various restaurants, accumulated over an indefinite time span.
Each night, you merge the daily visit count from visits_daily
into visits_history
. The merge operation modifies the target table in two ways:
One MERGE
statement executes both operations as a single (upsert) transaction.
Source and target tables
The source and target tables visits_daily
and visits_history
are defined as follows:
CREATE TABLE public.visits_daily
(
customer_id int,
location_name varchar(20),
visit_time time(0) DEFAULT (now())::timetz(6)
);
CREATE TABLE public.visits_history
(
customer_id int,
location_name varchar(20),
visit_count int
);
Table visits_history
contains rows of three customers who between them visited two restaurants, Etoile and LaRosa:
=> SELECT * FROM visits_history ORDER BY customer_id, location_name;
customer_id | location_name | visit_count
-------------+---------------+-------------
1001 | Etoile | 2
1002 | La Rosa | 4
1004 | Etoile | 1
(3 rows)
By close of business, table visits_daily
contains three rows of restaurant visits:
=> SELECT * FROM visits_daily ORDER BY customer_id, location_name;
customer_id | location_name | visit_time
-------------+---------------+------------
1001 | Etoile | 18:19:29
1003 | Lux Cafe | 08:07:00
1004 | La Rosa | 11:49:20
(3 rows)
Table data merge
The following MERGE
statement merges visits_daily
data into visits_history
:
-
For matching customers, MERGE
updates the occurrence count.
-
For non-matching customers, MERGE
inserts new rows.
=> MERGE INTO visits_history h USING visits_daily d
ON (h.customer_id=d.customer_id AND h.location_name=d.location_name)
WHEN MATCHED THEN UPDATE SET visit_count = h.visit_count + 1
WHEN NOT MATCHED THEN INSERT (customer_id, location_name, visit_count)
VALUES (d.customer_id, d.location_name, 1);
OUTPUT
--------
3
(1 row)
MERGE
returns the number of rows updated and inserted. In this case, the returned value specifies three updates and inserts:
-
Customer 1001
's third visit to Etoile
-
New customer 1003
's first visit to new restaurant Lux Cafe
-
Customer 1004
's first visit to La Rosa
If you now query table visits_history
, the result set shows the merged (updated and inserted) data. Updated and new rows are highlighted:
2 - MERGE source options
A MERGE operation joins the target table to one of the following data sources:.
A MERGE
operation joins the target table to one of the following data sources:
-
Another table
-
View
-
Subquery result set
Merging from table and view data
You merge data from one table into another as follows:
MERGE INTO target-table USING { source-table | source-view } join-condition
matching-clause[ matching-clause ]
If you specify a view, Vertica expands the view name to the query that it encapsulates, and uses the result set as the merge source data.
For example, the VMart table public.product_dimension
contains current and discontinued products. You can move all discontinued products into a separate table public.product_dimension_discontinued
, as follows:
=> CREATE TABLE public.product_dimension_discontinued (
product_key int,
product_version int,
sku_number char(32),
category_description char(32),
product_description varchar(128));
=> MERGE INTO product_dimension_discontinued tgt
USING product_dimension src ON tgt.product_key = src.product_key
AND tgt.product_version = src.product_version
WHEN NOT MATCHED AND src.discontinued_flag='1' THEN INSERT VALUES
(src.product_key,
src.product_version,
src.sku_number,
src.category_description,
src.product_description);
OUTPUT
--------
1186
(1 row)
Source table product_dimension
uses two columns, product_key
and product_version
, to identify unique products. The MERGE
statement joins the source and target tables on these columns in order to return single instances of non-matching rows. The WHEN NOT MATCHED
clause includes a filter (src.discontinued_flag='1'
), which reduces the result set to include only discontinued products. The remaining rows are inserted into target table product_dimension_discontinued
.
Merging from a subquery result set
You can merge into a table the result set that is returned by a subquery, as follows:
MERGE INTO target-table USING (subquery) sq-alias join-condition
matching-clause[ matching-clause ]
For example, the VMart table public.product_dimension
is defined as follows (DDL truncated):
CREATE TABLE public.product_dimension
(
product_key int NOT NULL,
product_version int NOT NULL,
product_description varchar(128),
sku_number char(32),
...
)
ALTER TABLE public.product_dimension
ADD CONSTRAINT C_PRIMARY PRIMARY KEY (product_key, product_version) DISABLED;
Columns product_key
and product_version
comprise the table's primary key. You can modify this table so it contains a single column that concatenates the values of these two columns. This column can be used to uniquely identify each product, while also maintaining the original values from product_key
and product_version
.
You populate the new column with a MERGE
statement that queries the other two columns:
=> ALTER TABLE public.product_dimension ADD COLUMN product_ID numeric(8,2);
ALTER TABLE
=> MERGE INTO product_dimension tgt
USING (SELECT (product_key||'.0'||product_version)::numeric(8,2) AS pid, sku_number
FROM product_dimension) src
ON tgt.product_key||'.0'||product_version::numeric=src.pid
WHEN MATCHED THEN UPDATE SET product_ID = src.pid;
OUTPUT
--------
60000
(1 row)
The following query verifies that the new column values correspond to the values in product_key
and product_version
:
=> SELECT product_ID, product_key, product_version, product_description
FROM product_dimension
WHERE category_description = 'Medical'
AND product_description ILIKE '%diabetes%'
AND discontinued_flag = 1 ORDER BY product_ID;
product_ID | product_key | product_version | product_description
------------+-------------+-----------------+-----------------------------------------
5836.02 | 5836 | 2 | Brand #17487 diabetes blood testing kit
14320.02 | 14320 | 2 | Brand #43046 diabetes blood testing kit
18881.01 | 18881 | 1 | Brand #56743 diabetes blood testing kit
(3 rows)
3 - MERGE matching clauses
MERGE supports one instance of the following matching clauses:.
MERGE
supports one instance of the following matching clauses:
Each matching clause can specify an additional filter, as described in Update and insert filters.
WHEN MATCHED THEN UPDATE SET
Updates all target table rows that are joined to the source table, typically with data from the source table:
WHEN MATCHED [ AND update-filter ] THEN UPDATE
SET { target-column = expression }[,...]
Vertica can execute the join only on unique values in the source table's join column. If the source table's join column contains more than one matching value, the MERGE
statement returns with a run-time error.
WHEN NOT MATCHED THEN INSERT
WHEN NOT MATCHED THEN INSERT
inserts into the target table a new row for each source table row that is excluded from the join:
WHEN NOT MATCHED [ AND insert-filter ] THEN INSERT
[ ( column-list ) ] VALUES ( values-list )
column-list
is a comma-delimited list of one or more target columns in the target table, listed in any order. MERGE
maps column-list
columns to values-list
values in the same order, and each column-value pair must be compatible. If you omit column-list
, Vertica maps values-list
values to columns according to column order in the table definition.
For example, given the following source and target table definitions:
CREATE TABLE t1 (a int, b int, c int);
CREATE TABLE t2 (x int, y int, z int);
The following WHEN NOT MATCHED
clause implicitly sets the values of the target table columns a
, b
, and c
in the newly inserted rows:
MERGE INTO t1 USING t2 ON t1.a=t2.x
WHEN NOT MATCHED THEN INSERT VALUES (t2.x, t2.y, t2.z);
In contrast, the following WHEN NOT MATCHED
clause excludes columns t1.b
and t2.y
from the merge operation. The WHEN NOT MATCHED
clause explicitly pairs two sets of columns from the target and source tables: t1.a
to t2.x
, and t1.c
to t2.z
. Vertica sets excluded column t1.b
. to null:
MERGE INTO t1 USING t2 ON t1.a=t2.x
WHEN NOT MATCHED THEN INSERT (a, c) VALUES (t2.x, t2.z);
4 - Update and insert filters
Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:.
Each WHEN MATCHED
and WHEN NOT MATCHED
clause in a MERGE
statement can optionally specify an update filter and insert filter, respectively:
WHEN MATCHED AND update-filter THEN UPDATE ...
WHEN NOT MATCHED AND insert-filter THEN INSERT ...
Vertica also supports Oracle syntax for specifying update and insert filters:
WHEN MATCHED THEN UPDATE SET column-updates WHERE update-filter
WHEN NOT MATCHED THEN INSERT column-values WHERE insert-filter
Each filter can specify multiple conditions. Vertica handles the filters as follows:
-
An update filter is applied to the set of matching rows in the target table that are returned by the MERGE
join. For each row where the update filter evaluates to true, Vertica updates the specified columns.
-
An insert filter is applied to the set of source table rows that are excluded from the MERGE
join. For each row where the insert filter evaluates to true, Vertica adds a new row to the target table with the specified values.
For example, given the following data in tables t11
and t22
:
=> SELECT * from t11 ORDER BY pk;
pk | col1 | col2 | SKIP_ME_FLAG
----+------+------+--------------
1 | 2 | 3 | t
2 | 3 | 4 | t
3 | 4 | 5 | f
4 | | 6 | f
5 | 6 | 7 | t
6 | | 8 | f
7 | 8 | | t
(7 rows)
=> SELECT * FROM t22 ORDER BY pk;
pk | col1 | col2
----+------+------
1 | 2 | 4
2 | 4 | 8
3 | 6 |
4 | 8 | 16
(4 rows)
You can merge data from table t11
into table t22
with the following MERGE
statement, which includes update and insert filters:
=> MERGE INTO t22 USING t11 ON ( t11.pk=t22.pk )
WHEN MATCHED
AND t11.SKIP_ME_FLAG=FALSE AND (
COALESCE (t22.col1<>t11.col1, (t22.col1 is null)<>(t11.col1 is null))
)
THEN UPDATE SET col1=t11.col1, col2=t11.col2
WHEN NOT MATCHED
AND t11.SKIP_ME_FLAG=FALSE
THEN INSERT (pk, col1, col2) VALUES (t11.pk, t11.col1, t11.col2);
OUTPUT
--------
3
(1 row)
=> SELECT * FROM t22 ORDER BY pk;
pk | col1 | col2
----+------+------
1 | 2 | 4
2 | 4 | 8
3 | 4 | 5
4 | | 6
6 | | 8
(5 rows)
Vertica uses the update and insert filters as follows:
-
Evaluates all matching rows against the update filter conditions. Vertica updates each row where the following two conditions both evaluate to true:
-
Evaluates all non-matching rows in the source table against the insert filter. For each row where column t11.SKIP_ME_FLAG
is set to false, Vertica inserts a new row in the target table.
5 - MERGE optimization
You can improve MERGE performance in several ways:.
You can improve MERGE
performance in several ways:
Projections for MERGE operations
The Vertica query optimizer automatically chooses the best projections to implement a merge operation. A good projection design strategy provides projections that help the query optimizer avoid extra sort and data transfer operations, and facilitate MERGE
performance.
For example, the following MERGE
statement fragment joins source and target tables tgt
and src
, respectively, on columns tgt.a
and src.b
:
=> MERGE INTO tgt USING src ON tgt.a = src.b ...
Vertica can use a local merge join if projections for tables tgt
and src
use one of the following projection designs, where inputs are presorted by projection ORDER BY
clauses:
Optimizing MERGE query plans
Vertica prepares an optimized query plan if the following conditions are all true:
-
The MERGE
statement contains both matching clauses
WHEN MATCHED THEN UPDATE SET
and
WHEN NOT MATCHED THEN INSERT
. If the MERGE
statement contains only one matching clause, it uses a non-optimized query plan.
-
The MERGE
statement excludes update and insert filters.
-
The target table join column has a unique or primary key constraint. This requirement does not apply to the source table join column.
-
Both matching clauses specify all columns in the target table.
-
Both matching clauses specify identical source values.
For details on evaluating an
EXPLAIN
-generated query plan, see MERGE path.
The examples that follow use a simple schema to illustrate some of the conditions under which Vertica prepares or does not prepare an optimized query plan for MERGE
:
CREATE TABLE target(a INT PRIMARY KEY, b INT, c INT) ORDER BY b,a;
CREATE TABLE source(a INT, b INT, c INT) ORDER BY b,a;
INSERT INTO target VALUES(1,2,3);
INSERT INTO target VALUES(2,4,7);
INSERT INTO source VALUES(3,4,5);
INSERT INTO source VALUES(4,6,9);
COMMIT;
Optimized MERGE statement
Vertica can prepare an optimized query plan for the following MERGE
statement because:
-
The target table's join column t.a
has a primary key constraint.
-
All columns in the target table (a,b,c)
are included in the UPDATE
and INSERT
clauses.
-
The UPDATE
and INSERT
clauses specify identical source values: s.a
, s.b
, and s.c
.
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a=s.a, b=s.b, c=s.c
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);
OUTPUT
--------
2
(1 row)
The output value of 2 indicates success and denotes the number of rows updated/inserted from the source into the target.
Non-optimized MERGE statement
In the next example, the MERGE
statement runs without optimization because the source values in the UPDATE/INSERT
clauses are not identical. Specifically, the UPDATE
clause includes constants for columns s.a
and s.c
and the INSERT
clause does not:
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c - 1
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);
To make the previous MERGE
statement eligible for optimization, rewrite the statement so that the source values in the UPDATE
and INSERT
clauses are identical:
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c -1
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a + 1, s.b, s.c - 1);
6 - MERGE restrictions
The following restrictions apply to updating and inserting table data with MERGE.
The following restrictions apply to updating and inserting table data with
MERGE
.
Constraint enforcement
If primary key, unique key, or check constraints are enabled for automatic enforcement in the target table, Vertica enforces those constraints when you load new data. If a violation occurs, Vertica rolls back the operation and returns an error.
Caution
If you run MERGE multiple times using the same target and source table, each iteration is liable to introduce duplicate values into the target columns and return with an error.
Columns prohibited from merge
The following columns cannot be specified in a merge operation; attempts to do so return with an error:
-
IDENTITY columns, or columns whose default value is set to a named sequence.
-
Vmap columns such as __raw__
in flex tables.
-
Columns of complex types ARRAY, SET, or ROW.