Identical segmentation
To improve query performance when you join multiple tables, create projections that are identically segmented on the join keys. Identically-segmented projections allow the joins to occur locally on each node, thereby helping to reduce data movement across the network during query processing.
To determine if projections are identically-segmented on the query join keys, create a query plan with EXPLAIN. If the query plan contains RESEGMENT or BROADCAST, the projections are not identically segmented.
The database optimizer chooses a projection to supply rows for each table in a query. If the projections to be joined are segmented, the optimizer evaluates their segmentation against the query join expressions. It thereby determines whether the rows are placed on each node so it can join them without fetching data from another node.
Join conditions for identically segmented projections
A projection p is segmented on join columns if all column references in p’s segmentation expression are a subset of the columns in the join expression.
The following conditions must be true for two segmented projections p1 of table t1 and p2 of table t2 to participate in a join of t1 to t2:
-
The join condition must have the following form:
t1.j1 = t2.j1 AND t1.j2 = t2.j2 AND ... t1.jN = t2.jN -
The join columns must share the same base data type. For example:
-
If
t1.j1is an INTEGER,t2.j1can be an INTEGER but it cannot be a FLOAT. -
If
t1.j1is a CHAR(10),t2.j1can be any CHAR or VARCHAR (for example, CHAR(10), VARCHAR(10), VARCHAR(20)), butt2.j1cannot be an INTEGER.
-
-
If
p1is segmented by an expression on columns {t1.s1, t1.s2, ... t1.sN}, each segmentation columnt1.sXmust be in the join column set {t1.jX}. -
If
p2is segmented by an expression on columns {t2.s1, t2.s2, ... t2.sN}, each segmentation columnt2.sXmust be in the join column set {t2.jX}. -
The segmentation expressions of
p1andp2must be structurally equivalent. For example:-
If
p1isSEGMENTED BY hash(t1.x)andp2isSEGMENTED BY hash(t2.x),p1andp2are identically segmented. -
If
p1isSEGMENTED BY hash(t1.x)andp2isSEGMENTED BY hash(t2.x + 1),p1andp2are not identically segmented.
-
-
p1andp2must have the same segment count. -
The assignment of segments to nodes must match. For example, if
p1andp2use anOFFSETclause, their offsets must match. -
If the database finds projections for
t1andt2that are not identically segmented, the data is redistributed across the network during query run time, as necessary.Tip
If you create custom designs, try to use segmented projections for joins whenever possible.
Examples
The following statements create two tables and specify to create identical segments:
=> CREATE TABLE t1 (id INT, x1 INT, y1 INT) SEGMENTED BY HASH(id, x1) ALL NODES;
=> CREATE TABLE t2 (id INT, x1 INT, y1 INT) SEGMENTED BY HASH(id, x1) ALL NODES;
Given this design, the join conditions in the following queries can leverage identical segmentation:
=> SELECT * FROM t1 JOIN t2 ON t1.id = t2.id;
=> SELECT * FROM t1 JOIN t2 ON t1.id = t2.id AND t1.x1 = t2.x1;
Conversely, the join conditions in the following queries require resegmentation:
=> SELECT * FROM t1 JOIN t2 ON t1.x1 = t2.x1;
=> SELECT * FROM t1 JOIN t2 ON t1.id = t2.x1;