<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>OpenText Analytics Database 26.2.x – JOIN queries</title>
    <link>/en/data-analysis/query-optimization/join-queries/</link>
    <description>Recent content in JOIN queries on OpenText Analytics Database 26.2.x</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/en/data-analysis/query-optimization/join-queries/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Data-Analysis: Hash joins versus merge joins</title>
      <link>/en/data-analysis/query-optimization/join-queries/hash-joins-versus-merge-joins/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/join-queries/hash-joins-versus-merge-joins/</guid>
      <description>
        
        
        &lt;p&gt;The database optimizer implements a join with one of the following algorithms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Merge join&lt;/strong&gt; is used when projections of the joined tables are sorted on the join columns. Merge joins are faster and uses less memory than hash joins.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hash join&lt;/strong&gt; is used when projections of the joined tables are not already sorted on the join columns. In this case, the optimizer builds an in-memory hash table on the inner table&#39;s join column. The optimizer then scans the outer table for matches to the hash table, and joins data from the two tables accordingly. The cost of performing a hash join is low if the entire hash table can fit in memory. Cost rises significantly if the hash table must be written to disk.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The optimizer automatically chooses the most appropriate algorithm to execute a query, given the projections that are available.&lt;/p&gt;
&lt;h2 id=&#34;facilitating-merge-joins&#34;&gt;Facilitating merge joins&lt;/h2&gt;
&lt;p&gt;To facilitate a merge join, create projections for the joined tables that are sorted on the join predicate columns. The join predicate columns should be the first columns in the &lt;code&gt;ORDER BY&lt;/code&gt; clause.&lt;/p&gt;
&lt;p&gt;For example, tables &lt;code&gt;first&lt;/code&gt; and &lt;code&gt;second&lt;/code&gt; are defined as follows, with projections &lt;code&gt;first_p1&lt;/code&gt; and &lt;code&gt;second_p1&lt;/code&gt;, respectively. The projections are sorted on &lt;code&gt;data_first&lt;/code&gt; and &lt;code&gt;data_second&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;CREATE TABLE first ( id INT, data_first INT );
CREATE PROJECTION first_p1 AS SELECT * FROM first ORDER BY data_first;

CREATE TABLE second ( id INT, data_second INT );
CREATE PROJECTION second_p1 AS SELECT * FROM second ORDER BY data_second;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When you join these tables on unsorted columns &lt;code&gt;first.id&lt;/code&gt; and &lt;code&gt;second.id&lt;/code&gt;, the database uses the hash join algorithm:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt; EXPLAIN SELECT first.data_first, second.data_second FROM first JOIN second ON first.id = second.id;

 Access Path:
 +-&lt;span class=&#34;code-input&#34;&gt;JOIN HASH&lt;/span&gt; [Cost: 752, Rows: 300K] (PATH ID: 1) Inner (BROADCAST)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can facilitate execution of this query with the merge join algorithm by creating projections &lt;code&gt;first_p&lt;/code&gt;2 and &lt;code&gt;second_p2&lt;/code&gt;, which are sorted on join columns &lt;code&gt;first_p2.id&lt;/code&gt; and &lt;code&gt;second_p2.id&lt;/code&gt;, respectively:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
CREATE PROJECTION first_p2 AS SELECT id, data_first FROM first ORDER BY id SEGMENTED BY hash(id, data_first) ALL NODES;
CREATE PROJECTION second_p2 AS SELECT id, data_second FROM second ORDER BY id SEGMENTED BY hash(id, data_second) ALL NODES;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If the query joins significant amounts of data, the query optimizer uses the merge algorithm:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;EXPLAIN SELECT first.data_first, second.data_second FROM first JOIN second ON first.id = second.id;

 Access Path:
 +-&lt;span class=&#34;code-input&#34;&gt;JOIN MERGEJOIN(inputs presorted)&lt;/span&gt; [Cost: 731, Rows: 300K] (PATH ID: 1) Inner (BROADCAST)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can also facilitate a merge join by using subqueries to pre-sort the join predicate columns. For example:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT first.id, first.data_first, second.data_second FROM
  &lt;span class=&#34;code-input&#34;&gt;(SELECT * FROM first ORDER BY id )&lt;/span&gt; first JOIN &lt;span class=&#34;code-input&#34;&gt;(SELECT * FROM second ORDER BY id)&lt;/span&gt; second ON first.id = second.id;
&lt;/code&gt;&lt;/pre&gt;
      </description>
    </item>
    
    <item>
      <title>Data-Analysis: Identical segmentation</title>
      <link>/en/data-analysis/query-optimization/join-queries/identical-segmentation/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/join-queries/identical-segmentation/</guid>
      <description>
        
        
        &lt;p&gt;To improve query performance when you join multiple tables, create projections that are identically segmented on the join keys. Identically-segmented projections allow the joins to occur locally on each node, thereby helping to reduce data movement across the network during query processing.&lt;/p&gt;
&lt;p&gt;To determine if projections are identically-segmented on the query join keys, create a query plan with &lt;code&gt;EXPLAIN&lt;/code&gt;. If the query plan contains &lt;code&gt;RESEGMENT&lt;/code&gt; or &lt;code&gt;BROADCAST&lt;/code&gt;, the projections are not identically segmented.&lt;/p&gt;
&lt;p&gt;The database optimizer chooses a projection to supply rows for each table in a query. If the projections to be joined are segmented, the optimizer evaluates their segmentation against the query join expressions. It thereby determines whether the rows are placed on each node so it can join them without fetching data from another node.&lt;/p&gt;
&lt;h2 id=&#34;join-conditions-for-identically-segmented-projections&#34;&gt;Join conditions for identically segmented projections&lt;/h2&gt;
&lt;p&gt;A projection &lt;code&gt;p&lt;/code&gt; is segmented on join columns if all column references in &lt;code&gt;p&lt;/code&gt;’s segmentation expression are a subset of the columns in the join expression.&lt;/p&gt;
&lt;p&gt;The following conditions must be true for two segmented projections &lt;code&gt;p1&lt;/code&gt; of table &lt;code&gt;t1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; of table &lt;code&gt;t2&lt;/code&gt; to participate in a join of &lt;code&gt;t1&lt;/code&gt; to &lt;code&gt;t2&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The join condition must have the following form:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;t1.j1 = t2.j1 AND t1.j2 = t2.j2 AND ... t1.jN = t2.jN
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The join columns must share the same base data type. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If &lt;code&gt;t1.j1&lt;/code&gt; is an INTEGER, &lt;code&gt;t2.j1&lt;/code&gt; can be an INTEGER but it cannot be a FLOAT.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If &lt;code&gt;t1.j1&lt;/code&gt; is a CHAR(10), &lt;code&gt;t2.j1&lt;/code&gt; can be any CHAR or VARCHAR (for example, CHAR(10), VARCHAR(10), VARCHAR(20)), but &lt;code&gt;t2.j1&lt;/code&gt; cannot be an INTEGER.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If &lt;code&gt;p1&lt;/code&gt; is segmented by an expression on columns {&lt;code&gt;t1.s1, t1.s2, ... t1.sN&lt;/code&gt;}, each segmentation column &lt;code&gt;t1.sX&lt;/code&gt; must be in the join column set {&lt;code&gt;t1.jX&lt;/code&gt;}.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If &lt;code&gt;p2&lt;/code&gt; is segmented by an expression on columns {&lt;code&gt;t2.s1, t2.s2, ... t2.sN&lt;/code&gt;}, each segmentation column &lt;code&gt;t2.sX&lt;/code&gt; must be in the join column set {&lt;code&gt;t2.jX&lt;/code&gt;}.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The segmentation expressions of &lt;code&gt;p1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; must be structurally equivalent. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If &lt;code&gt;p1&lt;/code&gt; is &lt;code&gt;SEGMENTED BY hash(t1.x)&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; is &lt;code&gt;SEGMENTED BY hash(t2.x)&lt;/code&gt;, &lt;code&gt;p1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; are identically segmented.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If &lt;code&gt;p1&lt;/code&gt; is &lt;code&gt;SEGMENTED BY hash(t1.x)&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; is &lt;code&gt;SEGMENTED BY hash(t2.x + 1)&lt;/code&gt;, &lt;code&gt;p1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; are not identically segmented.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;p1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; must have the same segment count.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The assignment of segments to nodes must match. For example, if &lt;code&gt;p1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt; use an &lt;code&gt;OFFSET&lt;/code&gt; clause, their offsets must match.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the database finds projections for &lt;code&gt;t1&lt;/code&gt; and &lt;code&gt;t2&lt;/code&gt; that are not identically segmented, the data is redistributed across the network during query run time, as necessary.&lt;/p&gt;

&lt;div class=&#34;alert admonition tip&#34; role=&#34;alert&#34;&gt;
&lt;h4 class=&#34;admonition-head&#34;&gt;Tip&lt;/h4&gt;

If you create custom designs, try to use segmented projections for joins whenever possible.

&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;
&lt;p&gt;The following statements create two tables and specify to create identical segments:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; CREATE TABLE t1 (id INT, x1 INT, y1 INT) SEGMENTED BY HASH(id, x1) ALL NODES;
=&amp;gt; CREATE TABLE t2 (id INT, x1 INT, y1 INT) SEGMENTED BY HASH(id, x1) ALL NODES;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Given this design, the join conditions in the following queries can leverage identical segmentation:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT * FROM t1 JOIN t2 ON t1.id = t2.id;
=&amp;gt; SELECT * FROM t1 JOIN t2 ON t1.id = t2.id AND t1.x1 = t2.x1;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Conversely, the join conditions in the following queries require resegmentation:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;=&amp;gt; SELECT * FROM t1 JOIN t2 ON t1.x1 = t2.x1;
=&amp;gt; SELECT * FROM t1 JOIN t2 ON t1.id = t2.x1;
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;see-also&#34;&gt;See also&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../en/admin/partitioning-tables/partitioning-and-segmentation/#&#34;&gt;Partitioning and segmentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;../../../../en/sql-reference/statements/create-statements/create-projection/#&#34;&gt;CREATE PROJECTION&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Data-Analysis: Joining variable length string data</title>
      <link>/en/data-analysis/query-optimization/join-queries/joining-variable-length-string-data/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/en/data-analysis/query-optimization/join-queries/joining-variable-length-string-data/</guid>
      <description>
        
        
        &lt;p&gt;When you join tables on VARCHAR columns, the database calculates how much storage space it requires to buffer join column data. It does so by formatting the column data in one of two ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Uses the join column metadata to size column data to a fixed length and buffer accordingly. For example, given a column that is defined as &lt;code&gt;VARCHAR(1000)&lt;/code&gt;, the database always buffers 1000 characters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Uses the actual length of join column data, so buffer size varies for each join. For example, given a join on strings Xi, John, and Amrita, the database buffers only as much storage as it needs for each join—in this case, 2, 4, and 6 bytes, respectively.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second approach can improve join query performance. It can also reduce memory consumption, which helps prevent join spills and minimize how often memory is borrowed from the resource manager. In general, these benefits are especially marked in cases where the defined size of a join column significantly exceeds the average length of its data.&lt;/p&gt;
&lt;h2 id=&#34;setting-and-verifying-variable-length-formatting&#34;&gt;Setting and verifying variable length formatting&lt;/h2&gt;
&lt;p&gt;You can control how the database implements joins at the session or database levels, through configuration parameter &lt;a href=&#34;../../../../en/sql-reference/config-parameters/general-parameters/&#34;&gt;JoinDefaultTupleFormat&lt;/a&gt;, or for individual queries, through the &lt;a href=&#34;../../../../en/sql-reference/language-elements/hints/jfmt/#&#34;&gt;JFMT&lt;/a&gt; hint. OpenText™ Analytics Database supports variable length formatting for all joins except &lt;a href=&#34;../../../../en/data-analysis/query-optimization/join-queries/hash-joins-versus-merge-joins/&#34;&gt;merge&lt;/a&gt; and &lt;a href=&#34;../../../../en/data-analysis/queries/joins/event-series-joins/&#34;&gt;event series&lt;/a&gt; joins.&lt;/p&gt;
&lt;p&gt;Use &lt;a href=&#34;../../../../en/admin/managing-queries/query-plans/viewing-query-plans/verbose-query-plans/&#34;&gt;EXPLAIN VERBOSE&lt;/a&gt; to verify whether a given query uses variable character formatting, by checking for these flags:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;JF_EE_VARIABLE_FORMAT&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;JF_EE_FIXED_FORMAT&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
  </channel>
</rss>
