This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Inner joins

An inner join combines records from two tables based on a join predicate and requires that each record in the first table has a matching record in the second table.

An inner join combines records from two tables based on a join predicate and requires that each record in the first table has a matching record in the second table. Thus, inner joins return only records from both joined tables that satisfy the join condition. Records that contain no matches are excluded from the result set.

Inner joins take the following form:

SELECT column-list FROM left-join-table
  [INNER] JOIN right-join-table ON join-predicate

If you omit the INNER keyword, Vertica assumes an inner join. Inner joins are commutative and associative. You can specify tables in any order without changing the results.

Example

The following example specifies an inner join between tables store.store_dimension and public.employee_dimension whose records have matching values in columns store_region and employee_region, respectively:

=> SELECT s.store_region, SUM(e.vacation_days) TotalVacationDays
   FROM public.employee_dimension e
   JOIN store.store_dimension s ON s.store_region=e.employee_region
   GROUP BY s.store_region ORDER BY TotalVacationDays;

This join can also be expressed as follows:

=> SELECT s.store_region, SUM(e.vacation_days) TotalVacationDays
    FROM public.employee_dimension e, store.store_dimension s
    WHERE s.store_region=e.employee_region
    GROUP BY s.store_region ORDER BY TotalVacationDays;

Both queries return the same result set:

 store_region | TotalVacationDays
--------------+-------------------
 NorthWest    |             23280
 SouthWest    |            367250
 MidWest      |            925938
 South        |           1280468
 East         |           1952854
 West         |           2849976
(6 rows)

If the join's inner table store.store_dimension has any rows with store_region values that do not match employee_region values in table public.employee_dimension, those rows are excluded from the result set. To include that row, you can specify an outer join.

1 - Equi-joins and non equi-joins

Vertica supports any arbitrary join expression with both matching and non-matching column values.

Vertica supports any arbitrary join expression with both matching and non-matching column values. For example:

SELECT * FROM fact JOIN dim ON fact.x = dim.x;
SELECT * FROM fact JOIN dim ON fact.x > dim.y;
SELECT * FROM fact JOIN dim ON fact.x <= dim.y;
SELECT * FROM fact JOIN dim ON fact.x <> dim.y;
SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;

Equi-joins are based on equality (matching column values). This equality is indicated with an equal sign (=), which functions as the comparison operator in the ON clause using SQL-92 syntax or the WHERE clause using older join syntax.

The first example below uses SQL-92 syntax and the ON clause to join the online sales table with the call center table using the call center key; the query then returns the sale date key that equals the value 156:

=> SELECT sale_date_key, cc_open_date FROM online_sales.online_sales_fact
   INNER JOIN online_sales.call_center_dimension
   ON (online_sales.online_sales_fact.call_center_key =
    online_sales.call_center_dimension.call_center_key
   AND sale_date_key = 156);
 sale_date_key | cc_open_date
---------------+--------------
           156 | 2005-08-12
(1 row)

The second example uses older join syntax and the WHERE clause to join the same tables to get the same results:

=> SELECT sale_date_key, cc_open_date
    FROM online_sales.online_sales_fact, online_sales.call_center_dimension
   WHERE online_sales.online_sales_fact.call_center_key =
      online_sales.call_center_dimension.call_center_key
   AND sale_date_key = 156;
 sale_date_key | cc_open_date
---------------+--------------
           156 | 2005-08-12
(1 row)

Vertica also permits tables with compound (multiple-column) primary and foreign keys. For example, to create a pair of tables with multi-column keys:

=> CREATE TABLE dimension(pk1 INTEGER NOT NULL, pk2 INTEGER NOT NULL);=> ALTER TABLE dimension ADD PRIMARY KEY (pk1, pk2);
=> CREATE TABLE fact (fk1 INTEGER NOT NULL, fk2 INTEGER NOT NULL);
=> ALTER TABLE fact ADD FOREIGN KEY (fk1, fk2) REFERENCES dimension (pk1, pk2);

To join tables using compound keys, you must connect two join predicates with a Boolean AND operator. For example:

=> SELECT * FROM fact f JOIN dimension d ON f.fk1 = d.pk1 AND f.fk2 = d.pk2;

You can write queries with expressions that contain the <=> operator for NULL=NULL joins.

=> SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;

The <=> operator performs an equality comparison like the = operator, but it returns true, instead of NULL, if both operands are NULL, and false, instead of NULL, if one operand is NULL.

=> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
 ?column? | ?column? | ?column?
----------+----------+----------
 t        | t        | f
(1 row)

Compare the <=> operator to the = operator:

=> SELECT 1 = 1, NULL = NULL, 1 = NULL;
 ?column? | ?column? | ?column?
----------+----------+----------
 t        |          |
(1 row)

When composing joins, it helps to know in advance which columns contain null values. An employee's hire date, for example, would not be a good choice because it is unlikely hire date would be omitted. An hourly rate column, however, might work if some employees are paid hourly and some are salaried. If you are unsure about the value of columns in a given table and want to check, type the command:

=> SELECT COUNT(*) FROM tablename WHERE columnname IS NULL;

2 - Natural joins

A natural join is just a join with an implicit join predicate.

A natural join is just a join with an implicit join predicate. Natural joins can be inner, left outer, right outer, or full outer joins and take the following form:

SELECT column-list FROM left-join-table
NATURAL [ INNER | LEFT OUTER | RIGHT OUTER | FULL OUTER ] JOIN right-join-table

Natural joins are, by default, natural inner joins; however, there can also be natural left/right/full outer joins. The primary difference between an inner and natural join is that inner joins have an explicit join condition, whereas the natural join’s conditions are formed by matching all pairs of columns in the tables that have the same name and compatible data types, making natural joins equi-joins because join condition are equal between common columns. (If the data types are incompatible, Vertica returns an error.)

The following query is a simple natural join between tables T1 and T2 when the T2 column val is greater than 5:

=> SELECT * FROM T1 NATURAL JOIN T2 WHERE T2.val > 5;

The store_sales_fact table and the product_dimension table have two columns that share the same name and data type: product_key and product_version. The following example creates a natural join between those two tables at their shared columns:

=> SELECT product_description, sales_quantity FROM store.store_sales_fact
   NATURAL JOIN public.product_dimension;

The following three queries return the same result expressed as a basic query, an inner join, and a natural join. at the table expressions are equivalent only if the common attribute in the store_sales_fact table and the store_dimension table is store_key. If both tables have a column named store_key, then the natural join would also have a store_sales_fact.store_key = store_dimension.store_key join condition. Since the results are the same in all three instances, they are shown in the first (basic) query only:

=> SELECT store_name FROM store.store_sales_fact, store.store_dimension
   WHERE store.store_sales_fact.store_key = store.store_dimension.store_key
   AND store.store_dimension.store_state = 'MA' ORDER BY store_name;
 store_name
------------
 Store11
 Store128
 Store178
 Store66
 Store8
 Store90
(6 rows)

The query written as an inner join:

=> SELECT store_name FROM store.store_sales_fact
   INNER JOIN store.store_dimension
   ON (store.store_sales_fact.store_key = store.store_dimension.store_key)
   WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;

In the case of the natural join, the join predicate appears implicitly by comparing all of the columns in both tables that are joined by the same column name. The result set contains only one column representing the pair of equally-named columns.

=> SELECT store_name FROM store.store_sales_fact
   NATURAL JOIN store.store_dimension
   WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;

3 - Cross joins

Cross joins are the simplest joins to write, but they are not usually the fastest to run because they consist of all possible combinations of two tables’ records.

Cross joins are the simplest joins to write, but they are not usually the fastest to run because they consist of all possible combinations of two tables’ records. Cross joins contain no join condition and return what is known as a Cartesian product, where the number of rows in the result set is equal to the number of rows in the first table multiplied by the number of rows in the second table.

The following query returns all possible combinations from the promotion table and the store sales table:

=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;

Because this example returns over 600 million records, many cross join results can be extremely large and difficult to manage. Cross joins can be useful, however, such as when you want to return a single-row result set.

Implicit versus explicit joins

Vertica recommends that you do not write implicit cross joins (comma-separated tables in the FROM clause). These queries can imply accidental omission of a join predicate.

The following query implicitly cross joins tables promotion_dimension and store.store_sales_fact:

=> SELECT * FROM promotion_dimension, store.store_sales_fact;

It is better practice to express this cross join explicitly, as follows:

=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;

Examples

The following example creates two small tables and their superprojections and then runs a cross join on the tables:

=> CREATE TABLE employee(employee_id INT, employee_fname VARCHAR(50));
=> CREATE TABLE department(dept_id INT, dept_name VARCHAR(50));
=> INSERT INTO employee VALUES (1, 'Andrew');
=> INSERT INTO employee VALUES (2, 'Priya');
=> INSERT INTO employee VALUES (3, 'Michelle');
=> INSERT INTO department VALUES (1, 'Engineering');
=> INSERT INTO department VALUES (2, 'QA');
=> SELECT * FROM employee CROSS JOIN department;

In the result set, the cross join retrieves records from the first table and then creates a new row for every row in the 2nd table. It then does the same for the next record in the first table, and so on.

 employee_id | employee_name | dept_id | dept_name
 -------------+---------------+---------+-----------
           1 | Andrew        |       1 |  Engineering
           2 | Priya         |       1 |  Engineering
           3 | Michelle      |       1 |  Engineering
           1 | Andrew        |       2 |  QA
           2 | Priya         |       2 |  QA
           3 | Michelle      |       2 |  QA
(6 rows)