This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Inner joins
An inner join combines records from two tables based on a join predicate and requires that each record in the first table has a matching record in the second table.
An inner join combines records from two tables based on a join predicate and requires that each record in the first table has a matching record in the second table. Thus, inner joins return only records from both joined tables that satisfy the join condition. Records that contain no matches are excluded from the result set.
Inner joins take the following form:
SELECT column-list FROM left-join-table
[INNER] JOIN right-join-table ON join-predicate
If you omit the INNER
keyword, Vertica assumes an inner join. Inner joins are commutative and associative. You can specify tables in any order without changing the results.
Example
The following example specifies an inner join between tables store.store_dimension
and public.employee_dimension
whose records have matching values in columns store_region
and employee_region
, respectively:
=> SELECT s.store_region, SUM(e.vacation_days) TotalVacationDays
FROM public.employee_dimension e
JOIN store.store_dimension s ON s.store_region=e.employee_region
GROUP BY s.store_region ORDER BY TotalVacationDays;
This join can also be expressed as follows:
=> SELECT s.store_region, SUM(e.vacation_days) TotalVacationDays
FROM public.employee_dimension e, store.store_dimension s
WHERE s.store_region=e.employee_region
GROUP BY s.store_region ORDER BY TotalVacationDays;
Both queries return the same result set:
store_region | TotalVacationDays
--------------+-------------------
NorthWest | 23280
SouthWest | 367250
MidWest | 925938
South | 1280468
East | 1952854
West | 2849976
(6 rows)
If the join's inner table store.store_dimension
has any rows with store_region
values that do not match employee_region
values in table public.employee_dimension
, those rows are excluded from the result set. To include that row, you can specify an outer join.
1 - Equi-joins and non equi-joins
Vertica supports any arbitrary join expression with both matching and non-matching column values.
Vertica supports any arbitrary join expression with both matching and non-matching column values. For example:
SELECT * FROM fact JOIN dim ON fact.x = dim.x;
SELECT * FROM fact JOIN dim ON fact.x > dim.y;
SELECT * FROM fact JOIN dim ON fact.x <= dim.y;
SELECT * FROM fact JOIN dim ON fact.x <> dim.y;
SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;
Note
Operators =
and <=>
generally run the fastest.
Equi-joins are based on equality (matching column values). This equality is indicated with an equal sign (=
), which functions as the comparison operator in the ON
clause using SQL-92 syntax or the WHERE
clause using older join syntax.
The first example below uses SQL-92 syntax and the ON
clause to join the online sales table with the call center table using the call center key; the query then returns the sale date key that equals the value 156:
=> SELECT sale_date_key, cc_open_date FROM online_sales.online_sales_fact
INNER JOIN online_sales.call_center_dimension
ON (online_sales.online_sales_fact.call_center_key =
online_sales.call_center_dimension.call_center_key
AND sale_date_key = 156);
sale_date_key | cc_open_date
---------------+--------------
156 | 2005-08-12
(1 row)
The second example uses older join syntax and the WHERE
clause to join the same tables to get the same results:
=> SELECT sale_date_key, cc_open_date
FROM online_sales.online_sales_fact, online_sales.call_center_dimension
WHERE online_sales.online_sales_fact.call_center_key =
online_sales.call_center_dimension.call_center_key
AND sale_date_key = 156;
sale_date_key | cc_open_date
---------------+--------------
156 | 2005-08-12
(1 row)
Vertica also permits tables with compound (multiple-column) primary and foreign keys. For example, to create a pair of tables with multi-column keys:
=> CREATE TABLE dimension(pk1 INTEGER NOT NULL, pk2 INTEGER NOT NULL);=> ALTER TABLE dimension ADD PRIMARY KEY (pk1, pk2);
=> CREATE TABLE fact (fk1 INTEGER NOT NULL, fk2 INTEGER NOT NULL);
=> ALTER TABLE fact ADD FOREIGN KEY (fk1, fk2) REFERENCES dimension (pk1, pk2);
To join tables using compound keys, you must connect two join predicates with a Boolean AND
operator. For example:
=> SELECT * FROM fact f JOIN dimension d ON f.fk1 = d.pk1 AND f.fk2 = d.pk2;
You can write queries with expressions that contain the <=>
operator for NULL=NULL
joins.
=> SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;
The <=>
operator performs an equality comparison like the =
operator, but it returns true, instead of NULL
, if both operands are NULL
, and false, instead of NULL
, if one operand is NULL
.
=> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
?column? | ?column? | ?column?
----------+----------+----------
t | t | f
(1 row)
Compare the <=> operator to the = operator:
=> SELECT 1 = 1, NULL = NULL, 1 = NULL;
?column? | ?column? | ?column?
----------+----------+----------
t | |
(1 row)
Note
Writing NULL=NULL
joins on primary key/foreign key combinations is not an optimal choice because PK/FK columns are usually defined as NOT NULL
.
When composing joins, it helps to know in advance which columns contain null values. An employee's hire date, for example, would not be a good choice because it is unlikely hire date would be omitted. An hourly rate column, however, might work if some employees are paid hourly and some are salaried. If you are unsure about the value of columns in a given table and want to check, type the command:
=> SELECT COUNT(*) FROM tablename WHERE columnname IS NULL;
2 - Natural joins
A natural join is just a join with an implicit join predicate.
A natural join is just a join with an implicit join predicate. Natural joins can be inner, left outer, right outer, or full outer joins and take the following form:
SELECT column-list FROM left-join-table
NATURAL [ INNER | LEFT OUTER | RIGHT OUTER | FULL OUTER ] JOIN right-join-table
Natural joins are, by default, natural inner joins; however, there can also be natural left/right/full outer joins. The primary difference between an inner and natural join is that inner joins have an explicit join condition, whereas the natural join’s conditions are formed by matching all pairs of columns in the tables that have the same name and compatible data types, making natural joins equi-joins because join condition are equal between common columns. (If the data types are incompatible, Vertica returns an error.)
Note
The
Data type coercion chart lists the data types that can be cast to other data types. If one data type can be cast to the other, those two data types are compatible.
The following query is a simple natural join between tables T1 and T2 when the T2 column val
is greater than 5:
=> SELECT * FROM T1 NATURAL JOIN T2 WHERE T2.val > 5;
The store_sales_fact
table and the product_dimension
table have two columns that share the same name and data type: product_key
and product_version
. The following example creates a natural join between those two tables at their shared columns:
=> SELECT product_description, sales_quantity FROM store.store_sales_fact
NATURAL JOIN public.product_dimension;
The following three queries return the same result expressed as a basic query, an inner join, and a natural join. at the table expressions are equivalent only if the common attribute in the store_sales_fact
table and the store_dimension
table is store_key
. If both tables have a column named store_key
, then the natural join would also have a store_sales_fact.store_key = store_dimension.store_key
join condition. Since the results are the same in all three instances, they are shown in the first (basic) query only:
=> SELECT store_name FROM store.store_sales_fact, store.store_dimension
WHERE store.store_sales_fact.store_key = store.store_dimension.store_key
AND store.store_dimension.store_state = 'MA' ORDER BY store_name;
store_name
------------
Store11
Store128
Store178
Store66
Store8
Store90
(6 rows)
The query written as an inner join:
=> SELECT store_name FROM store.store_sales_fact
INNER JOIN store.store_dimension
ON (store.store_sales_fact.store_key = store.store_dimension.store_key)
WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;
In the case of the natural join, the join predicate appears implicitly by comparing all of the columns in both tables that are joined by the same column name. The result set contains only one column representing the pair of equally-named columns.
=> SELECT store_name FROM store.store_sales_fact
NATURAL JOIN store.store_dimension
WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;
3 - Cross joins
Cross joins are the simplest joins to write, but they are not usually the fastest to run because they consist of all possible combinations of two tables’ records.
Cross joins are the simplest joins to write, but they are not usually the fastest to run because they consist of all possible combinations of two tables’ records. Cross joins contain no join condition and return what is known as a Cartesian product, where the number of rows in the result set is equal to the number of rows in the first table multiplied by the number of rows in the second table.
The following query returns all possible combinations from the promotion table and the store sales table:
=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;
Because this example returns over 600 million records, many cross join results can be extremely large and difficult to manage. Cross joins can be useful, however, such as when you want to return a single-row result set.
Tip
Filter out unwanted records in a cross with WHERE
clause join predicates:
=> SELECT * FROM promotion_dimension p CROSS JOIN store.store_sales_fact f
WHERE p.promotion_key LIKE f.promotion_key;
Implicit versus explicit joins
Vertica recommends that you do not write implicit cross joins (comma-separated tables in the FROM
clause). These queries can imply accidental omission of a join predicate.
The following query implicitly cross joins tables promotion_dimension
and
store.store_sales_fact
:
=> SELECT * FROM promotion_dimension, store.store_sales_fact;
It is better practice to express this cross join explicitly, as follows:
=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;
Examples
The following example creates two small tables and their superprojections and then runs a cross join on the tables:
=> CREATE TABLE employee(employee_id INT, employee_fname VARCHAR(50));
=> CREATE TABLE department(dept_id INT, dept_name VARCHAR(50));
=> INSERT INTO employee VALUES (1, 'Andrew');
=> INSERT INTO employee VALUES (2, 'Priya');
=> INSERT INTO employee VALUES (3, 'Michelle');
=> INSERT INTO department VALUES (1, 'Engineering');
=> INSERT INTO department VALUES (2, 'QA');
=> SELECT * FROM employee CROSS JOIN department;
In the result set, the cross join retrieves records from the first table and then creates a new row for every row in the 2nd table. It then does the same for the next record in the first table, and so on.
employee_id | employee_name | dept_id | dept_name
-------------+---------------+---------+-----------
1 | Andrew | 1 | Engineering
2 | Priya | 1 | Engineering
3 | Michelle | 1 | Engineering
1 | Andrew | 2 | QA
2 | Priya | 2 | QA
3 | Michelle | 2 | QA
(6 rows)