Subquery examples

This topic illustrates some of the subqueries you can write.

This topic illustrates some of the subqueries you can write. The examples use the VMart example database.

Single-row subqueries

Single-row subqueries are used with single-row comparison operators (=, >=, <=, <>, and <=>) and return exactly one row.

For example, the following query retrieves the name and hire date of the oldest employee in the Vmart database:

=> SELECT employee_key, employee_first_name, employee_last_name, hire_date
   FROM employee_dimension
   WHERE hire_date = (SELECT MIN(hire_date) FROM employee_dimension);
 employee_key | employee_first_name | employee_last_name | hire_date
--------------+---------------------+--------------------+------------
         2292 | Mary                | Bauer              | 1956-01-11
(1 row)

Multiple-row subqueries

Multiple-row subqueries return multiple records.

For example, the following IN clause subquery returns the names of the employees making the highest salary in each of the six regions:

=> SELECT employee_first_name, employee_last_name, annual_salary, employee_region
    FROM employee_dimension WHERE annual_salary IN
     (SELECT MAX(annual_salary) FROM employee_dimension GROUP BY employee_region)
   ORDER BY annual_salary DESC;
 employee_first_name | employee_last_name | annual_salary |  employee_region
---------------------+--------------------+---------------+-------------------
 Alexandra           | Sanchez            |        992363 | West
 Mark                | Vogel              |        983634 | South
 Tiffany             | Vu                 |        977716 | SouthWest
 Barbara             | Lewis              |        957949 | MidWest
 Sally               | Gauthier           |        927335 | East
 Wendy               | Nielson            |        777037 | NorthWest
(6 rows)

Multicolumn subqueries

Multicolumn subqueries return one or more columns. Sometimes a subquery's result set is evaluated in the containing query in column-to-column and row-to-row comparisons.

You can substitute some multicolumn subqueries with a join, with the reverse being true as well. For example, the following two queries ask for the sales transactions of all products sold online to customers located in Massachusetts and return the same result set. The only difference is the first query is written as a join and the second is written as a subquery.

Join query: Subquery:
=> SELECT *
   FROM online_sales.online_sales_fact
   INNER JOIN public.customer_dimension
   USING (customer_key)
   WHERE customer_state = 'MA';
=> SELECT *
   FROM online_sales.online_sales_fact
   WHERE customer_key IN
     (SELECT customer_key
      FROM public.customer_dimension
      WHERE customer_state = 'MA');

The following query returns all employees in each region whose salary is above the average:

=> SELECT e.employee_first_name, e.employee_last_name, e.annual_salary,
      e.employee_region, s.average
   FROM employee_dimension e,
     (SELECT employee_region, AVG(annual_salary) AS average
      FROM employee_dimension GROUP BY employee_region) AS s
   WHERE  e.employee_region = s.employee_region AND e.annual_salary > s.average
   ORDER BY annual_salary DESC;
 employee_first_name | employee_last_name | annual_salary | employee_region |     average
---------------------+--------------------+---------------+-----------------+------------------
 Doug                | Overstreet         |        995533 | East            |  61192.786013986
 Matt                | Gauthier           |        988807 | South           | 57337.8638902996
 Lauren              | Nguyen             |        968625 | West            | 56848.4274914089
 Jack                | Campbell           |        963914 | West            | 56848.4274914089
 William             | Martin             |        943477 | NorthWest       | 58928.2276119403
 Luigi               | Campbell           |        939255 | MidWest         | 59614.9170454545
 Sarah               | Brown              |        901619 | South           | 57337.8638902996
 Craig               | Goldberg           |        895836 | East            |  61192.786013986
 Sam                 | Vu                 |        889841 | MidWest         | 59614.9170454545
 Luigi               | Sanchez            |        885078 | MidWest         | 59614.9170454545
 Michael             | Weaver             |        882685 | South           | 57337.8638902996
 Doug                | Pavlov             |        881443 | SouthWest       | 57187.2510548523
 Ruth                | McNulty            |        874897 | East            |  61192.786013986
 Luigi               | Dobisz             |        868213 | West            | 56848.4274914089
 Laura               | Lang               |        865829 | East            |  61192.786013986
 ...

You can also use the EXCEPT, INTERSECT, and UNION [ALL] keywords in FROM, WHERE, and HAVING clauses.

The following subquery returns information about all Connecticut-based customers who bought items through either stores or online sales channel and whose purchases amounted to more than 500 dollars:

=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
   WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
      WHERE sales_dollar_amount > 500
      UNION ALL
      SELECT customer_key FROM online_sales.online_sales_fact
      WHERE sales_dollar_amount > 500)
   AND customer_state = 'CT';
 customer_key |  customer_name
--------------+------------------
          200 | Carla Y. Kramer
          733 | Mary Z. Vogel
          931 | Lauren X. Roy
         1533 | James C. Vu
         2948 | Infocare
         4909 | Matt Z. Winkler
         5311 | John Z. Goldberg
         5520 | Laura M. Martin
         5623 | Daniel R. Kramer
         6759 | Daniel Q. Nguyen
 ...

HAVING clause subqueries

A HAVING clause is used in conjunction with the GROUP BY clause to filter the select-list records that a GROUP BY returns. HAVING clause subqueries must use Boolean comparison operators: =, >, <, <>, <=, >= and take the following form:

SELECT <column, ...>
FROM <table>
GROUP BY <expression>
HAVING <expression>
  (SELECT <column, ...>
   FROM <table>
   HAVING <expression>);

For example, the following statement uses the VMart database and returns the number of customers who purchased lowfat products. Note that the GROUP BY clause is required because the query uses an aggregate (COUNT).

=> SELECT s.product_key, COUNT(s.customer_key) FROM store.store_sales_fact s
   GROUP BY s.product_key HAVING s.product_key IN
     (SELECT product_key FROM product_dimension WHERE diet_type = 'Low Fat');

The subquery first returns the product keys for all low-fat products, and the outer query then counts the total number of customers who purchased those products.

 product_key | count
 -------------+-------
          15 |     2
          41 |     1
          66 |     1
         106 |     1
         118 |     1
         169 |     1
         181 |     2
         184 |     2
         186 |     2
         211 |     1
         229 |     1
         267 |     1
         289 |     1
         334 |     2
         336 |     1
(15 rows)