SQL queries

All DML (Data Manipulation Language) statements can contain queries.

All DML (Data Manipulation Language) statements can contain queries. This section introduces some of the query types in Vertica, with additional details in later sections.

Simple queries

Simple queries contain a query against one table. Minimal effort is required to process the following query, which looks for product keys and SKU numbers in the product table:

=> SELECT product_key, sku_number FROM public.product_dimension;
product_key  | sku_number
-------------+-----------
43           | SKU-#129
87           | SKU-#250
42           | SKU-#125
49           | SKU-#154
37           | SKU-#107
36           | SKU-#106
86           | SKU-#248
41           | SKU-#121
88           | SKU-#257
40           | SKU-#120
(10 rows)

Tables can contain arrays. You can select the entire array column, an index into it, or the results of a function applied to the array. For more information, see Arrays and sets (collections).

Joins

Joins use a relational operator that combines information from two or more tables. The query's ON clause specifies how tables are combined, such as by matching foreign keys to primary keys. In the following example, the query requests the names of stores with transactions greater than 70 by joining the store key ID from the store schema's sales fact and sales tables:

=> SELECT store_name, COUNT(*) FROM store.store_sales_fact
   JOIN store.store_dimension ON store.store_sales_fact.store_key = store.store_dimension.store_key
   GROUP BY store_name HAVING COUNT(*) > 70 ORDER BY store_name;

 store_name | count
------------+-------
 Store49    |    72
 Store83    |    78
(2 rows)

For more detailed information, see Joins. See also the Multicolumn subqueries section in Subquery examples.

Cross joins

Also known as the Cartesian product, a cross join is the result of joining every record in one table with every record in another table. A cross join occurs when there is no join key between tables to restrict records. The following query, for example, returns all instances of vendor and store names in the vendor and store tables:

=> SELECT vendor_name, store_name FROM public.vendor_dimension
    CROSS JOIN store.store_dimension;
vendor_name         | store_name
--------------------+------------
Deal Warehouse      | Store41
Deal Warehouse      | Store12
Deal Warehouse      | Store46
Deal Warehouse      | Store50
Deal Warehouse      | Store15
Deal Warehouse      | Store48
Deal Warehouse      | Store39
Sundry Wholesale    | Store41
Sundry Wholesale    | Store12
Sundry Wholesale    | Store46
Sundry Wholesale    | Store50
Sundry Wholesale    | Store15
Sundry Wholesale    | Store48
Sundry Wholesale    | Store39
Market Discounters  | Store41
Market Discounters  | Store12
Market Discounters  | Store46
Market Discounters  | Store50
Market Discounters  | Store15
Market Discounters  | Store48
Market Discounters  | Store39
Market Suppliers    | Store41
Market Suppliers    | Store12
Market Suppliers    | Store46
Market Suppliers    | Store50
Market Suppliers    | Store15
Market Suppliers    | Store48
Market Suppliers    | Store39
...                 | ...
(4000 rows)

This example's output is truncated because this particular cross join returned several thousand rows. See also Cross joins.

Subqueries

A subquery is a query nested within another query. In the following example, we want a list of all products containing the highest fat content. The inner query (subquery) returns the product containing the highest fat content among all food products to the outer query block (containing query). The outer query then uses that information to return the names of the products containing the highest fat content.

=> SELECT product_description, fat_content FROM public.product_dimension
   WHERE fat_content IN
     (SELECT MAX(fat_content) FROM public.product_dimension
      WHERE category_description = 'Food' AND department_description = 'Bakery')
   LIMIT 10;
         product_description         | fat_content
-------------------------------------+-------------
 Brand #59110 hotdog buns            |          90
 Brand #58107 english muffins        |          90
 Brand #57135 english muffins        |          90
 Brand #54870 cinnamon buns          |          90
 Brand #53690 english muffins        |          90
 Brand #53096 bagels                 |          90
 Brand #50678 chocolate chip cookies |          90
 Brand #49269 wheat bread            |          90
 Brand #47156 coffee cake            |          90
 Brand #43844 corn muffins           |          90
(10 rows)

For more information, see Subqueries.

Sorting queries

Use the ORDER BY clause to order the rows that a query returns.

Special note about query results

You could get different results running certain queries on one machine or another for the following reasons:

  • Partitioning on a FLOAT type could return nondeterministic results because of the precision, especially when the numbers are close to one another, such as results from the RADIANS() function, which has a very small range of output.

    To get deterministic results, use NUMERIC if you must partition by data that is not an INTEGER type.

  • Most analytics (with analytic aggregations, such as MIN()/MAX()/SUM()/COUNT()/AVG() as exceptions) rely on a unique order of input data to get deterministic result. If the analytic window-order clause cannot resolve ties in the data, results could be different each time you run the query.

    For example, in the following query, the analytic ORDER BY does not include the first column in the query, promotion_key. So for a tie of AVG(RADIANS(cost_dollar_amount)), product_version, the same promotion_key could have different positions within the analytic partition, resulting in a different NTILE() number. Thus, DISTINCT could also have a different result:

    => SELECT COUNT(*) FROM
          (SELECT DISTINCT SIN(FLOOR(MAX(store.store_sales_fact.promotion_key))),
       NTILE(79) OVER(PARTITION BY AVG (RADIANS
          (store.store_sales_fact.cost_dollar_amount ))
       ORDER BY store.store_sales_fact.product_version)
       FROM store.store_sales_fact
       GROUP BY store.store_sales_fact.product_version,
             store.store_sales_fact.sales_dollar_amount ) AS store;
     count
    -------
      1425
    (1 row)
    

    If you add MAX(promotion_key) to analytic ORDER BY, the results are the same on any machine:

    => SELECT COUNT(*) FROM (SELECT DISTINCT MAX(store.store_sales_fact.promotion_key),
        NTILE(79) OVER(PARTITION BY MAX(store.store_sales_fact.cost_dollar_amount)
       ORDER BY store.store_sales_fact.product_version,
       MAX(store.store_sales_fact.promotion_key))
       FROM store.store_sales_fact
       GROUP BY store.store_sales_fact.product_version,
         store.store_sales_fact.sales_dollar_amount) AS store;