以下查询将返回存在于 Company_A 内但不存在于 Company_B 内的员工的 ID 和姓氏:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B;
id | emp_lname
------+-----------
1234 | Stephen
5678 | Alice
(2 rows)
以下查询将按照员工的姓氏对上一查询的结果进行排序:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
ORDER BY emp_lname ASC;
id | emp_lname
------+-----------
5678 | Alice
1234 | Stephen
(2 rows)
如果按列位置排序,查询将返回相同结果:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
ORDER BY 2 ASC;
id | emp_lname
------+-----------
5678 | Alice
1234 | Stephen
(2 rows)
以下查询将返回存在于 Company_A 内但不存在于 Company_B 或 Company_C 内的员工的 ID 和姓氏:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT id, emp_lname FROM Company_B
EXCEPT
SELECT id, emp_lname FROM Company_C;
id | emp_lname
------+-----------
5678 | Alice
(1 row)
以下查询将显示不匹配数据类型的结果:
=> SELECT id, emp_lname FROM Company_A
EXCEPT
SELECT emp_lname, id FROM Company_B;
ERROR 3429: For 'EXCEPT', types int and varchar are inconsistent
DETAIL: Columns: id and emp_lname
=> SELECT customer_key, customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
WHERE sales_dollar_amount > 500
EXCEPT
SELECT customer_key FROM store.store_sales_fact
WHERE tender_type = 'Cash')
AND customer_state = 'CT';
customer_key | customer_name
--------------+----------------------
15084 | Doug V. Lampert
21730 | Juanita F. Peterson
24412 | Mary U. Garnett
25840 | Ben Z. Taylor
29940 | Brian B. Dobisz
32225 | Ruth T. McNulty
33127 | Darlene Y. Rodriguez
40000 | Steve L. Lewis
44383 | Amy G. Jones
46495 | Kevin H. Taylor
(10 rows)
=> SELECT COUNT(*) FROM customer_dimension;
COUNT
-------
50000
(1 row)
返回表 customer_dimension 中一小部分行的采样:
=> SELECT customer_name, customer_state FROM customer_dimension TABLESAMPLE(0.5) WHERE customer_state='IL';
customer_name | customer_state
---------------------+----------------
Amy Y. McNulty | IL
Daniel C. Nguyen | IL
Midori O. Greenwood | IL
Meghan U. Lampert | IL
Tiffany Y. Lang | IL
Laura S. King | IL
Steve T. Nguyen | IL
Craig S. Webber | IL
Luigi A. Lewis | IL
Mark W. Williams | IL
(10 rows)
GROUP BY a,b,c,d, ROLLUP(a,b)
GROUP BY a,b,c,d, CUBE((a,b),c,d)
GROUP BY a,b,c,d, CUBE(a,b), ROLLUP (c,d)
GROUP BY ROLLUP(a), CUBE(b), GROUPING SETS(c)
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),CUBE(a,b))
GROUP BY a,b,c,d, GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())
用法注意事项
表达式不能包括聚合函数。但是,可以将 GROUP BY 子句与 CUBE、GROUPING SETS 和 ROLLUP 一起使用,以返回每个组的汇总值。
创建 GROUP BY 子句时,必须包括 SELECT 列表中显示的所有非聚合列。
如果 GROUP BY 子句包括 WHERE 子句,Vertica 会忽略所有不满足 WHERE 子句的行。
示例
此示例显示了如何将 WHERE 子句与 GROUP BY 一起使用。在这种情况下,示例检索姓氏以 S 开头的所有员工,忽略所有不符合此条件的行。GROUP BY 子句使用 ILIKE 函数仅检索以 S 开头的姓氏。聚合函数 SUM 计算每个组的总休假天数。
=> SELECT employee_last_name, SUM(vacation_days)
FROM employee_dimension
WHERE employee_last_name ILIKE 'S%'
GROUP BY employee_last_name;
employee_last_name | SUM
--------------------+------
Sanchez | 2892
Smith | 2672
Stein | 2660
(3 rows)
以下示例中的 GROUP BY 子句按供应商地区和供应商地区的最大交易对结果分组:
=> SELECT vendor_region, MAX(deal_size) AS "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
MidWest | 699163
NorthWest | 76101
South | 854136
SouthWest | 609807
West | 964005
(6 rows)
以下查询使用 HAVING 子句修改上一个查询,该子句指定仅返回最大交易规模超过 $900,000 的组:
=> SELECT vendor_region, MAX(deal_size) as "Biggest Deal"
FROM vendor_dimension
GROUP BY vendor_region
HAVING MAX(deal_size) > 900000;
vendor_region | Biggest Deal
---------------+--------------
East | 990889
West | 964005
(2 rows)
可以将 GROUP BY 子句与标量类型的一维数组一起使用。在以下示例中,grants 为 ARRAY[VARCHAR],grant_values 为 ARRAY[INT]。
=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]);
=> COPY employees FROM STDIN;
42|Physics|[US-7376,DARPA-1567]|[65000,135000]
36|Physics|[US-7376,DARPA-1567]|[10000,25000]
33|Physics|[US-7376]|[30000]
36|Astronomy|[US-7376,DARPA-1567]|[5000,4000]
\.
=> SELECT department, grants, SUM(apply_sum(grant_values)) FROM employees GROUP BY grants, department;
department | grants | SUM
------------+--------------------------+--------
Physics | ["US-7376","DARPA-1567"] | 235000
Astronomy | ["US-7376","DARPA-1567"] | 9000
Physics | ["US-7376"] | 30000
(3 rows)
不包含聚合的 GROUP BY 子句的使用方式类似于 SELECT DISTINCT。例如,以下两个查询返回相同的结果:
=> SELECT DISTINCT household_id FROM customer_dimension;
=> SELECT household_id FROM customer_dimension GROUP BY household_id;
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category, Year) ORDER BY 1,2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electricity | 2005 | 109.99
Electricity | 2006 | 109.99
Electricity | 2007 | 229.98
Electricity | | 449.96
| | 549.92
示例:将 ROLLUP 与 HAVING 子句一起使用
此示例说明如何将 ROLLUP 与 HAVING 子句一起使用,以限制 GROUP BY 结果。以下查询只生成 ROLLUP 类别,其中 year 根据 GROUPING 函数中的表达式得出小计:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY ROLLUP(Category,Year) HAVING GROUPING(Year)=1
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
此示例显示了如何限制 GROUP BY 结果以及如何将 CUBE 聚合与 HAVING 子句结合使用。此查询仅返回类别合计和总计:
=> SELECT Category, Year, SUM(Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING(Year)=1;
Category | Year | SUM
-------------+------+--------
Books | | 99.96
Electricity | | 449.96
| | 549.92
下一个查询仅返回每个年度两种类别的聚合。GROUPING ID 函数指定忽略总计 ($549.92):
=> SELECT Category, Year, SUM (Amount) FROM expenses
GROUP BY CUBE(Category,Year) HAVING GROUPING_ID(Category,Year)<2
ORDER BY 1, 2, GROUPING_ID();
Category | Year | SUM
------------+------+--------
Books | 2005 | 39.98
Books | 2007 | 29.99
Books | 2008 | 29.99
Books | | 99.96
Electrical | 2005 | 109.99
Electrical | 2006 | 109.99
Electrical | 2007 | 229.98
Electrical | | 449.96
=> SELECT id, emp_lname, sales FROM Company_A
INTERSECT
(SELECT id, emp_lname, sales FROM company_B WHERE sales > 1000)
ORDER BY sales DESC;
id | emp_lname | sales
------+-----------+-------
3214 | Smithson | 1500
(1 row)
接下来的查询返回同时为三个公司工作的员工 ID 号以及员工姓氏。
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT id, emp_lname FROM Company_B
INTERSECT
SELECT id, emp_lname FROM Company_C;
id | emp_lname
------+-----------
3214 | Smithson
(1 row)
接下来的查询返回的是不匹配的数据类型结果;这两个查询不是联集兼容的:
=> SELECT id, emp_lname FROM Company_A
INTERSECT
SELECT emp_lname, id FROM Company_B;
ERROR 3429: For 'INTERSECT', types int and varchar are inconsistent
DETAIL: Columns: id and emp_lname
=> SELECT customer_key, customer_name from public.customer_dimension
WHERE customer_key IN (SELECT customer_key
FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 400
INTERSECT
SELECT customer_key FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 500)
AND customer_state = 'CT' ORDER BY customer_key;
customer_key | customer_name
--------------+------------------------
39 | Sarah S. Winkler
44 | Meghan H. Overstreet
70 | Jack X. Cleveland
103 | Alexandra I. Vu
110 | Matt . Farmer
173 | Mary R. Reyes
188 | Steve G. Williams
233 | Theodore V. McNulty
250 | Marcus E. Williams
294 | Samantha V. Young
313 | Meghan P. Pavlov
375 | Sally N. Vu
384 | Emily R. Smith
387 | Emily L. Garcia
...
上一个查询和下一个查询是等效的,并且返回相同的结果:
=> SELECT customer_key,customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key
FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 400
AND sales_dollar_amount < 500)
AND customer_state = 'CT' ORDER BY customer_key;
=> SELECT * INTO TEMP TABLE newTempTable FROM customer_dimension;
WARNING 4102: No rows are inserted into table "public"."newTempTable" because
ON COMMIT DELETE ROWS is the default for create temporary table
HINT: Use "ON COMMIT PRESERVE ROWS" to preserve the data in temporary table
CREATE TABLE
=> SELECT store_region, store_city||', '||store_state location, store_name, number_of_employees FROM store.store_dimension
LIMIT 2 OVER (PARTITION BY store_region ORDER BY number_of_employees ASC);
store_region | location | store_name | number_of_employees
--------------+---------------------+------------+---------------------
West | Norwalk, CA | Store43 | 10
West | Lancaster, CA | Store95 | 11
East | Stamford, CT | Store219 | 12
East | New York, NY | Store122 | 12
SouthWest | North Las Vegas, NV | Store170 | 10
SouthWest | Phoenix, AZ | Store228 | 11
NorthWest | Bellevue, WA | Store200 | 19
NorthWest | Portland, OR | Store39 | 22
MidWest | South Bend, IN | Store134 | 10
MidWest | Evansville, IN | Store30 | 11
South | Mesquite, TX | Store124 | 10
South | Beaumont, TX | Store226 | 11
(12 rows)
8 - MATCH 子句
允许您以搜索事件模式筛选大量历史数据的 SQL 扩展, MATCH 子句为解析分区、排序提供子类,并基于您定义的模式匹配结果表的行。
您指定一个由 DEFINE 子句中定义的事件类型组成的模式作为正则表达式,其中每个事件都与输入表中的一行对应。接下来,您可以在输入事件序列内搜索模式。模式匹配返回符合 PATTERN 子句的连续行序列。例如,模式 P (A B* C) 由三个事件类型组成:A、B 和 C,当 Vertica 在输入表中找到匹配时,相关模式实例必须是一个 A 类型事件,且后接零个或多个 B 类型事件和一个 C 类型事件。
如果您想在点击流分析中根据用户的 Web 浏览行为(页面点击)确定用户的操作,那么模式匹配就尤其有用。有关详细信息,请参阅事件系列模式匹配。
语法
MATCH ( [ PARTITION BY table‑column ] ORDER BY table‑column
DEFINE event‑name AS boolean‑expr [,...]
PATTERN pattern‑name AS ( regexp )
[ rows‑match‑clause ] )
参数
PARTITION BY
定义与 PATTERN 子句中定义的模式相匹配的窗口数据范围。分区子句通过 PATTERN 子句中定义的匹配模式将数据分区。使用 OEDER BY 子句对每个分区中的数据进行排序。如果忽略分区子句,则将整个数据集视为单个分区。
DEFINE
Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE
'%website2.com%',
Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',
Purchase AS PageURL ILIKE '%website2.com%' AND Action='P'
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN pattern‑name
在 PATTERN 子句中定义的模式的名称,例如,P 是如下文定义的模式名称:
PATTERN P AS (...)
PATTERN 是由名称和正则表达式组成的搜索模式。
注意
Vertica 支持每次查询一个模式。
regexp
一个正则表达式,由 DEFINE 子句中定义的事件类型和下面的一个或多个限定符组成。当 Vertica 评估 MATCH 子句时,正则表达式会识别满足表达式条件的行。
rows‑match‑clause
指定如何解决单一行中评估为真的事件多于一个的问题,为以下几项之一:
ROWS MATCH ALL EVENTS:如果单一行中的多个事件评估为真,Vertica 将返回以下错误:
ERROR: pattern events must be mutually exclusive
HINT: try using ROWS MATCH FIRST EVENT
ROWS MATCH FIRST EVENT:如果给定行中的多个事件评估为真,Vertica 将使用该行的 SQL 语句中的第一个事件。
模式语意评估
SQL 子句的语意评估顺序是:FROM -> WHERE -> PATTERN MATCH -> SELECT。
数据根据 PARTITION BY 子句的规定分区。如果忽略分区子句,则将整个数据集视为单个分区。
对于每个分区,顺序子句指定如何针对模式匹配对输入数据排序。
对每一行的事件进行评估。一行可以具有 0、1 或 N 个事件评估为真。如果同一行中评估为真的事件超过一个,除非您指定 ROWS MATCH FIRST EVENT,否则 Vertica 将返回一个运行时错误。如果指定了 ROWS MATCH FIRST EVENT,且单一行中评估为真的事件多于一个,Vertica 将选择 SQL 语句中第一个定义的事件用于行。
搜索事件模式时,您可以使用 SQL MATCH 子句 语法来筛选大量历史数据。您可以将模式指定为正则表达式,然后可以在输入事件序列内搜索该模式。MATCH 提供了分析数据分区和排序的子句,以及对连续行集执行的模式匹配。
如果您想在点击流分析中根据用户的 Web 浏览行为(页面点击)确定用户的操作,那么模式匹配就尤其有用。典型的在线点击流漏斗是:
公司主页 -> 产品主页 -> 搜索 -> 结果 -> 在线购买
您可以使用此单击流漏斗,在用户的 Web 点击序列中搜索匹配并标识该用户:
登录公司主页
导航至产品页面
运行查询
单击搜索结果中的链接
购买
单击流漏斗架构
此主题中的示例使用了此点击流漏斗以及以下 clickstream_log 表架构:
=> CREATE TABLE clickstream_log (
uid INT, --user ID
sid INT, --browsing session ID, produced by previous sessionization computation
ts TIME, --timestamp that occurred during the user's page visit
refURL VARCHAR(20), --URL of the page referencing PageURL
pageURL VARCHAR(20), --URL of the page being visited
action CHAR(1) --action the user took after visiting the page ('P' = Purchase, 'V' = View)
);
INSERT INTO clickstream_log VALUES (1,100,'12:00','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:01','website2.com/home','website2.com/floby', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:02','website2.com/floby','website2.com/shamwow', 'V');
INSERT INTO clickstream_log values (1,100,'12:03','website2.com/shamwow','website2.com/buy', 'P');
INSERT INTO clickstream_log values (2,100,'12:10','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log values (2,100,'12:11','website2.com/home','website2.com/forks', 'V');
INSERT INTO clickstream_log values (2,100,'12:13','website2.com/forks','website2.com/buy', 'P');
COMMIT;
此示例包括 Vertica MATCH 子句函数,以便分析用户在 website2.com 上的浏览历史记录。它会确定用户执行以下任务的模式:
从其他网站登录 website2.com(进入)
浏览任意数量的其他页面(站内)
做出购买(购买)
在以下语句中,模式 P (Entry Onsite* Purchase) 包括三种事件类型:进入、站内和购买。当 Vertica 在输入表中找到匹配时,相关模式实例必须是一个进入事件类型,且后跟零个或多个站内事件类型以及一个购买事件类型
=> SELECT uid,
sid,
ts,
refurl,
pageurl,
action,
event_name(),
pattern_id(),
match_id()
FROM clickstream_log
MATCH
(PARTITION BY uid, sid ORDER BY ts
DEFINE
Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%',
Onsite AS PageURL ILIKE '%website2.com%' AND Action='V',
Purchase AS PageURL ILIKE '%website2.com%' AND Action = 'P'
PATTERN
P AS (Entry Onsite* Purchase)
ROWS MATCH FIRST EVENT);
=> SELECT customer_name, customer_gender FROM customer_dimension
WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name;
customer_name | customer_gender
----------------------+-----------------
Amy X. Lang | Female
Anna H. Li | Female
Brian O. Weaver | Male
Craig O. Pavlov | Male
Doug Z. Goldberg | Male
Harold S. Jones | Male
Jack E. Perkins | Male
Joseph W. Overstreet | Male
Kevin . Campbell | Male
Raja Y. Wilson | Male
Samantha O. Brown | Female
Steve H. Gauthier | Male
William . Nielson | Male
William Z. Roy | Male
(14 rows)
=> SELECT customer_name, customer_gender FROM customer_dimension
WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name OFFSET 8;
customer_name | customer_gender
-------------------+-----------------
Kevin . Campbell | Male
Raja Y. Wilson | Male
Samantha O. Brown | Female
Steve H. Gauthier | Male
William . Nielson | Male
William Z. Roy | Male
(6 rows)
=> CREATE TABLE t(geom geometry(200), geog geography(200));
=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
AS SEL_0 FROM t ORDER BY geog;
ERROR 2521: Cannot specify anything other than user defined transforms and partitioning expressions in the ORDER BY list
以下示例使用同一个表更正了此错误。
=> SELECT PolygonPoint(geom) OVER(PARTITION BY geom)
AS SEL_0 FROM t ORDER BY geom;
以下示例使用 ORDER BY 子句中的一个数组。
=> CREATE TABLE employees (id INT, department VARCHAR(50), grants ARRAY[VARCHAR], grant_values ARRAY[INT]);
=> COPY employees FROM STDIN;
42|Physics|[US-7376,DARPA-1567]|[65000,135000]
36|Physics|[US-7376,DARPA-1567]|[10000,25000]
33|Physics|[US-7376]|[30000]
36|Astronomy|[US-7376,DARPA-1567]|[5000,4000]
\.
=> SELECT * FROM employees ORDER BY grant_values;
id | department | grants | grant_values
----+------------+--------------------------+----------------
36 | Astronomy | ["US-7376","DARPA-1567"] | [5000,4000]
36 | Physics | ["US-7376","DARPA-1567"] | [10000,25000]
33 | Physics | ["US-7376"] | [30000]
42 | Physics | ["US-7376","DARPA-1567"] | [65000,135000]
(4 rows)
不允许使用“GROUP BY”和“HAVING”子句。如果在间隙填充和插值 (GFI) 之前或之后需要执行“GROUP BY”操作,请使用子查询,并且将“GROUP BY”放入外部查询。例如: => SELECT symbol, AVG(first_bid) as avg_bid FROM ( SELECT symbol, slice_time, TS_FIRST_VALUE(bid1) AS first_bid FROM Tickstore WHERE symbol IN ('MSFT', 'IBM') TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts) ) AS resultOfGFI GROUP BY symbol;
例如,以下两个查询返回语法错误,因为 bid1 不是 PARTITION BY 或 GROUP BY 列:
=> SELECT bid, symbol, TS_FIRST_VALUE(bid) FROM Tickstore
TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts);
ERROR: column "Tickstore.bid" must appear in the PARTITION BY list of Timeseries clause or be used in a Timeseries Output function
=> SELECT bid, symbol, AVG(bid) FROM Tickstore
GROUP BY symbol;
ERROR: column "Tickstore.bid" must appear in the GROUP BY clause or be used in an aggregate function
UNION 语句可以指定自己的 ORDER BY、LIMIT 和 OFFSET 子句。例如,对于以下示例中描述的表,以下查询将按 emp_name 对 UNION 结果集进行排序,并将输出限制为前两行:
=> SELECT id, emp_name FROM company_a UNION ALL SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2;
id | emp_name
------+----------
5678 | Alice
8765 | Bob
(2 rows)
UNION 子句中的每个 SELECT 语句都可以指定自己的 ORDER BY、LIMIT 和 OFFSET 子句。在这种情况下,SELECT 语句必须用括号括起来。Vertica 首先处理 SELECT 语句的 ORDER BY、LIMIT 和 OFFSET 子句,然后再处理 UNION 子句。
例如,以下 UNION 中的每个 SELECT 语句都指定了自己的 ORDER BY 和 LIMIT 子句。Vertica 处理单个查询,然后连接两个结果集:
=> (SELECT id, emp_name FROM company_a ORDER BY emp_name LIMIT 2)
UNION ALL
(SELECT id, emp_name FROM company_b ORDER BY emp_name LIMIT 2);
id | emp_name
------+-----------
5678 | Alice
9012 | Katherine
8765 | Bob
9012 | Katherine
(4 rows)
=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension WHERE customer_key IN
(SELECT customer_key FROM store.store_sales_fact WHERE sales_dollar_amount > 500
UNION ALL
SELECT customer_key FROM online_sales.online_sales_fact WHERE sales_dollar_amount > 500)
AND customer_state = 'CT';
customer_key | customer_name
--------------+------------------------
7021 | Luigi T. Dobisz
1971 | Betty V. Dobisz
46284 | Ben C. Gauthier
33885 | Tanya Y. Taylor
5449 | Sarah O. Robinson
29059 | Sally Z. Fortin
11200 | Foodhope
15582 | John J. McNulty
24638 | Alexandra F. Jones
...
示例
使用这两个表的示例:
company_a
ID emp_name dept sales
------+------------+-------------+-------
1234 | Stephen | auto parts | 1000
5678 | Alice | auto parts | 2500
9012 | Katherine | floral | 500
company_b
ID emp_name dept sales
------+------------+-------------+-------
4321 | Marvin | home goods | 250
9012 | Katherine | home goods | 500
8765 | Bob | electronics | 20000
从 company_a 和 company_b 中查找所有员工 ID 和姓名
UNION 语句指定选项 DISTINCT,将合并员工的唯一 ID 和姓氏;Katherine 同时为两个公司工作,所以她在结果集中只出现一次。DISTINCT 为默认值,可以忽略:
=> SELECT id, emp_name FROM company_a UNION DISTINCT SELECT id, emp_name FROM company_b ORDER BY id;
id | emp_name
------+-----------
1234 | Stephen
4321 | Marvin
5678 | Alice
8765 | Bob
9012 | Katherine
(5 rows)
下一个 UNION 语句指定选项 ALL。Katherine 同时为两个公司工作,因此查询返回两条与她相关的记录:
=> SELECT id, emp_name FROM company_a UNION ALL SELECT id, emp_name FROM company_b ORDER BY id;
id | emp_name
------+-----------
1234 | Stephen
5678 | Alice
9012 | Katherine
4321 | Marvin
9012 | Katherine
8765 | Bob
(6 rows)
查找每个公司里成绩最好的两名销售人员
每个 SELECT 语句都指定了自己的 ORDER BY 和 LIMIT 子句,因此 UNION 语句将连接每个查询返回的结果集:
=> (SELECT id, emp_name, sales FROM company_a ORDER BY sales DESC LIMIT 2)
UNION ALL
(SELECT id, emp_name, sales FROM company_b ORDER BY sales DESC LIMIT 2);
id | emp_name | sales
------+-----------+-------
8765 | Bob | 20000
5678 | Alice | 2500
1234 | Stephen | 1000
9012 | Katherine | 500
(4 rows)
查找所有员工的销售排序
UNION 语句指定其自己的 ORDER BY 子句,Vertica 将其应用于整个结果:
=> SELECT id, emp_name, sales FROM company_a
UNION
SELECT id, emp_name, sales FROM company_b
ORDER BY sales;
id | emp_name | sales
------+-----------+-------
4321 | Marvin | 250
9012 | Katherine | 500
1234 | Stephen | 1000
5678 | Alice | 2500
8765 | Bob | 20000
(5 rows)
计算各公司各部门的销售总量
每个 SELECT 语句都有自己的 GROUP BY 子句。UNION 将合并每个查询的聚合结果:
=> (SELECT 'Company A' as company, dept, SUM(sales) FROM company_a
GROUP BY dept)
UNION
(SELECT 'Company B' as company, dept, SUM(sales) FROM company_b
GROUP BY dept)
ORDER BY 1;
company | dept | sum
-----------+-------------+-------
Company A | auto parts | 3500
Company A | floral | 500
Company B | electronics | 20000
Company B | home goods | 750
(4 rows)
启用当前 WITH 子句中所有查询的实体化。否则,实体化由配置参数 WithClauseMaterialization 设置,默认设置为 0(禁用)。如果禁用 WithClauseMaterialization,则 WITH 子句的主查询返回时会自动清除实体化。有关详细信息,请参阅WITH 子句的实体化。
RECURSIVE
指定通过重复执行嵌入的 UNION 或 UNION ALL 语句来迭代 WITH 子句的结果集。请参阅下面的递归查询。
cte‑identifier
标识 WITH 子句中的公用表表达式 (CTE)。此标识符可用于同一 WITH 子句中的 CTE,也可用于父 WITH 子句和子 WITH 子句(如果有)中的 CTE。最外层(主)WITH 子句的 CTE 标识符也可用于主要查询。
同一 WITH 子句的所有 CTE 标识符必须是唯一的。例如,以下 WITH 子句定义了两个 CTE,因此它们需要唯一的标识符: regional_sales 和 top_regions:
WITH
-- query sale amounts for each region
regional_sales AS (SELECT ... ),
top_regions AS ( SELECT ... )
)
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY vendor_key)
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
subordinate‑WITH‑clause
嵌套在当前 WITH 子句中的 WITH 子句。此 WITH 子句中的 CTE 只能引用同一子句中的 CTE,或者父 WITH 子句和子 WITH 子句中的 CTE。
重要
主要查询只能引用最外层 WITH 子句中的 CTE。它不能引用任何嵌套的 WITH 子句中的 CTE。
query-expression
给定 CTE 的查询。
限制
WITH 子句仅支持 SELECT 和 INSERT 语句。它们不支持 UPDATE 或 DELETE 语句。
递归查询
包含 RECURSIVE 选项的 WITH 子句可以重复执行 UNION 或 UNION ALL 查询,从而迭代其自身的输出。递归查询在处理分层结构(例如,经理下属关系)或树状结构数据(如分类法)等自引用数据时十分有用。
配置参数 WithClauseRecursionLimit(默认设置为 8)将设置递归的最大深度。您可以分别使用 ALTER DATABASE 和 ALTER SESSION 在数据库和会话范围内设置此参数。递归将会继续,直到达到配置的最大深度为止,或者直到最后一次迭代返回没有数据为止。
可以按如下方式指定递归 WITH 子句:
WITH [ /*+ENABLE_WITH_CLAUSE_MATERIALIZATION*/ ] RECURSIVE
cte‑identifier [ ( column-aliases ) ] AS (
non-recursive-term
UNION [ ALL ]
recursive-term
)
下面的 SQL 定义了一个包含单个 CTE 的 WITH 子句 revenue,该子句将聚合表 store.store_orders_fact 中的数据。主要查询将引用两次 WITH 子句结果集:在其 JOIN 子句和谓词中:
-- define WITH clause
WITH revenue ( vkey, total_revenue ) AS (
SELECT vendor_key, SUM(total_order_cost)
FROM store.store_orders_fact
GROUP BY vendor_key ORDER BY 1)
-- End WITH clause
-- primary query
SELECT v.vendor_name, v.vendor_address, v.vendor_city, r.total_revenue
FROM vendor_dimension v JOIN revenue r ON v.vendor_key = r.vkey
WHERE r.total_revenue = (SELECT MAX(total_revenue) FROM revenue )
ORDER BY vendor_name;
vendor_name | vendor_address | vendor_city | total_revenue
------------------+----------------+-------------+---------------
Frozen Suppliers | 471 Mission St | Peoria | 49877044
(1 row)
包含多个 CTE 的单个 WITH 子句
在以下示例中,WITH 子句包含两个 CTE:
regional_sales 每个地区的销售总量
top_regions 使用 regional_sales 的结果集确定销售总量最高的三个地区:
主要查询在 top_regions 结果集中按地区和部门聚合销量:
WITH
-- query sale amounts for each region
regional_sales (region, total_sales) AS (
SELECT sd.store_region, SUM(of.total_order_cost) AS total_sales
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY store_region ),
-- query previous result set
top_regions AS (
SELECT region, total_sales
FROM regional_sales ORDER BY total_sales DESC LIMIT 3
)
-- primary query
-- aggregate sales in top_regions result set
SELECT sd.store_region AS region, pd.department_description AS department, SUM(of.total_order_cost) AS product_sales
FROM store.store_orders_fact of
JOIN store.store_dimension sd ON sd.store_key = of.store_key
JOIN public.product_dimension pd ON of.product_key = pd.product_key
WHERE sd.store_region IN (SELECT region FROM top_regions)
GROUP BY ROLLUP (region, department) ORDER BY region, product_sales DESC, GROUPING_ID();
region | department | product_sales
---------+----------------------------------+---------------
East | | 1716917786
East | Meat | 189837962
East | Produce | 170607880
East | Photography | 162271618
East | Frozen Goods | 141077867
East | Gifts | 137604397
East | Bakery | 136497842
East | Liquor | 130410463
East | Canned Goods | 128683257
East | Cleaning supplies | 118996326
East | Dairy | 118866901
East | Seafood | 109986665
East | Medical | 100404891
East | Pharmacy | 71671717
MidWest | | 1287550770
MidWest | Meat | 141446607
MidWest | Produce | 125156100
MidWest | Photography | 122666753
MidWest | Frozen Goods | 105893534
MidWest | Gifts | 103088595
MidWest | Bakery | 102844467
MidWest | Canned Goods | 97647270
MidWest | Liquor | 97306898
MidWest | Cleaning supplies | 90775242
MidWest | Dairy | 89065443
MidWest | Seafood | 82541528
MidWest | Medical | 76674814
MidWest | Pharmacy | 52443519
West | | 2159765937
West | Meat | 235841506
West | Produce | 215277204
West | Photography | 205949467
West | Frozen Goods | 178311593
West | Bakery | 172824555
West | Gifts | 172134780
West | Liquor | 164798022
West | Canned Goods | 163330813
West | Cleaning supplies | 148776443
West | Dairy | 145244575
West | Seafood | 139464407
West | Medical | 126184049
West | Pharmacy | 91628523
| | 5164234493
(43 rows)
包含 WITH 子句的 INSERT 语句
以下 SQL 使用 WITH 子句将 JOIN 查询中的数据插入到表 total_store_sales 中:
CREATE TABLE total_store_sales (store_key int, region VARCHAR(20), store_sales numeric (12,2));
INSERT INTO total_store_sales
WITH store_sales AS (
SELECT sd.store_key, sd.store_region::VARCHAR(20), SUM (of.total_order_cost)
FROM store.store_dimension sd JOIN store.store_orders_fact of ON sd.store_key = of.store_key
GROUP BY sd.store_region, sd.store_key ORDER BY sd.store_region, sd.store_key)
SELECT * FROM store_sales;
=> SELECT * FROM total_store_sales ORDER BY region, store_key;
store_key | region | store_sales
-----------+-----------+-------------
2 | East | 47668303.00
6 | East | 48136354.00
12 | East | 46673113.00
22 | East | 48711211.00
24 | East | 48603836.00
31 | East | 46836469.00
36 | East | 48461449.00
37 | East | 48018279.00
41 | East | 48713084.00
44 | East | 47808362.00
49 | East | 46990023.00
50 | East | 47643329.00
9 | MidWest | 46851087.00
15 | MidWest | 48787354.00
27 | MidWest | 48497620.00
29 | MidWest | 47639234.00
30 | MidWest | 49013483.00
38 | MidWest | 48856012.00
42 | MidWest | 47297912.00
45 | MidWest | 48544521.00
46 | MidWest | 48887255.00
4 | NorthWest | 47580215.00
39 | NorthWest | 47136892.00
47 | NorthWest | 48477574.00
8 | South | 48131455.00
13 | South | 47605422.00
17 | South | 46054367.00
...
(50 rows)