A live aggregate projection contains columns with values that are aggregated from columns in its anchor table. When you load data into the table, Vertica aggregates the data before loading it into the live aggregate projection. On subsequent loads—for example, through INSERTor COPY—Vertica recalculates aggregations with the new data and updates the projection.
This is the multi-page printable view of this section. Click here to print.
Live aggregate projections
- 1: Functions supported for live aggregate projections
- 2: Creating live aggregate projections
- 3: Live aggregate projection example
1 - Functions supported for live aggregate projections
Vertica can aggregate results in live aggregate projections from the following aggregate functions:
Aggregate functions with DISTINCT
Live aggregate projections can support queries that include aggregate functions qualified with the keyword DISTINCT
. The following requirements apply:
-
The aggregated expression must evaluate to a non-constant.
-
The projection's
GROUP BY
clause must specify the aggregated expression.
For example, the following query uses SUM(DISTINCT)
to calculate the total of all unique salaries in a given region:
SELECT customer_region, SUM(DISTINCT annual_income)::INT
FROM customer_dimension GROUP BY customer_region;
This query can use the following live aggregate projection, which specifies the aggregated column (annual_income
) in its GROUP BY
clause:
CREATE PROJECTION public.TotalRegionalIncome
(
customer_region,
annual_income,
Count
)
AS
SELECT customer_dimension.customer_region,
customer_dimension.annual_income,
count(*) AS Count
FROM public.customer_dimension
GROUP BY customer_dimension.customer_region,
customer_dimension.annual_income
;
Note
This projection includes the aggregate functionCOUNT
, which here serves no logical objective; it is included only because live aggregate projections require at least one aggregate function.
2 - Creating live aggregate projections
You define a live aggregate projection with the following syntax:
=> CREATE PROJECTION proj-name AS
SELECT select-expression FROM table
GROUP BY group-expression;
For full syntax options, see CREATE PROJECTION.
For example:
=> CREATE PROJECTION clicks_agg AS
SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
GROUP BY page_id, click_time::DATE KSAFE 1;
For an extended discussion, see Live aggregate projection example.
Requirements
The following requirements apply to live aggregate projections:
-
The projection cannot be unsegmented. Unless the CREATE PROJECTION statement or its anchor table DDL specifies otherwise, all projections are segmented by default.
-
SELECT and GROUP BY columns must be in the same order. GROUP BY expressions must be at the beginning of the SELECT list.
Restrictions
The following restrictions apply to live aggregate projections:
-
MERGE operations must be optimized if they are performed on target tables that have live aggregate projections.
-
Live aggregate projections can reference only one table.
-
Live aggregate projections cannot use row access policies.
-
Vertica does not regard live aggregate projections as superprojections, even those that include all table columns.
-
You cannot modify the anchor table metadata of columns that are included in live aggregate projections—for example, a column's data type or default value. You also cannot drop these columns. To make these changes, first drop all live aggregate and Top-K projections that are associated with the table.
Note
One exception applies: You can set and drop NOT NULL on columns that are included in a live aggregate projection.
3 - Live aggregate projection example
This example shows how you can track user clicks on a given web page using the following clicks
table:
=> CREATE TABLE clicks(
user_id INTEGER,
page_id INTEGER,
click_time TIMESTAMP NOT NULL);
You can aggregate user-specific activity with the following query:
=> SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
WHERE click_time::DATE = '2015-04-30'
GROUP BY page_id, click_time::DATE ORDER BY num_clicks DESC;
To facilitate performance of this query, create a live aggregate projection that counts the number of clicks per user:
=> CREATE PROJECTION clicks_agg AS
SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
GROUP BY page_id, click_time::DATE KSAFE 1;
When you query the clicks
table on user clicks, Vertica typically directs the query to the live aggregate projection clicks_agg
. As additional data is loaded into clicks
, Vertica pre-aggregates the new data and updates clicks_agg
, so queries always return with the latest data.
For example:
=> SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
WHERE click_time::DATE = '2015-04-30' GROUP BY page_id, click_time::DATE
ORDER BY num_clicks DESC;
page_id | click_date | num_clicks
---------+------------+------------
2002 | 2015-04-30 | 10
3003 | 2015-04-30 | 3
2003 | 2015-04-30 | 1
2035 | 2015-04-30 | 1
12034 | 2015-04-30 | 1
(5 rows)