This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Live aggregate projections

A live aggregate projection contains columns with values that are aggregated from columns in its anchor table. When you load data into the table, Vertica aggregates the data before loading it into the live aggregate projection. On subsequent loads—for example, through INSERTor COPY—Vertica recalculates aggregations with the new data and updates the projection.

1 - Functions supported for live aggregate projections

Vertica can aggregate results in live aggregate projections from the following aggregate functions:.

Vertica can aggregate results in live aggregate projections from the following aggregate functions:

Aggregate functions with DISTINCT

Live aggregate projections can support queries that include aggregate functions qualified with the keyword DISTINCT. The following requirements apply:

  • The aggregated expression must evaluate to a non-constant.

  • The projection's GROUP BY clause must specify the aggregated expression.

For example, the following query uses SUM(DISTINCT) to calculate the total of all unique salaries in a given region:

SELECT customer_region, SUM(DISTINCT annual_income)::INT
   FROM customer_dimension GROUP BY customer_region;

This query can use the following live aggregate projection, which specifies the aggregated column (annual_income) in its GROUP BY clause:

CREATE PROJECTION public.TotalRegionalIncome
(
 customer_region,
 annual_income,
 Count
)
AS
 SELECT customer_dimension.customer_region,
        customer_dimension.annual_income,
        count(*) AS Count
 FROM public.customer_dimension
 GROUP BY customer_dimension.customer_region,
          customer_dimension.annual_income
;

2 - Creating live aggregate projections

You define a live aggregate projection with the following syntax:.

You define a live aggregate projection with the following syntax:


=> CREATE PROJECTION proj-name AS
      SELECT select-expression FROM table
      GROUP BY group-expression;

For full syntax options, see CREATE PROJECTION.

For example:

=> CREATE PROJECTION clicks_agg AS
   SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
   GROUP BY page_id, click_time::DATE KSAFE 1;

For an extended discussion, see Live aggregate projection example.

Requirements

The following requirements apply to live aggregate projections:

  • The projection cannot be unsegmented. Unless the CREATE PROJECTION statement or its anchor table DDL specifies otherwise, all projections are segmented by default.

  • SELECT and GROUP BY columns must be in the same order. GROUP BY expressions must be at the beginning of the SELECT list.

Restrictions

The following restrictions apply to live aggregate projections:

  • MERGE operations must be optimized if they are performed on target tables that have live aggregate projections.

  • Live aggregate projections can reference only one table.

  • Live aggregate projections cannot use row access policies.

  • Vertica does not regard live aggregate projections as superprojections, even those that include all table columns.

  • You cannot modify the anchor table metadata of columns that are included in live aggregate projections—for example, a column's data type or default value. You also cannot drop these columns. To make these changes, first drop all live aggregate and Top-K projections that are associated with the table.

3 - Live aggregate projection example

This example shows how you can track user clicks on a given web page using the following clicks table:.

This example shows how you can track user clicks on a given web page using the following clicks table:


=> CREATE TABLE clicks(
   user_id INTEGER,
   page_id INTEGER,
   click_time TIMESTAMP NOT NULL);

You can aggregate user-specific activity with the following query:


=> SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
   WHERE click_time::DATE = '2015-04-30'
   GROUP BY page_id, click_time::DATE ORDER BY num_clicks DESC;

To facilitate performance of this query, create a live aggregate projection that counts the number of clicks per user:

=> CREATE PROJECTION clicks_agg AS
   SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
   GROUP BY page_id, click_time::DATE KSAFE 1;

When you query the clicks table on user clicks, Vertica typically directs the query to the live aggregate projection clicks_agg. As additional data is loaded into clicks, Vertica pre-aggregates the new data and updates clicks_agg, so queries always return with the latest data.

For example:

=> SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
    WHERE click_time::DATE = '2015-04-30' GROUP BY page_id, click_time::DATE
    ORDER BY num_clicks DESC;
 page_id | click_date | num_clicks
---------+------------+------------
    2002 | 2015-04-30 |         10
    3003 | 2015-04-30 |          3
    2003 | 2015-04-30 |          1
    2035 | 2015-04-30 |          1
   12034 | 2015-04-30 |          1
(5 rows)