This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data aggregation

You can use functions such as SUM and COUNT to aggregate the results of GROUP BY queries at one or more levels.

1: Single-level aggregation
2: Multi-level aggregation
3: Aggregates and functions for multilevel grouping
4: Aggregate expressions for GROUP BY
5: Pre-aggregating data in projections

5.1: Live aggregate projections

5.1.1: Functions supported for live aggregate projections
5.1.2: Creating live aggregate projections
5.1.3: Live aggregate projection example

5.2: Top-k projections

5.2.1: Creating top-k projections
5.2.2: Top-k projection examples

5.3: Pre-aggregating UDTF results
5.4: Aggregating data through expressions

5.4.1: Querying data through expressions example

5.5: Aggregation information in system tables

You can use functions such as SUM and COUNT to aggregate the results of GROUP BY queries at one or more levels.

1 - Single-level aggregation

The simplest GROUP BY queries aggregate data at a single level.

The simplest GROUP BY queries aggregate data at a single level. For example, a table might contain the following information about family expenses:

Category
Amount spent on that category during the year
Year

Table data might look like this:

=> SELECT * FROM expenses ORDER BY Category;
 Year |  Category  | Amount
------+------------+--------
2005  | Books      |  39.98
2007  | Books      |  29.99
2008  | Books      |  29.99
2006  | Electrical | 109.99
2005  | Electrical | 109.99
2007  | Electrical | 229.98

You can use aggregate functions to get the total expenses per category or per year:

=> SELECT SUM(Amount), Category FROM expenses GROUP BY Category;
 SUM     | Category
---------+------------
  99.96  | Books
 449.96  | Electrical
=> SELECT SUM(Amount), Year FROM expenses GROUP BY Year;
 SUM    | Year
--------+------
 149.97 | 2005
 109.99 | 2006
  29.99 | 2008
 259.97 | 2007

2 - Multi-level aggregation

Over time, tables that are updated frequently can contain large amounts of data.

Over time, tables that are updated frequently can contain large amounts of data. Using the simple table shown earlier, suppose you want a multilevel query, like the number of expenses per category per year.

The following query uses the ROLLUP aggregation with the SUM function to calculate the total expenses by category and the overall expenses total. The NULL fields indicate subtotal values in the aggregation.

When only the Year column is NULL, the subtotal is for all the Category values.
When both the Year and Category columns are NULL, the subtotal is for all Amount values for both columns.

Using the ORDER BY clause orders the results by expense category, the year the expenses took place, and the GROUP BY level that the GROUPING_ID function creates:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Category, Year) ORDER BY Category, Year, GROUPING_ID();
 Category   | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Books      |      |  99.96
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
 Electrical |      | 449.96
            |      | 549.92

Similarly, the following query calculates the total sales by year and the overall sales total and then uses the ORDER BY clause to sort the results:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY ROLLUP(Year, Category) ORDER BY 2, 1, GROUPING_ID();
  Category  | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Electrical | 2005 | 109.99
            | 2005 | 149.97
 Electrical | 2006 | 109.99
            | 2006 | 109.99
 Books      | 2007 |  29.99
 Electrical | 2007 | 229.98
            | 2007 | 259.97
 Books      | 2008 |  29.99
            | 2008 |  29.99
            |      | 549.92
(11 rows)

You can use the CUBE aggregate to perform all possible groupings of the category and year expenses. The following query returns all possible groupings, ordered by grouping:

=> SELECT Category, Year, SUM(Amount) FROM expenses
   GROUP BY CUBE(Category, Year) ORDER BY 1, 2, GROUPING_ID();
 Category   | Year |  SUM
------------+------+--------
 Books      | 2005 |  39.98
 Books      | 2007 |  29.99
 Books      | 2008 |  29.99
 Books      |      |  99.96
 Electrical | 2005 | 109.99
 Electrical | 2006 | 109.99
 Electrical | 2007 | 229.98
 Electrical |      | 449.96
            | 2005 | 149.97
            | 2006 | 109.99
            | 2007 | 259.97
            | 2008 |  29.99
            |      | 549.92

The results include subtotals for each category and each year and a total ($549.92) for all transactions, regardless of year or category.

ROLLUP, CUBE, and GROUPING SETS generate NULL values in grouping columns to identify subtotals. If table data includes NULL values, differentiating these from NULL values in subtotals can sometimes be challenging.

In the preceding output, the NULL values in the Year column indicate that the row was grouped on the Category column, rather than on both columns. In this case, ROLLUP added the NULL value to indicate the subtotal row.

3 - Aggregates and functions for multilevel grouping

Vertica provides several aggregates and functions that group the results of a GROUP BY query at multiple levels.

Aggregates for multilevel grouping

Use the following aggregates for multilevel grouping:

ROLLUP automatically performs subtotal aggregations. ROLLUP performs one or more aggregations across multiple dimensions, at different levels.
CUBE performs the aggregation for all permutations of the CUBE expression that you specify.
GROUPING SETS let you specify which groupings of aggregations you need.

You can use CUBE or ROLLUP expressions inside GROUPING SETS expressions. Otherwise, you cannot nest multilevel aggregate expressions.

Grouping functions

You use one of the following three grouping functions with ROLLUP, CUBE, and GROUPING SETS:

GROUP_ID returns one or more numbers, starting with zero (0), to uniquely identify duplicate sets.
GROUPING_ID produces a unique ID for each grouping combination.
GROUPING identifies for each grouping combination whether a column is a part of this grouping. This function also differentiates NULL values in the data from NULL grouping subtotals.

These functions are typically used with multilevel aggregates.

4 - Aggregate expressions for GROUP BY

You can include CUBE and ROLLUP aggregates within a GROUPING SETS aggregate.

You can include CUBE and ROLLUP aggregates within a GROUPING SETS aggregate. Be aware that the CUBE and ROLLUP aggregates can result in a large amount of output. However, you can avoid large outputs by using GROUPING SETS to return only specified results.

...GROUP BY a,b,c,d,ROLLUP(a,b)...
...GROUP BY a,b,c,d,CUBE((a,b),c,d)...

You cannot include any aggregates in a CUBE or ROLLUP aggregate expression.

You can append multiple GROUPING SETS, CUBE, or ROLLUP aggregates in the same query.

...GROUP BY a,b,c,d,CUBE(a,b),ROLLUP (c,d)...
...GROUP BY a,b,c,d,GROUPING SETS ((a,d),(b,c),CUBE(a,b));...
...GROUP BY a,b,c,d,GROUPING SETS ((a,d),(b,c),(a,b),(a),(b),())...

5 - Pre-aggregating data in projections

Queries that use aggregate functions such as SUM and COUNT can perform more efficiently when they use projections that already contain the aggregated data.

Queries that use aggregate functions such as SUM and COUNT can perform more efficiently when they use projections that already contain the aggregated data. This improved efficiency is especially true for queries on large quantities of data.

For example, a power grid company reads 30 million smart meters that provide data at five-minute intervals. The company records each reading in a database table. Over a given year, three trillion records are added to this table.

The power grid company can analyze these records with queries that include aggregate functions to perform the following tasks:

Establish usage patterns.
Detect fraud.
Measure correlation to external events such as weather patterns or pricing changes.

To optimize query response time, you can create an aggregate projection, which stores the data is stored after it is aggregated.

Aggregate projections

Vertica provides several types of projections for storing data that is returned from aggregate functions or expressions:

Live aggregate projection: Projection that contains columns with values that are aggregated from columns in its anchor table. You can also define live aggregate projections that include user-defined transform functions.
Top-K projection: Type of live aggregate projection that returns the top k rows from a partition of selected rows. Create a Top-K projection that satisfies the criteria for a Top-K query.
Projection that pre-aggregates UDTF results: Live aggregate projection that invokes user-defined transform functions (UDTFs). To minimize overhead when you query those projections of this type, Vertica processes the UDTF functions in the background and stores their results on disk.
Projection that contains expressions: Projection with columns whose values are calculated from anchor table columns.

Recommended use

Aggregate projections are most useful for queries against large sets of data.
For optimal query performance, the size of LAP projections should be a small subset of the anchor table—ideally, between 1 and 10 percent of the anchor table, or smaller, if possible.

Restrictions

MERGE operations must be optimized if they are performed on target tables that have live aggregate projections.
You cannot update or delete data in temporary tables with live aggregate projections.

Requirements

In the event of manual recovery from an unclean database shutdown, live aggregate projections might require some time to refresh.

5.1 - Live aggregate projections

A live aggregate projection contains columns with values that are aggregated from columns in its anchor table. When you load data into the table, Vertica aggregates the data before loading it into the live aggregate projection. On subsequent loads—for example, through INSERTor COPY—Vertica recalculates aggregations with the new data and updates the projection.

5.1.1 - Functions supported for live aggregate projections

Vertica can aggregate results in live aggregate projections from the following aggregate functions:.

Vertica can aggregate results in live aggregate projections from the following aggregate functions:

SUM
MAX
MIN
COUNT

Aggregate functions with DISTINCT

Live aggregate projections can support queries that include aggregate functions qualified with the keyword DISTINCT. The following requirements apply:

The aggregated expression must evaluate to a non-constant.
The projection's GROUP BY clause must specify the aggregated expression.

For example, the following query uses SUM(DISTINCT) to calculate the total of all unique salaries in a given region:

SELECT customer_region, SUM(DISTINCT annual_income)::INT
   FROM customer_dimension GROUP BY customer_region;

This query can use the following live aggregate projection, which specifies the aggregated column (annual_income) in its GROUP BY clause:

CREATE PROJECTION public.TotalRegionalIncome
(
 customer_region,
 annual_income,
 Count
)
AS
 SELECT customer_dimension.customer_region,
        customer_dimension.annual_income,
        count(*) AS Count
 FROM public.customer_dimension
 GROUP BY customer_dimension.customer_region,
          customer_dimension.annual_income
;

Note

This projection includes the aggregate function COUNT, which here serves no logical objective; it is included only because live aggregate projections require at least one aggregate function.

5.1.2 - Creating live aggregate projections

You define a live aggregate projection with the following syntax:.

You define a live aggregate projection with the following syntax:


=> CREATE PROJECTION proj-name AS
      SELECT select-expression FROM table
      GROUP BY group-expression;

For full syntax options, see CREATE PROJECTION.

For example:

=> CREATE PROJECTION clicks_agg AS
   SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
   GROUP BY page_id, click_time::DATE KSAFE 1;

For an extended discussion, see Live aggregate projection example.

Requirements

The following requirements apply to live aggregate projections:

The projection cannot be unsegmented. Unless the CREATE PROJECTION statement or its anchor table DDL specifies otherwise, all projections are segmented by default.
SELECT and GROUP BY columns must be in the same order. GROUP BY expressions must be at the beginning of the SELECT list.

Restrictions

The following restrictions apply to live aggregate projections:

MERGE operations must be optimized if they are performed on target tables that have live aggregate projections.
Live aggregate projections can reference only one table.
Live aggregate projections cannot use row access policies.
Vertica does not regard live aggregate projections as superprojections, even those that include all table columns.
You cannot modify the anchor table metadata of columns that are included in live aggregate projections—for example, a column's data type or default value. You also cannot drop these columns. To make these changes, first drop all live aggregate and Top-K projections that are associated with the table.

Note
One exception applies: You can set and drop NOT NULL on columns that are included in a live aggregate projection.

5.1.3 - Live aggregate projection example

This example shows how you can track user clicks on a given web page using the following clicks table:.

This example shows how you can track user clicks on a given web page using the following clicks table:


=> CREATE TABLE clicks(
   user_id INTEGER,
   page_id INTEGER,
   click_time TIMESTAMP NOT NULL);

You can aggregate user-specific activity with the following query:


=> SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
   WHERE click_time::DATE = '2015-04-30'
   GROUP BY page_id, click_time::DATE ORDER BY num_clicks DESC;

To facilitate performance of this query, create a live aggregate projection that counts the number of clicks per user:

=> CREATE PROJECTION clicks_agg AS
   SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
   GROUP BY page_id, click_time::DATE KSAFE 1;

When you query the clicks table on user clicks, Vertica typically directs the query to the live aggregate projection clicks_agg. As additional data is loaded into clicks, Vertica pre-aggregates the new data and updates clicks_agg, so queries always return with the latest data.

For example:

=> SELECT page_id, click_time::DATE click_date, COUNT(*) num_clicks FROM clicks
    WHERE click_time::DATE = '2015-04-30' GROUP BY page_id, click_time::DATE
    ORDER BY num_clicks DESC;
 page_id | click_date | num_clicks
---------+------------+------------
    2002 | 2015-04-30 |         10
    3003 | 2015-04-30 |          3
    2003 | 2015-04-30 |          1
    2035 | 2015-04-30 |          1
   12034 | 2015-04-30 |          1
(5 rows)

5.2 - Top-k projections

A Top-K query returns the top k rows from partitions of selected rows.

A Top-K query returns the top k rows from partitions of selected rows. Top-K projections can significantly improve performance of Top-K queries. For example, you can define a table that stores gas meter readings with three columns: gas meter ID, time of meter reading, and the read value:


=> CREATE TABLE readings (
    meter_id INT,
    reading_date TIMESTAMP,
    reading_value FLOAT);

Given this table, the following Top-K query returns the five most recent meter readings for a given meter:


SELECT meter_id, reading_date, reading_value FROM readings
    LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC);

To improve the performance of this query, you can create a Top-K projection, which is a special type of live aggregate projection:


=> CREATE PROJECTION readings_topk (meter_id, recent_date, recent_value)
    AS SELECT meter_id, reading_date, reading_value FROM readings
    LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC);

After you create this Top-K projection and load its data (through START_REFRESH or REFRESH), Vertica typically redirects the query to the projection and returns with the pre-aggregated data.

5.2.1 - Creating top-k projections

You define a Top-K projection with the following syntax:.

You define a Top-K projection with the following syntax:


CREATE PROJECTION proj-name [(proj-column-spec)]
    AS SELECT select-expression FROM table
    LIMIT num-rows OVER (PARTITION BY expression ORDER BY column-expr);

For full syntax options, see CREATE PROJECTION.

For example:


=> CREATE PROJECTION readings_topk (meter_id, recent_date, recent_value)
    AS SELECT meter_id, reading_date, reading_value FROM readings
    LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC);

For an extended discussion, see Top-k projection examples.

Requirements

The following requirements apply to Top-K projections:

The projection cannot be unsegmented.
The window partition clause must use PARTITION BY.
Columns in PARTITION BY and ORDER BY clauses must be the first columns specified in the SELECT list.

You must use the LIMIT option to create a Top-K projection, instead of subqueries. For example, the following SELECT statements are equivalent:


=> SELECT symbol, trade_time last_trade, price last_price FROM (
    SELECT symbol, trade_time, price, ROW_NUMBER()
    OVER(PARTITION BY symbol ORDER BY trade_time DESC) rn FROM trades) trds WHERE rn <=1;

=> SELECT symbol, trade_time last_trade, price last_price FROM trades
    LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);

Both return the same results:


      symbol      |      last_trade       | last_price
------------------+-----------------------+------------
 AAPL             | 2011-11-10 10:10:20.5 |   108.4000
 HPQ              | 2012-10-10 10:10:10.4 |    42.0500
(2 rows)

A Top-K projection that pre-aggregates data for use by both queries must include the LIMIT option:


=> CREATE PROJECTION trades_topk AS
     SELECT symbol, trade_time last_trade, price last_price FROM trades
     LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);

Restrictions

The following restrictions apply to Top-K projections:

Top-K projections can reference only one table.
Vertica does not regard Top-K projections as superprojections, even those that include all table columns.
You cannot modify the anchor table metadata of columns that are included in Top-K projections—for example, a column's data type or default value. You also cannot drop these columns. To make these changes, first drop all live aggregate and Top-K projections that are associated with the table.

Note
One exception applies: You can set and drop NOT NULL on columns that are included in a live aggregate projection.

5.2.2 - Top-k projection examples

The following examples show how to query a table with two Top-K projections for the most-recent trade and last trade of the day for each stock symbol.

Create a table that contains information about individual stock trades:

Stock symbol
Timestamp
Price per share
Number of shares

=> CREATE TABLE trades(
    symbol CHAR(16) NOT NULL,
    trade_time TIMESTAMP NOT NULL,
    price NUMERIC(12,4),
    volume INT )
    PARTITION BY (EXTRACT(year from trade_time) * 100 +
    EXTRACT(month from trade_time));

Load data into the table:


INSERT INTO trades VALUES('AAPL','2010-10-10 10:10:10'::TIMESTAMP,100.00,100);
INSERT INTO trades VALUES('AAPL','2010-10-10 10:10:10.3'::TIMESTAMP,101.00,100);
INSERT INTO trades VALUES ('AAPL','2011-10-10 10:10:10.5'::TIMESTAMP,106.1,1000);
INSERT INTO trades VALUES ('AAPL','2011-10-10 10:10:10.2'::TIMESTAMP,105.2,500);
INSERT INTO trades VALUES ('HPQ','2012-10-10 10:10:10.2'::TIMESTAMP,42.01,400);
INSERT INTO trades VALUES ('HPQ','2012-10-10 10:10:10.3'::TIMESTAMP,42.02,1000);
INSERT INTO trades VALUES ('HPQ','2012-10-10 10:10:10.4'::TIMESTAMP,42.05,100);
COMMIT;

Create two Top-K projections that obtain the following information from the trades table:

Return the most recent trades for each stock symbol.
Return the last trade on each trading day.

For each stock symbol, return the most recent trade.


=> CREATE PROJECTION trades_topk_a AS SELECT symbol, trade_time last_trade, price last_price
       FROM trades LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);

=> SELECT symbol, trade_time last_trade, price last_price FROM trades
   LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);

      symbol      |      last_trade       | last_price
------------------+-----------------------+------------
 HPQ              | 2012-10-10 10:10:10.4 |    42.0500
 AAPL             | 2011-10-10 10:10:10.5 |   106.1000
(2 rows)

For each stock symbol, return the last trade on each trading day.


=> CREATE PROJECTION trades_topk_b
    AS SELECT symbol, trade_time::DATE trade_date, trade_time, price close_price, volume
    FROM trades LIMIT 1 OVER(PARTITION BY symbol, trade_time::DATE ORDER BY trade_time DESC);

=> SELECT symbol, trade_time::DATE trade_date, trade_time, price close_price, volume
    FROM trades LIMIT 1 OVER(PARTITION BY symbol, trade_time::DATE ORDER BY trade_time DESC);

      symbol      | trade_date |      trade_time       | close_price | volume
------------------+------------+-----------------------+-------------+--------
 HPQ              | 2012-10-10 | 2012-10-10 10:10:10.4 |     42.0500 |    100
 AAPL             | 2011-10-10 | 2011-10-10 10:10:10.5 |    106.1000 |   1000
 AAPL             | 2010-10-10 | 2010-10-10 10:10:10.3 |    101.0000 |    100
(3 rows)

In each scenario, Vertica redirects queries on the trades table to the appropriate Top-K projection and returns the aggregated data from them. As additional data is loaded into this table, Vertica pre-aggregates the new data and updates the Top-K projections, so queries always return with the latest data.

5.3 - Pre-aggregating UDTF results

CREATE PROJECTION can define live aggregate projections that invoke user-defined transform functions (UDTFs).

CREATE PROJECTION can define live aggregate projections that invoke user-defined transform functions (UDTFs). To minimize overhead when you query those projections, Vertica processes these functions in the background and stores their results on disk.

Important

Currently, live aggregate projections can only reference UDTFs that are developed in C++.

Defining projections with UDTFs

The projection definition characterizes UDTFs in one of two ways:

Identifies the UDTF as a pre-pass UDTF, which transforms newly loaded data before it is stored in the projection ROS containers.
Identifies the UDTF as a batch UDTF, which aggregates and stores projection data.

The projection definition identifies a UDTF as a pre-pass UDTF or batch UDTF in its window partition clause, through the keywords PREPASS or BATCH. A projection can specify one pre-pass or batch UDTF or include both (see UDTF Specification Options).

In all cases, the projection is implicitly segmented and ordered on the PARTITION BY columns.

UDTF specification options

Projections can invoke batch and pre-pass UDTFs singly or in combination.

Single pre-pass UDTF

Vertica invokes the pre-pass UDTF when you load data into the projection's anchor table—for example through COPY or INSERT statements. A pre-pass UDTF transforms the new data and then stores the transformed data in the projection's ROS containers.

Use the following syntax:


CREATE PROJECTION [ IF NOT EXISTS ] [[database.]schema.]projection
[ (
   { projection-column | grouped-clause
   [ ENCODING encoding-type ]
   [ ACCESSRANK integer ] }[,...]
) ]

AS SELECT { table-column | expr-with-table-columns }[,...], prepass-udtf(prepass-args)
    OVER (PARTITION PREPASS BY partition-column-expr[,...])
    [ AS (prepass-output-columns) ] FROM table [[AS] alias]

Single batch UDTF

When invoked singly, a batch UDTF transforms and aggregates projection data on mergeout, data load, and query operations. The UDTF stores aggregated results in the projection's ROS containers. Aggregation is cumulative across mergeout and load operations, and is completed (if necessary) on query execution.

Use the following syntax:


CREATE PROJECTION [ IF NOT EXISTS ] [[database.]schema.]projection
[ (
   { projection-column | grouped-clause
   [ ENCODING encoding-type ]
   [ ACCESSRANK integer ]  }[,...]
) ]
AS SELECT { table-column | expr-with-table-columns }[,...], batch-udtf(batch-args)
   OVER (PARTITION BATCH BY partition-column-expr[,...])
   [ AS (batch-output-columns) FROM table [ [AS] alias ]

Combined pre-pass and batch UDTFs

You can define a projection with a subquery that invokes a pre-pass UDTF. The pre-pass UDTF returns transformed data to the outer batch query. The batch UDTF then iteratively aggregates results across mergeout operations. It completes aggregation (if necessary) on query execution.

Use the following syntax:


CREATE PROJECTION [ IF NOT EXISTS ] [[database.]schema.]projection
[ (
   { projection-column | grouped-clause
   [ ENCODING encoding-type ]
   [ ACCESSRANK integer ] }[,...]
) ]
AS SELECT { table-column | expr-with-table-columns }[,...], batch-udtf(batch-args)
   OVER (PARTITION BATCH BY partition-column-expr[,...]) [ AS (batch-output-columns) ] FROM (
      SELECT { table-column | expr-with-table-columns }[,...], prepass-udtf (prepass-args)
      OVER (PARTITION PREPASS BY partition-column-expr[,...]) [ AS (prepass-output-columns) ] FROM table ) sq-ref

Important

The outer batch UDTF arguments batch-args must exactly match the output columns returned by the pre-pass UDTF, in name and order.

Examples

Single pre-pass UDTF
The following example shows how to use the UDTF text_index, which extracts from a text document strings that occur more than once.

The following projection specifies to invoke text_index as a pre-pass UDTF:


=> CREATE TABLE documents ( doc_id INT PRIMARY KEY, text VARCHAR(140));

=> CREATE PROJECTION index_proj (doc_id, text)
     AS SELECT doc_id, text_index(doc_id, text)
     OVER (PARTITION PREPASS BY doc_id) FROM documents;

The UDTF is invoked whenever data is loaded into the anchor table documents. text_index transforms the newly loaded data, and Vertica stores the transformed data in the live aggregate projection ROS containers.

So, if you load the following data into documents:

=> INSERT INTO documents VALUES
(100, 'A SQL Query walks into a bar. In one corner of the bar are two tables.
 The Query walks up to the tables and asks - Mind if I join you?');
 OUTPUT
--------
      1
(1 row)

text_index transforms the newly loaded data and stores it in the projection ROS containers. When you query the projection, it returns with the following results:


doc_id | frequency |     term
-------+-----------+--------------
100    | 2         | bar
100    | 2         | Query
100    | 2         | tables
100    | 2         | the
100    | 2         | walks

Combined Pre-Pass and Batch UDTFs
The following projection specifies pre-pass and batch UDTFs stv_intersect and aggregate_classified_points, respectively:


CREATE TABLE points( point_id INTEGER, point_type VARCHAR(10), coordinates GEOMETRY(100));

CREATE PROJECTION aggregated_proj
   AS SELECT point_type, aggregate_classified_points( sq.point_id, sq.polygon_id)
   OVER (PARTITION BATCH BY point_type) AS some_alias
   FROM
      (SELECT point_type, stv_intersect(
         point_id, coordinates USING PARAMETERS index='polygons' )
       OVER (PARTITION PREPASS BY point_type) AS (point_id, polygon_id) FROM points) sq;

The pre-pass query UDTF stv_intersect returns its results (a set of point and matching polygon IDs) to the outer batch query. The outer batch query then invokes the UDTF aggregate_classified_points. Vertica aggregates the result set that is returned by aggregate_classified_points whenever a mergeout operation consolidates projection data. Final aggregation (if necessary) occurs when the projection is queried.

The batch UDTF arguments must exactly match the output columns returned by the pre-pass UDTF stv_intersect, in name and order. In this example, the pre-pass subquery explicitly names the pre-pass UDTF output columns point_id and polygon_id. Accordingly, the batch UDTF arguments match them in name and order: sq.point_id and sq.polygon_id.

5.4 - Aggregating data through expressions

You can create projections where one or more columns are defined by expressions.

You can create projections where one or more columns are defined by expressions. An expression can reference one or more anchor table columns. For example, the following table contains two integer columns, a and b:

=> CREATE TABLE values (a INT, b INT);

You can create a projection with an expression that calculates the value of column c as the product of a and b:

=> CREATE PROJECTION values_product (a, b, c)
   AS SELECT a, b, a*b FROM values SEGMENTED BY HASH(a) ALL NODES KSAFE;

When you load data into this projection, Vertica resolves the expression a*b in column c. You can then query the projection instead of the anchor table. Vertica returns the pre-calculated data and avoids the overhead otherwise incurred by resource-intensive computations.

Using expressions in projections also lets you sort or segment data on the calculated results of an expression instead of sorting on single column values.

Note

If a projection with expressions also includes aggregate functions such as SUM or COUNT, Vertica treats it like a live aggregate projection.

Support for user-defined scalar functions

Important

Currently, support for pre-aggregating UDSF results is limited to C++.

Vertica treats user-defined scalar functions (UDSFs) like other expressions. On each load operation, the UDSF is invoked and returns its results. Vertica stores these results on disk, and returns them when you query the projection directly.

In the following example, the projection points_p1 specifies the UDSF zorder, which is invoked whenever data is loaded in the anchor table points. When data is loaded into the projection, Vertica invokes this function and stores its results for fast access by future queries.

=> CREATE TABLE points(point_id INTEGER, lat NUMERIC(12,9), long NUMERIC(12,9));

=> CREATE PROJECTION points_p1
     AS SELECT point_id, lat, long, zorder(lat, long) zorder FROM points
     ORDER BY zorder(lat, long) SEGMENTED BY hash(point_id) ALL NODES;

Requirements

Any ORDER BY expression must be in the SELECT list.
All projection columns must be named.

Restrictions

MERGE operations must be optimized if they are performed on target tables that have live aggregate projections.
Unlike live aggregate projections, Vertica does not redirect queries with expressions to an equivalent existing projection.
Projection expressions must be immutable—that is, they must always return the same result. For example, a projection cannot include expressions that use TO CHAR (depends on locale) or RANDOM (returns different value at each invocation).
Projection expressions cannot include Vertica meta-functions such as ADVANCE_EPOCH, ANALYZE_STATISTICS, EXPORT_TABLES, or START_REFRESH.

5.4.1 - Querying data through expressions example

The following example uses a table that contains two integer columns, a and b:.

The following example uses a table that contains two integer columns, a and b:

=> CREATE TABLE values (a INT, b INT);

You can create a projection with an expression that calculates the value of column c as the product of a and b:

=> CREATE PROJECTION values_product (a, b, c)
   AS SELECT a, b, a*b FROM values SEGMENTED BY HASH(a) ALL NODES KSAFE;
=> COPY values FROM STDIN DELIMITER ',' DIRECT;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 3,11
>> 3,55
>> 8,9
>> 8,23
>> 16,41
>> 22,111
>> \.
=>

To query this projection, use the name that Vertica assigns to it or to its buddy projections. For example, the following queries target different instances of the projection defined earlier, and return the same results:

=> SELECT * FROM values_product_b0;
=> SELECT * FROM values_product_b1;

The following example queries the anchor table:

=> SELECT * FROM values;
 a  |  b
----+-----
  3 |  11
  3 |  55
  8 |   9
  8 |  23
 16 |  41
 22 | 111

Given the projection created earlier, querying that projection returns the following values:

VMart=> SELECT * FROM values_product_b0;
 a  |  b  | product
----+-----+---------
  3 |  11 |      33
  3 |  55 |     165
  8 |   9 |      72
  8 |  23 |     184
 16 |  41 |     656
 22 | 111 |    2442

5.5 - Aggregation information in system tables

You can query the following system table fields for information about live aggregate projections, Top-K projections, and projections with expressions:.

You can query the following system table fields for information about live aggregate projections, Top-K projections, and projections with expressions:

Table	Fields
`TABLES`	`TABLE_HAS_AGGREGATE_PROJECTION`
`PROJECTIONS`	`AGGREGATE_TYPE`
	`HAS_EXPRESSIONS`
	`AGGREGATE_TYPE`
`PROJECTION_COLUMNS`	`COLUMN_EXPRESSION`
	`IS_AGGREGATE`
	`IS_EXPRESSION`
	`ORDER_BY_POSITION`
	`ORDER_BY_TYPE`
	`PARTITION_BY_POSITION`