This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Top-k projections

A Top-K query returns the top k rows from partitions of selected rows.

A Top-K query returns the top k rows from partitions of selected rows. Top-K projections can significantly improve performance of Top-K queries. For example, you can define a table that stores gas meter readings with three columns: gas meter ID, time of meter reading, and the read value:


=> CREATE TABLE readings (
    meter_id INT,
    reading_date TIMESTAMP,
    reading_value FLOAT);

Given this table, the following Top-K query returns the five most recent meter readings for a given meter:


SELECT meter_id, reading_date, reading_value FROM readings
    LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC);

To improve the performance of this query, you can create a Top-K projection, which is a special type of live aggregate projection:


=> CREATE PROJECTION readings_topk (meter_id, recent_date, recent_value)
    AS SELECT meter_id, reading_date, reading_value FROM readings
    LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC);

After you create this Top-K projection and load its data (through START_REFRESH or REFRESH), Vertica typically redirects the query to the projection and returns with the pre-aggregated data.

1 - Creating top-k projections

You define a Top-K projection with the following syntax:.

You define a Top-K projection with the following syntax:


CREATE PROJECTION proj-name [(proj-column-spec)]
    AS SELECT select-expression FROM table
    LIMIT num-rows OVER (PARTITION BY expression ORDER BY column-expr);

For full syntax options, see CREATE PROJECTION.

For example:


=> CREATE PROJECTION readings_topk (meter_id, recent_date, recent_value)
    AS SELECT meter_id, reading_date, reading_value FROM readings
    LIMIT 5 OVER (PARTITION BY meter_id ORDER BY reading_date DESC);

For an extended discussion, see Top-k projection examples.

Requirements

The following requirements apply to Top-K projections:

  • The projection cannot be unsegmented.

  • The window partition clause must use PARTITION BY.

  • Columns in PARTITION BY and ORDER BY clauses must be the first columns specified in the SELECT list.

  • You must use the LIMIT option to create a Top-K projection, instead of subqueries. For example, the following SELECT statements are equivalent:

    
    => SELECT symbol, trade_time last_trade, price last_price FROM (
        SELECT symbol, trade_time, price, ROW_NUMBER()
        OVER(PARTITION BY symbol ORDER BY trade_time DESC) rn FROM trades) trds WHERE rn <=1;
    
    => SELECT symbol, trade_time last_trade, price last_price FROM trades
        LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);
    

    Both return the same results:

    
          symbol      |      last_trade       | last_price
    ------------------+-----------------------+------------
     AAPL             | 2011-11-10 10:10:20.5 |   108.4000
     HPQ              | 2012-10-10 10:10:10.4 |    42.0500
    (2 rows)
    

    A Top-K projection that pre-aggregates data for use by both queries must include the LIMIT option:

    
    => CREATE PROJECTION trades_topk AS
         SELECT symbol, trade_time last_trade, price last_price FROM trades
         LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);
    

Restrictions

The following restrictions apply to Top-K projections:

  • Top-K projections can reference only one table.

  • Vertica does not regard Top-K projections as superprojections, even those that include all table columns.

  • You cannot modify the anchor table metadata of columns that are included in Top-K projections—for example, a column's data type or default value. You also cannot drop these columns. To make these changes, first drop all live aggregate and Top-K projections that are associated with the table.

2 - Top-k projection examples

The following examples show how to query a table with two Top-K projections for the most-recent trade and last trade of the day for each stock symbol.

The following examples show how to query a table with two Top-K projections for the most-recent trade and last trade of the day for each stock symbol.

  1. Create a table that contains information about individual stock trades:

    • Stock symbol

    • Timestamp

    • Price per share

    • Number of shares

    => CREATE TABLE trades(
        symbol CHAR(16) NOT NULL,
        trade_time TIMESTAMP NOT NULL,
        price NUMERIC(12,4),
        volume INT )
        PARTITION BY (EXTRACT(year from trade_time) * 100 +
        EXTRACT(month from trade_time));
    
  2. Load data into the table:

    
    INSERT INTO trades VALUES('AAPL','2010-10-10 10:10:10'::TIMESTAMP,100.00,100);
    INSERT INTO trades VALUES('AAPL','2010-10-10 10:10:10.3'::TIMESTAMP,101.00,100);
    INSERT INTO trades VALUES ('AAPL','2011-10-10 10:10:10.5'::TIMESTAMP,106.1,1000);
    INSERT INTO trades VALUES ('AAPL','2011-10-10 10:10:10.2'::TIMESTAMP,105.2,500);
    INSERT INTO trades VALUES ('HPQ','2012-10-10 10:10:10.2'::TIMESTAMP,42.01,400);
    INSERT INTO trades VALUES ('HPQ','2012-10-10 10:10:10.3'::TIMESTAMP,42.02,1000);
    INSERT INTO trades VALUES ('HPQ','2012-10-10 10:10:10.4'::TIMESTAMP,42.05,100);
    COMMIT;
    
  3. Create two Top-K projections that obtain the following information from the trades table:

    For each stock symbol, return the most recent trade.

    
    => CREATE PROJECTION trades_topk_a AS SELECT symbol, trade_time last_trade, price last_price
           FROM trades LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);
    
    => SELECT symbol, trade_time last_trade, price last_price FROM trades
       LIMIT 1 OVER(PARTITION BY symbol ORDER BY trade_time DESC);
    
          symbol      |      last_trade       | last_price
    ------------------+-----------------------+------------
     HPQ              | 2012-10-10 10:10:10.4 |    42.0500
     AAPL             | 2011-10-10 10:10:10.5 |   106.1000
    (2 rows)
    


    For each stock symbol, return the last trade on each trading day.

    
    => CREATE PROJECTION trades_topk_b
        AS SELECT symbol, trade_time::DATE trade_date, trade_time, price close_price, volume
        FROM trades LIMIT 1 OVER(PARTITION BY symbol, trade_time::DATE ORDER BY trade_time DESC);
    
    => SELECT symbol, trade_time::DATE trade_date, trade_time, price close_price, volume
        FROM trades LIMIT 1 OVER(PARTITION BY symbol, trade_time::DATE ORDER BY trade_time DESC);
    
          symbol      | trade_date |      trade_time       | close_price | volume
    ------------------+------------+-----------------------+-------------+--------
     HPQ              | 2012-10-10 | 2012-10-10 10:10:10.4 |     42.0500 |    100
     AAPL             | 2011-10-10 | 2011-10-10 10:10:10.5 |    106.1000 |   1000
     AAPL             | 2010-10-10 | 2010-10-10 10:10:10.3 |    101.0000 |    100
    (3 rows)
    

In each scenario, Vertica redirects queries on the trades table to the appropriate Top-K projection and returns the aggregated data from them. As additional data is loaded into this table, Vertica pre-aggregates the new data and updates the Top-K projections, so queries always return with the latest data.