This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Analytic query examples

The topics in this section show how to use analytic queries for calculations.

The topics in this section show how to use analytic queries for calculations.

1 - Calculating a median value

A median is a numerical value that separates the higher half of a sample from the lower half.

A median is a numerical value that separates the higher half of a sample from the lower half. For example, you can retrieve the median of a finite list of numbers by arranging all observations from lowest value to highest value and then picking the middle one.

If the number of observations is even, then there is no single middle value; the median is the mean (average) of the two middle values.

The following example uses this table:

CREATE TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT);
INSERT INTO allsales VALUES('MA', 'A', 60);
INSERT INTO allsales VALUES('NY', 'B', 20);
INSERT INTO allsales VALUES('NY', 'C', 15);
INSERT INTO allsales VALUES('MA', 'D', 20);
INSERT INTO allsales VALUES('MA', 'E', 50);
INSERT INTO allsales VALUES('NY', 'F', 40);
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;

You can use the analytic function MEDIAN to calculate the median of all sales in this table. In the following query, the function's OVER clause is empty, so the query returns the same aggregated value for each row of the result set:

=> SELECT name, sales, MEDIAN(sales) OVER() AS median FROM allsales;
 name | sales | median
------+-------+--------
 G    |    10 |     20
 C    |    15 |     20
 D    |    20 |     20
 B    |    20 |     20
 F    |    40 |     20
 E    |    50 |     20
 A    |    60 |     20
(7 rows)

You can modify this query to group sales by state and obtain the median for each one. To do so, include a window partition clause in the OVER clause:

=> SELECT state, name, sales, MEDIAN(sales) OVER(partition by state) AS median FROM allsales;
 state | name | sales | median
-------+------+-------+--------
 MA    | G    |    10 |     35
 MA    | D    |    20 |     35
 MA    | E    |    50 |     35
 MA    | A    |    60 |     35
 NY    | C    |    15 |     20
 NY    | B    |    20 |     20
 NY    | F    |    40 |     20
(7 rows)

2 - Getting price differential for two stocks

The following subquery selects out two stocks of interest.

The following subquery selects out two stocks of interest. The outer query uses the LAST_VALUE() and OVER() components of analytics, with IGNORE NULLS.

Schema

DROP TABLE Ticks CASCADE;


CREATE TABLE Ticks (ts TIMESTAMP, Stock varchar(10), Bid float);
INSERT INTO Ticks VALUES('2011-07-12 10:23:54', 'abc', 10.12);
INSERT INTO Ticks VALUES('2011-07-12 10:23:58', 'abc', 10.34);
INSERT INTO Ticks VALUES('2011-07-12 10:23:59', 'abc', 10.75);
INSERT INTO Ticks VALUES('2011-07-12 10:25:15', 'abc', 11.98);
INSERT INTO Ticks VALUES('2011-07-12 10:25:16', 'abc');
INSERT INTO Ticks VALUES('2011-07-12 10:25:22', 'xyz', 45.16);
INSERT INTO Ticks VALUES('2011-07-12 10:25:27', 'xyz', 49.33);
INSERT INTO Ticks VALUES('2011-07-12 10:31:12', 'xyz', 65.25);
INSERT INTO Ticks VALUES('2011-07-12 10:31:15', 'xyz');

COMMIT;

Ticks table

=> SELECT * FROM ticks;
         ts          | stock |  bid
---------------------+-------+-------
 2011-07-12 10:23:59 | abc   | 10.75
 2011-07-12 10:25:22 | xyz   | 45.16
 2011-07-12 10:23:58 | abc   | 10.34
 2011-07-12 10:25:27 | xyz   | 49.33
 2011-07-12 10:23:54 | abc   | 10.12
 2011-07-12 10:31:15 | xyz   |
 2011-07-12 10:25:15 | abc   | 11.98
 2011-07-12 10:25:16 | abc   |
 2011-07-12 10:31:12 | xyz   | 65.25
(9 rows)

Query

 => SELECT ts, stock, bid, last_value(price1 IGNORE NULLS)
   OVER(ORDER BY ts) - last_value(price2 IGNORE NULLS)
   OVER(ORDER BY ts)   as price_diff
 FROM
  (SELECT ts, stock, bid,
    CASE WHEN stock = 'abc' THEN bid ELSE NULL END AS price1,
    CASE WHEN stock = 'xyz' then bid ELSE NULL END AS price2
    FROM ticks
    WHERE stock IN ('abc','xyz')
  ) v1
 ORDER BY ts;
         ts          | stock |  bid  | price_diff
---------------------+-------+-------+------------
 2011-07-12 10:23:54 | abc   | 10.12 |
 2011-07-12 10:23:58 | abc   | 10.34 |
 2011-07-12 10:23:59 | abc   | 10.75 |
 2011-07-12 10:25:15 | abc   | 11.98 |
 2011-07-12 10:25:16 | abc   |       |
 2011-07-12 10:25:22 | xyz   | 45.16 |     -33.18
 2011-07-12 10:25:27 | xyz   | 49.33 |     -37.35
 2011-07-12 10:31:12 | xyz   | 65.25 |     -53.27
 2011-07-12 10:31:15 | xyz   |       |     -53.27
(9 rows)

3 - Calculating the moving average

Calculating the moving average is useful to get an estimate about the trends in a data set.

Calculating the moving average is useful to get an estimate about the trends in a data set. The moving average is the average of any subset of numbers over a period of time. For example, if you have retail data that spans over ten years, you could calculate a three year moving average, a four year moving average, and so on. This example calculates a 40-second moving average of bids for one stock. This examples uses the ticks table schema.

Query

=> SELECT ts, bid, AVG(bid)
   OVER(ORDER BY ts
       RANGE BETWEEN INTERVAL '40 seconds'
       PRECEDING AND CURRENT ROW)
FROM ticks
WHERE stock = 'abc'
GROUP BY bid, ts
ORDER BY ts;
         ts          |  bid  |     ?column?
---------------------+-------+------------------
 2011-07-12 10:23:54 | 10.12 |            10.12
 2011-07-12 10:23:58 | 10.34 |            10.23
 2011-07-12 10:23:59 | 10.75 | 10.4033333333333
 2011-07-12 10:25:15 | 11.98 |            11.98
 2011-07-12 10:25:16 |       |            11.98
(5 rows)
DROP TABLE Ticks CASCADE;


CREATE TABLE Ticks (ts TIMESTAMP, Stock varchar(10), Bid float);
INSERT INTO Ticks VALUES('2011-07-12 10:23:54', 'abc', 10.12);
INSERT INTO Ticks VALUES('2011-07-12 10:23:58', 'abc', 10.34);
INSERT INTO Ticks VALUES('2011-07-12 10:23:59', 'abc', 10.75);
INSERT INTO Ticks VALUES('2011-07-12 10:25:15', 'abc', 11.98);
INSERT INTO Ticks VALUES('2011-07-12 10:25:16', 'abc');
INSERT INTO Ticks VALUES('2011-07-12 10:25:22', 'xyz', 45.16);
INSERT INTO Ticks VALUES('2011-07-12 10:25:27', 'xyz', 49.33);
INSERT INTO Ticks VALUES('2011-07-12 10:31:12', 'xyz', 65.25);
INSERT INTO Ticks VALUES('2011-07-12 10:31:15', 'xyz');

COMMIT;

4 - Getting latest bid and ask results

The following query fills in missing NULL values to create a full book order showing latest bid and ask price and size, by vendor id.

The following query fills in missing NULL values to create a full book order showing latest bid and ask price and size, by vendor id. Original rows have values for (typically) one price and one size, so use last_value with "ignore nulls" to find the most recent non-null value for the other pair each time there is an entry for the ID. Sequenceno provides a unique total ordering.

Schema:

=> CREATE TABLE bookorders(
    vendorid VARCHAR(100),
    date TIMESTAMP,
    sequenceno INT,
    askprice FLOAT,
    asksize INT,
    bidprice FLOAT,
    bidsize INT);
=> INSERT INTO bookorders VALUES('3325XPK','2011-07-12 10:23:54', 1, 10.12, 55, 10.23, 59);
=> INSERT INTO bookorders VALUES('3345XPZ','2011-07-12 10:23:55', 2, 10.55, 58, 10.75, 57);
=> INSERT INTO bookorders VALUES('445XPKF','2011-07-12 10:23:56', 3, 10.22, 43, 54);
=> INSERT INTO bookorders VALUES('445XPKF','2011-07-12 10:23:57', 3, 10.22, 59, 10.25, 61);
=> INSERT INTO bookorders VALUES('3425XPY','2011-07-12 10:23:58', 4, 11.87, 66, 11.90, 66);
=> INSERT INTO bookorders VALUES('3727XVK','2011-07-12 10:23:59', 5, 11.66, 51, 11.67, 62);
=> INSERT INTO bookorders VALUES('5325XYZ','2011-07-12 10:24:01', 6, 15.05, 44, 15.10, 59);
=> INSERT INTO bookorders VALUES('3675XVS','2011-07-12 10:24:05', 7, 15.43, 47, 58);
=> INSERT INTO bookorders VALUES('8972VUG','2011-07-12 10:25:15', 8, 14.95, 52, 15.11, 57);
COMMIT;

Query:

=> SELECT
    sequenceno Seq,
    date "Time",
    vendorid ID,
    LAST_VALUE (bidprice IGNORE NULLS)
     OVER (PARTITION BY vendorid ORDER BY sequenceno
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    AS "Bid Price",
    LAST_VALUE (bidsize IGNORE NULLS)
     OVER (PARTITION BY vendorid ORDER BY sequenceno
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    AS "Bid Size",
    LAST_VALUE (askprice IGNORE NULLS)
     OVER (PARTITION BY vendorid ORDER BY sequenceno
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    AS "Ask Price",
    LAST_VALUE (asksize IGNORE NULLS)
     OVER (PARTITION BY vendorid order by sequenceno
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
    AS  "Ask Size"
 FROM bookorders
ORDER BY sequenceno;
 Seq |        Time         |   ID    | Bid Price | Bid Size | Ask Price | Ask Size
-----+---------------------+---------+-----------+----------+-----------+----------
   1 | 2011-07-12 10:23:54 | 3325XPK |     10.23 |       59 |     10.12 |       55
   2 | 2011-07-12 10:23:55 | 3345XPZ |     10.75 |       57 |     10.55 |       58
   3 | 2011-07-12 10:23:57 | 445XPKF |     10.25 |       61 |     10.22 |       59
   3 | 2011-07-12 10:23:56 | 445XPKF |        54 |          |     10.22 |       43
   4 | 2011-07-12 10:23:58 | 3425XPY |      11.9 |       66 |     11.87 |       66
   5 | 2011-07-12 10:23:59 | 3727XVK |     11.67 |       62 |     11.66 |       51
   6 | 2011-07-12 10:24:01 | 5325XYZ |      15.1 |       59 |     15.05 |       44
   7 | 2011-07-12 10:24:05 | 3675XVS |        58 |          |     15.43 |       47
   8 | 2011-07-12 10:25:15 | 8972VUG |     15.11 |       57 |     14.95 |       52
(9 rows)