This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Window framing
The window frame of an analytic function comprises a set of rows relative to the row that is currently being evaluated by the function.
The window frame of an analytic function comprises a set of rows relative to the row that is currently being evaluated by the function. After the analytic function processes that row and its window frame, Vertica advances the current row and adjusts the frame boundaries accordingly. If the OVER clause also specifies a partition, Vertica also checks that frame boundaries do not cross partition boundaries. This process repeats until the function evaluates the last row of the last partition.
Specifying a window frame
You specify a window frame in the analytic function's OVER clause, as follows:
{ ROWS | RANGE } { BETWEEN start-point AND end-point } | start-point
start-point
/ end-point
:
{ UNBOUNDED {PRECEDING | FOLLOWING}
| CURRENT ROW
| constant-value {PRECEDING | FOLLOWING}}
start-point
and end-point
specify the window frame's offset from the current row. Keywords ROWS and RANGE specify whether the offset is physical or logical. If you specify only a start point, Vertica creates a window from that point to the current row.
For syntax details, see Window frame clause.
Requirements
In order to specify a window frame, the OVER must also specify a window order (ORDER BY) clause. If the OVER clause includes a window order clause but omits specifying a window frame, the function creates a default frame that extends from the first row in the current partition to the current row. This is equivalent to the following clause:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Window aggregate functions
Analytic functions that support window frames are called window aggregates. They return information such as moving averages and cumulative results. To use the following functions as window (analytic) aggregates, instead of basic aggregates, the OVER clause must specify a window order clause and, optionally, a window frame clause. If the OVER clause omits specifying a window frame, the function creates a default window frame as described earlier.
The following analytic functions support window frames:
A window aggregate with an empty OVER clause creates no window frame. The function is used as a reporting function, where all input is treated as a single partition.
1 - Windows with a physical offset (ROWS)
The keyword ROWS in a window frame clause specifies window dimensions as the number of rows relative to the current row.
The keyword ROWS
in a window frame clause specifies window dimensions as the number of rows relative to the current row. The value can be INTEGER
data type only.
Note
The value returned by an analytic function with a physical offset is liable to produce nondeterministic results unless the ordering expression results in a unique ordering. To achieve unique ordering, the
window order clause might need to specify multiple columns.
Examples
The examples on this page use the emp
table schema:
CREATE TABLE emp(deptno INT, sal INT, empno INT);
INSERT INTO emp VALUES(10,101,1);
INSERT INTO emp VALUES(10,104,4);
INSERT INTO emp VALUES(20,100,11);
INSERT INTO emp VALUES(20,109,7);
INSERT INTO emp VALUES(20,109,6);
INSERT INTO emp VALUES(20,109,8);
INSERT INTO emp VALUES(20,110,10);
INSERT INTO emp VALUES(20,110,9);
INSERT INTO emp VALUES(30,102,2);
INSERT INTO emp VALUES(30,103,3);
INSERT INTO emp VALUES(30,105,5);
COMMIT;
The following query invokes COUNT
to count the current row and the rows preceding it, up to two rows:
SELECT deptno, sal, empno, COUNT(*) OVER
(PARTITION BY deptno ORDER BY sal ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
AS count FROM emp;
The OVER
clause contains three components:
-
Window partition clause PARTITION BY deptno
-
Order by clause ORDER BY sal
-
Window frame clause ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
. This clause defines window dimensions as extending from the current row through the two rows that precede it.
The query returns results that are divided into three partitions, indicated below as red lines. Within the second partition (deptno=20
), COUNT
processes the window frame clause as follows:
-
Creates the first window (green box). This window comprises a single row, as the current row (blue box) is also the the partition's first row. Thus, the value in the count
column shows the number of rows in the current window, which is 1:
-
After COUNT
processes the partition's first row, it resets the current row to the partition's second row. The window now spans the current row and the row above it, so COUNT
returns a value of 2:
-
After COUNT
processes the partition's second row, it resets the current row to the partition's third row. The window now spans the current row and the two rows above it, so COUNT
returns a value of 3:
-
Thereafter, COUNT
continues to process the remaining partition rows and moves the window accordingly, but the window dimensions (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) remain unchanged as three rows. Accordingly, the value in the count
column also remains unchanged (3):
2 - Windows with a logical offset (RANGE)
The RANGE keyword defines an analytic window frame as a logical offset from the current row.
The RANGE
keyword defines an analytic window frame as a logical offset from the current row.
Note
The value returned by an analytic function with a logical offset is always deterministic.
For each row, an analytic function uses the window order clause (ORDER_BY
) column or expression to calculate window frame dimensions as follows:
-
Within the current partition, evaluates the ORDER_BY
value of the current row against the ORDER_BY
values of contiguous rows.
-
Determines which of these rows satisfy the specified range requirements relative to the current row.
-
Creates a window frame that includes only those rows.
-
Executes on the current window.
Example
This example uses the table property_sales
, which contains data about neighborhood home sales:
=> SELECT property_key, neighborhood, sell_price FROM property_sales ORDER BY neighborhood, sell_price;
property_key | neighborhood | sell_price
--------------+---------------+------------
10918 | Jamaica Plain | 353000
10921 | Jamaica Plain | 450000
10927 | Jamaica Plain | 450000
10922 | Jamaica Plain | 474000
10919 | Jamaica Plain | 515000
10917 | Jamaica Plain | 675000
10924 | Jamaica Plain | 675000
10920 | Jamaica Plain | 705000
10923 | Jamaica Plain | 710000
10926 | Jamaica Plain | 875000
10925 | Jamaica Plain | 900000
10930 | Roslindale | 300000
10928 | Roslindale | 422000
10932 | Roslindale | 450000
10929 | Roslindale | 485000
10931 | Roslindale | 519000
10938 | West Roxbury | 479000
10933 | West Roxbury | 550000
10937 | West Roxbury | 550000
10934 | West Roxbury | 574000
10935 | West Roxbury | 598000
10936 | West Roxbury | 615000
10939 | West Roxbury | 720000
(23 rows)
The analytic function AVG can obtain the average of proximate selling prices within each neighborhood. The following query calculates for each home the average sale for all other neighborhood homes whose selling price was $50k higher or lower:
=> SELECT property_key, neighborhood, sell_price, AVG(sell_price) OVER(
PARTITION BY neighborhood ORDER BY sell_price
RANGE BETWEEN 50000 PRECEDING and 50000 FOLLOWING)::int AS comp_sales
FROM property_sales ORDER BY neighborhood;
property_key | neighborhood | sell_price | comp_sales
--------------+---------------+------------+------------
10918 | Jamaica Plain | 353000 | 353000
10927 | Jamaica Plain | 450000 | 458000
10921 | Jamaica Plain | 450000 | 458000
10922 | Jamaica Plain | 474000 | 472250
10919 | Jamaica Plain | 515000 | 494500
10917 | Jamaica Plain | 675000 | 691250
10924 | Jamaica Plain | 675000 | 691250
10920 | Jamaica Plain | 705000 | 691250
10923 | Jamaica Plain | 710000 | 691250
10926 | Jamaica Plain | 875000 | 887500
10925 | Jamaica Plain | 900000 | 887500
10930 | Roslindale | 300000 | 300000
10928 | Roslindale | 422000 | 436000
10932 | Roslindale | 450000 | 452333
10929 | Roslindale | 485000 | 484667
10931 | Roslindale | 519000 | 502000
10938 | West Roxbury | 479000 | 479000
10933 | West Roxbury | 550000 | 568000
10937 | West Roxbury | 550000 | 568000
10934 | West Roxbury | 574000 | 577400
10935 | West Roxbury | 598000 | 577400
10936 | West Roxbury | 615000 | 595667
10939 | West Roxbury | 720000 | 720000
(23 rows)
AVG processes this query as follows:
-
AVG
evaluates row 1 of the first partition (Jamaica Plain), but finds no sales within $50k of this row's sell_price
, ($353k). AVG
creates a window that includes this row only, and returns an average of 353k for row 1:
-
AVG
evaluates row 2 and finds three sell_price
values within $50k of the current row. AVG
creates a window that includes these three rows, and returns an average of 458k for row 2:
-
AVG
evaluates row 3 and finds the same three sell_price
values within $50k of the current row. AVG
creates a window identical to the one before, and returns the same average of 458k for row 3:
-
AVG
evaluates row 4 and finds four sell_price
values within $50k of the current row. AVG
expands its window to include rows 2 through 5, and returns an average of $472.25k for row 4:
-
In similar fashion, AVG
evaluates the remaining rows in this partition. When the function evaluates the first row of the second partition (Roslindale), it resets the window as follows:
Restrictions
If RANGE
specifies a constant value, that value's data type and the window's ORDER BY
data type must be the same. The following exceptions apply:
-
RANGE
can specify INTERVAL Year to Month
if the window order clause data type is one of following: TIMESTAMP
, TIMESTAMP WITH TIMEZONE
, or DATE
. TIME
and TIME WITH TIMEZONE
are not supported.
-
RANGE
can specify INTERVAL Day to Second
if the window order clause data is one of following: TIMESTAMP
, TIMESTAMP WITH TIMEZONE
, DATE
, TIME
, or TIME WITH TIMEZONE
.
The window order clause must specify one of the following data types: NUMERIC
, DATE/TIME
, FLOAT
or INTEGER
. This requirement is ignored if the window specifies one of following frames:
-
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
-
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
3 - Reporting aggregates
Some of the analytic functions that take the window-frame-clause are the reporting aggregates.
Some of the analytic functions that take the window-frame-clause are the reporting aggregates. These functions let you compare a partition's aggregate values with detail rows, taking the place of correlated subqueries or joins.
If you use a window aggregate with an empty OVER() clause, the analytic function is used as a reporting function, where the entire input is treated as a single partition.
About standard deviation and variance functions
With standard deviation functions, a low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data points are spread out over a large range of values.
Standard deviation is often graphed and a distributed standard deviation creates the classic bell curve.
Variance functions measure how far a set of numbers is spread out.
Examples
Think of the window for reporting aggregates as a window defined as UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING. The omission of a window-order-clause makes all rows in the partition also the window (reporting aggregates).
=> SELECT deptno, sal, empno, COUNT(sal)
OVER (PARTITION BY deptno) AS COUNT FROM emp;
deptno | sal | empno | count
--------+-----+-------+-------
10 | 101 | 1 | 2
10 | 104 | 4 | 2
------------------------------
20 | 110 | 10 | 6
20 | 110 | 9 | 6
20 | 109 | 7 | 6
20 | 109 | 6 | 6
20 | 109 | 8 | 6
20 | 100 | 11 | 6
------------------------------
30 | 105 | 5 | 3
30 | 103 | 3 | 3
30 | 102 | 2 | 3
(11 rows)
If the OVER() clause in the above query contained a window-order-clause
(for example, ORDER BY sal), it would become a moving window (window aggregate) query with a default window of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW:
=> SELECT deptno, sal, empno, COUNT(sal) OVER (PARTITION BY deptno ORDER BY sal) AS COUNT FROM emp;
deptno | sal | empno | count
--------+-----+-------+-------
10 | 101 | 1 | 1
10 | 104 | 4 | 2
------------------------------
20 | 100 | 11 | 1
20 | 109 | 7 | 4
20 | 109 | 6 | 4
20 | 109 | 8 | 4
20 | 110 | 10 | 6
20 | 110 | 9 | 6
------------------------------
30 | 102 | 2 | 1
30 | 103 | 3 | 2
30 | 105 | 5 | 3
(11 rows)
What about LAST_VALUE?
You might wonder why you couldn't just use the LAST_VALUE() analytic function.
For example, for each employee, get the highest salary in the department:
=> SELECT deptno, sal, empno,LAST_VALUE(empno) OVER (PARTITION BY deptno ORDER BY sal) AS lv FROM emp;
![](/images/reporting-aggregates2.png)
Due to default window semantics, LAST_VALUE does not always return the last value of a partition. If you omit the window-frame-clause from the analytic clause, LAST_VALUE operates on this default window. Results, therefore, can seem non-intuitive because the function does not return the bottom of the current partition. It returns the bottom of the window, which continues to change along with the current input row being processed.
Remember the default window:
OVER (PARTITION BY deptno ORDER BY sal)
is the same as:
OVER(PARTITION BY deptno ORDER BY salROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
If you want to return the last value of a partition, use UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
=> SELECT deptno, sal, empno, LAST_VALUE(empno)
OVER (PARTITION BY deptno ORDER BY sal
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
FROM emp;
![](/images/reporting-aggregates9.png)
Vertica recommends that you use LAST_VALUE with the window-order-clause to produce deterministic results.
In the following example, empno 6, 7, and 8 have the same salary, so they are in adjacent rows. empno 8 appears first in this case but the order is not guaranteed.
Notice in the output above, the last value is 7, which is the last row from the partition deptno = 20. If the rows have a different order, then the function returns a different value:
Now the last value is 6, which is the last row from the partition deptno = 20. The solution is to add a unique key to the sort order. Even if the order of the query changes, the result will always be the same, and so deterministic.
=> SELECT deptno, sal, empno, LAST_VALUE(empno)
OVER (PARTITION BY deptno ORDER BY sal, empno
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lv
FROM emp;
Notice how the rows are now ordered by empno
, the last value stays at 8, and it does not matter the order of the query.