Gap filling and interpolation (GFI)

The examples and graphics that explain the concepts in this topic use the following simple schema:.

The examples and graphics that explain the concepts in this topic use the following simple schema:

CREATE TABLE TickStore (ts TIMESTAMP, symbol VARCHAR(8), bid FLOAT);
INSERT INTO TickStore VALUES ('2009-01-01 03:00:00', 'XYZ', 10.0);
INSERT INTO TickStore VALUES ('2009-01-01 03:00:05', 'XYZ', 10.5);

In Vertica, time series data is represented by a sequence of rows that conforms to a particular table schema, where one of the columns stores the time information.

Both time and the state of data within a time series are continuous. Thus, evaluating SQL queries over time can be challenging because input records usually occur at non-uniform intervals and can contain gaps.

For example, the following table contains two input rows five seconds apart: 3:00:00 and 3:00:05.

=> SELECT * FROM TickStore;
         ts          | symbol | bid
 2009-01-01 03:00:00 | XYZ    |   10
 2009-01-01 03:00:05 | XYZ    | 10.5
(2 rows)

Given those two inputs, how can you determine a bid price that falls between the two points, such as at 3:00:03 PM? The TIME_SLICE function normalizes timestamps into corresponding time slices; however, TIME_SLICE does not solve the problem of missing inputs (time slices) in the data. Instead, Vertica provides gap-filling and interpolation (GFI) functionality, which fills in missing data points and adds new (missing) data points within a range of known data points to the output. It accomplishes these tasks with the time series aggregate functions (TS_FIRST_VALUE and TS_LAST_VALUE) and the SQL TIMESERIES clause. But first, we'll illustrate the components that make up gap filling and interpolation in Vertica, starting with Constant interpolation. The images in the following topics use the following legend:

  • The x-axis represents the timestamp (ts) column

  • The y-axis represents the bid column.

  • The vertical blue lines delimit the time slices.

  • The red dots represent the input records in the table, $10.0 and $10.5.

  • The blue stars represent the output values, including interpolated values.