PREDICT_ARIMA

Applies an autoregressive integrated moving average (ARIMA) model to an input relation or makes predictions using the in-sample data.

Applies an autoregressive integrated moving average (ARIMA) model to an input relation or makes predictions using the in-sample data. ARIMA models make predictions based on preceding time series values and errors of previous predictions. The function, by default, returns the predicted values plus the mean of the model.

Behavior type

Immutable

Syntax

Apply to an input relation:

PREDICT_ARIMA ( timeseries-column
        USING PARAMETERS param=value[,...] )
        OVER (ORDER BY timestamp-column)
        FROM input-relation

Make predictions using the in-sample data:

PREDICT_ARIMA ( USING PARAMETERS model_name = 'ARIMA-model'
        [, start = prediction-start ]
        [, npredictions = num-predictions ]
        [, output_standard_errors = boolean ] )
        OVER ()

Arguments

timeseries-column
Name of a NUMERIC column in input-relation used to make predictions.
timestamp-column
Name of an INTEGER, FLOAT, or TIMESTAMP column in input-relation that represents the timestamp variable. The timestep between consecutive entries should be consistent throughout the timestamp-column.
input-relation
Input relation containing timeseries-column and timestamp-column.

Parameters

model_name
Name of a trained ARIMA model.
start
The behavior of the start parameter and its range of accepted values depends on whether you provide a timeseries-column:
  • No provided timeseries-column: start must be an integer ≥0, where zero indicates to start prediction at the end of the in-sample data. If start is a positive value, the function predicts the values between the end of the in-sample data and the start index, and then uses the predicted values as time series inputs for the subsequent npredictions.
  • timeseries-column provided: start must be an integer ≥1 and identifies the index (row) of the timeseries-column at which to begin prediction. If the start index is greater than the number of rows, N, in the input data, the function predicts the values between N and start and uses the predicted values as time series inputs for the subsequent npredictions.

Default:

  • No provided timeseries-column: prediction begins from the end of the in-sample data.

  • timeseries-column provided: prediction begins from the end of the provided input data.

npredictions
Integer ≥1, the number of predicted timesteps.

Default: 10

missing
Methods for handling missing values, one of the following strings:
  • 'drop': Missing values are ignored.

  • 'error': Missing values raise an error.

  • 'zero': Missing values are replaced with 0.

  • 'linear_interpolation': Missing values are replaced by linearly-interpolated values based on the nearest valid entries before and after the missing value. If all values before or after a missing value in the prediction range are missing or invalid, interpolation is impossible and the function errors.

Default: Method used when training the model

add_mean
Boolean, whether to add the model mean to the predicted value.

Default: True

output_standard_errors
Boolean, whether to return estimates of the standard error of each prediction.

Default: False

Examples

The following example makes predictions using the in-sample data that the arima_temp model was trained on:

=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_temp', npredictions=10) OVER();
 index |    prediction
-------+------------------
     1 | 12.9794640462952
     2 | 13.3759980774506
     3 | 13.4596213753292
     4 | 13.4670492239575
     5 | 13.4559956810351
     6 | 13.4405315951159
     7 |  13.424086943584
     8 | 13.4074973032696
     9 | 13.3909657020137
    10 |  13.374540947803
(10 rows)

You can also apply the model to an input relation:

=> SELECT PREDICT_ARIMA(temperature USING PARAMETERS model_name='arima_temp', start=100, npredictions=10) OVER(ORDER BY time) FROM temp_data;
 index |    prediction
-------+------------------
     1 | 15.0373821404594
     2 | 13.4707358943239
     3 | 10.5714574755414
     4 | 13.1957213344543
     5 | 13.5606204019976
     6 | 13.1604413418938
     7 | 13.3998222399722
     8 | 12.6110939669533
     9 | 12.9015211253485
    10 | 13.2382768006631
(10 rows)

For an in-depth example that trains and makes predictions with an ARIMA model, see ARIMA model example.

See also