PREDICT_ARIMA
Applies an autoregressive integrated moving average (ARIMA) model to an input relation or makes predictions using the in-sample data. ARIMA models make predictions based on preceding time series values and errors of previous predictions. The function, by default, returns the predicted values plus the mean of the model.
Behavior type
ImmutableSyntax
Apply to an input relation:
PREDICT_ARIMA ( timeseries-column
USING PARAMETERS param=value[,...] )
OVER (ORDER BY timestamp-column)
FROM input-relation
Make predictions using the in-sample data:
PREDICT_ARIMA ( USING PARAMETERS model_name = 'ARIMA-model'
[, start = prediction-start ]
[, npredictions = num-predictions ]
[, output_standard_errors = boolean ] )
OVER ()
Arguments
timeseries-column
- Name of a NUMERIC column in
input-relation
used to make predictions. timestamp-column
- Name of an INTEGER, FLOAT, or TIMESTAMP column in
input-relation
that represents the timestamp variable. The timestep between consecutive entries should be consistent throughout thetimestamp-column
. input-relation
- Input relation containing
timeseries-column
andtimestamp-column
.
Parameters
model_name
- Name of a trained ARIMA model.
start
- The behavior of the
start
parameter and its range of accepted values depends on whether you provide atimeseries-column
:- No provided
timeseries-column
:start
must be an integer ≥0, where zero indicates to start prediction at the end of the in-sample data. Ifstart
is a positive value, the function predicts the values between the end of the in-sample data and thestart
index, and then uses the predicted values as time series inputs for the subsequentnpredictions
. timeseries-column
provided:start
must be an integer ≥1 and identifies the index (row) of thetimeseries-column
at which to begin prediction. If thestart
index is greater than the number of rows,N
, in the input data, the function predicts the values betweenN
andstart
and uses the predicted values as time series inputs for the subsequentnpredictions
.
Default:
-
No provided
timeseries-column
: prediction begins from the end of the in-sample data. -
timeseries-column
provided: prediction begins from the end of the provided input data.
- No provided
npredictions
- Integer ≥1, the number of predicted timesteps.
Default: 10
missing
- Methods for handling missing values, one of the following strings:
-
'drop': Missing values are ignored.
-
'error': Missing values raise an error.
-
'zero': Missing values are replaced with 0.
-
'linear_interpolation': Missing values are replaced by linearly-interpolated values based on the nearest valid entries before and after the missing value. If all values before or after a missing value in the prediction range are missing or invalid, interpolation is impossible and the function errors.
Default: Method used when training the model
-
add_mean
- Boolean, whether to add the model mean to the predicted value.
Default: True
output_standard_errors
- Boolean, whether to return estimates of the standard error of each prediction.
Default: False
Examples
The following example makes predictions using the in-sample data that the arima_temp
model was trained on:
=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_temp', npredictions=10) OVER();
index | prediction
-------+------------------
1 | 12.9794640462952
2 | 13.3759980774506
3 | 13.4596213753292
4 | 13.4670492239575
5 | 13.4559956810351
6 | 13.4405315951159
7 | 13.424086943584
8 | 13.4074973032696
9 | 13.3909657020137
10 | 13.374540947803
(10 rows)
You can also apply the model to an input relation:
=> SELECT PREDICT_ARIMA(temperature USING PARAMETERS model_name='arima_temp', start=100, npredictions=10) OVER(ORDER BY time) FROM temp_data;
index | prediction
-------+------------------
1 | 15.0373821404594
2 | 13.4707358943239
3 | 10.5714574755414
4 | 13.1957213344543
5 | 13.5606204019976
6 | 13.1604413418938
7 | 13.3998222399722
8 | 12.6110939669533
9 | 12.9015211253485
10 | 13.2382768006631
(10 rows)
For an in-depth example that trains and makes predictions with an ARIMA model, see ARIMA model example.