ARIMA model example

Autoregressive integrated moving average (ARIMA) models combine the abilities of AUTOREGRESSOR and MOVING_AVERAGE models by making future predictions based on both preceding time series values and errors of previous predictions.

Autoregressive integrated moving average (ARIMA) models combine the abilities of AUTOREGRESSOR and MOVING_AVERAGE models by making future predictions based on both preceding time series values and errors of previous predictions. At model training time, you specify the number of preceding values and previous prediction errors that the model will use to calculate predictions.

You can use the following functions to train and make predictions with ARIMA models:

  • ARIMA: Creates and trains an ARIMA model

  • PREDICT_ARIMA: Applies a trained ARIMA model to an input relation or makes predictions using the in-sample data

These functions require time series data with consistent timesteps. To normalize a time series with inconsistent timesteps, see Gap filling and interpolation (GFI).

The following example trains an ARIMA model on a daily temperature dataset and then makes temperature predictions with both possible prediction methods—using the in-sample data and applying the model to an input relation.

Load the training data

Before you begin the example, load the Machine Learning sample data.

This examples uses the daily-min-temperatures dataset, which contains data on the daily minimum temperature in Melbourne, Australia from 1981 through 1990. The data is available in the temp_data table:

=> SELECT * FROM temp_data;
        time         | Temperature
---------------------+-------------
 1981-01-01 00:00:00 |        20.7
 1981-01-02 00:00:00 |        17.9
 1981-01-03 00:00:00 |        18.8
 1981-01-04 00:00:00 |        14.6
 1981-01-05 00:00:00 |        15.8
 ...
 1990-12-27 00:00:00 |          14
 1990-12-28 00:00:00 |        13.6
 1990-12-29 00:00:00 |        13.5
 1990-12-30 00:00:00 |        15.7
 1990-12-31 00:00:00 |          13
(3650 rows)

=> SELECT COUNT(*) FROM temp_data;
 COUNT
-------
3650
(1 row)

Train the ARIMA model

After you load the data, you can use the ARIMA function to create and train an ARIMA model. For this example, the model is trained with lags of p=3 and q=3, taking the value and prediction error of three previous time steps into account for each prediction:

=> SELECT ARIMA('arima_temp', 'temp_data', 'temperature', 'time' USING PARAMETERS p=3, q=3);
                             ARIMA
--------------------------------------------------------------
Finished in 20 iterations.
3650 elements accepted, 0 elements rejected.

(1 row)

You can view a summary of the model with the GET_MODEL_SUMMARY function:

=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='arima_temp');
               GET_MODEL_SUMMARY
-----------------------------------------------

============
coefficients
============
parameter| value
---------+--------
  phi_1  | 0.64189
  phi_2  | 0.46667
  phi_3  |-0.11777
 theta_1 |-0.05109
 theta_2 |-0.58699
 theta_3 |-0.15882


==============
regularization
==============
none

===============
timeseries_name
===============
temperature

==============
timestamp_name
==============
time

==============
missing_method
==============
linear_interpolation

===========
call_string
===========
ARIMA('public.arima_temp', 'temp_data', 'temperature', 'time' USING PARAMETERS p=3, d=0, q=3, missing='linear_interpolation', init_method='Zero', epsilon=1e-06, max_iterations=100);

===============
Additional Info
===============
       Name       | Value
------------------+--------
        p         |   3
        q         |   3
        d         |   0
       mean       |11.17775
      lambda      | 1.00000
mean_squared_error| 5.80490
rejected_row_count|   0
accepted_row_count|  3650

(1 row)

Make predictions

After you train the ARIMA model, you can call the PREDICT_ARIMA function to predict future temperatures. This function supports making predictions using the in-sample data that the model was trained on or applying the model to an input relation.

Using in-sample data

The following example makes predictions using the in-sample data that the model was trained on. The model begins prediction at the end of the temp_data table and returns predicted values for ten timesteps:

=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_temp', start=0, npredictions=10) OVER();
   prediction
------------------
12.9745063293842
13.4389080858551
13.3955791360528
13.3551146487462
13.3149336514747
13.2750516811057
13.2354710353376
13.1961939790513
13.1572226788109
13.1185592045127
(10 rows)

For both prediction methods, if you want the function to return the standard error of each prediction, you can set output_standard_errors to true:

=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_temp', start=0, npredictions=10, output_standard_errors=true) OVER();
    prediction    |     std_err
------------------+------------------
 12.9745063293842 | 1.00621890780865
 13.4389080858551 | 1.45340836833232
 13.3955791360528 | 1.61041524562932
 13.3551146487462 | 1.76368421116143
 13.3149336514747 | 1.91223938476627
 13.2750516811057 | 2.05618464609977
 13.2354710353376 | 2.19561771498385
 13.1961939790513 | 2.33063553781651
 13.1572226788109 | 2.46133422924445
 13.1185592045127 | 2.58780904243988
(10 rows)

Applying to an input relation

You can also apply the model to an input relation, in this case the temp_data training set:

=> SELECT PREDICT_ARIMA(temperature USING PARAMETERS model_name='arima_temp', start=3651, npredictions=10, output_standard_errors=true) OVER(ORDER BY time) FROM temp_data;
    prediction    |     std_err
------------------+------------------
 12.9745063293842 | 1.00621890780865
 13.4389080858551 | 1.45340836833232
 13.3955791360528 | 1.61041524562932
 13.3551146487462 | 1.76368421116143
 13.3149336514747 | 1.91223938476627
 13.2750516811057 | 2.05618464609977
 13.2354710353376 | 2.19561771498385
 13.1961939790513 | 2.33063553781651
 13.1572226788109 | 2.46133422924445
 13.1185592045127 | 2.58780904243988
(10 rows)

Because the same data and relative start index were provided to both prediction methods, the model predictions for each method are identical.

When applying a model to an input relation, you can set add_mean to false so that the function returns the predicted difference from the mean instead of the sum of the model mean and the predicted difference:

=> SELECT PREDICT_ARIMA(temperature USING PARAMETERS model_name='arima_temp', start=3680, npredictions=10, add_mean=false) OVER(ORDER BY time) FROM temp_data;
  prediction
------------------
1.2026877112171
1.17114068517961
1.13992534953432
1.10904183333367
1.0784901998692
1.04827044781798
1.01838251238116
0.98882626641461
0.959601521551628
0.93070802931751
(10 rows)

See also