This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Time series forecasting
Time series models are trained on stationary time series (that is, time series where the mean doesn't change over time) of stochastic processes with consistent time steps.
Time series models are trained on stationary time series (that is, time series where the mean doesn't change over time) of stochastic processes with consistent time steps. These algorithms forecast future values by taking into account the influence of values at some number of preceding timesteps (lags).
Examples of applicable datasets include those for temperature, stock prices, earthquakes, product sales, etc.
To normalize datasets with inconsistent timesteps, see Gap filling and interpolation (GFI).
1 - ARIMA model example
Autoregressive integrated moving average (ARIMA) models combine the abilities of AUTOREGRESSOR and MOVING_AVERAGE models by making future predictions based on both preceding time series values and errors of previous predictions.
Autoregressive integrated moving average (ARIMA) models combine the abilities of AUTOREGRESSOR and MOVING_AVERAGE models by making predictions based on both preceding time series values and errors of previous predictions. ARIMA models also provide the option to apply a differencing operation to the input data, which can turn a non-stationary time series into a stationary time series. At model training time, you specify the differencing order and the number of preceding values and previous prediction errors that the model uses to calculate predictions.
You can use the following functions to train and make predictions with ARIMA models:
These functions require time series data with consistent timesteps. To normalize a time series with inconsistent timesteps, see Gap filling and interpolation (GFI).
The following example trains three ARIMA models, two that use differencing and one that does not, and then makes predictions using the models.
Load the training data
Before you begin the example,
load the Machine Learning sample data.
This example uses the following data:
daily-min-temperatures
: provided in the machine learning sample data, this dataset contains data on the daily minimum temperature in Melbourne, Australia from 1981 through 1990. After you load the sample datasets, this data is available in the temp_data
table.
db_size
: a table that tracks the size of a database over consecutive months.
=> SELECT * FROM temp_data;
time | Temperature
---------------------+-------------
1981-01-01 00:00:00 | 20.7
1981-01-02 00:00:00 | 17.9
1981-01-03 00:00:00 | 18.8
1981-01-04 00:00:00 | 14.6
1981-01-05 00:00:00 | 15.8
...
1990-12-27 00:00:00 | 14
1990-12-28 00:00:00 | 13.6
1990-12-29 00:00:00 | 13.5
1990-12-30 00:00:00 | 15.7
1990-12-31 00:00:00 | 13
(3650 rows)
=> SELECT COUNT(*) FROM temp_data;
COUNT
-------
3650
(1 row)
=> => SELECT * FROM db_size;
month | GB
-------+-----
1 | 5
2 | 10
3 | 20
4 | 35
5 | 55
6 | 80
7 | 110
8 | 145
9 | 185
10 | 230
(10 rows)
Train the ARIMA models
After you load the daily-min-temperatures
data, you can use the ARIMA function to create and train an ARIMA model. For this example, the model is trained with lags of p
=3 and q
=3, taking the value and prediction error of three previous time steps into account for each prediction. Because the input time series is stationary, you don't need to apply differencing to the data:
=> SELECT ARIMA('arima_temp', 'temp_data', 'temperature', 'time' USING PARAMETERS p=3, d=0, q=3);
ARIMA
--------------------------------------------------------------
Finished in 20 iterations.
3650 elements accepted, 0 elements rejected.
(1 row)
You can view a summary of the model with the GET_MODEL_SUMMARY function:
=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='arima_temp');
GET_MODEL_SUMMARY
-----------------------------------------------
============
coefficients
============
parameter| value
---------+--------
phi_1 | 0.64189
phi_2 | 0.46667
phi_3 |-0.11777
theta_1 |-0.05109
theta_2 |-0.58699
theta_3 |-0.15882
==============
regularization
==============
none
===============
timeseries_name
===============
temperature
==============
timestamp_name
==============
time
==============
missing_method
==============
linear_interpolation
===========
call_string
===========
ARIMA('public.arima_temp', 'temp_data', 'temperature', 'time' USING PARAMETERS p=3, d=0, q=3, missing='linear_interpolation', init_method='Zero', epsilon=1e-06, max_iterations=100);
===============
Additional Info
===============
Name | Value
------------------+--------
p | 3
q | 3
d | 0
mean |11.17775
lambda | 1.00000
mean_squared_error| 5.80490
rejected_row_count| 0
accepted_row_count| 3650
(1 row)
Examining the db_size
table, it is clear that there is an upward trend to the database size over time. Each month the database size increases five more gigabytes than the increase in the previous month. This trend indicates the time series is non-stationary.
To account for this in the ARIMA model, you must difference the data by setting a non-zero d
parameter value. For comparison, two ARIMA models are trained on this data, the first with a d
value of one and the second with a d
value of two:
=> SELECT ARIMA('arima_d1', 'db_size', 'GB', 'month' USING PARAMETERS p=2, d=1, q=2);
ARIMA
------------------------------------------------------------------------
Finished in 9 iterations.
10 elements accepted, 0 elements rejected.
(1 row)
=> SELECT ARIMA('arima_d2', 'db_size', 'GB', 'month' USING PARAMETERS p=2, d=2, q=2);
ARIMA
------------------------------------------------------------------------
Finished in 0 iterations.
10 elements accepted, 0 elements rejected.
(1 row)
Make predictions
After you train the ARIMA models, you can call the PREDICT_ARIMA function to predict future time series values. This function supports making predictions using the in-sample data that the models were trained on or applying the model to an input relation.
Using in-sample data
The following PREIDCT_ARIMA call makes temperature predictions using the in-sample data that the arima_temp
model was trained on. The model begins prediction at the end of the temp_data
table and returns predicted values for ten timesteps:
=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_temp', start=0, npredictions=10) OVER();
prediction
------------------
12.9745063293842
13.4389080858551
13.3955791360528
13.3551146487462
13.3149336514747
13.2750516811057
13.2354710353376
13.1961939790513
13.1572226788109
13.1185592045127
(10 rows)
For both prediction methods, if you want the function to return the standard error of each prediction, you can set output_standard_errors
to true:
=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_temp', start=0, npredictions=10, output_standard_errors=true) OVER();
prediction | std_err
------------------+------------------
12.9745063293842 | 1.00621890780865
13.4389080858551 | 1.45340836833232
13.3955791360528 | 1.61041524562932
13.3551146487462 | 1.76368421116143
13.3149336514747 | 1.91223938476627
13.2750516811057 | 2.05618464609977
13.2354710353376 | 2.19561771498385
13.1961939790513 | 2.33063553781651
13.1572226788109 | 2.46133422924445
13.1185592045127 | 2.58780904243988
(10 rows)
To make predictions with the two models trained on the db_size
table, you only need to change the specified model_name
in the above calls:
=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_d1', start=0, npredictions=10) OVER();
prediction
------------------
279.882778508943
334.398317856829
393.204492820962
455.909453114272
522.076165355683
591.227478668175
662.851655189833
736.408301395412
811.334631481162
887.051990217688
(10 rows)
=> SELECT PREDICT_ARIMA(USING PARAMETERS model_name='arima_d2', start=0, npredictions=10) OVER();
prediction
------------
280
335
395
460
530
605
685
770
860
955
(10 rows)
Comparing the outputs from the two models, you can see that the model trained with a d
value of two correctly captures the trend in the data. Each month the rate of database growth increases by five gigabytes.
You can also apply the model to an input relation. The following example makes predictions by applying the arima_temp
model to the temp_data
training set:
=> SELECT PREDICT_ARIMA(temperature USING PARAMETERS model_name='arima_temp', start=3651, npredictions=10, output_standard_errors=true) OVER(ORDER BY time) FROM temp_data;
prediction | std_err
------------------+------------------
12.9745063293842 | 1.00621890780865
13.4389080858551 | 1.45340836833232
13.3955791360528 | 1.61041524562932
13.3551146487462 | 1.76368421116143
13.3149336514747 | 1.91223938476627
13.2750516811057 | 2.05618464609977
13.2354710353376 | 2.19561771498385
13.1961939790513 | 2.33063553781651
13.1572226788109 | 2.46133422924445
13.1185592045127 | 2.58780904243988
(10 rows)
Because the same data and relative start index were provided to both prediction methods, the arima_temp
model predictions for each method are identical.
When applying a model to an input relation, you can set add_mean
to false so that the function returns the predicted difference from the mean instead of the sum of the model mean and the predicted difference:
=> SELECT PREDICT_ARIMA(temperature USING PARAMETERS model_name='arima_temp', start=3680, npredictions=10, add_mean=false) OVER(ORDER BY time) FROM temp_data;
prediction
------------------
1.2026877112171
1.17114068517961
1.13992534953432
1.10904183333367
1.0784901998692
1.04827044781798
1.01838251238116
0.98882626641461
0.959601521551628
0.93070802931751
(10 rows)
See also
2 - Autoregressive model example
Autoregressive models predict future values of a time series based on the preceding values. More specifically, the user-specified lag determines how many previous timesteps it takes into account during computation, and predicted values are linear combinations of the values at each lag.
Use the following functions when training and predicting with autoregressive models. Note that these functions require datasets with consistent timesteps.
To normalize datasets with inconsistent timesteps, see Gap filling and interpolation (GFI).
Example
-
Load the datasets from the Machine-Learning-Examples repository.
This example uses the daily-min-temperatures dataset, which contains data on the daily minimum temperature in Melbourne, Australia from 1981 through 1990:
=> SELECT * FROM temp_data;
time | Temperature
---------------------+-------------
1981-01-01 00:00:00 | 20.7
1981-01-02 00:00:00 | 17.9
1981-01-03 00:00:00 | 18.8
1981-01-04 00:00:00 | 14.6
1981-01-05 00:00:00 | 15.8
...
1990-12-27 00:00:00 | 14
1990-12-28 00:00:00 | 13.6
1990-12-29 00:00:00 | 13.5
1990-12-30 00:00:00 | 15.7
1990-12-31 00:00:00 | 13
(3650 rows)
-
Use
AUTOREGRESSOR
to create the autoregressive model AR_temperature
from the temp_data
dataset. In this case, the model is trained with a lag of p
=3, taking the previous 3 entries into account for each estimation:
=> SELECT AUTOREGRESSOR('AR_temperature', 'temp_data', 'Temperature', 'time' USING PARAMETERS p=3);
AUTOREGRESSOR
---------------------------------------------------------
Finished. 3650 elements accepted, 0 elements rejected.
(1 row)
You can view a summary of the model with
GET_MODEL_SUMMARY
:
=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='AR_temperature');
GET_MODEL_SUMMARY
-------------------
============
coefficients
============
parameter| value
---------+--------
alpha | 1.88817
phi_(t-1)| 0.70004
phi_(t-2)|-0.05940
phi_(t-3)| 0.19018
==================
mean_squared_error
==================
not evaluated
===========
call_string
===========
autoregressor('public.AR_temperature', 'temp_data', 'temperature', 'time'
USING PARAMETERS p=3, missing=linear_interpolation, regularization='none', lambda=1, compute_mse=false);
===============
Additional Info
===============
Name |Value
------------------+-----
lag_order | 3
rejected_row_count| 0
accepted_row_count|3650
(1 row)
-
Use
PREDICT_AUTOREGRESSOR
to predict future temperatures. The following query start
s the prediction at the end of the dataset and returns 10 predictions.
=> SELECT PREDICT_AUTOREGRESSOR(Temperature USING PARAMETERS model_name='AR_temperature', npredictions=10) OVER(ORDER BY time) FROM temp_data;
prediction
------------------
12.6235419917807
12.9387860506032
12.6683380680058
12.3886937385419
12.2689506237424
12.1503023330142
12.0211734746741
11.9150531529328
11.825870404008
11.7451846722395
(10 rows)
3 - Moving-average model example
Moving average models use the errors of previous predictions to make future predictions. More specifically, the user-specified lag determines how many previous predictions and errors it takes into account during computation.
Use the following functions when training and predicting with moving-average models. Note that these functions require datasets with consistent timesteps.
To normalize datasets with inconsistent timesteps, see Gap filling and interpolation (GFI).
Example
-
Load the datasets from the Machine-Learning-Examples repository.
This example uses the daily-min-temperatures dataset, which contains data on the daily minimum temperature in Melbourne, Australia from 1981 through 1990:
=> SELECT * FROM temp_data;
time | Temperature
---------------------+-------------
1981-01-01 00:00:00 | 20.7
1981-01-02 00:00:00 | 17.9
1981-01-03 00:00:00 | 18.8
1981-01-04 00:00:00 | 14.6
1981-01-05 00:00:00 | 15.8
...
1990-12-27 00:00:00 | 14
1990-12-28 00:00:00 | 13.6
1990-12-29 00:00:00 | 13.5
1990-12-30 00:00:00 | 15.7
1990-12-31 00:00:00 | 13
(3650 rows)
-
Use
MOVING_AVERAGE
to create the moving-average model MA_temperature
from the temp_data
dataset. In this case, the model is trained with a lag of p
=3, taking the error of 3 previous predictions into account for each estimation:
=> SELECT MOVING_AVERAGE('MA_temperature', 'temp_data', 'temperature', 'time' USING PARAMETERS q=3, missing='linear_interpolation', regularization='none', lambda=1);
MOVING_AVERAGE
---------------------------------------------------------
Finished. 3650 elements accepted, 0 elements rejected.
(1 row)
You can view a summary of the model with
GET_MODEL_SUMMARY
:
=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='MA_temperature');
GET_MODEL_SUMMARY
-------------------
============
coefficients
============
parameter| value
---------+--------
phi_(t-0)|-0.90051
phi_(t-1)|-0.10621
phi_(t-2)| 0.07173
===============
timeseries_name
===============
temperature
==============
timestamp_name
==============
time
===========
call_string
===========
moving_average('public.MA_temperature', 'temp_data', 'temperature', 'time'
USING PARAMETERS q=3, missing=linear_interpolation, regularization='none', lambda=1);
===============
Additional Info
===============
Name | Value
------------------+--------
mean |11.17780
lag_order | 3
lambda | 1.00000
rejected_row_count| 0
accepted_row_count| 3650
(1 row)
-
Use
PREDICT_MOVING_AVERAGE
to predict future temperatures. The following query start
s the prediction at the end of the dataset and returns 10 predictions.
=> SELECT PREDICT_MOVING_AVERAGE(Temperature USING PARAMETERS model_name='MA_temperature', npredictions=10) OVER(ORDER BY time) FROM temp_data;
prediction
------------------
13.1324365636272
12.8071086272833
12.7218966671721
12.6011086656032
12.506624729879
12.4148247026733
12.3307873804812
12.2521385975133
12.1789741993396
12.1107640076638
(10 rows)