ARIMA
Creates and trains an autoregressive integrated moving average (ARIMA) model from a time series with consistent timesteps. ARIMA models combine the abilities of AUTOREGRESSOR and MOVING_AVERAGE models by making future predictions based on both preceding time series values and errors of previous predictions. After the model is trained, you can make predictions with the PREDICT_ARIMA function.
In Vertica, ARIMA is implemented using a Kalman Filter state-space approach, similar to Gardner, G., et al. This approach updates the state-space model with each element in the training data in order to calculate a loss score over the training data. A BFGS optimizer is then used to adjust the coefficients, and the state-space estimation is rerun until convergence. Because of this repeated estimation process, ARIMA consumes large amounts of memory when called with high values of p
and q
.
Given that the input data must be sorted by timestamp, this algorithm is single-threaded.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
ImmutableSyntax
ARIMA( 'model-name', 'input-relation', 'timeseries-column', 'timestamp-column'
USING PARAMETERS param=value[,...] )
Arguments
model-name
- Model to create, where
model-name
conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input-relation
- Name of the table or view containing
timeseries-column
andtimestamp-column
. timeseries-column
- Name of a NUMERIC column in
input-relation
that contains the dependent variable or outcome. timestamp-column
- Name of an INTEGER, FLOAT, or TIMESTAMP column in
input-relation
that represents the timestamp variable. The timestep between consecutive entries should be consistent throughout thetimestamp-column
.Tip
If yourtimestamp-column
has varying timesteps, consider standardizing the step size with the TIME_SLICE function.
Parameters
p
- Integer in the range [0, 1000], the number of lags to include in the autoregressive component of the computation. If
q
is unspecified or set to zero,p
must be set to a nonzero value. In some cases, using a largep
value can result in a memory overload error.Default: 0
d
- Difference order of the model.
Caution
Currently, this value must be zero. If you attempt to train an ARIMA model with a nonzerod
value, the function errors.Default: 0
q
- Integer in the range [0, 1000], the number of lags to include in the moving average component of the computation. If
p
is unspecified or set to zero,q
must be set to a nonzero value. In some cases, using a largeq
value can result in a memory overload error.Default: 0
missing
- Method for handling missing values, one of the following strings:
-
'drop': Missing values are ignored.
-
'raise': Missing values raise an error.
-
'zero': Missing values are set to zero.
-
'linear_interpolation': Missing values are replaced by a linearly interpolated value based on the nearest valid entries before and after the missing value. In cases where the first or last values in a dataset are missing, the function errors.
Default: 'linear_interpolation'
-
init_method
- Initialization method, one of the following strings:
-
'Zero': Coefficients are initialized to zero.
-
'Hannan-Rissanen' or 'HR': Coefficients are initialized using the Hannan-Rissanen algorithm.
Default: 'Zero'
-
epsilon
- Float in the range (0.0, 1.0), controls the convergence criteria of the optimization algorithm.
Default: 1e-6
max_iterations
- Integer in the range [1, 1000000), the maximum number of training iterations. If you set this value too low, the algorithm might not converge.
Default: 100
Model attributes
coefficients
- Coefficients of the model:
-
phi
: parameters for the autoregressive component of the computation. The number of returnedphi
values is equal to the value ofp
. -
theta
: parameters for the moving average component of the computation. The number of returnedtheta
values is equal to the value ofq
.
-
p, q, d
- ARIMA component values:
-
p
: number of lags included in the autoregressive component of the computation -
d
: difference order of the model -
q
: number of lags included in the moving average component of the computation
-
mean
- The model mean, average of the accepted sample values from
timeseries-column
regularization
- Type of regularization used when training the model
lambda
- Regularization parameter. Higher values indicates stronger regularization.
mean_squared_error
- Mean squared error of the model on the training set
rejected_row_count
- Number of samples rejected during training
accepted_row_count
- Number of samples accepted for training from the data set
timeseries_name
- Name of the
timeseries-column
used to train the model timestamp_name
- Name of the
timestamp-column
used to train the model missing_method
- Method used for handling missing values
call_string
- SQL statement used to train the model
Examples
The function requires that at least one of the p
and q
parameters be a positive, nonzero integer. The following example trains a model where both of these parameters are set to two:
=> SELECT ARIMA('arima_temp', 'temp_data', 'temperature', 'time' USING PARAMETERS p=2, q=2);
ARIMA
-------------------------------------
Finished in 24 iterations.
3650 elements accepted, 0 elements rejected.
(1 row)
To see a summary of the model, including all model coefficients and parameter values, call GET_MODEL_SUMMARY:
=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='arima_temp');
GET_MODEL_SUMMARY
------------------------------------------------------------
============
coefficients
============
parameter| value
---------+--------
phi_1 | 1.23639
phi_2 |-0.24201
theta_1 |-0.64535
theta_2 |-0.23046
==============
regularization
==============
none
===============
timeseries_name
===============
temperature
==============
timestamp_name
==============
time
==============
missing_method
==============
linear_interpolation
===========
call_string
===========
ARIMA('public.arima_temp', 'temp_data', 'temperature', 'time' USING PARAMETERS p=2, d=0, q=2, missing='linear_interpolation', init_method='Zero', epsilon=1e-06, max_iterations=100);
===============
Additional Info
===============
Name | Value
------------------+--------
p | 2
q | 2
d | 0
mean |11.17775
lambda | 1.00000
mean_squared_error| 5.80628
rejected_row_count| 0
accepted_row_count| 3650
(1 row)
For an in-depth example that trains and makes predictions with an ARIMA model, see ARIMA model example.