VAR model example
The following example trains a vector autoregression (VAR) model on an electricity usage dataset, and then makes predictions with that model.
Download and load the data
You can download the data from Mendeley Data. The dataset consists of metrics related to daily electricity load forecasting in Panama, including columns such as the following:
datetime
: hourly datetime index corresponding to Panama time-zone (UTC-05:00)nat_demand
: national electricity load in Panama, measured in megawatt hoursT2M_toc
: temperature at 2 meters in Tocumen, Panama CityQV2M_toc
: relative humidity at 2 meters in Tocumen, Panama CityTQL_toc
: liquid precipitation in Tocumen, Panama CityW2M_toc
: wind speed at 2 meters in Tocumen, Panama City
The dataset also includes temperature, humidity, precipitation, and wind speed columns for two other cities in Panama: Santiago City and David City.
Unlike the AR algorithm, which can only model a single variable over time, the VAR algorithm captures the relationship between time series variables across lagged time steps. For example, when trained on the above dataset, the VAR algorithm could model whether a rise in temperature tends to lead to an increase in electricity demand hours later.
After you have downloaded the data locally, you can load the data into Vertica with the following statements:
=> CREATE TABLE electric_data(ts timestamp, nat_demand float, T2M_toc float, QV2M_toc float,
TQL_toc float, W2M_toc float, T2M_san float, QV2M_san float, TQL_san float, W2M_san float, T2M_dav
float, QV2M_dav float, TQL_dav float, W2M_dav float, Holiday_ID int, holiday bool, school bool);
=> COPY electric_data (ts,nat_demand,T2M_toc,QV2M_toc,TQL_toc,W2M_toc,T2M_san,QV2M_san,TQL_san,W2M_san,
T2M_dav,QV2M_dav,TQL_dav,W2M_dav,Holiday_ID,holiday,school) FROM LOCAL 'path-to-data'
DELIMITER ',';
The dataset includes some categorical columns, which need to be dropped before training a VAR model:
=> ALTER TABLE electric_data DROP column Holiday_ID;
=> ALTER TABLE electric_data DROP column holiday;
=> ALTER TABLE electric_data DROP column school;
Query the electric_data
table to view a sample of the data that will be used to train the model:
=> SELECT * FROM electric_data LIMIT 3;
ts | nat_demand | T2M_toc | QV2M_toc | TQL_toc | W2M_toc | T2M_san | QV2M_san | TQL_san | W2M_san | T2M_dav | QV2M_dav | TQL_dav | W2M_dav
---------------------+------------+------------------+-------------+--------------+------------------+------------------+-------------+--------------+------------------+------------------+-------------+-------------+------------------
2015-01-03 01:00:00 | 970.345 | 25.8652587890625 | 0.018576382 | 0.016174316 | 21.8505458178752 | 23.4824462890625 | 0.017271755 | 0.0018553734 | 10.3289487293842 | 22.6621337890625 | 0.016562222 | 0.09609985 | 5.36414795209389
2015-01-03 02:00:00 | 912.1755 | 25.8992553710938 | 0.018653292 | 0.016418457 | 22.1669442769882 | 23.3992553710938 | 0.017264742 | 0.0013270378 | 10.6815171370742 | 22.5789428710938 | 0.016509432 | 0.087646484 | 5.57247134626166
2015-01-03 03:00:00 | 900.2688 | 25.9372802734375 | 0.01876786 | 0.0154800415 | 22.4549108762635 | 23.3435302734375 | 0.017211463 | 0.0014281273 | 10.8749237899741 | 22.5310302734375 | 0.016479041 | 0.07873535 | 5.87118374509993
(3 rows)
Train the model
To train a VAR model, use the AUTOREGRESSOR function. In this example, the model is trained with a lag of 3, meaning 3 previous timesteps are taken into account during computation:
=> SELECT AUTOREGRESSOR ( 'var_electric', 'electric_data', 'nat_demand, T2M_toc, QV2M_toc, TQL_toc, W2M_toc, T2M_san, QV2M_san,
TQL_san, W2M_san, T2M_dav, QV2M_dav, TQL_dav, W2M_dav', 'ts' USING PARAMETERS P=3, compute_mse=True );
WARNING 0: Only the Yule Walker method is currently supported for Vector Autoregression, setting method to Yule Walker
AUTOREGRESSOR
---------------------------------------------------------
Finished. 48048 elements accepted, 0 elements rejected.
(1 row)
Use the GET_MODEL_SUMMARY function to view a summary of the model, including its coefficient and parameter values:
=> SELECT GET_MODEL_SUMMARY( USING PARAMETERS model_name='var_electric');
GET_MODEL_SUMMARY
-------------------------------------------------------------------------------------------------------
=========
phi_(t-1)
=========
predictor |nat_demand|t2m_toc | qv2m_toc | tql_toc |w2m_toc |t2m_san | qv2m_san | tql_san |w2m_san | t2m_dav | qv2m_dav | tql_dav |w2m_dav
----------+----------+--------+------------+---------+--------+--------+------------+----------+--------+----------+------------+---------+--------
nat_demand| 1.14622 |-1.66306|-26024.21189|853.81013| 2.88135|59.48676|-28244.33013|-194.95364| 1.26052|-135.79998|-66425.52272|229.38130| 0.58089
t2m_toc | 0.00048 | 0.53187| -112.46610 | 1.14647 |-0.06549| 0.13092| -200.78913 | -1.58964 | 0.21530| -0.46069 | -494.82893 |-1.22152 | 0.13995
qv2m_toc | 0.00000 | 0.00057| 0.89702 |-0.00218 |-0.00001|-0.00017| 0.06365 | 0.00151 |-0.00003| -0.00012 | -0.06966 |-0.00027 |-0.00002
tql_toc | -0.00002 | 0.01902| 9.77736 | 1.39416 |-0.00235|-0.01847| 0.84052 | -0.27010 | 0.00264| 0.00547 | 15.98251 |-0.19235 |-0.00214
w2m_toc | -0.00392 | 2.33136| 99.79514 |14.08998 | 1.44729|-1.80924|-1756.46929 | 1.11483 | 0.12860| 0.18757 | 386.03986 |-3.28553 | 1.08902
t2m_san | 0.00008 |-0.28536| 474.30004 | 0.02968 |-0.05527| 1.29437| -475.49532 | -6.68332 | 0.24776| -1.09044 | -871.84132 | 0.10465 | 0.40562
qv2m_san | -0.00000 | 0.00021| -0.39224 | 0.00080 | 0.00002|-0.00010| 1.13421 | 0.00252 |-0.00006| 0.00001 | -0.30554 |-0.00206 | 0.00004
tql_san | -0.00003 |-0.00361| 17.26050 | 0.13776 |-0.00267| 0.01761| -9.63725 | 1.03554 |-0.00053| -0.01907 | -8.70276 |-0.56572 | 0.00523
w2m_san | -0.00007 |-0.92628| 9.34644 |22.01868 | 0.15592|-0.38963| 219.53687 | 6.32666 | 0.98779| 0.50404 | 607.06291 |-2.93982 | 1.01091
t2m_dav | -0.00009 |-0.19734| 447.10894 |-2.09032 | 0.00302| 0.27105| -266.05516 | -3.03434 | 0.28049| -0.03718 | -750.51074 |-3.00557 | 0.01414
qv2m_dav | -0.00000 | 0.00003| -0.25311 | 0.00255 |-0.00002| 0.00012| 0.36524 | -0.00043 | 0.00001| -0.00001 | 0.55553 |-0.00040 | 0.00010
tql_dav | -0.00000 | 0.00638| 36.02787 | 0.40214 |-0.00116| 0.00352| 2.09579 | 0.14142 | 0.00192| -0.01039 | -35.63238 | 0.91257 |-0.00834
w2m_dav | 0.00316 |-0.48625| -250.62285 | 6.92672 | 0.13897|-0.30942| 21.40057 | -2.77030 |-0.05098| 0.49791 | 86.43985 |-5.61450 | 1.36653
=========
phi_(t-2)
=========
predictor |nat_demand| t2m_toc | qv2m_toc | tql_toc |w2m_toc | t2m_san | qv2m_san | tql_san |w2m_san |t2m_dav | qv2m_dav | tql_dav | w2m_dav
----------+----------+---------+-----------+-----------+--------+---------+-----------+---------+--------+--------+-----------+----------+---------
nat_demand| -0.50332 |-46.08404|18727.70487|-1054.48563|-1.64150|-36.33188|-4175.69233|498.68770|24.05641|97.39713|15062.02349|-418.70514|-27.63335
t2m_toc | -0.00240 |-0.01253 | 304.73283 | -5.13242 |-0.03848|-0.08204 | -27.48349 | 1.86556 | 0.06103| 0.31194| 46.52296 | 0.25737 |-0.30230
qv2m_toc | 0.00000 |-0.00023 | -0.16013 | 0.00443 | 0.00003| 0.00003 | -0.02433 |-0.00226 | 0.00001| 0.00006| 0.10786 | 0.00055 | 0.00008
tql_toc | -0.00003 |-0.01472 | 3.90260 | -0.60709 | 0.00318| 0.01456 | -15.78706 | 0.23611 |-0.00201|-0.00475| -11.18542 | 0.09355 | 0.00227
w2m_toc | 0.00026 |-1.93177 | 864.02774 | -33.46154 |-0.64149| 1.99313 | 924.12187 |-2.23520 |-0.12906|-0.46720|-763.47613 | 8.44744 |-1.42164
t2m_san | -0.00449 |-0.12273 | 317.82860 | -9.44207 |-0.11911|-0.32597 | 11.71343 | 6.41420 | 0.17817| 0.78725| 147.48679 | -1.93663 |-0.73511
qv2m_san | 0.00000 | 0.00001 | 0.29127 | 0.00430 | 0.00001| 0.00018 | -0.25311 |-0.00256 | 0.00000|-0.00030| 0.16254 | 0.00213 | 0.00007
tql_san | -0.00001 | 0.01255 | 8.58348 | 0.17678 | 0.00236|-0.01895 | 2.89453 |-0.13436 | 0.00492| 0.01080| -5.78848 | 0.39776 |-0.00792
w2m_san | -0.00402 |-0.23875 | 152.07427 | -34.35207 |-0.14634| 0.54952 |-458.82955 |-2.37583 |-0.04254| 0.02246|-523.69779 | 7.92699 |-1.13203
t2m_dav | -0.00340 |-0.17349 | 218.48562 | -3.75842 |-0.14256|-0.16157 | 104.28631 | 2.42379 | 0.09304| 0.59601| 47.14495 | 0.00436 |-0.36515
qv2m_dav | 0.00000 | 0.00011 | 0.26173 | 0.00065 | 0.00004| 0.00006 | -0.36303 | 0.00036 |-0.00003|-0.00030| 0.00561 | 0.00151 |-0.00008
tql_dav | 0.00002 |-0.01052 | -24.63826 | -0.08533 | 0.00290| 0.00260 | 0.91509 |-0.14088 |-0.00035| 0.00148| 11.70604 | -0.09407 | 0.01194
w2m_dav | -0.00222 | 0.06791 | 74.24493 | -7.00070 |-0.16760| 0.41837 | -47.66437 | 2.82942 | 0.03768|-0.35919| -78.61307 | 3.12482 |-0.60352
=========
phi_(t-3)
=========
predictor |nat_demand|t2m_toc | qv2m_toc | tql_toc |w2m_toc | t2m_san | qv2m_san | tql_san | w2m_san |t2m_dav | qv2m_dav | tql_dav |w2m_dav
----------+----------+--------+-----------+---------+--------+---------+-----------+----------+---------+--------+-----------+---------+--------
nat_demand| 0.28126 |53.13756|3338.49498 |-26.56706| 1.60491|-57.36669|-8104.16577|-172.24549|-38.55564|74.30481|96465.63629|245.06242|44.47450
t2m_toc | 0.00221 | 0.46994|-348.87103 | 2.62141 | 0.16466|-0.25496 |-137.21750 | 2.19341 |-0.40865 | 0.30965|1079.93357 | 0.14082 | 0.25677
qv2m_toc | -0.00000 |-0.00039| 0.32237 |-0.00208 |-0.00004| 0.00015 | -0.01736 | 0.00052 | 0.00004 | 0.00014| -0.12526 |-0.00012 |-0.00006
tql_toc | 0.00006 |-0.00387| -17.12826 | 0.11237 |-0.00034| 0.00313 | 13.96030 | 0.08266 |-0.00155 |-0.00080| -0.47584 | 0.14030 | 0.00030
w2m_toc | 0.00498 |-0.70555|-955.01026 |17.98079 | 0.18909|-0.64634 | 496.07411 | 2.36480 |-0.07010 | 1.03103| 779.48099 |-5.90297 | 0.44579
t2m_san | 0.00510 | 0.47889|-1166.20907| 7.15240 | 0.27574|-0.48793 |-215.11669 | 5.31181 |-0.66397 | 0.72268|1871.66380 | 0.51522 | 0.50609
qv2m_san | -0.00000 |-0.00029| 0.28296 |-0.00588 |-0.00005|-0.00014 | -0.04735 | -0.00013 | 0.00010 | 0.00040| 0.17917 |-0.00013 |-0.00015
tql_san | 0.00004 |-0.00467| -27.72270 |-0.40887 | 0.00016| 0.00234 | 4.26912 | -0.00376 |-0.00318 | 0.00015| 22.54523 | 0.24970 | 0.00081
w2m_san | 0.00493 | 1.14186|-373.37429 |13.42031 | 0.02690|-0.47403 | 49.48392 | -3.10954 |-0.11381 |-0.05802| 153.91529 |-4.97422 | 0.21238
t2m_dav | 0.00365 | 0.55030|-1007.89174| 3.91560 | 0.22359|-0.44299 |-400.17878 | 4.14859 |-0.58507 | 0.58492|1659.35503 | 2.60303 | 0.54396
qv2m_dav | -0.00000 |-0.00014| 0.19838 |-0.00385 |-0.00004|-0.00020 | -0.08521 | -0.00031 | 0.00008 | 0.00030| 0.37780 |-0.00163 |-0.00007
tql_dav | -0.00002 | 0.00898| -10.89014 |-0.36915 |-0.00241|-0.00164 | -8.04515 | -0.03469 | 0.00140 |-0.00446| 34.64281 | 0.11446 |-0.00757
w2m_dav | -0.00120 | 0.51977| 149.50193 |-0.12824 | 0.02324| 0.07751 | 109.64435 | -0.52736 | 0.08222 |-0.46739| -20.74587 | 2.62600 | 0.09365
====
mean
====
predictor | value
----------+--------
nat_demand| 0.00000
t2m_toc | 0.00000
qv2m_toc | 0.00000
tql_toc | 0.00000
w2m_toc | 0.00000
t2m_san | 0.00000
qv2m_san | 0.00000
tql_san | 0.00000
w2m_san | 0.00000
t2m_dav | 0.00000
qv2m_dav | 0.00000
tql_dav | 0.00000
w2m_dav | 0.00000
==================
mean_squared_error
==================
predictor | value
----------+-----------
nat_demand|17962.23433
t2m_toc | 1.69488
qv2m_toc | 0.00000
tql_toc | 0.00055
w2m_toc | 2.46161
t2m_san | 5.50800
qv2m_san | 0.00000
tql_san | 0.00129
w2m_san | 2.09820
t2m_dav | 4.17091
qv2m_dav | 0.00000
tql_dav | 0.00131
w2m_dav | 0.37255
=================
predictor_columns
=================
"nat_demand", "t2m_toc", "qv2m_toc", "tql_toc", "w2m_toc", "t2m_san", "qv2m_san", "tql_san", "w2m_san", "t2m_dav", "qv2m_dav", "tql_dav", "w2m_dav"
================
timestamp_column
================
ts
==============
missing_method
==============
error
===========
call_string
===========
autoregressor('public.var_electric', 'electric_data', '"nat_demand", "t2m_toc", "qv2m_toc", "tql_toc", "w2m_toc", "t2m_san", "qv2m_san", "tql_san", "w2m_san", "t2m_dav", "qv2m_dav", "tql_dav", "w2m_dav"', 'ts'
USING PARAMETERS p=3, method=yule-walker, missing=error, regularization='none', lambda=1, compute_mse=true, subtract_mean=false);
===============
Additional Info
===============
Name | Value
------------------+--------
lag_order | 3
num_predictors | 13
lambda | 1.00000
rejected_row_count| 0
accepted_row_count| 48048
(1 row)
Make predictions
After you have trained the model, use PREDICT_AUTOREGRESSOR to make predictions. The following query makes predictions for 10 timesteps after the end of the training data:
=> SELECT PREDICT_AUTOREGRESSOR (nat_demand, T2M_toc, QV2M_toc, TQL_toc, W2M_toc, T2M_san, QV2M_san, TQL_san, W2M_san, T2M_dav,
QV2M_dav, TQL_dav, W2M_dav USING PARAMETERS model_name='var_electric', npredictions=10) OVER (ORDER BY ts) FROM electric_data;
index | nat_demand | T2M_toc | QV2M_toc | TQL_toc | W2M_toc | T2M_san | QV2M_san | TQL_san | W2M_san | T2M_dav | QV2M_dav | TQL_dav | W2M_dav
-------+------------------+------------------+--------------------+--------------------+------------------+------------------+--------------------+-------------------+------------------+------------------+--------------------+-------------------+------------------
1 | 1078.0855626373 | 27.5432174013135 | 0.0202896580655671 | 0.0735420728737344 | 10.8175311126823 | 26.430434929925 | 0.0192831578421391 | 0.107198653008438 | 3.05244789585641 | 24.5096742655262 | 0.0184037299403769 | 0.16295453121027 | 3.08228477169708
2 | 1123.01343816948 | 27.5799917618547 | 0.0204207744201445 | 0.0720447905881737 | 10.8724214941076 | 26.3610153442989 | 0.0194263633273137 | 0.108930877007977 | 2.72589694499722 | 24.4670623561271 | 0.0186472805344351 | 0.165398914107496 | 2.87751855475047
3 | 1131.90496161147 | 27.3065074421367 | 0.0206625082516192 | 0.0697170726826932 | 10.5264893921207 | 25.81201637743 | 0.019608966941237 | 0.10637712791638 | 2.17340369566314 | 24.1521703335357 | 0.0188528868910987 | 0.167392378142989 | 2.80663029425841
4 | 1138.96441161386 | 27.4576230482214 | 0.0207001599239755 | 0.0777394805028406 | 10.3601575817394 | 26.1392475032107 | 0.0195632331195498 | 0.104149788020336 | 2.46022124286432 | 24.4888899706856 | 0.0187304304955302 | 0.164373252722466 | 2.78678931032488
5 | 1171.39047791301 | 28.0057288751278 | 0.0205956267885475 | 0.0848090062223719 | 10.6253279384262 | 27.0670669914329 | 0.0195635438719142 | 0.114456870352482 | 2.76078540220627 | 25.2647929547485 | 0.0187651697256172 | 0.17343826852935 | 2.69927291097792
6 | 1207.73967000806 | 28.3228814221316 | 0.0206018765585195 | 0.0822472149970854 | 10.9208806093031 | 27.4723192020112 | 0.0197095736612743 | 0.120234389446089 | 2.66435208358109 | 25.5897884504046 | 0.0190138227508656 | 0.181730688934231 | 2.48952353476086
7 | 1201.36034218262 | 28.0729800850783 | 0.0207697611016373 | 0.0771476148672282 | 10.8746443523915 | 26.9706455927136 | 0.0198268925550646 | 0.108597236269397 | 2.4209888271894 | 25.1762432351852 | 0.0191249010707135 | 0.172747303182877 | 2.23183544428884
8 | 1224.23208010817 | 28.3089311565846 | 0.0208114201116026 | 0.0850389146168386 | 11.0236068974249 | 27.4555219109112 | 0.0198191134457088 | 0.106317040996309 | 2.53443574238199 | 25.6496524683801 | 0.0189950796868336 | 0.164772798858117 | 2.03398232662887
9 | 1276.63054426938 | 28.9264909381223 | 0.0207288153284714 | 0.0949502386997371 | 11.3825529048681 | 28.5602280897709 | 0.0198383353555985 | 0.119437241768372 | 2.81035565170478 | 26.6092121876548 | 0.0189965295736137 | 0.173848915902565 | 1.97943631786186
10 | 1279.80379750225 | 28.9655412855392 | 0.0207630823553549 | 0.0899990557577538 | 11.4992364863754 | 28.5096132340911 | 0.0199982128259766 | 0.122916656987609 | 2.8045000981305 | 26.4950144592136 | 0.0192642881348006 | 0.183789154418929 | 1.97414217031828
(10 rows)