Building a linear regression model
This linear regression example uses a small data set named faithful. The data set gives the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park. The duration of each eruption can last anywhere between 1.5 and 5 minutes. The time between eruptions and the length of each eruption varies. However, you can estimate the time of the next eruption based on the duration of the previous eruption. The example shows how you can build a model to predict the value of eruptions
, given the value of the waiting
feature.
-
Create the linear regression model, named
linear_reg_faithful
, using thefaithful_training
training data:=> SELECT LINEAR_REG('linear_reg_faithful', 'faithful_training', 'eruptions', 'waiting' USING PARAMETERS optimizer='BFGS'); LINEAR_REG --------------------------- Finished in 6 iterations (1 row)
-
View the summary output of
linear_reg_faithful
:=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='linear_reg_faithful'); -------------------------------------------------------------------------------- ======= details ======= predictor|coefficient|std_err |t_value |p_value ---------+-----------+--------+--------+-------- Intercept| -2.06795 | 0.21063|-9.81782| 0.00000 waiting | 0.07876 | 0.00292|26.96925| 0.00000 ============== regularization ============== type| lambda ----+-------- none| 1.00000 =========== call_string =========== linear_reg('public.linear_reg_faithful', 'faithful_training', '"eruptions"', 'waiting' USING PARAMETERS optimizer='bfgs', epsilon=1e-06, max_iterations=100, regularization='none', lambda=1) =============== Additional Info =============== Name |Value ------------------+----- iteration_count | 3 rejected_row_count| 0 accepted_row_count| 162 (1 row)
-
Create a new table that contains the response values from running the
PREDICT_LINEAR_REG
function on your test data. Name this tablepred_faithful_results
. View the results in thepred_faithful_results
table:=> CREATE TABLE pred_faithful_results AS (SELECT id, eruptions, PREDICT_LINEAR_REG(waiting USING PARAMETERS model_name='linear_reg_faithful') AS pred FROM faithful_testing); CREATE TABLE => SELECT * FROM pred_faithful_results ORDER BY id; id | eruptions | pred -----+-----------+------------------ 4 | 2.283 | 2.8151271587036 5 | 4.533 | 4.62659045686076 8 | 3.6 | 4.62659045686076 9 | 1.95 | 1.94877514654148 11 | 1.833 | 2.18505296804024 12 | 3.917 | 4.54783118302784 14 | 1.75 | 1.6337380512098 20 | 4.25 | 4.15403481386324 22 | 1.75 | 1.6337380512098 . . . (110 rows)
Calculating the mean squared error (MSE)
You can calculate how well your model fits the data is by using the MSE function. MSE returns the average of the squared differences between actual value and predicted values.
=> SELECT MSE (eruptions::float, pred::float) OVER() FROM
(SELECT eruptions, pred FROM pred_faithful_results) AS prediction_output;
mse | Comments
-------------------+-----------------------------------------------
0.252925741352641 | Of 110 rows, 110 were used and 0 were ignored
(1 row)