Building a logistic regression model
This logistic regression example uses a small data set named mtcars. The example shows how to build a model that predicts the value of am
, which indicates whether the car has an automatic or a manual transmission. It uses the given values of all the other features in the data set.
In this example, roughly 60% of the data is used as training data to create a model. The remaining 40% is used as testing data against which you can test your logistic regression model.
Before you begin the example, load the Machine Learning sample data.-
Create the logistic regression model, named
logistic_reg_mtcars
, using themtcars_train
training data.=> SELECT LOGISTIC_REG('logistic_reg_mtcars', 'mtcars_train', 'am', 'cyl, wt' USING PARAMETERS exclude_columns='hp'); LOGISTIC_REG ---------------------------- Finished in 15 iterations (1 row)
-
View the summary output of
logistic_reg_mtcars
.=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='logistic_reg_mtcars'); -------------------------------------------------------------------------------- ======= details ======= predictor|coefficient| std_err |z_value |p_value ---------+-----------+-----------+--------+-------- Intercept| 262.39898 |44745.77338| 0.00586| 0.99532 cyl | 16.75892 |5987.23236 | 0.00280| 0.99777 wt |-119.92116 |17237.03154|-0.00696| 0.99445 ============== regularization ============== type| lambda ----+-------- none| 1.00000 =========== call_string =========== logistic_reg('public.logistic_reg_mtcars', 'mtcars_train', '"am"', 'cyl, wt' USING PARAMETERS exclude_columns='hp', optimizer='newton', epsilon=1e-06, max_iterations=100, regularization='none', lambda=1) =============== Additional Info =============== Name |Value ------------------+----- iteration_count | 20 rejected_row_count| 0 accepted_row_count| 20 (1 row)
-
Create a table named
mtcars_predict_results
. Populate this table with the prediction outputs you obtain from running thePREDICT_LOGISTIC_REG
function on your test data. View the results in themtcars_predict_results
table.=> CREATE TABLE mtcars_predict_results AS (SELECT car_model, am, PREDICT_LOGISTIC_REG(cyl, wt USING PARAMETERS model_name='logistic_reg_mtcars') AS Prediction FROM mtcars_test); CREATE TABLE => SELECT * FROM mtcars_predict_results; car_model | am | Prediction ----------------+----+------------ AMC Javelin | 0 | 0 Hornet 4 Drive | 0 | 0 Maserati Bora | 1 | 0 Merc 280 | 0 | 0 Merc 450SL | 0 | 0 Toyota Corona | 0 | 1 Volvo 142E | 1 | 1 Camaro Z28 | 0 | 0 Datsun 710 | 1 | 1 Honda Civic | 1 | 1 Porsche 914-2 | 1 | 1 Valiant | 0 | 0 (12 rows)
-
Evaluate the accuracy of the
PREDICT_LOGISTIC_REG
function, using the CONFUSION_MATRIX evaluation function.=> SELECT CONFUSION_MATRIX(obs::int, pred::int USING PARAMETERS num_classes=2) OVER() FROM (SELECT am AS obs, Prediction AS pred FROM mtcars_predict_results) AS prediction_output; class | 0 | 1 | comment -------+---+---+--------------------------------------------- 0 | 6 | 1 | 1 | 1 | 4 | Of 12 rows, 12 were used and 0 were ignored (2 rows)
In this case,
PREDICT_LOGISTIC_REG
correctly predicted that four out of five cars with a value of1
in theam
column have a value of1
. Out of the seven cars which had a value of0
in theam
column, six were correctly predicted to have the value0
. One car was incorrectly classified as having the value1
.