Building a logistic regression model
This logistic regression example uses a small data set named mtcars. The example shows how to build a model that predicts the value of am
, which indicates whether the car has an automatic or a manual transmission. It uses the given values of all the other features in the data set.
In this example, roughly 60% of the data is used as training data to create a model. The remaining 40% is used as testing data against which you can test your logistic regression model.
Before you begin the example, load the Machine Learning sample data.
Create the logistic regression model, named
logistic_reg_mtcars
, using themtcars_train
training data.=> SELECT LOGISTIC_REG('logistic_reg_mtcars', 'mtcars_train', 'am', 'cyl, wt' USING PARAMETERS exclude_columns='hp'); LOGISTIC_REG  Finished in 15 iterations (1 row)

View the summary output of
logistic_reg_mtcars
.=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='logistic_reg_mtcars');  ======= details ======= predictorcoefficient std_err z_value p_value ++++ Intercept 262.39898 44745.77338 0.00586 0.99532 cyl  16.75892 5987.23236  0.00280 0.99777 wt 119.92116 17237.031540.00696 0.99445 ============== regularization ============== type lambda + none 1.00000 =========== call_string =========== logistic_reg('public.logistic_reg_mtcars', 'mtcars_train', '"am"', 'cyl, wt' USING PARAMETERS exclude_columns='hp', optimizer='newton', epsilon=1e06, max_iterations=100, regularization='none', lambda=1) =============== Additional Info =============== Name Value + iteration_count  20 rejected_row_count 0 accepted_row_count 20 (1 row)

Create a table named
mtcars_predict_results
. Populate this table with the prediction outputs you obtain from running thePREDICT_LOGISTIC_REG
function on your test data. View the results in themtcars_predict_results
table.=> CREATE TABLE mtcars_predict_results AS (SELECT car_model, am, PREDICT_LOGISTIC_REG(cyl, wt USING PARAMETERS model_name='logistic_reg_mtcars') AS Prediction FROM mtcars_test); CREATE TABLE => SELECT * FROM mtcars_predict_results; car_model  am  Prediction ++ AMC Javelin  0  0 Hornet 4 Drive  0  0 Maserati Bora  1  0 Merc 280  0  0 Merc 450SL  0  0 Toyota Corona  0  1 Volvo 142E  1  1 Camaro Z28  0  0 Datsun 710  1  1 Honda Civic  1  1 Porsche 9142  1  1 Valiant  0  0 (12 rows)

Evaluate the accuracy of the
PREDICT_LOGISTIC_REG
function, using the CONFUSION_MATRIX evaluation function.=> SELECT CONFUSION_MATRIX(obs::int, pred::int USING PARAMETERS num_classes=2) OVER() FROM (SELECT am AS obs, Prediction AS pred FROM mtcars_predict_results) AS prediction_output; class  0  1  comment +++ 0  6  1  1  1  4  Of 12 rows, 12 were used and 0 were ignored (2 rows)
In this case,
PREDICT_LOGISTIC_REG
correctly predicted that four out of five cars with a value of1
in theam
column have a value of1
. Out of the seven cars which had a value of0
in theam
column, six were correctly predicted to have the value0
. One car was incorrectly classified as having the value1
.