构建逻辑回归模型
该逻辑回归示例使用了名为 mtcars 的小数据集。该示例展示了如何构建预测 am
值的模型(表示汽车是自动档还是手动档)。使用数据集中所有其他特征给定的值。
在该示例中,使用约 60% 的数据作为样本数据来创建模型。余下 40% 的数据用于针对测试逻辑回归模型的测试数据。
开始示例之前,请加载机器学习示例数据。-
使用
mtcars_train
训练数据创建名为logistic_reg_mtcars
的逻辑回归模型。=> SELECT LOGISTIC_REG('logistic_reg_mtcars', 'mtcars_train', 'am', 'cyl, wt' USING PARAMETERS exclude_columns='hp'); LOGISTIC_REG ---------------------------- Finished in 15 iterations (1 row)
-
查看
logistic_reg_mtcars
的摘要输出。=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='logistic_reg_mtcars'); -------------------------------------------------------------------------------- ======= details ======= predictor|coefficient| std_err |z_value |p_value ---------+-----------+-----------+--------+-------- Intercept| 262.39898 |44745.77338| 0.00586| 0.99532 cyl | 16.75892 |5987.23236 | 0.00280| 0.99777 wt |-119.92116 |17237.03154|-0.00696| 0.99445 ============== regularization ============== type| lambda ----+-------- none| 1.00000 =========== call_string =========== logistic_reg('public.logistic_reg_mtcars', 'mtcars_train', '"am"', 'cyl, wt' USING PARAMETERS exclude_columns='hp', optimizer='newton', epsilon=1e-06, max_iterations=100, regularization='none', lambda=1) =============== Additional Info =============== Name |Value ------------------+----- iteration_count | 20 rejected_row_count| 0 accepted_row_count| 20 (1 row)
-
创建名为
mtcars_predict_results
的表。使用通过在测试数据上运行PREDICT_LOGISTIC_REG
函数获得的预测结果来填充表。在mtcars_predict_results
表中查看结果。=> CREATE TABLE mtcars_predict_results AS (SELECT car_model, am, PREDICT_LOGISTIC_REG(cyl, wt USING PARAMETERS model_name='logistic_reg_mtcars') AS Prediction FROM mtcars_test); CREATE TABLE => SELECT * FROM mtcars_predict_results; car_model | am | Prediction ----------------+----+------------ AMC Javelin | 0 | 0 Hornet 4 Drive | 0 | 0 Maserati Bora | 1 | 0 Merc 280 | 0 | 0 Merc 450SL | 0 | 0 Toyota Corona | 0 | 1 Volvo 142E | 1 | 1 Camaro Z28 | 0 | 0 Datsun 710 | 1 | 1 Honda Civic | 1 | 1 Porsche 914-2 | 1 | 1 Valiant | 0 | 0 (12 rows)
-
使用
PREDICT_LOGISTIC_REG
评估函数来评估 CONFUSION_MATRIX 函数的准确性。=> SELECT CONFUSION_MATRIX(obs::int, pred::int USING PARAMETERS num_classes=2) OVER() FROM (SELECT am AS obs, Prediction AS pred FROM mtcars_predict_results) AS prediction_output; class | 0 | 1 | comment -------+---+---+--------------------------------------------- 0 | 6 | 1 | 1 | 1 | 4 | Of 12 rows, 12 were used and 0 were ignored (2 rows)
在本例中,
PREDICT_LOGISTIC_REG
正确预测在1
列中值为am
的五辆汽车中的四辆具有1
值。在0
列中值为am
的七辆汽车中,正确预测了六辆具有值0
。将一辆汽车错误地分类为具有值1
。