构建逻辑回归模型
该逻辑回归示例使用了名为 mtcars 的小数据集。该示例展示了如何构建预测 am 值的模型(表示汽车是自动档还是手动档)。使用数据集中所有其他特征给定的值。
在该示例中,使用约 60% 的数据作为样本数据来创建模型。余下 40% 的数据用于针对测试逻辑回归模型的测试数据。
开始示例之前,请加载机器学习示例数据。-
使用
mtcars_train训练数据创建名为logistic_reg_mtcars的逻辑回归模型。=> SELECT LOGISTIC_REG('logistic_reg_mtcars', 'mtcars_train', 'am', 'cyl, wt' USING PARAMETERS exclude_columns='hp'); LOGISTIC_REG ---------------------------- Finished in 15 iterations (1 row) -
查看
logistic_reg_mtcars的摘要输出。=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='logistic_reg_mtcars'); -------------------------------------------------------------------------------- ======= details ======= predictor|coefficient| std_err |z_value |p_value ---------+-----------+-----------+--------+-------- Intercept| 262.39898 |44745.77338| 0.00586| 0.99532 cyl | 16.75892 |5987.23236 | 0.00280| 0.99777 wt |-119.92116 |17237.03154|-0.00696| 0.99445 ============== regularization ============== type| lambda ----+-------- none| 1.00000 =========== call_string =========== logistic_reg('public.logistic_reg_mtcars', 'mtcars_train', '"am"', 'cyl, wt' USING PARAMETERS exclude_columns='hp', optimizer='newton', epsilon=1e-06, max_iterations=100, regularization='none', lambda=1) =============== Additional Info =============== Name |Value ------------------+----- iteration_count | 20 rejected_row_count| 0 accepted_row_count| 20 (1 row) -
创建名为
mtcars_predict_results的表。使用通过在测试数据上运行PREDICT_LOGISTIC_REG函数获得的预测结果来填充表。在mtcars_predict_results表中查看结果。=> CREATE TABLE mtcars_predict_results AS (SELECT car_model, am, PREDICT_LOGISTIC_REG(cyl, wt USING PARAMETERS model_name='logistic_reg_mtcars') AS Prediction FROM mtcars_test); CREATE TABLE => SELECT * FROM mtcars_predict_results; car_model | am | Prediction ----------------+----+------------ AMC Javelin | 0 | 0 Hornet 4 Drive | 0 | 0 Maserati Bora | 1 | 0 Merc 280 | 0 | 0 Merc 450SL | 0 | 0 Toyota Corona | 0 | 1 Volvo 142E | 1 | 1 Camaro Z28 | 0 | 0 Datsun 710 | 1 | 1 Honda Civic | 1 | 1 Porsche 914-2 | 1 | 1 Valiant | 0 | 0 (12 rows) -
使用
PREDICT_LOGISTIC_REG评估函数来评估 CONFUSION_MATRIX 函数的准确性。=> SELECT CONFUSION_MATRIX(obs::int, pred::int USING PARAMETERS num_classes=2) OVER() FROM (SELECT am AS obs, Prediction AS pred FROM mtcars_predict_results) AS prediction_output; class | 0 | 1 | comment -------+---+---+--------------------------------------------- 0 | 6 | 1 | 1 | 1 | 4 | Of 12 rows, 12 were used and 0 were ignored (2 rows)在本例中,
PREDICT_LOGISTIC_REG正确预测在1列中值为am的五辆汽车中的四辆具有1值。在0列中值为am的七辆汽车中,正确预测了六辆具有值0。将一辆汽车错误地分类为具有值1。