构建用于回归的 SVM 模型

该用于回归的 SVM 示例使用了一个名为 faithful 的小型数据集,该数据集基于黄石国家公园的老忠实间歇泉。该数据集包含有关间歇泉喷发之间的等待时间和喷发持续时间的值。该示例展示了如何构建模型来预测 eruptions 的值(给定 waiting 特征的值)。

  1. 使用 faithful_training 训练数据创建名为 svm_faithful 的 SVM 模型。

    => SELECT SVM_REGRESSOR('svm_faithful', 'faithful_training', 'eruptions', 'waiting'
                          USING PARAMETERS error_tolerance=0.1, max_iterations=100);
     Finished in 5 iterations
    Accepted Rows: 162   Rejected Rows: 0
    (1 row)
  2. 查看 svm_faithful 的摘要输出:

    => SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='svm_faithful');
    Predictors and Coefficients
    Intercept|  -1.59007
    waiting  |   0.07217
    Call string:
    SELECT svm_regressor('public.svm_faithful', 'faithful_training', '"eruptions"',
    'waiting'USING PARAMETERS error_tolerance = 0.1, C=1, max_iterations=100,
    Additional Info
    Name              |Value
    accepted_row_count| 162
    rejected_row_count|  0
    iteration_count  |  5
    (1 row)
  3. 在测试数据中运行 PREDICT_SVM_REGRESSOR 函数创建包含响应值的新表。将此表命名为 pred_faithful_results. 。在 pred_faithful_results 表中查看结果:

    => CREATE TABLE pred_faithful AS
           (SELECT id, eruptions, PREDICT_SVM_REGRESSOR(waiting USING PARAMETERS model_name='svm_faithful')
            AS pred FROM faithful_testing);
    => SELECT * FROM pred_faithful ORDER BY id;
     id  | eruptions |       pred
       4 |     2.283 | 2.88444568755189
       5 |     4.533 | 4.54434581879796
       8 |       3.6 | 4.54434581879796
       9 |      1.95 | 2.09058040739072
      11 |     1.833 | 2.30708912016195
      12 |     3.917 | 4.47217624787422
      14 |      1.75 | 1.80190212369576
      20 |      4.25 | 4.11132839325551
      22 |      1.75 | 1.80190212369576
    (110 rows)

计算均方误差 (MSE)

您可以使用 MSE 函数计算模型与数据的拟合程度。MSE 返回实际值与预测值之间的平方差的平均值。

=> SELECT MSE(obs::float, prediction::float) OVER()
   FROM (SELECT eruptions AS obs, pred AS prediction
         FROM pred_faithful) AS prediction_output;
        mse        |                   Comments
 0.254499811834235 | Of 110 rows, 110 were used and 0 were ignored
(1 row)
