This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

SVM (support vector machine) for regression

Support Vector Machine (SVM) for regression predicts continuous ordered variables based on the training data.

Support Vector Machine (SVM) for regression predicts continuous ordered variables based on the training data.

Unlike Logistic regression, which you use to determine a binary classification outcome, SVM for regression is primarily used to predict continuous numerical outcomes.

You can use the following functions to build an SVM for regression model, view the model, and use the model to make predictions on a set of test data:

For a complete example of how to use the SVM algorithm in Vertica, see Building an SVM for regression model.

1 - Building an SVM for regression model

This SVM for regression example uses a small data set named faithful, based on the Old Faithful geyser in Yellowstone National Park.

This SVM for regression example uses a small data set named faithful, based on the Old Faithful geyser in Yellowstone National Park. The data set contains values about the waiting time between eruptions and the duration of eruptions of the geyser. The example shows how you can build a model to predict the value of eruptions, given the value of the waiting feature.

Before you begin the example, load the Machine Learning sample data.
  1. Create the SVM model, named svm_faithful, using the faithful_training training data:

    => SELECT SVM_REGRESSOR('svm_faithful', 'faithful_training', 'eruptions', 'waiting'
                          USING PARAMETERS error_tolerance=0.1, max_iterations=100);
            SVM_REGRESSOR
    ---------------------------
     Finished in 5 iterations
    Accepted Rows: 162   Rejected Rows: 0
    (1 row)
    
  2. View the summary output of svm_faithful:

    => SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='svm_faithful');
    
    ------------------------------------------------------------------
    =======
    details
    =======
    
    ===========================
    Predictors and Coefficients
    ===========================
             |Coefficients
    ---------+------------
    Intercept|  -1.59007
    waiting  |   0.07217
    ===========
    call_string
    ===========
    Call string:
    SELECT svm_regressor('public.svm_faithful', 'faithful_training', '"eruptions"',
    'waiting'USING PARAMETERS error_tolerance = 0.1, C=1, max_iterations=100,
    epsilon=0.001);
    
    ===============
    Additional Info
    ===============
    Name              |Value
    ------------------+-----
    accepted_row_count| 162
    rejected_row_count|  0
    iteration_count  |  5
    (1 row)
    
  3. Create a new table that contains the response values from running the PREDICT_SVM_REGRESSOR function on your test data. Name this table pred_faithful_results. View the results in the pred_faithful_results table:

    => CREATE TABLE pred_faithful AS
           (SELECT id, eruptions, PREDICT_SVM_REGRESSOR(waiting USING PARAMETERS model_name='svm_faithful')
            AS pred FROM faithful_testing);
    CREATE TABLE
    => SELECT * FROM pred_faithful ORDER BY id;
     id  | eruptions |       pred
    -----+-----------+------------------
       4 |     2.283 | 2.88444568755189
       5 |     4.533 | 4.54434581879796
       8 |       3.6 | 4.54434581879796
       9 |      1.95 | 2.09058040739072
      11 |     1.833 | 2.30708912016195
      12 |     3.917 | 4.47217624787422
      14 |      1.75 | 1.80190212369576
      20 |      4.25 | 4.11132839325551
      22 |      1.75 | 1.80190212369576
    .
    .
    .
    (110 rows)
    

Calculating the mean squared error (MSE)

You can calculate how well your model fits the data is by using the MSE function. MSE returns the average of the squared differences between actual value and predicted values.

=> SELECT MSE(obs::float, prediction::float) OVER()
   FROM (SELECT eruptions AS obs, pred AS prediction
         FROM pred_faithful) AS prediction_output;
        mse        |                   Comments
-------------------+-----------------------------------------------
 0.254499811834235 | Of 110 rows, 110 were used and 0 were ignored
(1 row)

See also