This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

SVM (support vector machine) for classification

Support Vector Machine (SVM) is a classification algorithm that assigns data to one category or the other based on the training data.

Support Vector Machine (SVM) is a classification algorithm that assigns data to one category or the other based on the training data. This algorithm implements linear SVM, which is highly scalable.

You can use the following functions to train the SVM model, and use the model to make predictions on a set of test data:

You can also use the following evaluation functions to gain further insights:

For a complete example of how to use the SVM algorithm in Vertica, see Classifying data using SVM (support vector machine).

The implementation of the SVM algorithm in Vertica is based on the paper Distributed Newton Methods for Regularized Logistic Regression.

1 - Classifying data using SVM (support vector machine)

This SVM example uses a small data set named mtcars.

This SVM example uses a small data set named mtcars. The example shows how you can use the SVM_CLASSIFIER function to train the model to predict the value of am (the transmission type, where 0 = automatic and 1 = manual) using the PREDICT_SVM_CLASSIFIER function.

Before you begin the example, load the Machine Learning sample data.
  1. Create the SVM model, named svm_class, using the mtcars_train training data.

    
    => SELECT SVM_CLASSIFIER('svm_class', 'mtcars_train', 'am', 'cyl, mpg, wt, hp, gear'
                              USING PARAMETERS exclude_columns='gear');
    SVM_CLASSIFIER
    ----------------------------------------------------------------
    Finished in 12 iterations.
    Accepted Rows: 20  Rejected Rows: 0
    (1 row)
    
  2. View the summary output of `svm_class`.

    
    => SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='svm_class');
    ------------------------------------------------------------------------
    
    =======
    details
    =======
    predictor|coefficient
    ---------+-----------
    Intercept| -0.02006
    cyl      |  0.15367
    mpg      |  0.15698
    wt       | -1.78157
    hp       |  0.00957
    
    ===========
    call_string
    ===========
    SELECT svm_classifier('public.svm_class', 'mtcars_train', '"am"', 'cyl, mpg, wt, hp, gear'
    USING PARAMETERS exclude_columns='gear', C=1, max_iterations=100, epsilon=0.001);
    
    ===============
    Additional Info
    ===============
    Name              |Value
    ------------------+-----
    accepted_row_count| 20
    rejected_row_count|  0
    iteration_count   | 12
    (1 row)
    
  3. Create a new table, named svm_mtcars_predict. Populate this table with the prediction outputs you obtain from running the PREDICT_SVM_CLASSIFIER function on your test data.

    => CREATE TABLE svm_mtcars_predict AS
       (SELECT car_model, am, PREDICT_SVM_CLASSIFIER(cyl, mpg, wt, hp
                                                USING PARAMETERS model_name='svm_class')
                                                AS Prediction FROM mtcars_test);
    CREATE TABLE
    
  4. View the results in the svm_mtcars_predict table.

    => SELECT * FROM svm_mtcars_predict;
    car_model     | am | Prediction
    ------------- +----+------------
    Toyota Corona |  0 |          1
    Camaro Z28    |  0 |          0
    Datsun 710    |  1 |          1
    Valiant       |  0 |          0
    Volvo 142E    |  1 |          1
    AMC Javelin   |  0 |          0
    Honda Civic   |  1 |          1
    Hornet 4 Drive|  0 |          0
    Maserati Bora |  1 |          1
    Merc 280      |  0 |          0
    Merc 450SL    |  0 |          0
    Porsche 914-2 |  1 |          1
    (12 rows)
    
  5. Evaluate the accuracy of the PREDICT_SVM_CLASSIFIER function, using the CONFUSION_MATRIX evaluation function.

    => SELECT CONFUSION_MATRIX(obs::int, pred::int USING PARAMETERS num_classes=2) OVER()
            FROM (SELECT am AS obs, Prediction AS pred FROM svm_mtcars_predict) AS prediction_output;
     class | 0 | 1 |                   comment
    -------+---+---+---------------------------------------------
         0 | 6 | 1 |
         1 | 0 | 5 | Of 12 rows, 12 were used and 0 were ignored
    (2 rows)
    

    In this case, PREDICT_SVM_CLASSIFIER correctly predicted that the cars with a value of 1 in the am column have a value of 1. No cars were incorrectly classified. Out of the seven cars which had a value of 0 in the am column, six were correctly predicted to have the value 0. One car was incorrectly classified as having the value 1.

See also