This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
SVM (support vector machine) for regression
Support Vector Machine (SVM) for regression predicts continuous ordered variables based on the training data.
Support Vector Machine (SVM) for regression predicts continuous ordered variables based on the training data.
Unlike Logistic regression, which you use to determine a binary classification outcome, SVM for regression is primarily used to predict continuous numerical outcomes.
You can use the following functions to build an SVM for regression model, view the model, and use the model to make predictions on a set of test data:
For a complete example of how to use the SVM algorithm in Vertica, see Building an SVM for regression model.
1 - Building an SVM for regression model
This SVM for regression example uses a small data set named faithful, based on the Old Faithful geyser in Yellowstone National Park.
This SVM for regression example uses a small data set named faithful, based on the Old Faithful geyser in Yellowstone National Park. The data set contains values about the waiting time between eruptions and the duration of eruptions of the geyser. The example shows how you can build a model to predict the value of eruptions
, given the value of the waiting
feature.
Before you begin the example,
load the Machine Learning sample data.
-
Create the SVM model, named svm_faithful
, using the faithful_training
training data:
=> SELECT SVM_REGRESSOR('svm_faithful', 'faithful_training', 'eruptions', 'waiting'
USING PARAMETERS error_tolerance=0.1, max_iterations=100);
SVM_REGRESSOR
---------------------------
Finished in 5 iterations
Accepted Rows: 162 Rejected Rows: 0
(1 row)
-
View the summary output of svm_faithful
:
=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='svm_faithful');
------------------------------------------------------------------
=======
details
=======
===========================
Predictors and Coefficients
===========================
|Coefficients
---------+------------
Intercept| -1.59007
waiting | 0.07217
===========
call_string
===========
Call string:
SELECT svm_regressor('public.svm_faithful', 'faithful_training', '"eruptions"',
'waiting'USING PARAMETERS error_tolerance = 0.1, C=1, max_iterations=100,
epsilon=0.001);
===============
Additional Info
===============
Name |Value
------------------+-----
accepted_row_count| 162
rejected_row_count| 0
iteration_count | 5
(1 row)
-
Create a new table that contains the response values from running the PREDICT_SVM_REGRESSOR
function on your test data. Name this table pred_faithful_results.
View the results in the pred_faithful_results
table:
=> CREATE TABLE pred_faithful AS
(SELECT id, eruptions, PREDICT_SVM_REGRESSOR(waiting USING PARAMETERS model_name='svm_faithful')
AS pred FROM faithful_testing);
CREATE TABLE
=> SELECT * FROM pred_faithful ORDER BY id;
id | eruptions | pred
-----+-----------+------------------
4 | 2.283 | 2.88444568755189
5 | 4.533 | 4.54434581879796
8 | 3.6 | 4.54434581879796
9 | 1.95 | 2.09058040739072
11 | 1.833 | 2.30708912016195
12 | 3.917 | 4.47217624787422
14 | 1.75 | 1.80190212369576
20 | 4.25 | 4.11132839325551
22 | 1.75 | 1.80190212369576
.
.
.
(110 rows)
Calculating the mean squared error (MSE)
You can calculate how well your model fits the data is by using the MSE function. MSE returns the average of the squared differences between actual value and predicted values.
=> SELECT MSE(obs::float, prediction::float) OVER()
FROM (SELECT eruptions AS obs, pred AS prediction
FROM pred_faithful) AS prediction_output;
mse | Comments
-------------------+-----------------------------------------------
0.254499811834235 | Of 110 rows, 110 were used and 0 were ignored
(1 row)
See also