SVM_CLASSIFIER
Trains the SVM model on an input relation.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
SVM_CLASSIFIER ( 'model-name', input-relation, 'response-column', 'predictor-columns'
[ USING PARAMETERS
[exclude_columns = 'excluded-columns']
[, C = 'cost']
[, epsilon = 'epsilon-value']
[, max_iterations = 'max-iterations']
[, class_weights = 'weight']
[, intercept_mode = 'intercept-mode']
[, intercept_scaling = 'scale'] ] )
Arguments
model-name
- Identifies the model to create, where
model-name
conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input-relation
- The table or view that contains the training data. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMA
to sync thehcatalog
schema, and then run the machine learning function. response-column
- The input column that represents the dependent variable or outcome. The column value must be 0 or 1, and of type numeric or BOOLEAN, otherwise the function returns with an error.
predictor-columns
Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter
exclude_columns
must includeresponse-column
, and any columns that are invalid as predictor columns.All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.
Note
All BOOLEAN predictor values are converted to FLOAT values before training: 0 for false, 1 for true. No type checking occurs during prediction, so you can use a BOOLEAN predictor column in training, and during prediction provide a FLOAT column of the same name. In this case, all FLOAT values must be either 0 or 1.
Parameters
exclude_columns
- Comma-separated list of columns from
predictor-columns
to exclude from processing. C
- Weight for misclassification cost. The algorithm minimizes the regularization cost and the misclassification cost.
Default: 1.0
epsilon
- Used to control accuracy.
Default: 1e-3
max_iterations
- Maximum number of iterations that the algorithm performs.
Default: 100
class_weights
- Specifies how to determine weights of the two classes, one of the following:
-
None
(default): No weights are used -
value0
,value1
: Two comma-delimited strings that specify two positive FLOAT values, wherevalue0
assigns a weight to class 0, andvalue1
assigns a weight to class 1. -
auto
: Weights each class according to the number of samples.
-
intercept_mode
- Specifies how to treat the intercept, one of the following:
-
regularized
(default): Fits the intercept and applies a regularization on it. -
unregularized
: Fits the intercept but does not include it in regularization.
-
intercept_scaling
- Float value that serves as the value of a dummy feature whose coefficient Vertica uses to calculate the model intercept. Because the dummy feature is not in the training data, its values are set to a constant, by default 1.
Model attributes
coeff
- Coefficients in the model:
-
colNames
: Intercept, or predictor column name -
coefficients
: Coefficient value
-
nAccepted
- Number of samples accepted for training from the data set
nRejected
- Number of samples rejected when training
nIteration
- Number of iterations used in training
callStr
- SQL statement used to replicate the training
Examples
The following example uses SVM_CLASSIFIER
on the mtcars
table:
=> SELECT SVM_CLASSIFIER(
'mySvmClassModel', 'mtcars', 'am', 'mpg,cyl,disp,hp,drat,wt,qsec,vs,gear,carb'
USING PARAMETERS exclude_columns = 'hp,drat');
SVM_CLASSIFIER
----------------------------------------------------------------
Finished in 15 iterations.
Accepted Rows: 32 Rejected Rows: 0
(1 row)