SVM_REGRESSOR
Trains the SVM model on an input relation.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
SVM_REGRESSOR ( 'model-name', input-relation, 'response-column', 'predictor-columns'
[ USING PARAMETERS
[exclude_columns = 'excluded-columns']
[, error_tolerance = error-tolerance]
[, C = cost]
[, epsilon = epsilon-value]
[, max_iterations = max-iterations]
[, intercept_mode = 'mode']
[, intercept_scaling = 'scale'] ] )
Arguments
model-name
- Identifies the model to create, where
model-name
conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input-relation
- The table or view that contains the training data. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMA
to sync thehcatalog
schema, and then run the machine learning function. response-column
- An input column that represents the dependent variable or outcome. The column must be a numeric data type.
predictor-columns
Comma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter
exclude_columns
must includeresponse-column
, and any columns that are invalid as predictor columns.All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.
Note
All BOOLEAN predictor values are converted to FLOAT values before training: 0 for false, 1 for true. No type checking occurs during prediction, so you can use a BOOLEAN predictor column in training, and during prediction provide a FLOAT column of the same name. In this case, all FLOAT values must be either 0 or 1.
Parameters
exclude_columns
- Comma-separated list of columns from
predictor-columns
to exclude from processing. error_tolerance
- Defines the acceptable error margin. Any data points outside this region add a penalty to the cost function.
Default: 0.1
C
- The weight for misclassification cost. The algorithm minimizes the regularization cost and the misclassification cost.
Default: 1.0
epsilon
- Used to control accuracy.
Default: 1e-3
max_iterations
- The maximum number of iterations that the algorithm performs.
Default: 100
intercept_mode
- A string that specifies how to treat the intercept, one of the following
-
regularized
(default): Fits the intercept and applies a regularization on it. -
unregularized
: Fits the intercept but does not include it in regularization.
-
intercept_scaling
- A FLOAT value, serves as the value of a dummy feature whose coefficient Vertica uses to calculate the model intercept. Because the dummy feature is not in the training data, its values are set to a constant, by default set to 1.
Model attributes
coeff
- Coefficients in the model:
-
colNames
: Intercept, or predictor column name -
coefficients
: Coefficient value
-
nAccepted
- Number of samples accepted for training from the data set
nRejected
- Number of samples rejected when training
nIteration
- Number of iterations used in training
callStr
- SQL statement used to replicate the training
Examples
=> SELECT SVM_REGRESSOR('mySvmRegModel', 'faithful', 'eruptions', 'waiting'
USING PARAMETERS error_tolerance=0.1, max_iterations=100);
SVM_REGRESSOR
----------------------------------------------------------------
Finished in 5 iterations.
Accepted Rows: 272 Rejected Rows: 0
(1 row)