SVM_REGRESSOR
Trains the SVM model on an input relation.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
SVM_REGRESSOR ( 'model‑name', input‑relation, 'response‑column', 'predictor‑columns'
[ USING PARAMETERS
[exclude_columns = 'excluded‑columns']
[, error_tolerance = error-tolerance]
[, C = cost]
[, epsilon = epsilon‑value]
[, max_iterations = max‑iterations]
[, intercept_mode = 'mode']
[, intercept_scaling = 'scale'] ] )
Arguments
model‑name- Identifies the model to create, where
model‑nameconforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input‑relation- The table or view that contains the training data. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMAto sync thehcatalogschema, and then run the machine learning function. response‑column- An input column that represents the dependent variable or outcome. The column must be a numeric data type.
predictor‑columnsComma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter
exclude_columnsmust includeresponse‑column, and any columns that are invalid as predictor columns.All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.
Note
All BOOLEAN predictor values are converted to FLOAT values before training: 0 for false, 1 for true. No type checking occurs during prediction, so you can use a BOOLEAN predictor column in training, and during prediction provide a FLOAT column of the same name. In this case, all FLOAT values must be either 0 or 1.
Parameters
exclude_columns- Comma-separated list of columns from
predictor‑columnsto exclude from processing. error_tolerance- Defines the acceptable error margin. Any data points outside this region add a penalty to the cost function.
Default: 0.1
C- The weight for misclassification cost. The algorithm minimizes the regularization cost and the misclassification cost.
Default: 1.0
epsilon- Used to control accuracy.
Default: 1e-3
max_iterations- The maximum number of iterations that the algorithm performs.
Default: 100
intercept_mode- A string that specifies how to treat the intercept, one of the following
-
regularized(default): Fits the intercept and applies a regularization on it. -
unregularized: Fits the intercept but does not include it in regularization.
-
intercept_scaling- A FLOAT value, serves as the value of a dummy feature whose coefficient Vertica uses to calculate the model intercept. Because the dummy feature is not in the training data, its values are set to a constant, by default set to 1.
Model attributes
coeff- Coefficients in the model:
-
colNames: Intercept, or predictor column name -
coefficients: Coefficient value
-
nAccepted- Number of samples accepted for training from the data set
nRejected- Number of samples rejected when training
nIteration- Number of iterations used in training
callStr- SQL statement used to replicate the training
Examples
=> SELECT SVM_REGRESSOR('mySvmRegModel', 'faithful', 'eruptions', 'waiting'
USING PARAMETERS error_tolerance=0.1, max_iterations=100);
SVM_REGRESSOR
----------------------------------------------------------------
Finished in 5 iterations.
Accepted Rows: 272 Rejected Rows: 0
(1 row)