PREDICT_XGB_CLASSIFIER_CLASSES
Applies an XGBoost classifier model on an input relation and returns the probabilities of classes:
-
VARCHAR
predicted
column contains the class label with the highest probability. -
Multiple FLOAT columns, where the first
probability
column contains the probability for the class reported in the predicted column. Other columns contain the probability of each class specified in theclasses
parameter. -
Key columns with the same value and data type as matching input columns specified in parameter
key_columns
.
All trees contribute to a predicted probability for each response class, and the highest probability class is chosen.
Syntax
PREDICT_XGB_CLASSIFIER_CLASSES ( predictor-columns)
USING PARAMETERS model_name = 'model-name'
[, key_columns = 'key-columns']
[, exclude_columns = 'excluded-columns']
[, classes = 'classes']
[, match_by_pos = match-by-position]
[, probability_normalization = 'prob-normalization' ] )
OVER( [window-partition-clause] )
Arguments
input-columns
- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns.
Parameters
model_name
Name of the model (case-insensitive).
key_columns
Comma-separated list of predictor column names that identify the output rows. To exclude these and other predictor columns from being used for prediction, include them in the argument list for parameter
exclude_columns
.exclude_columns
- Comma-separated list of columns from
predictor-columns
to exclude from processing. classes
- Comma-separated list of class labels in the model. The probability of belonging to each given class is predicted by the classifier. Values are case sensitive.
match_by_pos
- Boolean value that specifies how predictor columns are matched to model features:
-
false
(default): Match by name. -
true
: Match by the position of columns in the predictor columns list.
-
probability_normalization
The classifier's normalization method, either
softmax
(multi-class classifier) orlogit
(binary classifier). If unspecified, the defaultlogit
function is used for normalization.
Examples
After creating an XGBoost classifier model with
XGB_CLASSIFIER
, you can use PREDICT_XGB_CLASSIFIER_CLASSES
to view the probability of each classification. In this example, the XGBoost classifier model "xgb_iris" is used to predict the probability that a given flower belongs to a species of iris:
=> SELECT PREDICT_XGB_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
USING PARAMETERS model_name='xgb_iris') OVER (PARTITION BEST) FROM iris1;
predicted | probability
------------+-------------------
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.999911552783011
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.9999650465368
setosa | 0.9999650465368
versicolor | 0.99991871763563
.
.
.
(90 rows)
You can also specify additional classes. In this example, PREDICT_XGB_CLASSIFIER_CLASSES
makes the same prediction as the previous example, but also returns the probability that a flower belongs to the specified classes
"virginica" and "versicolor":
=> SELECT PREDICT_XGB_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
USING PARAMETERS model_name='xgb_iris', classes='virginica,versicolor', probability_normalization='logit') OVER (PARTITION BEST) FROM iris1;
predicted | probability | virginica | versicolor
------------+-------------------+----------------------+----------------------
setosa | 0.9999650465368 | 1.16160301545536e-05 | 2.33374330460065e-05
setosa | 0.9999650465368 | 1.16160301545536e-05 | 2.33374330460065e-05
setosa | 0.9999650465368 | 1.16160301545536e-05 | 2.33374330460065e-05
.
.
.
versicolor | 0.99991871763563 | 6.45697562080953e-05 | 0.99991871763563
versicolor | 0.999967282051702 | 1.60052775404199e-05 | 0.999967282051702
versicolor | 0.999648819964864 | 0.00028366342010669 | 0.999648819964864
.
.
.
virginica | 0.999977039257386 | 0.999977039257386 | 1.13305901169304e-05
virginica | 0.999977085131063 | 0.999977085131063 | 1.12847163501674e-05
virginica | 0.999977039257386 | 0.999977039257386 | 1.13305901169304e-05
(90 rows)