XGBoost for classification

The following XGBoost functions create and perform predictions with a classification model:.

XGBoost (eXtreme Gradient Boosting) is a popular supervised-learning algorithm used for regression and classification on large datasets. It uses sequentially-built shallow decision trees to provide accurate results and a highly-scalable training method that avoids overfitting.

The following XGBoost functions create and perform predictions with a classification model:

Example

This example uses the "iris" dataset, which contains measurements for various parts of a flower, and can be used to predict its species and creates an XGBoost classifier model to classify the species of each flower.

Before you begin the example, load the Machine Learning sample data.

Use XGB_CLASSIFIER to create the XGBoost classifier model xgb_iris using the iris dataset:

=> SELECT XGB_CLASSIFIER ('xgb_iris', 'iris', 'Species', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width'
    USING PARAMETERS max_ntree=10, max_depth=5, weight_reg=0.1, learning_rate=1);
 XGB_CLASSIFIER
----------------
 Finished
(1 row)

You can then view a summary of the model with GET_MODEL_SUMMARY:


=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='xgb_iris');
                                                                                                                                                                       GET_MODEL_SUMMARY
------------------------------------------------------
===========
call_string
===========
xgb_classifier('public.xgb_iris', 'iris', '"species"', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width'
USING PARAMETERS exclude_columns='', max_ntree=10, max_depth=5, nbins=32, objective=crossentropy,
split_proposal_method=global, epsilon=0.001, learning_rate=1, min_split_loss=0, weight_reg=0.1, sampling_size=1)

=======
details
=======
 predictor  |      type
------------+----------------
sepal_length|float or numeric
sepal_width |float or numeric
petal_length|float or numeric
petal_width |float or numeric


===============
Additional Info
===============
       Name       |Value
------------------+-----
    tree_count    |  10
rejected_row_count|  0
accepted_row_count| 150

(1 row)

Use PREDICT_XGB_CLASSIFIER to apply the classifier to the test data:

=> SELECT PREDICT_XGB_CLASSIFIER (Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
    USING PARAMETERS model_name='xgb_iris') FROM iris1;
 PREDICT_XGB_CLASSIFIER
------------------------
 setosa
 setosa
 setosa
 .
 .
 .
 versicolor
 versicolor
 versicolor
 .
 .
 .
 virginica
 virginica
 virginica
 .
 .
 .

(90 rows)

Use PREDICT_XGB_CLASSIFIER_CLASSES to view the probability of each class:

=> SELECT PREDICT_XGB_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
    USING PARAMETERS model_name='xgb_iris') OVER (PARTITION BEST) FROM iris1;
  predicted  |    probability
------------+-------------------
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     | 0.999911552783011
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 setosa     |   0.9999650465368
 versicolor |  0.99991871763563
 .
 .
 .
(90 rows)