XGBoost for classification

The following XGBoost functions create and perform predictions with a classification model:.

XGBoost (eXtreme Gradient Boosting) is a popular supervised-learning algorithm used for regression and classification on large datasets. It uses sequentially-built shallow decision trees to provide accurate results and a highly-scalable training method that avoids overfitting.

The following XGBoost functions create and perform predictions with a classification model:

Example

This example uses the "iris" dataset, which contains measurements for various parts of a flower, and can be used to predict its species and creates an XGBoost classifier model to classify the species of each flower.

Before you begin the example, load the Machine Learning sample data.
  1. Use XGB_CLASSIFIER to create the XGBoost classifier model xgb_iris using the iris dataset:

    => SELECT XGB_CLASSIFIER ('xgb_iris', 'iris', 'Species', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width'
        USING PARAMETERS max_ntree=10, max_depth=5, weight_reg=0.1, learning_rate=1);
     XGB_CLASSIFIER
    ----------------
     Finished
    (1 row)
    

    You can then view a summary of the model with GET_MODEL_SUMMARY:

    
    => SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='xgb_iris');
                                                                                                                                                                           GET_MODEL_SUMMARY
    ------------------------------------------------------
    ===========
    call_string
    ===========
    xgb_classifier('public.xgb_iris', 'iris', '"species"', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width'
    USING PARAMETERS exclude_columns='', max_ntree=10, max_depth=5, nbins=32, objective=crossentropy,
    split_proposal_method=global, epsilon=0.001, learning_rate=1, min_split_loss=0, weight_reg=0.1, sampling_size=1)
    
    =======
    details
    =======
     predictor  |      type
    ------------+----------------
    sepal_length|float or numeric
    sepal_width |float or numeric
    petal_length|float or numeric
    petal_width |float or numeric
    
    
    ===============
    Additional Info
    ===============
           Name       |Value
    ------------------+-----
        tree_count    |  10
    rejected_row_count|  0
    accepted_row_count| 150
    
    (1 row)
    
  2. Use PREDICT_XGB_CLASSIFIER to apply the classifier to the test data:

    => SELECT PREDICT_XGB_CLASSIFIER (Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
        USING PARAMETERS model_name='xgb_iris') FROM iris1;
     PREDICT_XGB_CLASSIFIER
    ------------------------
     setosa
     setosa
     setosa
     .
     .
     .
     versicolor
     versicolor
     versicolor
     .
     .
     .
     virginica
     virginica
     virginica
     .
     .
     .
    
    (90 rows)
    
  3. Use PREDICT_XGB_CLASSIFIER_CLASSES to view the probability of each class:

    => SELECT PREDICT_XGB_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
        USING PARAMETERS model_name='xgb_iris') OVER (PARTITION BEST) FROM iris1;
      predicted  |    probability
    ------------+-------------------
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     | 0.999911552783011
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     setosa     |   0.9999650465368
     versicolor |  0.99991871763563
     .
     .
     .
    (90 rows)