XGBoost for classification
XGBoost (eXtreme Gradient Boosting) is a popular supervised-learning algorithm used for regression and classification on large datasets. It uses sequentially-built shallow decision trees to provide accurate results and a highly-scalable training method that avoids overfitting.
The following XGBoost functions create and perform predictions with a classification model:
Example
This example uses the "iris" dataset, which contains measurements for various parts of a flower, and can be used to predict its species and creates an XGBoost classifier model to classify the species of each flower.
Before you begin the example, load the Machine Learning sample data.-
Use
XGB_CLASSIFIER
to create the XGBoost classifier modelxgb_iris
using theiris
dataset:=> SELECT XGB_CLASSIFIER ('xgb_iris', 'iris', 'Species', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width' USING PARAMETERS max_ntree=10, max_depth=5, weight_reg=0.1, learning_rate=1); XGB_CLASSIFIER ---------------- Finished (1 row)
You can then view a summary of the model with
GET_MODEL_SUMMARY
:=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='xgb_iris'); GET_MODEL_SUMMARY ------------------------------------------------------ =========== call_string =========== xgb_classifier('public.xgb_iris', 'iris', '"species"', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width' USING PARAMETERS exclude_columns='', max_ntree=10, max_depth=5, nbins=32, objective=crossentropy, split_proposal_method=global, epsilon=0.001, learning_rate=1, min_split_loss=0, weight_reg=0.1, sampling_size=1) ======= details ======= predictor | type ------------+---------------- sepal_length|float or numeric sepal_width |float or numeric petal_length|float or numeric petal_width |float or numeric =============== Additional Info =============== Name |Value ------------------+----- tree_count | 10 rejected_row_count| 0 accepted_row_count| 150 (1 row)
-
Use
PREDICT_XGB_CLASSIFIER
to apply the classifier to the test data:=> SELECT PREDICT_XGB_CLASSIFIER (Sepal_Length, Sepal_Width, Petal_Length, Petal_Width USING PARAMETERS model_name='xgb_iris') FROM iris1; PREDICT_XGB_CLASSIFIER ------------------------ setosa setosa setosa . . . versicolor versicolor versicolor . . . virginica virginica virginica . . . (90 rows)
-
Use
PREDICT_XGB_CLASSIFIER_CLASSES
to view the probability of each class:=> SELECT PREDICT_XGB_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width USING PARAMETERS model_name='xgb_iris') OVER (PARTITION BEST) FROM iris1; predicted | probability ------------+------------------- setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.999911552783011 setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.9999650465368 setosa | 0.9999650465368 versicolor | 0.99991871763563 . . . (90 rows)