XGB_PREDICTOR_IMPORTANCE

Measures the importance of the predictors in an XGBoost model.

Measures the importance of the predictors in an XGBoost model. The function outputs three measures of importance for each predictor:

  • frequency: relative number of times the model uses a predictor to split the data.

  • total_gain: relative contribution of a predictor to the model based on the total information gain across a predictor's splits. A higher value means more predictive importance.

  • avg_gain: relative contribution of a predictor to the model based on the average information gain across a predictor's splits.

The sum of each importance measure is normalized to one across all predictors.

Syntax

XGB_PREDICTOR_IMPORTANCE ( USING PARAMETERS param=value[,...] )

Parameters

model_name
Name of the model, which must be of type xgb_classifier or xgb_regressor.
tree_id
Integer in the range [0, n-1], where n is the number of trees in model_name, that specifies the tree to process. If you omit this parameter, the function uses all trees in the model to measure predictor importance values.

Privileges

Non-superusers: USAGE privileges on the model

Examples

The following example measures the importance of the predictors in the model 'xgb_iris', an XGBoost classifier model, across all trees:

=> SELECT XGB_PREDICTOR_IMPORTANCE( USING PARAMETERS model_name = 'xgb_iris' );
 predictor_index | predictor_name |     frequency     |     total_gain     |      avg_gain
-----------------+----------------+-------------------+--------------------+--------------------
               0 | sepal_length   |  0.15384615957737 |    0.0183021749937 | 0.0370849960701401
               1 | sepal_width    | 0.215384617447853 | 0.0154729501420881 | 0.0223944615251752
               2 | petal_length   | 0.369230777025223 |  0.607349886817728 |  0.512770753876444
               3 | petal_width    | 0.261538475751877 |  0.358874988046484 |  0.427749788528241
(4 rows)

To sort the predictors by importance values, you can use a nested query with an ORDER BY clause. The following sorts the model predictors by descending avg_gain:

=> SELECT * FROM (SELECT XGB_PREDICTOR_IMPORTANCE( USING PARAMETERS model_name = 'xgb_iris' )) AS importances ORDER BY avg_gain DESC;
 predictor_index | predictor_name |     frequency     |     total_gain     |      avg_gain
-----------------+----------------+-------------------+--------------------+--------------------
               2 | petal_length   | 0.369230777025223 |  0.607349886817728 |  0.512770753876444
               3 | petal_width    | 0.261538475751877 |  0.358874988046484 |  0.427749788528241
               0 | sepal_length   |  0.15384615957737 |    0.0183021749937 | 0.0370849960701401
               1 | sepal_width    | 0.215384617447853 | 0.0154729501420881 | 0.0223944615251752
(4 rows)

See also