LIFT_TABLE

Returns a table that compares the predictive quality of a machine learning model.

Returns a table that compares the predictive quality of a machine learning model. This function is also known as a lift chart.

Syntax

LIFT_TABLE ( targets, probabilities
        [ USING PARAMETERS [num_bins = num-bins] [, main_class = class-name ] ] )
OVER()

Arguments

targets

An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:

INTEGER: Uses the input column as containing the true value of the response variable.
BOOLEAN: Resolves Yes to 1, 0 to No.
CHAR/VARCHAR: Resolves the value specified by parameter main_class to 1, all other values to 0.

Note

If the input column is of data type INTEGER or BOOLEAN, the function ignores parameter main_class.

probabilities

A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if targets is of type INTEGER.

Parameters

num_bins

An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each num-bin + 1 point.

Default: 100

main_class

Used only if targets is of type CHAR/VARCHAR, specifies the class to associate with the probabilities argument.

Examples

Execute LIFT_TABLE on an input table mtcars.

=> SELECT LIFT_TABLE(obs::int, prob::float USING PARAMETERS num_bins=2) OVER()
    FROM (SELECT am AS obs, PREDICT_LOGISTIC_REG(mpg, cyl, disp, drat, wt, qsec, vs, gear, carb
                                                    USING PARAMETERS model_name='myLogisticRegModel',
                                                    type='probability') AS prob
             FROM mtcars) AS prediction_output;
 decision_boundary | positive_prediction_ratio |       lift       |                   comment
-------------------+---------------------------+------------------+---------------------------------------------
                 1 |                         0 |              NaN |
               0.5 |                   0.40625 | 2.46153846153846 |
                 0 |                         1 |                1 | Of 32 rows, 32 were used and 0 were ignored
(3 rows)

The first column, decision_boundary, indicates the cut-off point for whether to classify a response as 0 or 1. For instance, for each row, if prob is greater than or equal to decision_boundary, the response is classified as 1. If prob is less than decision_boundary, the response is classified as 0.

The second column, positive_prediction_ratio, shows the percentage of samples in class 1 that the function classified correctly using the corresponding decision_boundary value.

For the third column, lift, the function divides the positive_prediction_ratio by the percentage of rows correctly or incorrectly classified as class 1.