LIFT_TABLE
Returns a table that compares the predictive quality of a machine learning model. This function is also known as a lift chart
.
Syntax
LIFT_TABLE ( targets, probabilities
[ USING PARAMETERS [num_bins = num-bins] [, main_class = class-name ] ] )
OVER()
Arguments
targets
- An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:
-
INTEGER: Uses the input column as containing the true value of the response variable.
-
BOOLEAN: Resolves Yes to 1, 0 to No.
-
CHAR/VARCHAR: Resolves the value specified by parameter
main_class
to 1, all other values to 0.
Note
If the input column is of data type INTEGER or BOOLEAN, the function ignores parametermain_class
. -
probabilities
- A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if
targets
is of type INTEGER.
Parameters
num_bins
An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each
num-bin
+ 1 point.Default: 100
main_class
Used only if
targets
is of type CHAR/VARCHAR, specifies the class to associate with theprobabilities
argument.
Examples
Execute LIFT_TABLE
on an input table mtcars
.
=> SELECT LIFT_TABLE(obs::int, prob::float USING PARAMETERS num_bins=2) OVER()
FROM (SELECT am AS obs, PREDICT_LOGISTIC_REG(mpg, cyl, disp, drat, wt, qsec, vs, gear, carb
USING PARAMETERS model_name='myLogisticRegModel',
type='probability') AS prob
FROM mtcars) AS prediction_output;
decision_boundary | positive_prediction_ratio | lift | comment
-------------------+---------------------------+------------------+---------------------------------------------
1 | 0 | NaN |
0.5 | 0.40625 | 2.46153846153846 |
0 | 1 | 1 | Of 32 rows, 32 were used and 0 were ignored
(3 rows)
The first column, decision_boundary
, indicates the cut-off point for whether to classify a response as 0 or 1. For instance, for each row, if prob
is greater than or equal to decision_boundary
, the response is classified as 1. If prob
is less than decision_boundary
, the response is classified as 0.
The second column, positive_prediction_ratio
, shows the percentage of samples in class 1 that the function classified correctly using the corresponding decision_boundary
value.
For the third column, lift
, the function divides the positive_prediction_ratio
by the percentage of rows correctly or incorrectly classified as class 1.