ROC
Returns a table that displays the points on a receiver operating characteristic curve. The ROC
function tells you the accuracy of a classification model as you raise the discrimination threshold for the model.
Syntax
ROC ( targets, probabilities
[ USING PARAMETERS
[num_bins = num-bins]
[, AUC = output]
[, main_class = class-name ] ) ] )
OVER()
Arguments
targets
- An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:
-
INTEGER: Uses the input column as containing the true value of the response variable.
-
BOOLEAN: Resolves Yes to 1, 0 to No.
-
CHAR/VARCHAR: Resolves the value specified by parameter
main_class
to 1, all other values to 0.
Note
If the input column is of data type INTEGER or BOOLEAN, the function ignores parametermain_class
. -
probabilities
- A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if
targets
is of type INTEGER.
Parameters
num_bins
An integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each
num-bin
+ 1 point.Default: 100
Greater values result in more precise approximations of the AUC.
AUC
- A Boolean value that specifies whether to output the area under the curve (AUC) value.
Default: True
main_class
Used only if
targets
is of type CHAR/VARCHAR, specifies the class to associate with theprobabilities
argument.
Examples
Execute ROC
on input table mtcars
. Observed class labels are in column obs
, predicted class labels are in column prob
:
=> SELECT ROC(obs::int, prob::float USING PARAMETERS num_bins=5, AUC = True) OVER()
FROM (SELECT am AS obs,
PREDICT_LOGISTIC_REG (mpg, cyl, disp, drat, wt, qsec, vs, gear, carb
USING PARAMETERS
model_name='myLogisticRegModel', type='probability') AS prob
FROM mtcars) AS prediction_output;
decision_boundary | false_positive_rate | true_positive_rate | AUC |comment
-------------------+---------------------+--------------------+-----+-----------------------------------
0 | 1 | 1 | |
0.5 | 0 | 1 | |
1 | 0 | 0 | 1 | Of 32 rows,32 were used and 0 were ignoreded
(3 rows)
The function returns a table with the following results:
-
decision_boundary
indicates the cut-off point for whether to classify a response as 0 or 1. In each row, ifprob
is equal to or greater thandecision_boundary
, the response is classified as 1. Ifprob
is less thandecision_boundary
, the response is classified as 0. -
false_positive_rate
shows the percentage of false positives (when 0 is classified as 1) in the correspondingdecision_boundary
. -
true_positive_rate
shows the percentage of rows that were classified as 1 and also belong to class 1.