PRC
Returns a table that displays the points on a receiver precision recall (PR) curve.
Syntax
PRC ( targets, probabilities
[ USING PARAMETERS
[num_bins = num-bins]
[, f1_score = return-score ]
[, main_class = class-name ] )
OVER()
Arguments
targets- An input column that contains the true values of the response variable, one of the following data types: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on the column data type, the function processes column data as follows:
-
INTEGER: Uses the input column as containing the true value of the response variable.
-
BOOLEAN: Resolves Yes to 1, 0 to No.
-
CHAR/VARCHAR: Resolves the value specified by parameter
main_classto 1, all other values to 0.
Note
If the input column is of data type INTEGER or BOOLEAN, the function ignores parametermain_class. -
probabilities- A FLOAT input column that contains the predicted probability of response being the main class, set to 1 if
targetsis of type INTEGER.
Parameters
num_binsAn integer value that determines the number of decision boundaries. Decision boundaries are set at equally spaced intervals between 0 and 1, inclusive. The function computes the table at each
num-bin+ 1 point.Default: 100
f1_score- A Boolean that specifies whether to return a column that contains the f1 score—the harmonic average of the precision and recall measures, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
Default: false
main_classUsed only if
targetsis of type CHAR/VARCHAR, specifies the class to associate with theprobabilitiesargument.
Examples
Execute the PRC function on an input table named mtcars. The response variables appear in the column obs, while the prediction variables appear in column pred.
=> SELECT PRC(obs::int, prob::float USING PARAMETERS num_bins=2, f1_score=true) OVER()
FROM (SELECT am AS obs,
PREDICT_LOGISTIC_REG (mpg, cyl, disp, drat, wt, qsec, vs, gear, carb
USING PARAMETERS model_name='myLogisticRegModel',
type='probability') AS prob
FROM mtcars) AS prediction_output;
decision_boundary | recall | precision | f1_score | comment
------------------+--------+-----------+-------------------+--------------------------------------------
0 | 1 | 0.40625 | 0.577777777777778 |
0.5 | 1 | 1 | 1 | Of 32 rows, 32 were used and 0 were ignored
(2 rows)
The first column, decision_boundary, indicates the cut-off point for whether to classify a response as 0 or 1. For example, in each row, if the probability is equal to or greater than decision_boundary, the response is classified as 1. If the probability is less than decision_boundary, the response is classified as 0.