CONFUSION_MATRIX
Computes the confusion matrix of a table with observed and predicted values of a response variable. CONFUSION_MATRIX
produces a table with the following dimensions:
-
Rows: Number of classes
-
Columns: Number of classes + 2
Syntax
CONFUSION_MATRIX ( targets, predictions [ USING PARAMETERS num_classes = num-classes ] OVER()
Arguments
targets
- An input column that contains the true values of the response variable.
predictions
- An input column that contains the predicted class labels.
Arguments targets
and predictions
must be set to input columns of the same data type, one of the following: INTEGER, BOOLEAN, or CHAR/VARCHAR. Depending on their data type, these columns identify classes as follows:
-
INTEGER: Zero-based consecutive integers between 0 and (
num-classes
-1) inclusive, wherenum-classes
is the number of classes. For example, given the following input column values—{0, 1, 2, 3, 4
}—Vertica assumes five classes.Note
If input column values are not consecutive, Vertica interpolates the missing values. Thus, given the following input values—{0, 1, 3, 5, 6,}
— Vertica assumes seven classes. -
BOOLEAN: Yes or No
-
CHAR/VARCHAR: Class names. If the input columns are of type CHAR/VARCHAR columns, you must also set parameter
num_classes
to the number of classes.Note
Vertica computes the number of classes as the union of values in both input columns. For example, given the following sets of values in the
targets
andpredictions
input columns, Vertica counts four classes:{'milk', 'soy milk', 'cream'} {'soy milk', 'almond milk'}
Parameters
num_classes
An integer > 1, specifies the number of classes to pass to the function.
You must set this parameter if the specified input columns are of type CHAR/VARCHAR. Otherwise, the function processes this parameter according to the column data types:
-
INTEGER: By default set to 2, you must set this parameter correctly if the number of classes is any other value.
-
BOOLEAN: By default set to 2, cannot be set to any other value.
-
Examples
This example computes the confusion matrix for a logistic regression model that classifies cars in the mtcars
data set as automatic or manual transmission. Observed values are in input column obs
, while predicted values are in input column pred
. Because this is a binary classification problem, all values are either 0 or 1.
In the table returned, all 19 cars with a value of 0 in column am
are correctly predicted by PREDICT_LOGISTIC_REGRESSION
as having a value of 0. Of the 13 cars with a value of 1 in column am
, 12 are correctly predicted to have a value of 1, while 1 car is incorrectly classified as having a value of 0:
=> SELECT CONFUSION_MATRIX(obs::int, pred::int USING PARAMETERS num_classes=2) OVER()
FROM (SELECT am AS obs, PREDICT_LOGISTIC_REG(mpg, cyl, disp,drat, wt, qsec, vs, gear, carb
USING PARAMETERS model_name='myLogisticRegModel')AS PRED
FROM mtcars) AS prediction_output;
actual_class | predicted_0 | predicted_1 | comment
-------------+-------------+-------------+------------------------------------------
0 | 19 | 0 |
1 | 0 | 13 | Of 32 rows, 32 were used and 0 were ignored
(2 rows)