APPLY_IFOREST
Applies an isolation forest (iForest) model to an input relation. For each input row, the function returns an output row with two fields:
anomaly_score
: A float value that represents the average path length across all trees in the model normalized by the training sample size.is_anomaly
: A Boolean value that indicates whether the input row is an anomaly. This value is true whenanomaly_score
is equal to or larger than a given threshold; otherwise, it's false.
Syntax
APPLY_IFOREST( input-columns USING PARAMETERS param=value[,...] )
Arguments
input-columns
- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Column types must match the types of the predictors in
model_name
.
Parameters
model_name
Name of the model (case-insensitive).
threshold
- Optional. Float in the range (0.0, 1.0), specifies the threshold that determines if a data point is an anomaly. If the
anomaly_score
for a data point is equal to or larger than the value ofthreshold
, the data point is marked as an outlier.Alternatively, you can specify a
contamination
value that sets a threshold where the percentage of training data points labeled as outliers is approximately equal to the value ofcontamination
. You cannot set bothcontamination
andthreshold
in the same function call.Default: 0.7
match_by_pos
- Optional. Boolean value that specifies how input columns are matched to model columns:
-
false
: Match by name. -
true
: Match by the position of columns in the input columns list.
Default: false
-
contamination
- Optional. Float in the range (0.0, 1.0), the approximate ratio of data points in the training data that are labeled as outliers. The function calculates a threshold based on this
contamination
value. If you do not set this parameter, the function marks outliers using the specified or defaultthreshold
value.You cannot set both
contamination
andthreshold
in the same function call.
Privileges
Non-superusers:
-
USAGE privileges on the model
-
SELECT privileges on the input relation
Examples
The following example demonstrates how different threshold
values can affect outlier detection on an input relation:
=> SELECT * FROM (SELECT first_name, last_name, APPLY_IFOREST(team, hr, hits, avg, salary USING PARAMETERS model_name='baseball_anomalies',
threshold=0.75) AS predictions FROM baseball) AS outliers WHERE predictions.is_anomaly IS true;
first_name | last_name | predictions
------------+-----------+-------------------------------------------------------
Jacqueline | Richards | {"anomaly_score":0.777757463074347,"is_anomaly":true}
(1 row)
=> SELECT * FROM (SELECT first_name, last_name, APPLY_IFOREST(team, hr, hits, avg, salary USING PARAMETERS model_name='baseball_anomalies',
threshold=0.55) AS predictions FROM baseball) AS outliers WHERE predictions.is_anomaly IS true;
first_name | last_name | predictions
------------+-----------+--------------------------------------------------------
Jacqueline | Richards | {"anomaly_score":0.777757463074347,"is_anomaly":true}
Debra | Hall | {"anomaly_score":0.5714649698133808,"is_anomaly":true}
Gerald | Fuller | {"anomaly_score":0.5980549926114661,"is_anomaly":true}
(3 rows)
You can also use different contamination
values to alter the outlier threshold:
=> SELECT * FROM (SELECT first_name, last_name, APPLY_IFOREST(team, hr, hits, avg, salary USING PARAMETERS model_name='baseball_anomalies',
contamination = 0.1) AS predictions FROM baseball) AS outliers WHERE predictions.is_anomaly IS true;
first_name | last_name | predictions
------------+-----------+--------------------------------------------------------
Marie | Fields | {"anomaly_score":0.5307715717521868,"is_anomaly":true}
Jacqueline | Richards | {"anomaly_score":0.777757463074347,"is_anomaly":true}
Debra | Hall | {"anomaly_score":0.5714649698133808,"is_anomaly":true}
Gerald | Fuller | {"anomaly_score":0.5980549926114661,"is_anomaly":true}
(4 rows)
=> SELECT * FROM (SELECT first_name, last_name, APPLY_IFOREST(team, hr, hits, avg, salary USING PARAMETERS model_name='baseball_anomalies',
contamination = 0.01) AS predictions FROM baseball) AS outliers WHERE predictions.is_anomaly IS true;
first_name | last_name | predictions
------------+-----------+--------------------------------------------------------
Jacqueline | Richards | {"anomaly_score":0.777757463074347,"is_anomaly":true}
Debra | Hall | {"anomaly_score":0.5714649698133808,"is_anomaly":true}
Gerald | Fuller | {"anomaly_score":0.5980549926114661,"is_anomaly":true}
(3 rows)