IFOREST
Trains and returns an isolation forest (iForest) model. After you train the model, you can use the APPLY_IFOREST function to predict outliers in an input relation.
For more information about how the iForest algorithm works, see Isolation Forest.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
IFOREST( 'model-name', 'input-relation', 'input-columns' [ USING PARAMETERS param=value[,...] ] )
Arguments
- model-name
- Identifies the model to create, where model-nameconforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.
- input-relation
- The table or view that contains the input data for IFOREST.
- input-columns
- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Columns must be of types CHAR, VARCHAR, BOOL, INT, or FLOAT.
Columns of types CHAR, VARCHAR, and BOOL are treated as categorical features; all others are treated as numeric features. 
Parameters
- exclude_columns
- Comma-separated list of column names from input-columnsto exclude from processing.Default: Empty string ('') 
- ntree
- Integer in the range [1, 1000], specifies the number of trees in the forest.
Default: 100 
- sampling_size
- Float in the range (0.0, 1.0], specifies the portion of the input data set that is randomly picked, without replacement, for training each tree.
Default: 0.632 
- col_sample_by_tree
- Float in the range (0.0, 1.0], specifies the fraction of columns that are randomly picked for training each tree.
Default: 1.0 
- max_depth
- Integer in the range [1, 100], specifies the maximum depth for growing each tree.
Default: 10 
- nbins
- Integer in the range [2, 1000], specifies the number of bins used to discretize continuous features.
Default: 32 
Model Attributes
- details
- Details about the function's predictor columns, including:
- 
predictor: Names of the predictors in the same order specified when training the model.
- 
type: Types of the predictors in the same order as their names inpredictor.
 
- 
- tree_count
- Number of trees in the model.
- rejected_row_count
- Number of rows in input-relationthat were skipped because they contained an invalid value.
- accepted_row_count
- Total number of rows in input-relationminusrejected_row_count.
- call_string
- Value of all input arguments that were specified at the time the function was called.
Privileges
Non-superusers:
- 
CREATE privileges on the schema where the model is created 
- 
SELECT privileges on the input relation 
Examples
In the following example, the input data to the function contains columns of type INT, VARCHAR, and FLOAT:
=> SELECT IFOREST('baseball_anomalies','baseball','team, hr, hits, avg, salary' USING PARAMETERS ntree=75, sampling_size=0.7,
max_depth=15);
IFOREST
----------
Finished
(1 row)
You can verify that all the input columns were read in correctly by calling GET_MODEL_SUMMARY and checking the details section:
=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='baseball_anomalies');
GET_MODEL_SUMMARY
-------------------------------------------------------------------------------------------------------------------------------------
===========
call_string
===========
SELECT iforest('public.baseball_anomalies', 'baseball', 'team, hr, hits, avg, salary' USING PARAMETERS exclude_columns='', ntree=75,
sampling_size=0.7, col_sample_by_tree=1, max_depth=15, nbins=32);
=======
details
=======
predictor|      type
---------+----------------
  team   |char or varchar
   hr    |      int
  hits   |      int
   avg   |float or numeric
 salary  |float or numeric
===============
Additional Info
===============
       Name       |Value
------------------+-----
    tree_count    | 75
rejected_row_count|  0
accepted_row_count|1000
(1 row)