DETECT_OUTLIERS
Returns the outliers in a data set based on the outlier threshold.
	Returns the outliers in a data set based on the outlier threshold. The output is a table that contains the outliers. DETECT_OUTLIERS uses the detection method robust_szcore to normalize each input column. The function then identifies as outliers all rows that contain a normalized value greater than the default or specified threshold.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
DETECT_OUTLIERS ( 'output-table', 'input-relation','input-columns', 'detection-method'
        [ USING PARAMETERS
              [outlier_threshold = threshold]
              [, exclude_columns = 'excluded-columns']
              [, partition_columns = 'partition-columns'] ] )
Arguments
- output-table
- The name of the table where Vertica saves rows that are outliers along the chosen input_columns. All columns are present in this table.
- input-relation
- The table or view that contains outlier data. If the input relation is defined in Hive, use 
SYNC_WITH_HCATALOG_SCHEMAto sync thehcatalogschema, and then run the machine learning function.
- input-columns
- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be of type numeric.
- detection-method
- The outlier detection method to use, set to robust_zscore.
Parameters
- outlier_threshold
- The minimum normalized value in a row that is used to identify that row as an outlier.
Default: 3.0 
- exclude_columns
- Comma-separated list of column names from - input-columnsto exclude from processing.
- partition_columns
- Comma-separated list of column names from the input table or view that defines the partitions. DETECT_OUTLIERSdetects outliers among each partition separately.Default: empty list 
Privileges
Non-superusers:
- 
SELECT privileges on the input relation 
- 
CREATE privileges on the output table 
Examples
The following example shows how to use DETECT_OUTLIERS:
=> CREATE TABLE baseball_roster (id identity, last_name varchar(30), hr int, avg float);
CREATE TABLE
=> COPY baseball_roster FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Polo|7|.233
>> Gloss|45|.170
>> Gus|12|.345
>> Gee|1|.125
>> Laus|3|.095
>> Hilltop|16|.222
>> Wicker|78|.333
>> Scooter|0|.121
>> Hank|999999|.8888
>> Popup|35|.378
>> \.
=> SELECT * FROM baseball_roster;
 id | last_name |   hr   |  avg
----+-----------+--------+--------
  3 | Gus       |     12 |  0.345
  4 | Gee       |      1 |  0.125
  6 | Hilltop   |     16 |  0.222
 10 | Popup     |     35 |  0.378
  1 | Polo      |      7 |  0.233
  7 | Wicker    |     78 |  0.333
  9 | Hank      | 999999 | 0.8888
  2 | Gloss     |     45 |   0.17
  5 | Laus      |      3 |  0.095
  8 | Scooter   |      0 |  0.121
(10 rows)
=> SELECT DETECT_OUTLIERS('baseball_outliers', 'baseball_roster', 'id, hr, avg', 'robust_zscore' USING PARAMETERS
outlier_threshold=3.0);
     DETECT_OUTLIERS
--------------------------
 Detected 2 outliers
(1 row)
=> SELECT * FROM baseball_outliers;
 id | last_name | hr         | avg
----+-----------+------------+-------------
  7 | Wicker    |         78 |       0.333
  9 | Hank      |     999999 |      0.8888
(2 rows)