IMPUTE
Imputes missing values in a data set with either the mean or the mode, based on observed values for a variable in each column. This function supports numeric and categorical data types.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
IMPUTE( 'output-view', 'input-relation', 'input-columns', 'method'
[ USING PARAMETERS [exclude_columns = 'excluded-columns'] [, partition_columns = 'partition-columns'] ] )
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Arguments
output-view
- Name of the view that shows the input table with imputed values in place of missing values. In this view, rows without missing values are kept intact while the rows with missing values are modified according to the specified method.
input-relation
- The table or view that contains the data for missing-value imputation. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMA
to sync thehcatalog
schema, and then run the machine learning function. input-columns
- Comma-separated list of input columns where missing values will be replaced, or asterisk (*) to specify all columns. All columns must be of type numeric or BOOLEAN.
method
- The method to compute the missing value replacements, one of the following:
-
mean
: The missing values in each column will be replaced by the mean of that column. This method can be used for numeric data only. -
mode
: The missing values in each column will be replaced by the most frequent value in that column. This method can be used for categorical data only.
-
Parameters
exclude_columns
Comma-separated list of column names from
input-columns
to exclude from processing.partition_columns
- Comma-separated list of column names from the input relation that defines the partitions.
Privileges
Non-superusers:
-
SELECT privileges on the input relation
-
CREATE privileges on the output view schema
Examples
Execute IMPUTE
on the small_input_impute
table, specifying the mean method:
=> SELECT impute('output_view','small_input_impute', 'pid, x1,x2,x3,x4','mean'
USING PARAMETERS exclude_columns='pid');
impute
--------------------------
Finished in 1 iteration
(1 row)
Execute IMPUTE
, specifying the mode method:
=> SELECT impute('output_view3','small_input_impute', 'pid, x5,x6','mode' USING PARAMETERS exclude_columns='pid');
impute
--------------------------
Finished in 1 iteration
(1 row)