APPLY_ONE_HOT_ENCODER
A user-defined transform function (UDTF) that loads the one hot encoder model and writes out a table that contains the encoded columns.
Syntax
APPLY_ONE_HOT_ENCODER( input-columns
USING PARAMETERS model_name = 'model-name'
[, drop_first = 'is-first']
[, ignore_null = 'ignore']
[, separator = 'separator-character']
[, column_naming = 'name-output']
[, null_column_name = 'null-column-name'] )
Arguments
input-columns- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns.
Parameters
model_nameName of the model (case-insensitive).
, stores the categories and their corresponding levels.drop_first- Boolean value, one of the following:
-
true(default): Treat the first level of the categorical variable as the reference level. -
false: Every level of the categorical variable has a corresponding column in the output view
-
ignore_null- Boolean value, one of the following:
-
true(default): Null values set all corresponding one-hot binary columns to null. -
false: Null values ininput-columnsare treated as a categorical level
-
separator- The character that separates the input variable name and the indicator variable level in the output table.To avoid using any separator, set this parameter to null value.
Default: Underscore (
_) column_naming- Appends categorical levels to column names according to the specified method:
-
indices(default): Uses integer indices to represent categorical levels. -
values/values_relaxed: Both methods use categorical level names. If duplicate column names occur, the function attempts to disambiguate them by appending_n, wherenis a zero-based integer index (_0,_1,...).If the function cannot produce unique column names , it handles this according to the chosen method:
-
valuesreturns an error. -
values_relaxedreverts to using indices.
-
Important
The following column naming rules apply if
column_namingis set tovaluesorvalues_relaxed:-
Input column names with more than 128 characters are truncated.
-
Column names can contain special characters.
-
If parameter
ignore_nullis set to true,APPLY_ONE_HOT_ENCODERconstructs the column name from the value set in parameternull_column_name. If this parameter is omitted, the stringnullis used.
-
null_column_name- The string used in naming the indicator column for null values, used only if
ignore_nullis set tofalseandcolumn_namingis set tovaluesorvalues_relaxed.Default:
null
Note
Note: If an input row contains a level not stored in the model, the output row columns corresponding to that categorical level are returned as null values.Examples
=> SELECT APPLY_ONE_HOT_ENCODER(cyl USING PARAMETERS model_name='one_hot_encoder_model',
drop_first='true', ignore_null='false') FROM mtcars;
cyl | cyl_1 | cyl_2
----+-------+-------
8 | 0 | 1
4 | 0 | 0
4 | 0 | 0
8 | 0 | 1
8 | 0 | 1
8 | 0 | 1
4 | 0 | 0
8 | 0 | 1
8 | 0 | 1
4 | 0 | 0
8 | 0 | 1
6 | 1 | 0
4 | 0 | 0
4 | 0 | 0
6 | 1 | 0
6 | 1 | 0
8 | 0 | 1
8 | 0 | 1
4 | 0 | 0
4 | 0 | 0
6 | 1 | 0
8 | 0 | 1
8 | 0 | 1
6 | 1 | 0
4 | 0 | 0
8 | 0 | 1
8 | 0 | 1
8 | 0 | 1
6 | 1 | 0
6 | 1 | 0
4 | 0 | 0
4 | 0 | 0
(32 rows)