APPLY_ONE_HOT_ENCODER
A user-defined transform function (UDTF) that loads the one hot encoder model and writes out a table that contains the encoded columns.
Syntax
APPLY_ONE_HOT_ENCODER( input-columns
USING PARAMETERS model_name = 'model-name'
[, drop_first = 'is-first']
[, ignore_null = 'ignore']
[, separator = 'separator-character']
[, column_naming = 'name-output']
[, null_column_name = 'null-column-name'] )
Arguments
input-columns
- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns.
Parameters
model_name
Name of the model (case-insensitive).
, stores the categories and their corresponding levels.drop_first
- Boolean value, one of the following:
-
true
(default): Treat the first level of the categorical variable as the reference level. -
false
: Every level of the categorical variable has a corresponding column in the output view
-
ignore_null
- Boolean value, one of the following:
-
true
(default): Null values set all corresponding one-hot binary columns to null. -
false
: Null values ininput-columns
are treated as a categorical level
-
separator
- The character that separates the input variable name and the indicator variable level in the output table.To avoid using any separator, set this parameter to null value.
Default: Underscore (
_
) column_naming
- Appends categorical levels to column names according to the specified method:
-
indices
(default): Uses integer indices to represent categorical levels. -
values
/values_relaxed
: Both methods use categorical level names. If duplicate column names occur, the function attempts to disambiguate them by appending_
n
, wheren
is a zero-based integer index (_0
,_1
,...).If the function cannot produce unique column names , it handles this according to the chosen method:
-
values
returns an error. -
values_relaxed
reverts to using indices.
-
Important
The following column naming rules apply if
column_naming
is set tovalues
orvalues_relaxed
:-
Input column names with more than 128 characters are truncated.
-
Column names can contain special characters.
-
If parameter
ignore_null
is set to true,APPLY_ONE_HOT_ENCODER
constructs the column name from the value set in parameternull_column_name
. If this parameter is omitted, the stringnull
is used.
-
null_column_name
- The string used in naming the indicator column for null values, used only if
ignore_null
is set tofalse
andcolumn_naming
is set tovalues
orvalues_relaxed
.Default:
null
Note
Note: If an input row contains a level not stored in the model, the output row columns corresponding to that categorical level are returned as null values.Examples
=> SELECT APPLY_ONE_HOT_ENCODER(cyl USING PARAMETERS model_name='one_hot_encoder_model',
drop_first='true', ignore_null='false') FROM mtcars;
cyl | cyl_1 | cyl_2
----+-------+-------
8 | 0 | 1
4 | 0 | 0
4 | 0 | 0
8 | 0 | 1
8 | 0 | 1
8 | 0 | 1
4 | 0 | 0
8 | 0 | 1
8 | 0 | 1
4 | 0 | 0
8 | 0 | 1
6 | 1 | 0
4 | 0 | 0
4 | 0 | 0
6 | 1 | 0
6 | 1 | 0
8 | 0 | 1
8 | 0 | 1
4 | 0 | 0
4 | 0 | 0
6 | 1 | 0
8 | 0 | 1
8 | 0 | 1
6 | 1 | 0
4 | 0 | 0
8 | 0 | 1
8 | 0 | 1
8 | 0 | 1
6 | 1 | 0
6 | 1 | 0
4 | 0 | 0
4 | 0 | 0
(32 rows)