APPLY_ONE_HOT_ENCODER

A user-defined transform function (UDTF) that loads the one hot encoder model and writes out a table that contains the encoded columns.

A user-defined transform function (UDTF) that loads the one hot encoder model and writes out a table that contains the encoded columns.

Syntax

APPLY_ONE_HOT_ENCODER( input-columns
        USING PARAMETERS model_name = 'model-name'
            [, drop_first = 'is-first']
            [, ignore_null = 'ignore']
            [, separator = 'separator-character']
            [, column_naming = 'name-output']
            [, null_column_name = 'null-column-name'] )

Arguments

input-columns
Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns.

Parameters

model_name

Name of the model (case-insensitive).

, stores the categories and their corresponding levels.
drop_first
Boolean value, one of the following:
  • true (default): Treat the first level of the categorical variable as the reference level.

  • false: Every level of the categorical variable has a corresponding column in the output view

ignore_null
Boolean value, one of the following:
  • true (default): Null values set all corresponding one-hot binary columns to null.

  • false: Null values in input-columns are treated as a categorical level

separator
The character that separates the input variable name and the indicator variable level in the output table.To avoid using any separator, set this parameter to null value.

Default: Underscore (_)

column_naming
Appends categorical levels to column names according to the specified method:
  • indices (default): Uses integer indices to represent categorical levels.

  • values/values_relaxed: Both methods use categorical level names. If duplicate column names occur, the function attempts to disambiguate them by appending _n, where n is a zero-based integer index (_0, _1,...).

    If the function cannot produce unique column names , it handles this according to the chosen method:

    • values returns an error.

    • values_relaxed reverts to using indices.

null_column_name
The string used in naming the indicator column for null values, used only if ignore_null is set to false and column_naming is set to values or values_relaxed.

Default: null

Examples

=> SELECT APPLY_ONE_HOT_ENCODER(cyl USING PARAMETERS model_name='one_hot_encoder_model',
drop_first='true', ignore_null='false') FROM mtcars;
cyl | cyl_1 | cyl_2
----+-------+-------
8   |     0 |     1
4   |     0 |     0
4   |     0 |     0
8   |     0 |     1
8   |     0 |     1
8   |     0 |     1
4   |     0 |     0
8   |     0 |     1
8   |     0 |     1
4   |     0 |     0
8   |     0 |     1
6   |     1 |     0
4   |     0 |     0
4   |     0 |     0
6   |     1 |     0
6   |     1 |     0
8   |     0 |     1
8   |     0 |     1
4   |     0 |     0
4   |     0 |     0
6   |     1 |     0
8   |     0 |     1
8   |     0 |     1
6   |     1 |     0
4   |     0 |     0
8   |     0 |     1
8   |     0 |     1
8   |     0 |     1
6   |     1 |     0
6   |     1 |     0
4   |     0 |     0
4   |     0 |     0
(32 rows)

See also