ONE_HOT_ENCODER_FIT

Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.

Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

ONE_HOT_ENCODER_FIT ( 'model-name', 'input-relation','input-columns'
        [ USING PARAMETERS
              [exclude_columns = 'excluded-columns']
              [, output_view = 'output-view']
              [, extra_levels = 'category-levels'] ] )

Arguments

model-name
Identifies the model to create, where model-name conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema.
input-relation
The table or view that contains the data for one hot encoding. If the input relation is defined in Hive, use SYNC_WITH_HCATALOG_SCHEMA to sync the hcatalog schema, and then run the machine learning function.
input-columns
Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be INTEGER, BOOLEAN, VARCHAR, or dates.

Parameters

exclude_columns

Comma-separated list of column names from input-columns to exclude from processing.

output_view
The name of the view that stores the input relation and the one hot encodings. Columns are returned in the order they appear in the input relation, with the one-hot encoded columns appended after the original columns.
extra_levels
Additional levels in each category that are not in the input relation. This parameter should be passed as a string that conforms with the JSON standard, with category names as keys, and lists of extra levels in each category as values.

Model attributes

call_string
The value of all input arguments that were specified at the time the function was called.
varchar_categories integer_categories boolean_categories date_categories
Settings for all:
  • category_name: Column name

  • category_level: Levels of the category, sorted for each category

  • category_level_index: Index of this categorical level in the sorted list of levels for the category.

Privileges

Non-superusers:

  • CREATE privileges on the schema where the model is created

  • SELECT privileges on the input relation

  • CREATE privileges on the output view schema

Examples

=> SELECT ONE_HOT_ENCODER_FIT ('one_hot_encoder_model','mtcars','*'
USING PARAMETERS exclude_columns='mpg,disp,drat,wt,qsec,vs,am');
ONE_HOT_ENCODER_FIT
--------------------
Success
(1 row)

See also