ONE_HOT_ENCODER_FIT
Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.
Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
ONE_HOT_ENCODER_FIT ( 'model-name', 'input-relation','input-columns'
[ USING PARAMETERS
[exclude_columns = 'excluded-columns']
[, output_view = 'output-view']
[, extra_levels = 'category-levels'] ] )
Arguments
model-name
- Identifies the model to create, where
model-name
conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input-relation
- The table or view that contains the data for one hot encoding. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMA
to sync thehcatalog
schema, and then run the machine learning function. input-columns
- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be INTEGER, BOOLEAN, VARCHAR, or dates.
Parameters
exclude_columns
Comma-separated list of column names from
input-columns
to exclude from processing.output_view
- The name of the view that stores the input relation and the one hot encodings. Columns are returned in the order they appear in the input relation, with the one-hot encoded columns appended after the original columns.
extra_levels
- Additional levels in each category that are not in the input relation. This parameter should be passed as a string that conforms with the JSON standard, with category names as keys, and lists of extra levels in each category as values.
Model attributes
call_string
- The value of all input arguments that were specified at the time the function was called.
varchar_categories integer_categories boolean_categories date_categories
- Settings for all:
-
category_name
: Column name -
category_level
: Levels of the category, sorted for each category -
category_level_index
: Index of this categorical level in the sorted list of levels for the category.
-
Privileges
Non-superusers:
-
CREATE privileges on the schema where the model is created
-
SELECT privileges on the input relation
-
CREATE privileges on the output view schema
Examples
=> SELECT ONE_HOT_ENCODER_FIT ('one_hot_encoder_model','mtcars','*'
USING PARAMETERS exclude_columns='mpg,disp,drat,wt,qsec,vs,am');
ONE_HOT_ENCODER_FIT
--------------------
Success
(1 row)