ONE_HOT_ENCODER_FIT
Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.
Generates a sorted list of each of the category levels for each feature to be encoded, and stores the model.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
ONE_HOT_ENCODER_FIT ( 'model‑name', 'input‑relation','input‑columns'
[ USING PARAMETERS
[exclude_columns = 'excluded‑columns']
[, output_view = 'output‑view']
[, extra_levels = 'category‑levels'] ] )
Arguments
model‑name- Identifies the model to create, where
model‑nameconforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input‑relation- The table or view that contains the data for one hot encoding. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMAto sync thehcatalogschema, and then run the machine learning function. input‑columns- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be INTEGER, BOOLEAN, VARCHAR, or dates.
Parameters
exclude_columnsComma-separated list of column names from
input‑columnsto exclude from processing.output_view- The name of the view that stores the input relation and the one hot encodings. Columns are returned in the order they appear in the input relation, with the one-hot encoded columns appended after the original columns.
extra_levels- Additional levels in each category that are not in the input relation. This parameter should be passed as a string that conforms with the JSON standard, with category names as keys, and lists of extra levels in each category as values.
Model attributes
call_string- The value of all input arguments that were specified at the time the function was called.
-
varchar_categories integer_categories boolean_categories date_categories - Settings for all:
-
category_name: Column name -
category_level: Levels of the category, sorted for each category -
category_level_index: Index of this categorical level in the sorted list of levels for the category.
-
Privileges
Non-superusers:
-
CREATE privileges on the schema where the model is created
-
SELECT privileges on the input relation
-
CREATE privileges on the output view schema
Examples
=> SELECT ONE_HOT_ENCODER_FIT ('one_hot_encoder_model','mtcars','*'
USING PARAMETERS exclude_columns='mpg,disp,drat,wt,qsec,vs,am');
ONE_HOT_ENCODER_FIT
--------------------
Success
(1 row)