NORMALIZE_FIT
Note
This function differs from NORMALIZE, which directly outputs a view with normalized results, rather than storing normalization parameters into a model for later operation.NORMALIZE_FIT
computes normalization parameters for each of the specified columns in an input relation. The resulting model stores the normalization parameters. For example, for MinMax
normalization, the minimum and maximum value of each column are stored in the model. The generated model serves as input to functions APPLY_NORMALIZE and REVERSE_NORMALIZE.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
NORMALIZE_FIT ( 'model-name', 'input-relation', 'input-columns', 'normalization-method'
[ USING PARAMETERS [exclude_columns = 'excluded-columns'] [, output_view = 'output-view'] ] )
Arguments
model-name
- Identifies the model to create, where
model-name
conforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input-relation
- The table or view that contains the data to normalize. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMA
to sync thehcatalog
schema, and then run the machine learning function. input-columns
- Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be of data type numeric.
normalization-method
- The normalization method to use, one of the following:
-
minmax
-
zscore
-
robust_zscore
If you specify
robust_zscore
,NORMALIZE_FIT
uses the function APPROXIMATE_MEDIAN [aggregate].All normalization methods ignore infinity, negative infinity, or NULL values in the input relation.
-
Parameters
exclude_columns
Comma-separated list of column names from
input-columns
to exclude from processing.output_view
- Name of the view that contains all columns from the input relation, with the specified input columns normalized.
Model attributes
data
- Normalization method set to
minmax
:-
colNames
: Model column names -
mins
: Minimum value of each column -
maxes
: Maximum value of each column
-
Privileges
Non-superusers:
-
CREATE privileges on the schema where the model is created
-
SELECT privileges on the input relation
-
CREATE privileges on the output view schema
Examples
The following example creates a model with NORMALIZE_FIT
using the wt
and hp
columns in table mtcars
, and then uses this model in successive calls to APPLY_NORMALIZE and REVERSE_NORMALIZE.
=> SELECT NORMALIZE_FIT('mtcars_normfit', 'mtcars', 'wt,hp', 'minmax');
NORMALIZE_FIT
---------------
Success
(1 row)
The following call to APPLY_NORMALIZE
specifies the hp
and cyl
columns in table mtcars
, where hp
is in the normalization model and cyl
is not in the normalization model:
=> CREATE TABLE mtcars_normalized AS SELECT APPLY_NORMALIZE (hp, cyl USING PARAMETERS model_name = 'mtcars_normfit') FROM mtcars;
CREATE TABLE
=> SELECT * FROM mtcars_normalized;
hp | cyl
--------------------+-----
0.434628975265018 | 8
0.681978798586572 | 8
0.434628975265018 | 6
1 | 8
0.540636042402827 | 8
0 | 4
0.681978798586572 | 8
0.0459363957597173 | 4
0.434628975265018 | 8
0.204946996466431 | 6
0.250883392226148 | 6
0.049469964664311 | 4
0.204946996466431 | 6
0.201413427561837 | 4
0.204946996466431 | 6
0.250883392226148 | 6
0.049469964664311 | 4
0.215547703180212 | 4
0.0353356890459364 | 4
0.187279151943463 | 6
0.452296819787986 | 8
0.628975265017668 | 8
0.346289752650177 | 8
0.137809187279152 | 4
0.749116607773852 | 8
0.144876325088339 | 4
0.151943462897526 | 4
0.452296819787986 | 8
0.452296819787986 | 8
0.575971731448763 | 8
0.159010600706714 | 4
0.346289752650177 | 8
(32 rows)
=> SELECT REVERSE_NORMALIZE (hp, cyl USING PARAMETERS model_name='mtcars_normfit') FROM mtcars_normalized;
hp | cyl
-----+-----
175 | 8
245 | 8
175 | 6
335 | 8
205 | 8
52 | 4
245 | 8
65 | 4
175 | 8
110 | 6
123 | 6
66 | 4
110 | 6
109 | 4
110 | 6
123 | 6
66 | 4
113 | 4
62 | 4
105 | 6
180 | 8
230 | 8
150 | 8
91 | 4
264 | 8
93 | 4
95 | 4
180 | 8
180 | 8
215 | 8
97 | 4
150 | 8
(32 rows)
The following call to REVERSE_NORMALIZE
also specifies the hp
and cyl
columns in table mtcars
, where hp
is in normalization model mtcars_normfit
, and cyl
is not in the normalization model.
=> SELECT REVERSE_NORMALIZE (hp, cyl USING PARAMETERS model_name='mtcars_normfit') FROM mtcars_normalized;
hp | cyl
-----------------+-----
205.000005722046 | 8
150.000000357628 | 8
150.000000357628 | 8
93.0000016987324 | 4
174.99999666214 | 8
94.9999992102385 | 4
214.999997496605 | 8
97.0000009387732 | 4
245.000006556511 | 8
174.99999666214 | 6
335 | 8
245.000006556511 | 8
62.0000002086163 | 4
174.99999666214 | 8
230.000002026558 | 8
52 | 4
263.999997675419 | 8
109.999999523163 | 6
123.000002324581 | 6
64.9999996386468 | 4
66.0000005029142 | 4
112.999997898936 | 4
109.999999523163 | 6
180.000000983477 | 8
180.000000983477 | 8
108.999998658895 | 4
109.999999523163 | 6
104.999999418855 | 6
123.000002324581 | 6
180.000000983477 | 8
66.0000005029142 | 4
90.9999999701977 | 4
(32 rows)