APPLY_BISECTING_KMEANS

Applies a trained bisecting k-means model to an input relation, and assigns each new data point to the closest matching cluster in the trained model.

Note

If the input relation is defined in Hive, use SYNC_WITH_HCATALOG_SCHEMA to sync the hcatalog schema, and then run the machine learning function.

Syntax

SELECT APPLY_BISECTING_KMEANS( 'input-columns'
        USING PARAMETERS model_name = 'model-name'
            [, num_clusters = 'num-clusters']
            [, match_by_pos = match-by-position] ] )

Arguments

input-columns: Comma-separated list of columns to use from the input relation, or asterisk (*) to select all columns. Input columns must be of data type numeric.

Parameters

model_name

Name of the model (case-insensitive).

num_clusters

Integer between 1 and k inclusive, where k is the number of centers in the model, specifies the number of clusters to use for prediction.

Default: Value that the model specifies for k

match_by_pos

Boolean value that specifies how input columns are matched to model features:

true: Match by the position of columns in the input columns list.
false (default): Match by name.

Privileges

Non-superusers: model owner, or USAGE privileges on the model