KPROTOTYPES
Executes the k-prototypes algorithm on an input relation.
Executes the k-prototypes algorithm on an input relation. The result is a model with a list of cluster centers.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Syntax
SELECT KPROTOTYPES ('`*`model-name`*`', '`*`input-relation`*`', '`*`input-columns`*`', `*`num-clusters`*`
[USING PARAMETERS [exclude_columns = '`*`exclude-columns`*`']
[, max_iterations = '`*`max-iterations`*`']
[, epsilon = `*`epsilon`*`]
[, {[init_method = '`*`init-method`*`'] } | { initial_centers_table = '`*`init-table`*`' } ]
[, gamma = '`*`gamma`*`']
[, output_view = '`*`output-view`*`']
[, key_columns = '`*`key-columns`*`']]);
Behavior type
VolatileArguments
model-name
- Name of the model resulting from the training.
input-relation
- Name of the table or view containing the training samples.
input-columns
- String containing a comma-separated list of columns to use from the input-relation, or asterisk (*) to select all columns.
num-clusters
- Integer ≤ 10,000 representing the number of clusters to create. This argument represents the k in k-prototypes.
Parameters
exclude-columns
- String containing a comma-separated list of column names from input-columns to exclude from processing.
Default: (empty)
max_iterations
- Integer ≤ 1M representing the maximum number of iterations the algorithm performs.
Default: Integer ≤ 1M
epsilon
- Integer which determines whether the algorithm has converged.
Default: 1e-4
init_method
- String specifying the method used to find the initial k-prototypes cluster centers.
Default: "random"
initial_centers_table
- The table with the initial cluster centers to use.
gamma
- Float between 0 and 10000 specifying the weighing factor for categorical columns. It can determine relative importance of numerical and categorical attributes
Default: Inferred from data.
output_view
- The name of the view where you save the assignments of each point to its cluster
key_columns
- Comma-separated list of column names that identify the output rows. Columns must be in the input-columns argument list
Examples
The following example creates k-prototypes model small_model
and applies it to input table small_test_mixed
:
=> SELECT KPROTOTYPES('small_model_initcenters', 'small_test_mixed', 'x0, country', 3 USING PARAMETERS initial_centers_table='small_test_mixed_centers', key_columns='pid');
KPROTOTYPES
---------------------------
Finished in 2 iterations
(1 row)
=> SELECT country, x0, APPLY_KPROTOTYPES(country, x0
USING PARAMETERS model_name='small_model')
FROM small_test_mixed;
country | x0 | apply_kprototypes
------------+-----+-------------------
'China' | 20 | 0
'US' | 85 | 2
'Russia' | 80 | 1
'Brazil' | 78 | 1
'US' | 23 | 0
'US' | 50 | 0
'Canada' | 24 | 0
'Canada' | 18 | 0
'Russia' | 90 | 2
'Russia' | 98 | 2
'Brazil' | 89 | 2
...
(45 rows)