R example: kmeans
The KMeans_User scalar function reads any number of columns from a table, the observations.
The KMeans_User
scalar function reads any number of columns from a table, the observations. It then uses the observations and the two parameters when applying the kmeans clustering algorithm to the data, returning an integer value associated with the cluster of the row.
You can find more UDx examples in the Vertica Github repository, https://github.com/vertica/UDx-Examples.
Load the function and library
Create the library and the function:
=> CREATE OR REPLACE LIBRARY rLib AS 'kmeans.R' LANGUAGE 'R';
CREATE LIBRARY
=> CREATE OR REPLACE FUNCTION KMeans_User AS LANGUAGE 'R' NAME 'KMeans_UserFactory' LIBRARY rLib FENCED;
CREATE FUNCTION
Querying data with the function
The following query shows how you can run a query with the UDSF:
=> SELECT spec,
KMeans_User(sl, sw, pl, pw USING PARAMETERS clusters = 3, nstart = 20)
FROM iris;
spec | KMeans_User
-----------------+-------------
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
.
.
.
(150 rows)