R 示例:kmeans
KMeans_User
标量函数会从表中读取任意数量的列,即观察值。然后,在将 kmeans 群集算法应用于数据时,它会使用观测值和两个参数,以返回与行的群集相关联的整数值。
您可以在 Vertica Github 存储库中找到更多 UDx 示例:https://github.com/vertica/UDx-Examples。
加载函数和库
创建库和函数:
=> CREATE OR REPLACE LIBRARY rLib AS 'kmeans.R' LANGUAGE 'R';
CREATE LIBRARY
=> CREATE OR REPLACE FUNCTION KMeans_User AS LANGUAGE 'R' NAME 'KMeans_UserFactory' LIBRARY rLib FENCED;
CREATE FUNCTION
使用函数查询数据
以下查询显示了如何使用 UDSF 运行查询。
=> SELECT spec,
KMeans_User(sl, sw, pl, pw USING PARAMETERS clusters = 3, nstart = 20)
FROM iris;
spec | KMeans_User
-----------------+-------------
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
Iris-setosa | 2
.
.
.
(150 rows)
UDSF R 代码
KMeans_User <- function(input.data.frame, parameters.data.frame) {
# Take the clusters and nstart parameters passed by the user and assign them
# to variables in the function.
if ( is.null(parameters.data.frame[['clusters']]) ) {
stop("NULL value for clusters! clusters cannot be NULL.")
} else {
clusters.value <- parameters.data.frame[['clusters']]
}
if ( is.null(parameters.data.frame[['nstart']]) ) {
stop("NULL value for nstart! nstart cannot be NULL.")
} else {
nstart.value <- parameters.data.frame[['nstart']]
}
# Apply the algorithm to the data.
kmeans.clusters <- kmeans(input.data.frame[, 1:length(input.data.frame)],
clusters.value, nstart = nstart.value)
final.output <- data.frame(kmeans.clusters$cluster)
return(final.output)
}
KMeans_UserFactory <- function() {
list(name = KMeans_User,
udxtype = c("scalar"),
# Since this is a polymorphic function the intype must be any
intype = c("any"),
outtype = c("int"),
parametertypecallback=KMeansParameters)
}
KMeansParameters <- function() {
parameters <- list(datatype = c("int", "int"),
length = c("NA", "NA"),
scale = c("NA", "NA"),
name = c("clusters", "nstart"))
return(parameters)
}