R 示例:kmeansPoly

以下示例显示了对一个或多个输入列执行 kmeans 聚类的转换函数 (UDTF) 的实施。

kmeansPoly <- function(v.data.frame,v.param.list) {
  # Computes clusters using the kmeans algorithm.
  #
  # Input: A dataframe and a list of parameters.
  # Output: A dataframe with one column that tells the cluster to which each data
  #         point belongs.
  # Args:
  #  v.data.frame: The data from Vertica cast as an R data frame.
  #  v.param.list: List of function parameters.
  #
  # Returns:
  #  The cluster associated with each data point.
  # Ensure k is not null.
  if(!is.null(v.param.list[['k']])) {
     number_of_clusters <- as.numeric(v.param.list[['k']])
  } else {
    stop("k cannot be NULL! Please use a valid value.")
  }
  # Run the kmeans algorithm.
  kmeans_clusters <- kmeans(v.data.frame, number_of_clusters)
  final.output <- data.frame(kmeans_clusters$cluster)
  return(final.output)
}

kmeansFactoryPoly <- function() {
  # This function tells Vertica the name of the R function,
  # and the polymorphic parameters.
  list(name=kmeansPoly, udxtype=c("transform"), intype=c("any"),
       outtype=c("int"), parametertypecallback=kmeansParameters)
}

kmeansParameters <- function() {
  # Callback function for the parameter types.
  function.parameters <- data.frame(datatype=rep(NA, 1), length=rep(NA,1),
                                    scale=rep(NA,1), name=rep(NA,1))
  function.parameters[1,1] = "int"
  function.parameters[1,4] = "k"
  return(function.parameters)
}

多态 R 函数通过将 "any" 指定为 intype 形参的实参和可选的 outtype 形参,在其工厂函数中声明它可接受任何数量的实参。如果为 intypeouttype 定义 "any" 实参,则函数只能为相应的形参声明该类型。您不能先定义必需实参,然后再调用“any”将其余签名声明为可选实参。如果您的函数对其接受的实参有所要求,您的处理函数必须强制使用这些实参。

outtypecallback 方法用于指示与此方法一起调用的实参类型和数量,并且需要指示函数所返回的类型和数量。outtypecallback 方法还可以用于检查不受支持的实参类型和/或数量。例如,函数可能只需要最多 10 个整数:

您使用与将某个 SQL 名称分配给一个非多态 UDx 相同的语句将一个 SQL 名称分配给您的多态 UDx。以下语句显示了如何从示例中加载和调用多态函数。

=> CREATE LIBRARY rlib2 AS '/home/dbadmin/R_UDx/poly_kmeans.R' LANGUAGE 'R';
CREATE LIBRARY
=> CREATE TRANSFORM FUNCTION kmeansPoly AS LANGUAGE 'R' name 'kmeansFactoryPoly' LIBRARY rlib2;
CREATE FUNCTION
=> SELECT spec, kmeansPoly(sl,sw,pl,pw USING PARAMETERS k = 3)
    OVER(PARTITION BY spec) AS Clusters
      FROM iris;
      spec       | Clusters
-----------------+----------
 Iris-setosa     |        1
 Iris-setosa     |        1
 Iris-setosa     |        1
 Iris-setosa     |        1
.
.
.
(150 rows)