R example: kmeansPoly

The following example shows an implementation of a Transform Function (UDTF) that performs kmeans clustering on one or more input columns.

The following example shows an implementation of a Transform Function (UDTF) that performs kmeans clustering on one or more input columns.

kmeansPoly <- function(v.data.frame,v.param.list) {
  # Computes clusters using the kmeans algorithm.
  #
  # Input: A dataframe and a list of parameters.
  # Output: A dataframe with one column that tells the cluster to which each data
  #         point belongs.
  # Args:
  #  v.data.frame: The data from Vertica cast as an R data frame.
  #  v.param.list: List of function parameters.
  #
  # Returns:
  #  The cluster associated with each data point.
  # Ensure k is not null.
  if(!is.null(v.param.list[['k']])) {
     number_of_clusters <- as.numeric(v.param.list[['k']])
  } else {
    stop("k cannot be NULL! Please use a valid value.")
  }
  # Run the kmeans algorithm.
  kmeans_clusters <- kmeans(v.data.frame, number_of_clusters)
  final.output <- data.frame(kmeans_clusters$cluster)
  return(final.output)
}

kmeansFactoryPoly <- function() {
  # This function tells Vertica the name of the R function,
  # and the polymorphic parameters.
  list(name=kmeansPoly, udxtype=c("transform"), intype=c("any"),
       outtype=c("int"), parametertypecallback=kmeansParameters)
}

kmeansParameters <- function() {
  # Callback function for the parameter types.
  function.parameters <- data.frame(datatype=rep(NA, 1), length=rep(NA,1),
                                    scale=rep(NA,1), name=rep(NA,1))
  function.parameters[1,1] = "int"
  function.parameters[1,4] = "k"
  return(function.parameters)
}

The polymorphic R function declares it accepts any number of arguments in its factory function by specifying "any" as the argument to the intype parameter and optionally the outtype parameter. If you define "any" argument for intype or outtype, then it is the only type that your function can declare for the respective parameter. You cannot define required arguments and then call "any" to declare the rest of the signature as optional. If your function has requirements for the arguments it accepts, your process function must enforce them.

The outtypecallback method is used to indicate the argument types and sizes it has been called with, and is expected to indicate the types and sizes that the function returns. The outtypecallback method can also be used to check for unsupported types and/or number of arguments. For example, the function may require only integers, with no more than 10 of them.

You assign a SQL name to your polymorphic UDx using the same statement you use to assign one to a non-polymorphic UDx. The following statements show how you load and call the polymorphic function from the example.

=> CREATE LIBRARY rlib2 AS '/home/dbadmin/R_UDx/poly_kmeans.R' LANGUAGE 'R';
CREATE LIBRARY
=> CREATE TRANSFORM FUNCTION kmeansPoly AS LANGUAGE 'R' name 'kmeansFactoryPoly' LIBRARY rlib2;
CREATE FUNCTION
=> SELECT spec, kmeansPoly(sl,sw,pl,pw USING PARAMETERS k = 3)
    OVER(PARTITION BY spec) AS Clusters
      FROM iris;
      spec       | Clusters
-----------------+----------
 Iris-setosa     |        1
 Iris-setosa     |        1
 Iris-setosa     |        1
 Iris-setosa     |        1
.
.
.
(150 rows)