TransformFunction class

The TransformFunction class is where you perform the data-processing, transforming input rows into output rows.

The TransformFunction class is where you perform the data-processing, transforming input rows into output rows. Your subclass must define the processPartition() method. It may define methods to set up and tear down the function.

Performing the transformation

The processPartition() method carries out all of the processing that you want your UDTF to perform. When a user calls your function in a SQL statement, Vertica bundles together the data from the function parameters and passes it to processPartition().

The input and output of the processPartition() method are supplied by objects of the PartitionReader and PartitionWriter classes. They define methods that you use to read the input data and write the output data for your UDTF.

A UDTF does not necessarily operate on a single row the way a UDSF does. A UDTF can read any number of rows and write output at any time.

Consider the following guidelines when implementing processPartition():

  • Extract the input parameters by calling data-type-specific functions in the PartitionReader object to extract each input parameter. Each of these functions takes a single parameter: the column number in the input row that you want to read. Your function might need to handle NULL values.

  • When writing output, your UDTF must supply values for all of the output columns you defined in your factory. Similarly to reading input columns, the PartitionWriter object has functions for writing each type of data to the output row.

  • Use PartitionReader.next() to determine if there is more input to process, and exit when the input is exhausted.

  • In some cases, you might want to determine the number and types of parameters using PartitionReader's getNumCols() and getTypeMetaData() functions, instead of just hard-coding the data types of the columns in the input row. This is useful if you want your TransformFunction to be able to process input tables with different schemas. You can then use different TransformFunctionFactory classes to define multiple function signatures that call the same TransformFunction class. See Overloading your UDx for more information.

Setting up and tearing down

The TransformFunction class defines two additional methods that you can optionally implement to allocate and free resources: setup() and destroy(). You should use these methods to allocate and deallocate resources that you do not allocate through the UDx API (see Allocating resources for UDxs for details).

API

The TransformFunction API provides the following methods for extension by subclasses:

virtual void setup(ServerInterface &srvInterface,
        const SizedColumnTypes &argTypes);

virtual void processPartition(ServerInterface &srvInterface,
        PartitionReader &input_reader, PartitionWriter &output_writer)=0;

virtual void cancel(ServerInterface &srvInterface);

virtual void destroy(ServerInterface &srvInterface,
        const SizedColumnTypes &argTypes);

The PartitionReader and PartitionWriter classes provide getters and setters for column values, along with next() to iterate through partitions. See the API reference documentation for details.

The TransformFunction API provides the following methods for extension by subclasses:

public void setup(ServerInterface srvInterface, SizedColumnTypes argTypes);

public abstract void processPartition(ServerInterface srvInterface,
        PartitionReader input_reader, PartitionWriter input_writer)
    throws UdfException, DestroyInvocation;

protected void cancel(ServerInterface srvInterface);

public void destroy(ServerInterface srvInterface, SizedColumnTypes argTypes);

The PartitionReader and PartitionWriter classes provide getters and setters for column values, along with next() to iterate through partitions. See the API reference documentation for details.

The TransformFunction API provides the following methods for extension by subclasses:


def setup(self, server_interface, col_types)

def processPartition(self, server_interface, partition_reader, partition_writer)

def destroy(self, server_interface, col_types)

The PartitionReader and PartitionWriter classes provide getters and setters for column values, along with next() to iterate through partitions. See the API reference documentation for details.

Implement the Main function API to define a transform function:

FunctionName <- function(input.data.frame, parameters.data.frame) {
  # Computations

  # The function must return a data frame.
  return(output.data.frame)
}