This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Arguments and return values

For all UDx types except load (UDL), the factory class declares the arguments and return type of the associated function.

For all UDx types except load (UDL), the factory class declares the arguments and return type of the associated function. Factories have two methods for this purpose:

  • getPrototype() (required): declares input and output types

  • getReturnType() (sometimes required): declares the return types, including length and precision, when applicable

The getPrototype() method receives two ColumnTypes parameters, one for input and one for output. The factory in C++ example: string tokenizer takes a single input string and returns a string:

virtual void getPrototype(ServerInterface &srvInterface,
                          ColumnTypes &argTypes, ColumnTypes &returnType)
{
  argTypes.addVarchar();
  returnType.addVarchar();
}

The ColumnTypes class provides "add" methods for each supported type, like addVarchar(). This class supports complex types with the addArrayType() and addRowType() methods; see Complex Types as Arguments. If your function is polymorphic, you can instead call addAny(). You are then responsible for validating your inputs and outputs. For more information about implementing polymorphic UDxs, see Creating a polymorphic UDx.

The getReturnType() method computes a maximum length for the returned value. If your UDx returns a sized column (a return data type whose length can vary, such as a VARCHAR), a value that requires precision, or more than one value, implement this factory method. (Some UDx types require you to implement it.)

The input is a SizedColumnTypes containing the input argument types along with their lengths. Depending on the input types, add one of the following to the output types:

  • CHAR, (LONG) VARCHAR, BINARY, and (LONG) VARBINARY: return the maximum length.

  • NUMERIC types: specify the precision and scale.

  • TIME and TIMESTAMP values (with or without timezone): specify precision.

  • INTERVAL YEAR TO MONTH: specify range.

  • INTERVAL DAY TO SECOND: specify precision and range.

  • ARRAY: specify the maximum number of array elements.

In the case of the string tokenizer, the output is a VARCHAR and the function determines its maximum length:

// Tell Vertica what our return string length will be, given the input
// string length
virtual void getReturnType(ServerInterface &srvInterface,
                           const SizedColumnTypes &inputTypes,
                           SizedColumnTypes &outputTypes)
{
  // Error out if we're called with anything but 1 argument
  if (inputTypes.getColumnCount() != 1)
    vt_report_error(0, "Function only accepts 1 argument, but %zu provided", inputTypes.getColumnCount());

  int input_len = inputTypes.getColumnType(0).getStringLength();

  // Our output size will never be more than the input size
  outputTypes.addVarchar(input_len, "words");
}

Complex types as arguments and return values

The ColumnTypes class supports ARRAY and ROW types. Arrays have elements and rows have fields, both of which have types that you need to describe. To work with complex types, you build ColumnTypes objects for the array or row and then add them to the ColumnTypes objects representing the function inputs and outputs.

In the following example, the input to a transform function is an array of orders, which are rows, and the output is the individual rows with their positions in the array. An order consists of a shipping address (VARCHAR) and an array of product IDs (INT).

The factory's getPrototype() method first creates ColumnTypes for the array and row elements and then calls addArrayType() and addRowType() using them:


void getPrototype(ServerInterface &srv,
            ColumnTypes &argTypes,
            ColumnTypes &retTypes)
    {
        // item ID (int), to be used in an array
        ColumnTypes itemIdProto;
        itemIdProto.addInt();

        // row: order = address (varchar) + array of previously-created item IDs
        ColumnTypes orderProto;
        orderProto.addVarchar();                  /* address */
        orderProto.addArrayType(itemIdProto);     /* array of item ID */

        /* argument (input) is array of orders */
        argTypes.addArrayType(orderProto);

        /* return values: index in the array, order */
        retTypes.addInt();                        /* index of element */
        retTypes.addRowType(orderProto);          /* element return type */
    }

The arguments include a sized type (the VARCHAR). The getReturnType() method uses a similar approach, using the Fields class to build the two fields in the order.


void getReturnType(ServerInterface &srv,
            const SizedColumnTypes &argTypes,
            SizedColumnTypes &retTypes)
    {
        Fields itemIdElementFields;
        itemIdElementFields.addInt("item_id");

        Fields orderFields;
        orderFields.addVarchar(32, "address");
        orderFields.addArrayType(itemIdElementFields[0], "item_id");
        // optional third arg: max length, default unbounded

        /* declare return type */
        retTypes.addInt("index");
        static_cast<Fields &>(retTypes).addRowType(orderFields, "element");

        /* NOTE: presumably we have verified that the arguments match the prototype, so really we could just do this: */
        retTypes.addInt("index");
        retTypes.addArg(argTypes.getColumnType(0).getElementType(), "element");
    }

To access complex types in the UDx processing method, use the ArrayReader, ArrayWriter, StructReader, and StructWriter classes.

See C++ example: using complex types for a polymorphic function that uses arrays.

The factory's getPrototype() method first uses makeType() and addType() methods to create and construct ColumnTypes for the row and its elements. The method then calls addType() methods to add these constructed ColumnTypes to the arg_types and return_type objects:


def getPrototype(self, srv_interface, arg_types, return_type):
    # item ID (int), to be used in an array
    itemIdProto = vertica_sdk.ColumnTypes.makeInt()

    # row (order): address (varchar) + array of previously-created item IDs
    orderProtoFields = vertica_sdk.ColumnTypes.makeEmpty()
    orderProtoFields.addVarchar()  # address
    orderProtoFields.addArrayType(itemIdProto) # array of item ID
    orderProto = vertica_sdk.ColumnTypes.makeRowType(orderProtoFields)

    # argument (input): array of orders
    arg_types.addArrayType(orderProto)

    # return values: index in the array, order
    return_type.addInt();                        # index of element
    return_type.addRowType(orderProto);          # element return type

The factory's getReturnType() method creates SizedColumnTypes with the makeInt() and makeEmpty() methods and then builds two row fields with the addVarchar() and addArrayType() methods. Note that the addArrayType() method specifies the maximum number of array elements as 1024. getReturnType() then adds these constructed SizedColumnTypes to the object representing the return type.


def getReturnType(self, srv_interface, arg_types, return_type):
    itemsIdElementField = vertica_sdk.SizedColumnTypes.makeInt("item_id")

    orderFields = vertica_sdk.SizedColumnTypes.makeEmpty()
    orderFields.addVarchar(32, "address")
    orderFields.addArrayType(itemIdElementField, 1024, "item_ids")

    # declare return type
    return_type.addInt("index")
    return_type.addRowType(orderFields, "element")

    '''
    NOTE: presumably we have verified that the arguments match the prototype, so really we could just do this:
    return_type.addInt("index")
    return_type.addArrayType(argTypes.getColumnType(0).getElementType(), "element")
    '''

To access complex types in the UDx processing method, use the ArrayReader, ArrayWriter, RowReader, and RowWriter classes. For details, see Python SDK.

See Python example: matrix multiplication for a scalar function that uses complex types.

Handling different numbers and types of arguments

You can create UDxs that handle multiple signatures, or even accept all arguments supplied to them by the user, using either overloading or polymorphism.

You can overload your UDx by assigning the same SQL function name to multiple factory classes, each of which defines a unique function signature. When a user uses the function name in a query, Vertica tries to match the signature of the function call to the signatures declared by the factory's getPrototype() method. This is the best technique to use if your UDx needs to accept a few different signatures (for example, accepting two required and one optional argument).

Alternatively, you can write a polymorphic function, writing one factory method instead of several and declaring that it accepts any number and type of arguments. When a user uses the function name in a query, Vertica calls your function regardless of the signature. In exchange for this flexibility, your UDx's main "process" method has to determine whether it can accept the arguments and emit errors if not.

All UDx types can use polymorphic inputs. Transform functions and analytic functions can also use polymorphic outputs. This means that getPrototype() can declare a return type of "any" and set the actual return type at runtime. For example, a function that returns the largest value in an input would return the same type as the input type.

1 - Overloading your UDx

You may want your UDx to accept several different signatures (sets of arguments).

You may want your UDx to accept several different signatures (sets of arguments). For example, you might want your UDx to accept:

  • One or more optional arguments.

  • One or more arguments that can be one of several data types.

  • Completely distinct signatures (either all INTEGER or all VARCHAR, for example).

You can create a function with this behavior by creating several factory classes, each of which accepts a different signature (the number and data types of arguments). You can then associate a single SQL function name with all of them. You can use the same SQL function name to refer to multiple factory classes as long as the signature defined by each factory is unique. When a user calls your UDx, Vertica matches the number and types of arguments supplied by the user to the arguments accepted by each of your function's factory classes. If one matches, Vertica uses it to instantiate a function class to process the data.

Multiple factory classes can instantiate the same function class, so you can re-use one function class that is able to process multiple sets of arguments and then create factory classes for each of the function signatures. You can also create multiple function classes if you want.

See the C++ example: overloading your UDx and Java example: overloading your UDx examples.

1.1 - C++ example: overloading your UDx

The following example code demonstrates creating a user-defined scalar function (UDSF) that adds two or three integers together.

The following example code demonstrates creating a user-defined scalar function (UDSF) that adds two or three integers together. The Add2or3ints class is prepared to handle two or three arguments. The processBlock() function checks the number of arguments that have been passed to it, and adds all two or three of them together. It also exits with an error message if it has been called with less than 2 or more than 3 arguments. In theory, this should never happen, since Vertica only calls the UDSF if the user's function call matches a signature on one of the factory classes you create for your function. In practice, it is a good idea to perform this sanity checking, in case your (or someone else's) factory class inaccurately reports a set of arguments your function class cannot handle.

#include "Vertica.h"
using namespace Vertica;
using namespace std;
// a ScalarFunction that accepts two or three
// integers and adds them together.
class Add2or3ints : public Vertica::ScalarFunction
{
public:
    virtual void processBlock(Vertica::ServerInterface &srvInterface,
                              Vertica::BlockReader &arg_reader,
                              Vertica::BlockWriter &res_writer)
    {
        const size_t numCols = arg_reader.getNumCols();

        // Ensure that only two or three parameters are passed in
        if ( numCols < 2 || numCols > 3)
            vt_report_error(0, "Function only accept 2 or 3 arguments, "
                                "but %zu provided", arg_reader.getNumCols());
      // Add two integers together
        do {
            const vint a = arg_reader.getIntRef(0);
            const vint b = arg_reader.getIntRef(1);
            vint c = 0;
        // Check for third argument, add it in if it exists.
            if (numCols == 3)
                c = arg_reader.getIntRef(2);
            res_writer.setInt(a+b+c);
            res_writer.next();
        } while (arg_reader.next());
    }
};
// This factory accepts function calls with two integer arguments.
class Add2intsFactory : public Vertica::ScalarFunctionFactory
{
    virtual Vertica::ScalarFunction *createScalarFunction(Vertica::ServerInterface
                &srvInterface)
    { return vt_createFuncObj(srvInterface.allocator, Add2or3ints); }
    virtual void getPrototype(Vertica::ServerInterface &srvInterface,
                              Vertica::ColumnTypes &argTypes,
                              Vertica::ColumnTypes &returnType)
    {   // Accept 2 integer values
        argTypes.addInt();
        argTypes.addInt();
        returnType.addInt();
    }
};
RegisterFactory(Add2intsFactory);
// This factory defines a function that accepts 3 ints.
class Add3intsFactory : public Vertica::ScalarFunctionFactory
{
    virtual Vertica::ScalarFunction *createScalarFunction(Vertica::ServerInterface
                &srvInterface)
    { return vt_createFuncObj(srvInterface.allocator, Add2or3ints); }
    virtual void getPrototype(Vertica::ServerInterface &srvInterface,
                              Vertica::ColumnTypes &argTypes,
                              Vertica::ColumnTypes &returnType)
    {   // accept 3 integer values
        argTypes.addInt();
        argTypes.addInt();
        argTypes.addInt();
        returnType.addInt();
    }
};
RegisterFactory(Add3intsFactory);

The example has two ScalarFunctionFactory classes, one for each signature that the function accepts (two integers and three integers). There is nothing unusual about these factory classes, except that their implementation of ScalarFunctionFactory::createScalarFunction() both create Add2or3ints objects.

The final step is to bind the same SQL function name to both factory classes. You can assign multiple factories to the same SQL function, as long as the signatures defined by each factory's getPrototype() implementation are different.

=> CREATE LIBRARY add2or3IntsLib AS '/home/dbadmin/Add2or3Ints.so';
CREATE LIBRARY
=> CREATE FUNCTION add2or3Ints as NAME 'Add2intsFactory' LIBRARY add2or3IntsLib FENCED;
CREATE FUNCTION
=> CREATE FUNCTION add2or3Ints as NAME 'Add3intsFactory' LIBRARY add2or3IntsLib FENCED;
CREATE FUNCTION
=> SELECT add2or3Ints(1,2);
 add2or3Ints
-------------
           3
(1 row)
=> SELECT add2or3Ints(1,2,4);
 add2or3Ints
-------------
           7
(1 row)
=> SELECT add2or3Ints(1,2,3,4); -- Will generate an error
ERROR 3467:  Function add2or3Ints(int, int, int, int) does not exist, or
permission is denied for add2or3Ints(int, int, int, int)
HINT:  No function matches the given name and argument types. You may
need to add explicit type casts

The error message in response to the final call to the add2or3Ints function was generated by Vertica, since it could not find a factory class associated with add2or3Ints that accepted four integer arguments. To expand add2or3Ints further, you could create another factory class that accepted this signature, and either change the Add2or3ints ScalarFunction class or create a totally different class to handle adding more integers together. However, adding more classes to accept each variation in the arguments quickly becomes overwhelming. In that case, you should consider creating a polymorphic UDx.

1.2 - Java example: overloading your UDx

The following example code demonstrates creating a user-defined scalar function (UDSF) that adds two or three integers together.

The following example code demonstrates creating a user-defined scalar function (UDSF) that adds two or three integers together. The Add2or3ints class is prepared to handle two or three arguments. It checks the number of arguments that have been passed to it, and adds all two or three of them together. The processBlock() method checks whether it has been called with less than 2 or more than 3 arguments. In theory, this should never happen, since Vertica only calls the UDSF if the user's function call matches a signature on one of the factory classes you create for your function. In practice, it is a good idea to perform this sanity checking, in case your (or someone else's) factory class reports that your function class accepts a set of arguments that it actually does not.

// You need to specify the full package when creating functions based on
// the classes in your library.
package com.mycompany.multiparamexample;
// Import the entire Vertica SDK
import com.vertica.sdk.*;
// This ScalarFunction accepts two or three integer arguments. It tests
// the number of input columns to determine whether to read two or three
// arguments as input.
public class Add2or3ints extends ScalarFunction
{
    @Override
    public void processBlock(ServerInterface srvInterface,
                             BlockReader argReader,
                             BlockWriter resWriter)
                throws UdfException, DestroyInvocation
    {
        // See how many arguments were passed in
        int numCols = argReader.getNumCols();

        // Return an error if less than two or more than 3 aerguments
        // were given. This error only occurs if a Factory class that
        // accepts the wrong number of arguments instantiates this
        // class.
        if (numCols < 2 || numCols > 3) {
            throw new UdfException(0,
                "Must supply 2 or 3 integer arguments");
        }

        // Process all of the rows of input.
        do {
            // Get the first two integer arguments from the BlockReader
            long a = argReader.getLong(0);
            long b = argReader.getLong(1);

            // Assume no third argument.
            long c = 0;

            // Get third argument value if it exists
            if (numCols == 3) {
                c = argReader.getLong(2);
            }

            // Process the arguments and come up with a result. For this
            // example, just add the three arguments together.
            long result = a+b+c;

            // Write the integer output value.
            resWriter.setLong(result);

            // Advance the output BlocKWriter to the next row.
            resWriter.next();

            // Continue processing input rows until there are no more.
        } while (argReader.next());
    }
}

The main difference between the Add2ints class and the Add2or3ints class is the inclusion of a section that gets the number of arguments by calling BlockReader.getNumCols(). This class also tests the number of columns it received from Vertica to ensure it is in the range it is prepared to handle. This test will only fail if you create a ScalarFunctionFactory whose getPrototype() method defines a signature that accepts less than two or more than three arguments. This is not really necessary in this simple example, but for a more complicated class it is a good idea to test the number of columns and data types that Vertica passed your function class.

Within the do loop, Add2or3ints uses a default value of zero if Vertica sent it two input columns. Otherwise, it retrieves the third value and adds that to the other two. Your own class needs to use default values for missing input columns or alter its processing in some other way to handle the variable columns.

You must define your function class in its own source file, rather than as an inner class of one of your factory classes since Java does not allow the instantiation of an inner class from outside its containing class. You factory class has to be available for instantiation by multiple factory classes.

Once you have created a function class or classes, you create a factory class for each signature you want your function class to handle. These factory classes can call individual function classes, or they can all call the same class that is prepared to accept multiple sets of arguments.

The following example's createScalarFunction() method instantiates a member of the Add2or3ints class.

// You will need to specify the full package when creating functions based on
// the classes in your library.
package com.mycompany.multiparamexample;
// Import the entire Vertica SDK
import com.vertica.sdk.*;
public class Add2intsFactory extends ScalarFunctionFactory
{
    @Override
    public void getPrototype(ServerInterface srvInterface,
                             ColumnTypes argTypes,
                             ColumnTypes returnType)
    {
        // Accept two integers as input
        argTypes.addInt();
        argTypes.addInt();
        // writes one integer as output
        returnType.addInt();
    }
    @Override
    public ScalarFunction createScalarFunction(ServerInterface srvInterface)
    {
        // Instantiate the class that can handle either 2 or 3 integers.
        return new Add2or3ints();
    }
}

The following ScalarFunctionFactory subclass accepts three integers as input. It, too, instantiates a member of the Add2or3ints class to process the function call:

// You will need to specify the full package when creating functions based on
// the classes in your library.
package com.mycompany.multiparamexample;
// Import the entire Vertica SDK
import com.vertica.sdk.*;
public class Add3intsFactory extends ScalarFunctionFactory
{
    @Override
    public void getPrototype(ServerInterface srvInterface,
                             ColumnTypes argTypes,
                             ColumnTypes returnType)
    {
        // Accepts three integers as input
        argTypes.addInt();
        argTypes.addInt();
        argTypes.addInt();
        // Returns a single integer
        returnType.addInt();
    }
    @Override
    public ScalarFunction createScalarFunction(ServerInterface srvInterface)
    {
        // Instantiates the Add2or3ints ScalarFunction class, which is able to
        // handle eitehr 2 or 3 integers as arguments.
        return new Add2or3ints();
    }
}

The factory classes and the function class or classes they call must be packaged into the same JAR file (see Compiling and packaging a Java library for details). If a host in the database cluster has the JDK installed on it, you could use the following commands to compile and package the example:

$ cd pathToJavaProject$ javac -classpath /opt/vertica/bin/VerticaSDK.jar \
> com/mycompany/multiparamexample/*.java
$ jar -cvf Add2or3intslib.jar com/vertica/sdk/BuildInfo.class \
> com/mycompany/multiparamexample/*.class
added manifest
adding: com/vertica/sdk/BuildInfo.class(in = 1202) (out= 689)(deflated 42%)
adding: com/mycompany/multiparamexample/Add2intsFactory.class(in = 677) (out= 366)(deflated 45%)
adding: com/mycompany/multiparamexample/Add2or3ints.class(in = 919) (out= 601)(deflated 34%)
adding: com/mycompany/multiparamexample/Add3intsFactory.class(in = 685) (out= 369)(deflated 46%)

Once you have packaged your overloaded UDx, you deploy it the same way as you do a regular UDx, except you use multiple CREATE FUNCTION statements to define the function, once for each factory class.

=> CREATE LIBRARY add2or3intslib as '/home/dbadmin/Add2or3intslib.jar'
-> language 'Java';
CREATE LIBRARY
=> CREATE FUNCTION add2or3ints as LANGUAGE 'Java' NAME 'com.mycompany.multiparamexample.Add2intsFactory' LIBRARY add2or3intslib;
CREATE FUNCTION
=> CREATE FUNCTION add2or3ints as LANGUAGE 'Java' NAME 'com.mycompany.multiparamexample.Add3intsFactory' LIBRARY add2or3intslib;
CREATE FUNCTION

You call the overloaded function the same way you call any other function.

=> SELECT add2or3ints(2,3);
 add2or3ints
-------------
           5
(1 row)
=> SELECT add2or3ints(2,3,4);
 add2or3ints
-------------
           9
(1 row)
=> SELECT add2or3ints(2,3,4,5);
ERROR 3457:  Function add2or3ints(int, int, int, int) does not exist, or permission is denied for add2or3ints(int, int, int, int)
HINT:  No function matches the given name and argument types. You may need to add explicit type casts

The last error was generated by Vertica, not the UDx code. It returns an error if it cannot find a factory class whose signature matches the function call's signature.

Creating an overloaded UDx is useful if you want your function to accept a limited set of potential arguments. If you want to create a more flexible function, you can create a polymorphic function.

2 - Creating a polymorphic UDx

Polymorphic UDxs accept any number and type of argument that the user supplies.

Polymorphic UDxs accept any number and type of argument that the user supplies. Transform functions (UDTFs), analytic functions (UDAnFs), and aggregate functions (UDAFs) can define their output return types at runtime, usually based on the input arguments. For example, a UDTF that adds two numbers could return an integer or a float, depending on the input types.

Vertica does not check the number or types of argument that the user passes to the UDx—it just passes the UDx all of the arguments supplied by the user. It is up to your polymorphic UDx's main processing function (for example, processBlock() in user-defined scalar functions) to examine the number and types of arguments it received and determine if it can handle them. UDxs support up to 9800 arguments.

Polymorphic UDxs are more flexible than using multiple factory classes for your function (see Overloading your UDx). They also allow you to write more concise code, instead of writing versions for each data type. The tradeoff is that your polymorphic function needs to perform more work to determine whether it can process its arguments.

Your polymorphic UDx declares that it accepts any number of arguments in its factory's getPrototype() function by calling the addAny() function on the ColumnTypes object that defines its arguments, as follows:

    // C++ example
    void getPrototype(ServerInterface &srvInterface,
                      ColumnTypes &argTypes,
                      ColumnTypes &returnType)
    {
        argTypes.addAny(); // Must be only argument type.
        returnType.addInt(); // or whatever the function returns
    }

This "any parameter" argument type is the only one that your function can declare. You cannot define required arguments and then call addAny() to declare the rest of the signature as optional. If your function has requirements for the arguments it accepts, your process() function must enforce them.

The getPrototype() example shown previously accepts any type and declares that it returns an integer. The following example shows a version of the method that defers resolving the return type until runtime. You can only use the "any" return type for transform and analytic functions.

    void getPrototype(ServerInterface &srvInterface,
                      ColumnTypes &argTypes,
                      ColumnTypes &returnType)
    {
        argTypes.addAny();
        returnType.addAny(); // type determined at runtime
    }

If you use polymorphic return types, you must also define getReturnType() in your factory. This function is called at runtime to determine the actual return type. See C++ example: PolyNthValue for an example.

Polymorphic UDxs and schema search paths

If a user does not supply a schema name as part of a UDx call, Vertica searches each schema in the schema search path for a function whose name and signature match the function call. See Setting search paths for more information about schema search paths.

Because polymorphic UDxs do not have specific signatures associated with them, Vertica initially skips them when searching for a function to handle the function call. If none of the schemas in the search path contain a UDx whose name and signature match the function call, Vertica searches the schema search path again for a polymorphic UDx whose name matches the function name in the function call.

This behavior gives precedence to a UDx whose signature exactly matches the function call. It allows you to create a "catch-all" polymorphic UDx that Vertica calls only when none of the non-polymorphic UDxs with the same name have matching signatures.

This behavior may cause confusion if your users expect the first polymorphic function in the schema search path to handle a function call. To avoid confusion, you should:

  • Avoid using the same name for different UDxs. You should always uniquely name UDxs unless you intend to create an overloaded UDx with multiple signatures.

  • When you cannot avoid having UDxs with the same name in different schemas, always supply the schema name as part of the function call. Using the schema name prevents ambiguity and ensures that Vertica uses the correct UDx to process your function calls.

2.1 - C++ example: PolyNthValue

The PolyNthValue example is an analytic function that returns the value in the Nth row in each partition in its input.

The PolyNthValue example is an analytic function that returns the value in the Nth row in each partition in its input. This function is a generalization of FIRST_VALUE [analytic] and LAST_VALUE [analytic].

The values can be of any primitive data type.

For the complete source code, see PolymorphicNthValue.cpp in the examples (in /opt/vertica/sdk/examples/AnalyticFunctions/).

Loading and using the example

Load the library and create the function as follows:

=> CREATE LIBRARY AnalyticFunctions AS '/home/dbadmin/AnalyticFns.so';
CREATE LIBRARY

=> CREATE ANALYTIC FUNCTION poly_nth_value AS LANGUAGE 'C++'
   NAME 'PolyNthValueFactory' LIBRARY AnalyticFunctions;
CREATE ANALYTIC FUNCTION

Consider a table of scores for different test groups:

=> SELECT cohort, score FROM trials;
 cohort | score
--------+-------
   1    | 9
   1    | 8
   1    | 7
   3    | 3
   3    | 2
   3    | 1
   2    | 4
   2    | 5
   2    | 6
(9 rows)

Call the function in a query that uses an OVER clause to partition the data. This example returns the second-highest score in each cohort:

=> SELECT cohort, score, poly_nth_value(score USING PARAMETERS n=2) OVER (PARTITION BY cohort) AS nth_value
FROM trials;
 cohort | score | nth_value
--------+-------+-----------
   1    | 9     |         8
   1    | 8     |         8
   1    | 7     |         8
   3    | 3     |         2
   3    | 2     |         2
   3    | 1     |         2
   2    | 4     |         5
   2    | 5     |         5
   2    | 6     |         5
(9 rows)

Factory implementation

The factory declares that the class is polymorphic, and then sets the return type based on the input type. Two factory methods specify the argument and return types.

Use the getPrototype() method to declare that the analytic function takes and returns any type:

    void getPrototype(ServerInterface &srvInterface, ColumnTypes &argTypes, ColumnTypes &returnType)
    {
        // This function supports any argument data type
        argTypes.addAny();

        // Output data type will be the same as the argument data type
        // We will specify that in getReturnType()
        returnType.addAny();
    }

The getReturnType() method is called at runtime. This is where you set the return type based on the input type:

    void getReturnType(ServerInterface &srvInterface, const SizedColumnTypes &inputTypes,
                       SizedColumnTypes &outputTypes)
    {
        // This function accepts only one argument
        // Complain if we find a different number
        std::vector<size_t> argCols;
        inputTypes.getArgumentColumns(argCols); // get argument column indices

        if (argCols.size() != 1)
        {
            vt_report_error(0, "Only one argument is expected but %s provided",
                            argCols.size()? std::to_string(argCols.size()).c_str() : "none");
        }

        // Define output type the same as argument type
        outputTypes.addArg(inputTypes.getColumnType(argCols[0]), inputTypes.getColumnName(argCols[0]));
    }

Function implementation

The analytic function itself is type-agnostic:


    void processPartition(ServerInterface &srvInterface, AnalyticPartitionReader &inputReader,
                          AnalyticPartitionWriter &outputWriter)
    {
        try {
            const SizedColumnTypes &inTypes = inputReader.getTypeMetaData();
            std::vector<size_t> argCols; // Argument column indexes.
            inTypes.getArgumentColumns(argCols);

            vint currentRow = 1;
            bool nthRowExists = false;

            // Find the value of the n-th row
            do {
                if (currentRow == this->n) {
                    nthRowExists = true;
                    break;
                } else {
                    currentRow++;
                }
            } while (inputReader.next());

            if (nthRowExists) {
                do {
                    // Return n-th value
                    outputWriter.copyFromInput(0 /*dest column*/, inputReader,
                                               argCols[0] /*source column*/);
                } while (outputWriter.next());
            } else {
                // The partition has less than n rows
                // Return NULL value
                do {
                    outputWriter.setNull(0);
                } while (outputWriter.next());
            }
        } catch(std::exception& e) {
            // Standard exception. Quit.
            vt_report_error(0, "Exception while processing partition: [%s]", e.what());
        }
    }
};

2.2 - Java example: AddAnyInts

The following example shows an implementation of a Java ScalarFunction that adds together two or more integers.

The following example shows an implementation of a Java ScalarFunction that adds together two or more integers.

For the complete source code, see AddAnyIntsInfo.java in the examples (in /opt/vertica/sdk/examples/JavaUDx/ScalarFunctions).

Loading and using the example

Load the library and create the function as follows:

=> CREATE LIBRARY JavaScalarFunctions AS '/home/dbadmin/JavaScalarLib.jar' LANGUAGE 'JAVA';
CREATE LIBRARY

=> CREATE FUNCTION addAnyInts AS LANGUAGE 'Java' NAME 'com.vertica.JavaLibs.AddAnyIntsInfo'
   LIBRARY JavaScalarFunctions;
CREATE FUNCTION

Call the function with two or more integer arguments:

=> SELECT addAnyInts(1,2);
 addAnyInts
------------
          3
(1 row)

=> SELECT addAnyInts(1,2,3,40,50,60,70,80,900);
 addAnyInts
------------
       1206
(1 row)

Calling the function with too few arguments, or with non-integer arguments, produces errors that are generated from the processBlock() method. It is up to your UDx to ensure that the user supplies the correct number and types of arguments to your function and exit with an error if it cannot process them.

Function implementation

Most of the work in the example is done by the processBlock() method. It performs two checks on the arguments that have been passed in through the BlockReader object:

  • There are at least two arguments.

  • The data types of all arguments are integers.

It is up to your polymorphic UDx to determine that all of the input passed to it is valid.

Once the processBlock() method validates its arguments, it loops over them, adding them together.

        @Override
        public void processBlock(ServerInterface srvInterface,
                                 BlockReader arg_reader,
                                 BlockWriter res_writer)
                    throws UdfException, DestroyInvocation
        {
        SizedColumnTypes inTypes = arg_reader.getTypeMetaData();
        ArrayList<Integer> argCols = new ArrayList<Integer>(); // Argument column indexes.
        inTypes.getArgumentColumns(argCols);
        // While we have inputs to process
            do {
        long sum = 0;
        for (int i = 0; i < argCols.size(); ++i){
            long a = arg_reader.getLong(i);
            sum += a;
        }
                res_writer.setLong(sum);
                res_writer.next();
            } while (arg_reader.next());
        }
    }

Factory implementation

The factory declares the number and type of arguments in the getPrototype() function.

    @Override
    public void getPrototype(ServerInterface srvInterface,
                             ColumnTypes argTypes,
                             ColumnTypes returnType)
    {
    argTypes.addAny();
        returnType.addInt();
    }

2.3 - R example: kmeansPoly

The following example shows an implementation of a Transform Function (UDTF) that performs kmeans clustering on one or more input columns.

The following example shows an implementation of a Transform Function (UDTF) that performs kmeans clustering on one or more input columns.

kmeansPoly <- function(v.data.frame,v.param.list) {
  # Computes clusters using the kmeans algorithm.
  #
  # Input: A dataframe and a list of parameters.
  # Output: A dataframe with one column that tells the cluster to which each data
  #         point belongs.
  # Args:
  #  v.data.frame: The data from Vertica cast as an R data frame.
  #  v.param.list: List of function parameters.
  #
  # Returns:
  #  The cluster associated with each data point.
  # Ensure k is not null.
  if(!is.null(v.param.list[['k']])) {
     number_of_clusters <- as.numeric(v.param.list[['k']])
  } else {
    stop("k cannot be NULL! Please use a valid value.")
  }
  # Run the kmeans algorithm.
  kmeans_clusters <- kmeans(v.data.frame, number_of_clusters)
  final.output <- data.frame(kmeans_clusters$cluster)
  return(final.output)
}

kmeansFactoryPoly <- function() {
  # This function tells Vertica the name of the R function,
  # and the polymorphic parameters.
  list(name=kmeansPoly, udxtype=c("transform"), intype=c("any"),
       outtype=c("int"), parametertypecallback=kmeansParameters)
}

kmeansParameters <- function() {
  # Callback function for the parameter types.
  function.parameters <- data.frame(datatype=rep(NA, 1), length=rep(NA,1),
                                    scale=rep(NA,1), name=rep(NA,1))
  function.parameters[1,1] = "int"
  function.parameters[1,4] = "k"
  return(function.parameters)
}

The polymorphic R function declares it accepts any number of arguments in its factory function by specifying "any" as the argument to the intype parameter and optionally the outtype parameter. If you define "any" argument for intype or outtype, then it is the only type that your function can declare for the respective parameter. You cannot define required arguments and then call "any" to declare the rest of the signature as optional. If your function has requirements for the arguments it accepts, your process function must enforce them.

The outtypecallback method is used to indicate the argument types and sizes it has been called with, and is expected to indicate the types and sizes that the function returns. The outtypecallback method can also be used to check for unsupported types and/or number of arguments. For example, the function may require only integers, with no more than 10 of them.

You assign a SQL name to your polymorphic UDx using the same statement you use to assign one to a non-polymorphic UDx. The following statements show how you load and call the polymorphic function from the example.

=> CREATE LIBRARY rlib2 AS '/home/dbadmin/R_UDx/poly_kmeans.R' LANGUAGE 'R';
CREATE LIBRARY
=> CREATE TRANSFORM FUNCTION kmeansPoly AS LANGUAGE 'R' name 'kmeansFactoryPoly' LIBRARY rlib2;
CREATE FUNCTION
=> SELECT spec, kmeansPoly(sl,sw,pl,pw USING PARAMETERS k = 3)
    OVER(PARTITION BY spec) AS Clusters
      FROM iris;
      spec       | Clusters
-----------------+----------
 Iris-setosa     |        1
 Iris-setosa     |        1
 Iris-setosa     |        1
 Iris-setosa     |        1
.
.
.
(150 rows)