Arguments and return values
For all UDx types except load (UDL), the factory class declares the arguments and return type of the associated function. Factories have two methods for this purpose:
-
getPrototype()
(required): declares input and output types -
getReturnType()
(sometimes required): declares the return types, including length and precision, when applicable
The getPrototype()
method receives two ColumnTypes
parameters, one for input and one for output. The factory in C++ example: string tokenizer takes a single input string and returns a string:
virtual void getPrototype(ServerInterface &srvInterface,
ColumnTypes &argTypes, ColumnTypes &returnType)
{
argTypes.addVarchar();
returnType.addVarchar();
}
The ColumnTypes
class provides "add" methods for each supported type, like addVarchar()
. This class supports complex types with the addArrayType()
and addRowType()
methods; see Complex Types as Arguments. If your function is polymorphic, you can instead call addAny()
. You are then responsible for validating your inputs and outputs. For more information about implementing polymorphic UDxs, see Creating a polymorphic UDx.
The getReturnType()
method computes a maximum length for the returned value. If your UDx returns a sized column (a return data type whose length can vary, such as a VARCHAR), a value that requires precision, or more than one value, implement this factory method. (Some UDx types require you to implement it.)
The input is a SizedColumnTypes
containing the input argument types along with their lengths. Depending on the input types, add one of the following to the output types:
-
CHAR, (LONG) VARCHAR, BINARY, and (LONG) VARBINARY: return the maximum length.
-
NUMERIC types: specify the precision and scale.
-
TIME and TIMESTAMP values (with or without timezone): specify precision.
-
INTERVAL YEAR TO MONTH: specify range.
-
INTERVAL DAY TO SECOND: specify precision and range.
-
ARRAY: specify the maximum number of array elements.
In the case of the string tokenizer, the output is a VARCHAR and the function determines its maximum length:
// Tell Vertica what our return string length will be, given the input
// string length
virtual void getReturnType(ServerInterface &srvInterface,
const SizedColumnTypes &inputTypes,
SizedColumnTypes &outputTypes)
{
// Error out if we're called with anything but 1 argument
if (inputTypes.getColumnCount() != 1)
vt_report_error(0, "Function only accepts 1 argument, but %zu provided", inputTypes.getColumnCount());
int input_len = inputTypes.getColumnType(0).getStringLength();
// Our output size will never be more than the input size
outputTypes.addVarchar(input_len, "words");
}
Complex types as arguments and return values
The ColumnTypes
class supports ARRAY and ROW types. Arrays have elements and rows have fields, both of which have types that you need to describe. To work with complex types, you build ColumnTypes
objects for the array or row and then add them to the ColumnTypes
objects representing the function inputs and outputs.
In the following example, the input to a transform function is an array of orders, which are rows, and the output is the individual rows with their positions in the array. An order consists of a shipping address (VARCHAR) and an array of product IDs (INT).
The factory's getPrototype()
method first creates ColumnTypes
for the array and row elements and then calls addArrayType()
and addRowType()
using them:
void getPrototype(ServerInterface &srv,
ColumnTypes &argTypes,
ColumnTypes &retTypes)
{
// item ID (int), to be used in an array
ColumnTypes itemIdProto;
itemIdProto.addInt();
// row: order = address (varchar) + array of previously-created item IDs
ColumnTypes orderProto;
orderProto.addVarchar(); /* address */
orderProto.addArrayType(itemIdProto); /* array of item ID */
/* argument (input) is array of orders */
argTypes.addArrayType(orderProto);
/* return values: index in the array, order */
retTypes.addInt(); /* index of element */
retTypes.addRowType(orderProto); /* element return type */
}
The arguments include a sized type (the VARCHAR). The getReturnType()
method uses a similar approach, using the Fields
class to build the two fields in the order.
void getReturnType(ServerInterface &srv,
const SizedColumnTypes &argTypes,
SizedColumnTypes &retTypes)
{
Fields itemIdElementFields;
itemIdElementFields.addInt("item_id");
Fields orderFields;
orderFields.addVarchar(32, "address");
orderFields.addArrayType(itemIdElementFields[0], "item_id");
// optional third arg: max length, default unbounded
/* declare return type */
retTypes.addInt("index");
static_cast<Fields &>(retTypes).addRowType(orderFields, "element");
/* NOTE: presumably we have verified that the arguments match the prototype, so really we could just do this: */
retTypes.addInt("index");
retTypes.addArg(argTypes.getColumnType(0).getElementType(), "element");
}
To access complex types in the UDx processing method, use the ArrayReader
, ArrayWriter
, StructReader
, and StructWriter
classes.
See C++ example: using complex types for a polymorphic function that uses arrays.
The factory's getPrototype()
method first uses make
Type
()
and add
Type
()
methods to create and construct ColumnTypes
for the row and its elements. The method then calls add
Type
()
methods to add these constructed ColumnTypes
to the arg_types
and return_type
objects:
def getPrototype(self, srv_interface, arg_types, return_type):
# item ID (int), to be used in an array
itemIdProto = vertica_sdk.ColumnTypes.makeInt()
# row (order): address (varchar) + array of previously-created item IDs
orderProtoFields = vertica_sdk.ColumnTypes.makeEmpty()
orderProtoFields.addVarchar() # address
orderProtoFields.addArrayType(itemIdProto) # array of item ID
orderProto = vertica_sdk.ColumnTypes.makeRowType(orderProtoFields)
# argument (input): array of orders
arg_types.addArrayType(orderProto)
# return values: index in the array, order
return_type.addInt(); # index of element
return_type.addRowType(orderProto); # element return type
The factory's getReturnType()
method creates SizedColumnTypes
with the makeInt()
and makeEmpty()
methods and then builds two row fields with the addVarchar()
and addArrayType()
methods. Note that the addArrayType()
method specifies the maximum number of array elements as 1024. getReturnType()
then adds these constructed SizedColumnTypes
to the object representing the return type.
def getReturnType(self, srv_interface, arg_types, return_type):
itemsIdElementField = vertica_sdk.SizedColumnTypes.makeInt("item_id")
orderFields = vertica_sdk.SizedColumnTypes.makeEmpty()
orderFields.addVarchar(32, "address")
orderFields.addArrayType(itemIdElementField, 1024, "item_ids")
# declare return type
return_type.addInt("index")
return_type.addRowType(orderFields, "element")
'''
NOTE: presumably we have verified that the arguments match the prototype, so really we could just do this:
return_type.addInt("index")
return_type.addArrayType(argTypes.getColumnType(0).getElementType(), "element")
'''
To access complex types in the UDx processing method, use the ArrayReader
, ArrayWriter
, RowReader
, and RowWriter
classes. For details, see Python SDK.
See Python example: matrix multiplication for a scalar function that uses complex types.
Handling different numbers and types of arguments
You can create UDxs that handle multiple signatures, or even accept all arguments supplied to them by the user, using either overloading or polymorphism.
You can overload your UDx by assigning the same SQL function name to multiple factory classes, each of which defines a unique function signature. When a user uses the function name in a query, Vertica tries to match the signature of the function call to the signatures declared by the factory's getPrototype()
method. This is the best technique to use if your UDx needs to accept a few different signatures (for example, accepting two required and one optional argument).
Alternatively, you can write a polymorphic function, writing one factory method instead of several and declaring that it accepts any number and type of arguments. When a user uses the function name in a query, Vertica calls your function regardless of the signature. In exchange for this flexibility, your UDx's main "process" method has to determine whether it can accept the arguments and emit errors if not.
All UDx types can use polymorphic inputs. Transform functions and analytic functions can also use polymorphic outputs. This means that getPrototype()
can declare a return type of "any" and set the actual return type at runtime. For example, a function that returns the largest value in an input would return the same type as the input type.