This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Analytic functions (UDAnFs)
User-defined analytic functions (UDAnFs) are used for analytics.
User-defined analytic functions (UDAnFs) are used for analytics. See SQL analytics for an overview of Vertica's built-in analytics. Like user-defined scalar functions (UDSFs), UDAnFs must output a single value for each row of data read and can have no more than 9800 arguments.
Unlike UDSFs, the UDAnF's input reader and output reader can be advanced independently. This feature lets you create analytic functions where the output value is calculated over multiple rows of data. By advancing the reader and writer independently, you can create functions similar to the built-in analytic functions such as LAG, which uses data from prior rows to output a value for the current row.
1 - AnalyticFunction class
The AnalyticFunction class performs the analytic processing.
The AnalyticFunction
class performs the analytic processing. Your subclass must define the processPartition()
method to perform the operation. It may define methods to set up and tear down the function.
The processPartition()
method reads a partition of data, performs some sort of processing, and outputs a single value for each input row.
Vertica calls processPartition()
once for each partition of data. It supplies the partition using an AnalyticPartitionReader
object from which you read its input data. In addition, there is a unique method on this object named isNewOrderByKey()
, which returns a Boolean value indicating whether your function has seen a row with the same ORDER BY key (or keys). This method is very useful for analytic functions (such as the example RANK function) which need to handle rows with identical ORDER BY keys differently than rows with different ORDER BY keys.
Note
You can specify multiple ORDER BY columns in the SQL query you use to call your UDAnF. The isNewOrderByKey
method returns true if any of the ORDER BY keys are different than the previous row.
Once your method has finished processing the row of data, you advance it to the next row of input by calling next()
on AnalyticPartitionReader
.
Your method writes its output value using an AnalyticPartitionWriter
object that Vertica supplies as a parameter to processPartition()
. This object has data-type-specific methods to write the output value (such as setInt()
). After setting the output value, call next()
on AnalyticPartitionWriter
to advance to the next row in the output.
Note
You must be sure that your function produces a row of output for each row of input in the partition. You must also not output more rows than are in the partition, otherwise the zygote size process (if running in
Fenced and unfenced modes) or Vertica itself could generate an out of bounds error.
Setting up and tearing down
The AnalyticFunction
class defines two additional methods that you can optionally implement to allocate and free resources: setup()
and destroy()
. You should use these methods to allocate and deallocate resources that you do not allocate through the UDx API (see Allocating resources for UDxs for details).
API
The AnalyticFunction API provides the following methods for extension by subclasses:
The AnalyticFunction API provides the following methods for extension by subclasses:
2 - AnalyticFunctionFactory class
The AnalyticFunctionFactory class tells Vertica metadata about your UDAnF: its number of parameters and their data types, as well as the data type of its return value.
The AnalyticFunctionFactory
class tells Vertica metadata about your UDAnF: its number of parameters and their data types, as well as the data type of its return value. It also instantiates a subclass of AnalyticFunction
.
Your AnalyticFunctionFactory
subclass must implement the following methods:
-
getPrototype()
describes the input parameters and output value of your function. You set these values by calling functions on two ColumnTypes
objects that are passed to your method.
-
createAnalyticFunction()
supplies an instance of your AnalyticFunction
that Vertica can call to process a UDAnF function call.
-
getReturnType()
provides details about your function's output. This method is where you set the width of the output value if your function returns a variable-width value (such as VARCHAR) or the precision of the output value if it has a settable precision (such as TIMESTAMP).
API
The AnalyticFunctionFactory API provides the following methods for extension by subclasses:
The AnalyticFunctionFactory API provides the following methods for extension by subclasses:
3 - C++ example: rank
The Rank analytic function ranks rows based on how they are ordered.
The Rank
analytic function ranks rows based on how they are ordered. A Java version of this UDx is included in /opt/vertica/sdk/examples
.
Loading and using the example
The following example shows how to load the function into Vertica. It assumes that the AnalyticFunctions.so
library that contains the function has been copied to the dbadmin user's home directory on the initiator node.
=> CREATE LIBRARY AnalyticFunctions AS '/home/dbadmin/AnalyticFunctions.so';
CREATE LIBRARY
=> CREATE ANALYTIC FUNCTION an_rank AS LANGUAGE 'C++'
NAME 'RankFactory' LIBRARY AnalyticFunctions;
CREATE ANALYTIC FUNCTION
An example of running this rank function, named an_rank
, is:
=> SELECT * FROM hits;
site | date | num_hits
-----------------+------------+----------
www.example.com | 2012-01-02 | 97
www.vertica.com | 2012-01-01 | 343435
www.example.com | 2012-01-01 | 123
www.example.com | 2012-01-04 | 112
www.vertica.com | 2012-01-02 | 503695
www.vertica.com | 2012-01-03 | 490387
www.example.com | 2012-01-03 | 123
(7 rows)
=> SELECT site,date,num_hits,an_rank()
OVER (PARTITION BY site ORDER BY num_hits DESC)
AS an_rank FROM hits;
site | date | num_hits | an_rank
-----------------+------------+----------+---------
www.example.com | 2012-01-03 | 123 | 1
www.example.com | 2012-01-01 | 123 | 1
www.example.com | 2012-01-04 | 112 | 3
www.example.com | 2012-01-02 | 97 | 4
www.vertica.com | 2012-01-02 | 503695 | 1
www.vertica.com | 2012-01-03 | 490387 | 2
www.vertica.com | 2012-01-01 | 343435 | 3
(7 rows)
As with the built-in RANK analytic function, rows that have the same value for the ORDER BY column (num_hits in this example) have the same rank, but the rank continues to increase, so that the next row that has a different ORDER BY key gets a rank value based on the number of rows that preceded it.
AnalyticFunction implementation
The following code defines an AnalyticFunction
subclass named Rank
. It is based on example code distributed in the examples directory of the SDK.
In this example, the processPartition()
method does not actually read any of the data from the input row; it just advances through the rows. It does not need to read data; it just counts the rows that have been read and determine whether those rows have the same ORDER BY key as the previous row. If the current row is a new ORDER BY key, then the rank is set to the total number of rows that have been processed. If the current row has the same ORDER BY value as the previous row, then the rank remains the same.
Note that the function has a top-level try-catch block. All of your UDx functions should always have one to prevent stray exceptions from being passed back to Vertica (if you run the function unfenced) or the side process.
AnalyticFunctionFactory implementation
The following code defines the AnalyticFunctionFactory
that corresponds with the Rank
analytic function.
The first method defined by the RankFactory
subclass, getPrototype()
, sets the data type of the return value. Because the Rank UDAnF does not read input, it does not define any arguments by calling methods on the ColumnTypes
object passed in the argTypes
parameter.
The next method is getReturnType()
. If your function returns a data type that needs to define a width or precision, your implementation of the getReturnType()
method calls a method on the SizedColumnType
object passed in as a parameter to tell Vertica the width or precision. Rank
returns a fixed-width data type (an INTEGER) so it does not need to set the precision or width of its output; it just calls addInt()
to report its output data type.
Finally, RankFactory
defines the createAnalyticFunction()
method that returns an instance of the AnalyticFunction
class that Vertica can call. This code is mostly boilerplate. All you need to do is add the name of your analytic function class in the call to vt_createFuncObj()
, which takes care of allocating the object for you.