C++ example: rank
The Rank
analytic function ranks rows based on how they are ordered. A Java version of this UDx is included in /opt/vertica/sdk/examples
.
Loading and using the example
The following example shows how to load the function into Vertica. It assumes that the AnalyticFunctions.so
library that contains the function has been copied to the dbadmin user's home directory on the initiator node.
=> CREATE LIBRARY AnalyticFunctions AS '/home/dbadmin/AnalyticFunctions.so';
CREATE LIBRARY
=> CREATE ANALYTIC FUNCTION an_rank AS LANGUAGE 'C++'
NAME 'RankFactory' LIBRARY AnalyticFunctions;
CREATE ANALYTIC FUNCTION
An example of running this rank function, named an_rank
, is:
=> SELECT * FROM hits;
site | date | num_hits
-----------------+------------+----------
www.example.com | 2012-01-02 | 97
www.vertica.com | 2012-01-01 | 343435
www.example.com | 2012-01-01 | 123
www.example.com | 2012-01-04 | 112
www.vertica.com | 2012-01-02 | 503695
www.vertica.com | 2012-01-03 | 490387
www.example.com | 2012-01-03 | 123
(7 rows)
=> SELECT site,date,num_hits,an_rank()
OVER (PARTITION BY site ORDER BY num_hits DESC)
AS an_rank FROM hits;
site | date | num_hits | an_rank
-----------------+------------+----------+---------
www.example.com | 2012-01-03 | 123 | 1
www.example.com | 2012-01-01 | 123 | 1
www.example.com | 2012-01-04 | 112 | 3
www.example.com | 2012-01-02 | 97 | 4
www.vertica.com | 2012-01-02 | 503695 | 1
www.vertica.com | 2012-01-03 | 490387 | 2
www.vertica.com | 2012-01-01 | 343435 | 3
(7 rows)
As with the built-in RANK analytic function, rows that have the same value for the ORDER BY column (num_hits in this example) have the same rank, but the rank continues to increase, so that the next row that has a different ORDER BY key gets a rank value based on the number of rows that preceded it.
AnalyticFunction implementation
The following code defines an AnalyticFunction
subclass named Rank
. It is based on example code distributed in the examples directory of the SDK.
/**
* User-defined analytic function: Rank - works mostly the same as SQL-99 rank
* with the ability to define as many order by columns as desired
*
*/
class Rank : public AnalyticFunction
{
virtual void processPartition(ServerInterface &srvInterface,
AnalyticPartitionReader &inputReader,
AnalyticPartitionWriter &outputWriter)
{
// Always use a top-level try-catch block to prevent exceptions from
// leaking back to Vertica or the fenced-mode side process.
try {
rank = 1; // The rank to assign a row
rowCount = 0; // Number of rows processed so far
do {
rowCount++;
// Do we have a new order by row?
if (inputReader.isNewOrderByKey()) {
// Yes, so set rank to the total number of rows that have been
// processed. Otherwise, the rank remains the same value as
// the previous iteration.
rank = rowCount;
}
// Write the rank
outputWriter.setInt(0, rank);
// Move to the next row of the output
outputWriter.next();
} while (inputReader.next()); // Loop until no more input
} catch(exception& e) {
// Standard exception. Quit.
vt_report_error(0, "Exception while processing partition: %s", e.what());
}
}
private:
vint rank, rowCount;
};
In this example, the processPartition()
method does not actually read any of the data from the input row; it just advances through the rows. It does not need to read data; it just counts the rows that have been read and determine whether those rows have the same ORDER BY key as the previous row. If the current row is a new ORDER BY key, then the rank is set to the total number of rows that have been processed. If the current row has the same ORDER BY value as the previous row, then the rank remains the same.
Note that the function has a top-level try-catch block. All of your UDx functions should always have one to prevent stray exceptions from being passed back to Vertica (if you run the function unfenced) or the side process.
AnalyticFunctionFactory implementation
The following code defines the AnalyticFunctionFactory
that corresponds with the Rank
analytic function.
class RankFactory : public AnalyticFunctionFactory
{
virtual void getPrototype(ServerInterface &srvInterface,
ColumnTypes &argTypes, ColumnTypes &returnType)
{
returnType.addInt();
}
virtual void getReturnType(ServerInterface &srvInterface,
const SizedColumnTypes &inputTypes,
SizedColumnTypes &outputTypes)
{
outputTypes.addInt();
}
virtual AnalyticFunction *createAnalyticFunction(ServerInterface
&srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, Rank); }
};
The first method defined by the RankFactory
subclass, getPrototype()
, sets the data type of the return value. Because the Rank UDAnF does not read input, it does not define any arguments by calling methods on the ColumnTypes
object passed in the argTypes
parameter.
The next method is getReturnType()
. If your function returns a data type that needs to define a width or precision, your implementation of the getReturnType()
method calls a method on the SizedColumnType
object passed in as a parameter to tell Vertica the width or precision. Rank
returns a fixed-width data type (an INTEGER) so it does not need to set the precision or width of its output; it just calls addInt()
to report its output data type.
Finally, RankFactory
defines the createAnalyticFunction()
method that returns an instance of the AnalyticFunction
class that Vertica can call. This code is mostly boilerplate. All you need to do is add the name of your analytic function class in the call to vt_createFuncObj()
, which takes care of allocating the object for you.