Python example: multi-phase calculation
The following example shows a multi-phase transform function that computes the average value on a column of numbers in an input table. It first defines two transform functions, and then defines a factory that creates the phases using them.
See AvgMultiPhaseUDT.py
in the examples distribution for the complete code.
Loading and using the example
Create the library and function:
=> CREATE LIBRARY pylib_avg AS '/home/dbadmin/udx/AvgMultiPhaseUDT.py' LANGUAGE 'Python';
CREATE LIBRARY
=> CREATE TRANSFORM FUNCTION myAvg AS NAME 'MyAvgFactory' LIBRARY pylib_avg;
CREATE TRANSFORM FUNCTION
You can then use the function in SELECT statements:
=> CREATE TABLE IF NOT EXISTS numbers(num FLOAT);
CREATE TABLE
=> COPY numbers FROM STDIN delimiter ',';
1
2
3
4
\.
=> SELECT myAvg(num) OVER() FROM numbers;
average | ignored_rows | total_rows
---------+--------------+------------
2.5 | 0 | 4
(1 row)
Setup
All Python UDxs must import the Vertica SDK. This example also imports another library.
Component transform functions
A multi-phase transform function must define two or more TransformFunction
subclasses to be used in the phases. This example uses two classes: LocalCalculation
, which does calculations on local partitions, and GlobalCalculation
, which aggregates the results of all LocalCalculation
instances to calculate a final result.
In both functions, the calculation is done in the processPartition()
function:
Multi-phase factory
A MultiPhaseTransformFunctionFactory
ties together the individual functions as phases. The factory defines a TransformFunctionPhase
for each function. Each phase defines createTransformFunction()
, which calls the constructor for the corresponding TransformFunction
, and getReturnType()
.
The first phase, LocalPhase
, follows.
The second phase, GlobalPhase
, does not check its inputs because the first phase already did. As with the first phase, createTransformFunction
merely constructs and returns the corresponding TransformFunction
.
After defining the TransformFunctionPhase
subclasses, the factory instantiates them and chains them together in getPhases()
.