Python example: explode
The following example details a UDTF that accepts a one-dimensional array as input and outputs each element of the array as a separate row, similar to functions commonly known as EXPLODE. Because this UDTF always accepts one array as input, you can explicitly mark it as a one-to-many UDTF in the factory function, which helps Vertica optimize query plans and allows users to write SELECT queries that include any expression and do not require an OVER clause.
The complete source code is in /opt/vertica/sdk/examples/python/TransformFunctions.py
.
Loading and using the example
Load the library and create the transform function as follows:
=> CREATE OR REPLACE LIBRARY PyTransformFunctions AS '/opt/vertica/sdk/examples/python/TransformFunctions.py' LANGUAGE 'Python';
CREATE LIBRARY
=> CREATE TRANSFORM FUNCTION py_explode AS LANGUAGE 'Python' NAME 'ExplodeFactory' LIBRARY TransformFunctions;
CREATE TRANSFORM FUNCTION
You can then use the function in SQL statements, for example:
=> CREATE TABLE reviews (id INTEGER PRIMARY KEY, sentiment VARCHAR(16), review ARRAY[VARCHAR(16), 32]);
CREATE TABLE
=> INSERT INTO reviews VALUES(1, 'Very Negative', string_to_array('This was the worst restaurant I have ever had the misfortune of eating at' USING PARAMETERS collection_delimiter = ' ')),
(2, 'Neutral', string_to_array('This restaurant is pretty decent' USING PARAMETERS collection_delimiter = ' ')),
(3, 'Very Positive', string_to_array('Best restaurant in the Western Hemisphere' USING PARAMETERS collection_delimiter = ' ')),
(4, 'Positive', string_to_array('Prices low for the area' USING PARAMETERS collection_delimiter = ' '));
OUTPUT
--------
4
(1 row)
=> COMMIT;
COMMIT
=> SELECT id, sentiment, py_explode(review) FROM reviews; --no OVER clause because "is_exploder = True", see below
id | sentiment | element
----+---------------+------------
1 | Very Negative | This
1 | Very Negative | was
1 | Very Negative | the
1 | Very Negative | worst
1 | Very Negative | restaurant
1 | Very Negative | I
1 | Very Negative | have
...
3 | Very Positive | Western
3 | Very Positive | Hemisphere
4 | Positive | Prices
4 | Positive | low
4 | Positive | for
4 | Positive | the
4 | Positive | area
(30 rows)
Setup
All Python UDxs must import the Vertica SDK library:
import vertica_sdk
Factory implementation
The following code shows the ExplodeFactory
class.
class ExplodeFactory(vertica_sdk.TransformFunctionFactory):
def getPrototype(self, srv_interface, arg_types, return_type):
arg_types.addAny()
return_type.addAny()
def getTransformFunctionProperties(cls, server_interface, arg_types):
props = vertica_sdk.TransformFunctionFactory.Properties()
props.is_exploder = True
return props
def getReturnType(self, srv_interface, arg_types, return_type):
if arg_types.getColumnCount() != 1:
srv_interface.reportError(1, 'explode UDT should take exactly one argument')
if not arg_types.getColumnType(0).isArrayType():
srv_interface.reportError(2, 'Argument to explode UDT should be an ARRAY')
return_type.addColumn(arg_types.getColumnType(0).getElementType(), 'element')
def createTransformFunction(cls, server_interface):
return Explode()
In this example:
-
The
getTransformFunctionProperties
method sets theis_exploder
class property to True, explicitly marking the UDTF as one-to-many. This indicates that the function uses an OVER(PARTITION ROW) clause by default and thereby removes the need to specify an OVER clause when calling the UDTF. Withis_exploder
set to True, users can write SELECT queries that include any expression, unlike queries that use PARTITION BEST or PARTITION NODES. -
The
getReturnType
method verifies that the input contains only one argument and is of type ARRAY. The method also sets the return type to that of the elements in the input array.
Function implementation
The following code shows the Explode
class:
class Explode(vertica_sdk.TransformFunction):
"""
Transform function that turns an array into one row
for each array element.
"""
def processPartition(self, srv_interface, arg_reader, res_writer):
while True:
arr = arg_reader.getArrayReader(0)
for elt in arr:
res_writer.copyRow(elt)
res_writer.next()
if not arg_reader.next():
break;
The processPartition()
method accepts a single row of the input data, processes each element of the array, and then breaks the loop. The method accesses the elements of the array with an ArrayReader
object and then uses an ArrayWriter
object to write each element of the array to a separate output row. The UDTF calls processPartition()
for each row of the input data.