Python example: explode

The following example details a UDTF that accepts a one-dimensional array as input and outputs each element of the array as a separate row, similar to functions commonly known as EXPLODE.

The following example details a UDTF that accepts a one-dimensional array as input and outputs each element of the array as a separate row, similar to functions commonly known as EXPLODE. Because this UDTF always accepts one array as input, you can explicitly mark it as a one-to-many UDTF in the factory function, which helps Vertica optimize query plans and allows users to write SELECT queries that include any expression and do not require an OVER clause.

The complete source code is in /opt/vertica/sdk/examples/python/TransformFunctions.py.

Loading and using the example

Load the library and create the transform function as follows:

=> CREATE OR REPLACE LIBRARY PyTransformFunctions AS '/opt/vertica/sdk/examples/python/TransformFunctions.py' LANGUAGE 'Python';
CREATE LIBRARY

=> CREATE TRANSFORM FUNCTION py_explode AS LANGUAGE 'Python' NAME 'ExplodeFactory' LIBRARY TransformFunctions;
CREATE TRANSFORM FUNCTION

You can then use the function in SQL statements, for example:

=> CREATE TABLE reviews (id INTEGER PRIMARY KEY, sentiment VARCHAR(16), review ARRAY[VARCHAR(16), 32]);
CREATE TABLE

=> INSERT INTO reviews VALUES(1, 'Very Negative', string_to_array('This was the worst restaurant I have ever had the misfortune of eating at' USING PARAMETERS collection_delimiter = ' ')),
    (2, 'Neutral', string_to_array('This restaurant is pretty decent' USING PARAMETERS collection_delimiter = ' ')),
    (3, 'Very Positive', string_to_array('Best restaurant in the Western Hemisphere' USING PARAMETERS collection_delimiter = ' ')),
    (4, 'Positive', string_to_array('Prices low for the area' USING PARAMETERS collection_delimiter = ' '));
OUTPUT
--------
4
(1 row)

=> COMMIT;
COMMIT

=> SELECT id, sentiment, py_explode(review) FROM reviews; --no OVER clause because "is_exploder = True", see below
 id |   sentiment   |  element
----+---------------+------------
  1 | Very Negative | This
  1 | Very Negative | was
  1 | Very Negative | the
  1 | Very Negative | worst
  1 | Very Negative | restaurant
  1 | Very Negative | I
  1 | Very Negative | have
...
  3 | Very Positive | Western
  3 | Very Positive | Hemisphere
  4 | Positive      | Prices
  4 | Positive      | low
  4 | Positive      | for
  4 | Positive      | the
  4 | Positive      | area
(30 rows)

Setup

All Python UDxs must import the Vertica SDK library:

import vertica_sdk

Factory implementation

The following code shows the ExplodeFactory class.

class ExplodeFactory(vertica_sdk.TransformFunctionFactory):
    def getPrototype(self, srv_interface, arg_types, return_type):
        arg_types.addAny()
        return_type.addAny()

    def getTransformFunctionProperties(cls, server_interface, arg_types):
        props = vertica_sdk.TransformFunctionFactory.Properties()
        props.is_exploder = True
        return props

    def getReturnType(self, srv_interface, arg_types, return_type):
        if arg_types.getColumnCount() != 1:
            srv_interface.reportError(1, 'explode UDT should take exactly one argument')
        if not arg_types.getColumnType(0).isArrayType():
            srv_interface.reportError(2, 'Argument to explode UDT should be an ARRAY')

        return_type.addColumn(arg_types.getColumnType(0).getElementType(), 'element')

    def createTransformFunction(cls, server_interface):
        return Explode()

In this example:

  • The getTransformFunctionProperties method sets the is_exploder class property to True, explicitly marking the UDTF as one-to-many. This indicates that the function uses an OVER(PARTITION ROW) clause by default and thereby removes the need to specify an OVER clause when calling the UDTF. With is_exploder set to True, users can write SELECT queries that include any expression, unlike queries that use PARTITION BEST or PARTITION NODES.

  • The getReturnType method verifies that the input contains only one argument and is of type ARRAY. The method also sets the return type to that of the elements in the input array.

Function implementation

The following code shows the Explode class:

class Explode(vertica_sdk.TransformFunction):
"""
Transform function that turns an array into one row
for each array element.
"""
    def processPartition(self, srv_interface, arg_reader, res_writer):
        while True:
            arr = arg_reader.getArrayReader(0)
            for elt in arr:
                res_writer.copyRow(elt)
                res_writer.next()
            if not arg_reader.next():
                break;

The processPartition() method accepts a single row of the input data, processes each element of the array, and then breaks the loop. The method accesses the elements of the array with an ArrayReader object and then uses an ArrayWriter object to write each element of the array to a separate output row. The UDTF calls processPartition() for each row of the input data.

See also