Python example: complex types JSON parser
The following example details a UDParser that takes a JSON object and parses it into complex types. For this example, the parser assumes the input data are arrays of rows with two integer fields. The input records should be separated by newline characters. If any row fields aren't specified by the JSON input, the function parses those fields as NULL.
The source code for this UDParser also contains a factory method for parsing rows that have an integer and an array of integer fields. The implementation of the parser is independent of the return type in the factory, so you can create factories with different return types that all point to the ComplexJsonParser()
class in the prepare()
method. The complete source code is in /opt/vertica/sdk/examples/python/UDParsers.py
.
Loading and using the example
Load the library and create the parser as follows:
=> CREATE OR REPLACE LIBRARY UDParsers AS '/home/dbadmin/examples/python/UDParsers.py' LANGUAGE 'Python';
=> CREATE PARSER ComplexJsonParser AS LANGUAGE 'Python' NAME 'ArrayJsonParserFactory' LIBRARY UDParsers;
You can now define a table and then use the JSON parser to load data into it, for example:
=> CREATE TABLE orders (a bool, arr array[row(a int, b int)]);
CREATE TABLE
=> COPY orders (arr) FROM STDIN WITH PARSER ComplexJsonParser();
[]
[{"a":1, "b":10}]
[{"a":1, "b":10}, {"a":null, "b":10}]
[{"a":1, "b":10},{"a":10, "b":20}]
[{"a":1, "b":10}, {"a":null, "b":null}]
[{"a":1, "b":2}, {"a":3, "b":4}, {"a":5, "b":6}, {"a":7, "b":8}, {"a":9, "b":10}, {"a":11, "b":12}, {"a":13, "b":14}]
\.
=> SELECT * FROM orders;
a | arr
--+--------------------------------------------------------------------------
| []
| [{"a":1,"b":10}]
| [{"a":1,"b":10},{"a":null,"b":10}]
| [{"a":1,"b":10},{"a":10,"b":20}]
| [{"a":1,"b":10},{"a":null,"b":null}]
| [{"a":1,"b":2},{"a":3,"b":4},{"a":5,"b":6},{"a":7,"b":8},{"a":9,"b":10},{"a":11,"b":12},{"a":13,"b":14}]
(6 rows)
Setup
All Python UDxs must import the Vertica SDK library. ComplexJsonParser()
also requires the json library.
import vertica_sdk
import json
Factory implementation
The prepare()
method instantiates and returns a parser:
def prepare(self, srvInterface, perColumnParamReader, planCtxt, returnType):
return ComplexJsonParser()
getParserReturnType()
declares that the return type must be an array of rows that each have two integer fields:
def getParserReturnType(self, rvInterface, perColumnParamReader, planCtxt, argTypes, returnType):
fieldTypes = vertica_sdk.SizedColumnTypes.makeEmpty()
fieldTypes.addInt('a')
fieldTypes.addInt('b')
returnType.addArrayType(vertica_sdk.SizedColumnTypes.makeRowType(fieldTypes, 'elements'), 64, 'arr')
Parser implementation
The process()
method reads in data with an InputBuffer
and then splits that input data on the newline character. The method then passes the processed data to the writeRows()
method. writeRows()
turns each data row into a JSON object, checks the type of that JSON object, and then writes the appropriate value or object to the output.
class ComplexJsonParser(vertica_sdk.UDParser):
leftover = ''
def process(self, srvInterface, input_buffer, input_state, writer):
input_buffer.setEncoding('utf-8')
self.count = 0
rec = self.leftover + input_buffer.read()
row_lst = rec.split('\n')
self.leftover = row_lst[-1]
self.writeRows(row_lst[:-1], writer)
if input_state == InputState.END_OF_FILE:
self.writeRows([self.leftover], writer)
return StreamState.DONE
else:
return StreamState.INPUT_NEEDED
def writeRows(self, str_lst, writer):
for s in str_lst:
stripped = s.strip()
if len(stripped) == 0:
return
elif len(stripped) > 1 and stripped[0:2] == "//":
continue
jsonValue = json.loads(stripped)
if type(jsonValue) is list:
writer.setArray(0, jsonValue)
elif jsonValue is None:
writer.setNull(0)
else:
writer.setRow(0, jsonValue)
writer.next()