ParserFactory class

If you write a parser, you must also write a factory to produce parser instances.

If you write a parser, you must also write a factory to produce parser instances. To do so, subclass the ParserFactory class.

Parser factories are singletons. Your subclass must be stateless, with no fields containing data. Your subclass also must not modify any global variables.

The ParserFactory class defines the following methods. Your subclass must override the prepare() method. It may override the other methods.

Setting up

Vertica calls plan() once on the initiator node to perform the following tasks:

  • Check any parameters that have been passed from the function call in the COPY statement and error messages if there are any issues. You read the parameters by getting a ParamReader object from the instance of ServerInterface passed into your plan() method.

  • Store any information that the individual hosts need in order to parse the data. For example, you could store parameters in the PlanContext instance passed in through the planCtxt parameter. The plan() method runs only on the initiator node, and the prepareUDSources() method runs on each host reading from a data source. Therefore, this object is the only means of communication between them.

    You store data in the PlanContext by getting a ParamWriter object from the getWriter() method. You then write parameters by calling methods on the ParamWriter such as setString.

Creating parsers

Vertica calls prepare() on each node to create and initialize your parser, using data stored by the plan() method.

Defining parameters

Implement getParameterTypes() to define the names and types of parameters that your parser uses. Vertica uses this information to warn callers about unknown or missing parameters. Vertica ignores unknown parameters and uses default values for missing parameters. While you should define the types and parameters for your function, you are not required to override this method.

Defining parser outputs

Implement getParserReturnType() to define the data types of the table columns that the parser outputs. If applicable, getParserReturnType() also defines the size, precision, or scale of the data types. Usually, this method reads data types of the output table from the argType and perColumnParamReader arguments and verifies that it can output the appropriate data types. If getParserReturnType() is prepared to output the data types, it calls methods on the SizedColumnTypes object passed in the returnType argument. In addition to the data type of the output column, your method should also specify any additional information about the column's data type:

  • For binary and string data types (such as CHAR, VARCHAR, and LONG VARBINARY), specify its maximum length.

  • For NUMERIC types, specify its precision and scale.

  • For Time/Timestamp types (with or without time zone), specify its precision (-1 means unspecified).

  • For ARRAY types, specify the maximum number of elements.

  • For all other types, no length or precision specification is required.

Supporting cooperative parse

To support Cooperative parse, implement prepareChunker() and return an instance of your UDChunker subclass. If isChunkerApportionable() returns true, then it is an error for this method to return null.

Cooperative parse is currently supported only in the C++ API.

Supporting apportioned load

To support Apportioned load, your parser, chunker, or both must support apportioning. To indicate that the parser can apportion a load, implement isParserApportionable() and return true. To indicate that the chunker can apportion a load, implement isChunkerApportionable() and return true.

The isChunkerApportionable() method takes a ServerInterface as an argument, so you have access to the parameters supplied in the COPY statement. You might need this information if the user can specify a record delimiter, for example. Return true from this method if and only if the factory can create a chunker for this input.

API

The ParserFactory API provides the following methods for extension by subclasses:

virtual void plan(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader, PlanContext &planCtxt);

virtual UDParser * prepare(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader,
            PlanContext &planCtxt, const SizedColumnTypes &returnType)=0;

virtual void getParameterType(ServerInterface &srvInterface, SizedColumnTypes &parameterTypes);

virtual void getParserReturnType(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader,
            PlanContext &planCtxt, const SizedColumnTypes &argTypes,
            SizedColumnTypes &returnType);

virtual bool isParserApportionable();

// C++ API only:
virtual bool isChunkerApportionable(ServerInterface &srvInterface);

virtual UDChunker * prepareChunker(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader,
            PlanContext &planCtxt, const SizedColumnTypes &returnType);

If you are using Apportioned load to divide a single input into multiple load streams, implement isParserApportionable() and/or isChunkerApportionable() and return true. Returning true from these methods does not guarantee that Verticawill apportion the load. However, returning false from both indicates that it will not try to do so.

If you are using Cooperative parse, implement prepareChunker() and return an instance of your UDChunker subclass. Cooperative parse is supported only for the C++ API.

Vertica calls the prepareChunker() method only for unfenced functions. This method is not available when you use the function in fenced mode.

If you want your chunker to be available for apportioned load, implement isChunkerApportionable() and return true.

After creating your ParserFactory, you must register it with the RegisterFactory macro.

The ParserFactory API provides the following methods for extension by subclasses:

public void plan(ServerInterface srvInterface, PerColumnParamReader perColumnParamReader, PlanContext planCtxt)
    throws UdfException;

public abstract UDParser prepare(ServerInterface srvInterface, PerColumnParamReader perColumnParamReader,
                PlanContext planCtxt, SizedColumnTypes returnType)
    throws UdfException;

public void getParameterType(ServerInterface srvInterface, SizedColumnTypes parameterTypes);

public void getParserReturnType(ServerInterface srvInterface, PerColumnParamReader perColumnParamReader,
                PlanContext planCtxt, SizedColumnTypes argTypes, SizedColumnTypes returnType)
    throws UdfException;

The ParserFactory API provides the following methods for extension by subclasses:

class PyParserFactory(vertica_sdk.SourceFactory):
    def __init__(self):
        pass
    def plan(self):
        pass
    def prepareUDSources(self, srvInterface):
        # User implement the function to create PyUDParser.
        pass