ParserFactory class
If you write a parser, you must also write a factory to produce parser instances. To do so, subclass the ParserFactory
class.
Parser factories are singletons. Your subclass must be stateless, with no fields containing data. Your subclass also must not modify any global variables.
The ParserFactory
class defines the following methods. Your subclass must override the prepare()
method. It may override the other methods.
Setting up
Vertica calls plan()
once on the initiator node to perform the following tasks:
-
Check any parameters that have been passed from the function call in the COPY statement and error messages if there are any issues. You read the parameters by getting a
ParamReader
object from the instance ofServerInterface
passed into yourplan()
method. -
Store any information that the individual hosts need in order to parse the data. For example, you could store parameters in the
PlanContext
instance passed in through theplanCtxt
parameter. Theplan()
method runs only on the initiator node, and theprepareUDSources()
method runs on each host reading from a data source. Therefore, this object is the only means of communication between them.You store data in the
PlanContext
by getting aParamWriter
object from thegetWriter()
method. You then write parameters by calling methods on theParamWriter
such assetString
.Note
ParamWriter
offers only the ability to store simple data types. For complex types, you need to serialize the data in some manner and store it as a string or long string.
Creating parsers
Vertica calls prepare()
on each node to create and initialize your parser, using data stored by the plan()
method.
Defining parameters
Implement getParameterTypes()
to define the names and types of parameters that your parser uses. Vertica uses this information to warn callers about unknown or missing parameters. Vertica ignores unknown parameters and uses default values for missing parameters. While you should define the types and parameters for your function, you are not required to override this method.
Defining parser outputs
Implement getParserReturnType()
to define the data types of the table columns that the parser outputs. If applicable, getParserReturnType()
also defines the size, precision, or scale of the data types. Usually, this method reads data types of the output table from the argType
and perColumnParamReader
arguments and verifies that it can output the appropriate data types. If getParserReturnType()
is prepared to output the data types, it calls methods on the SizedColumnTypes
object passed in the returnType
argument. In addition to the data type of the output column, your method should also specify any additional information about the column's data type:
-
For binary and string data types (such as CHAR, VARCHAR, and LONG VARBINARY), specify its maximum length.
-
For NUMERIC types, specify its precision and scale.
-
For Time/Timestamp types (with or without time zone), specify its precision (-1 means unspecified).
-
For all other types, no length or precision specification is required.
Supporting cooperative parse
To support Cooperative parse, implement prepareChunker()
and return an instance of your UDChunker
subclass. If isChunkerApportionable()
returns true
, then it is an error for this method to return null.
Cooperative parse is currently supported only in the C++ API.
Supporting apportioned load
To support Apportioned load, your parser, chunker, or both must support apportioning. To indicate that the parser can apportion a load, implement isParserApportionable()
and return true
. To indicate that the chunker can apportion a load, implement isChunkerApportionable()
and return true
.
The isChunkerApportionable()
method takes a ServerInterface
as an argument, so you have access to the parameters supplied in the COPY statement. You might need this information if the user can specify a record delimiter, for example. Return true
from this method if and only if the factory can create a chunker for this input.
API
The ParserFactory API provides the following methods for extension by subclasses:
virtual void plan(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader, PlanContext &planCtxt);
virtual UDParser * prepare(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader,
PlanContext &planCtxt, const SizedColumnTypes &returnType)=0;
virtual void getParameterType(ServerInterface &srvInterface, SizedColumnTypes ¶meterTypes);
virtual void getParserReturnType(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader,
PlanContext &planCtxt, const SizedColumnTypes &argTypes,
SizedColumnTypes &returnType);
virtual bool isParserApportionable();
// C++ API only:
virtual bool isChunkerApportionable(ServerInterface &srvInterface);
virtual UDChunker * prepareChunker(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader,
PlanContext &planCtxt, const SizedColumnTypes &returnType);
If you are using Apportioned load to divide a single input into multiple load streams, implement isParserApportionable()
and/or isChunkerApportionable()
and return true
. Returning true
from these methods does not guarantee that Verticawill apportion the load. However, returning false
from both indicates that it will not try to do so.
If you are using Cooperative parse, implement prepareChunker()
and return an instance of your UDChunker
subclass. Cooperative parse is supported only for the C++ API.
Vertica calls the prepareChunker()
method only for unfenced functions. This method is not available when you use the function in fenced mode.
If you want your chunker to be available for apportioned load, implement isChunkerApportionable()
and return true
.
After creating your ParserFactory
, you must register it with the RegisterFactory
macro.
The ParserFactory API provides the following methods for extension by subclasses:
public void plan(ServerInterface srvInterface, PerColumnParamReader perColumnParamReader, PlanContext planCtxt)
throws UdfException;
public abstract UDParser prepare(ServerInterface srvInterface, PerColumnParamReader perColumnParamReader,
PlanContext planCtxt, SizedColumnTypes returnType)
throws UdfException;
public void getParameterType(ServerInterface srvInterface, SizedColumnTypes parameterTypes);
public void getParserReturnType(ServerInterface srvInterface, PerColumnParamReader perColumnParamReader,
PlanContext planCtxt, SizedColumnTypes argTypes, SizedColumnTypes returnType)
throws UdfException;
The ParserFactory API provides the following methods for extension by subclasses:
class PyParserFactory(vertica_sdk.SourceFactory):
def __init__(self):
pass
def plan(self):
pass
def prepareUDSources(self, srvInterface):
# User implement the function to create PyUDParser.
pass