C++ SDK Documentation  23.3.0
Vertica::UDParser Class Referenceabstract

Parses an input stream into Vertica tuples (rows to be inserted into a table). More...

Inheritance diagram for Vertica::UDParser:
Inheritance graph
Collaboration diagram for Vertica::UDParser:
Collaboration graph

Public Member Functions

void cancelUDX (ServerInterface &srvInterface)
 
virtual void destroy (ServerInterface &srvInterface, SizedColumnTypes &returnType)
 
virtual void destroy (ServerInterface &srvInterface, SizedColumnTypes &returnType, SessionParamWriterMap &udSessionParams)
 
virtual RejectedRecord getRejectedRecord ()
 
bool isCanceled () const
 
virtual bool isReadyToCooperate (ServerInterface &srvInterface)
 
virtual void prepareToCooperate (ServerInterface &srvInterface, bool isLeader)
 
virtual StreamState process (ServerInterface &srvInterface, DataBuffer &input, InputState input_state)=0
 
virtual StreamState processWithMetadata (ServerInterface &srvInterface, DataBuffer &input, LengthBuffer &input_lengths, InputState input_state)
 
virtual void setup (ServerInterface &srvInterface, SizedColumnTypes &returnType)
 
virtual bool useSideChannel ()
 

Protected Member Functions

virtual void cancel (ServerInterface &srvInterface)
 
Portion getPortion ()
 

Protected Attributes

Portion portion
 
vint recordsAcceptedInBatch
 
bool seen_eob
 
std::string srcname
 
StreamWriterwriter
 

Detailed Description

Parses an input stream into Vertica tuples (rows to be inserted into a table).

A UDParser can be used with up to one UDSource and any number of UDfilters.

Note that it is UNSAFE to maintain pointers or references to any of these arguments (or any other argument passed by reference into any other function in this API) beyond the scope of the function call in question. For example, do not store a reference to the server interface or the input block on an instance variable. Vertica may free and replace these objects.

Member Function Documentation

virtual void Vertica::UDXObject::cancel ( ServerInterface srvInterface)
inlineprotectedvirtualinherited

Cancel callback to be overridden by the UDX. Called when the query running the UDX has been canceled.

Note
  • This method will be invoked at most once per UDX object. Once a UDX object has been canceled, it will never be un-canceled.
  • This method may be called from a separate thread, concurrently with other methods of this UDX object (but never the constructor or destructor). Implementations must be thread-safe with all methods of this UDX.
  • This method will be invoked for either an explicit user cancel, or in the event of an error during query execution.

Referenced by Vertica::UDXObject::cancelUDX().

void Vertica::UDXObject::cancelUDX ( ServerInterface srvInterface)
inlineinherited

Cancel callback invoked when the query running the UDX has been canceled. See cancel().

virtual void Vertica::UDParser::destroy ( ServerInterface srvInterface,
SizedColumnTypes returnType 
)
inlinevirtual

UDParser::destroy()

Will be invoked during query execution, after the last time that process() is called on this UDParser instance for a particular input file.

May optionally be overridden to perform tear-down/destruction.

See UDParser::setup() for a note about the restartability of UDParsers.

Portion Vertica::UDParser::getPortion ( )
inlineprotected

Return a description of the portion of the input source which is being processed by the active load stack. This can be called by the UDParser implementation to determine how much data to process.

This can be invoked during setup(), process() or destroy().

The Portion object returned will be a valid portion if any part of the load stack is processing portions. This UDParser instance will see input states START_OF_PORTION and/or END_OF_PORTION if it is the stack element directly processing portions.

virtual RejectedRecord Vertica::UDParser::getRejectedRecord ( )
inlinevirtual

Returns information about the rejected record

bool Vertica::UDXObject::isCanceled ( ) const
inlineinherited
Returns
true iff this UDX has been canceled
virtual bool Vertica::UDParser::isReadyToCooperate ( ServerInterface srvInterface)
inlinevirtual

UDParser::isReadyToCooperate()

Called after UDParser::prepareToCooperate(), returns false if this parser is not yet ready to cooperate. Once this method returns true the parser can begin to cooperate. Default implementation returns true, override if some preparation is required before the parser can cooperate (e.g. a certain # of rows must be skipped).

virtual void Vertica::UDParser::prepareToCooperate ( ServerInterface srvInterface,
bool  isLeader 
)
inlinevirtual

UDParser::prepareToCooperate()

Notification to this parser that it should prepare to share parsing input with another. This can only happen when a parser has an associated chunker. Default implementation does nothing.

virtual StreamState Vertica::UDParser::process ( ServerInterface srvInterface,
DataBuffer input,
InputState  input_state 
)
pure virtual

UDParser::process()

Reads data from the input stream and produces tuples to load. Vertica invokes this method repeatedly during query execution until it returns DONE, or until the query is canceled by the user.

Input: a stream of bytes.
Output: a stream of tuples.

Returns
INPUT_NEEDED
DONE
REJECT
KEEP_GOING
See Vertica::StreamState for details about return values.

On each invocation, process() is given an input buffer. It must read data from the input buffer, convert the data to fields and tuples, then write those tuples with writer. After it has consumed as much as it reasonably can (for example, after it consumes the last complete row in the input buffer), it must return INPUT_NEEDED to indicate that more data is needed, or DONE to indicate that it has completed parsing the input stream and will not read more bytes.

If input_state == END_OF_FILE, then the last byte in input is the last byte in the input stream. Returning INPUT_NEEDED does not result in any new input. In this case, process() must return DONE as soon as this operator finishes producing all output that it is going to produce.

Note that input might contain null bytes, if the source file contains null bytes. Additionally, note that input is NOT automatically null-terminated.

process() must not block indefinitely. If it cannot proceed for an extended period of time, it should return KEEP_GOING so that it can be called again. Failure to do this causes issues, including preventing the user from canceling the query.

Note that unless INPUT_NEEDED is returned, input is UNMODIFIED the next time process() is called. This means that pointers into the buffer are still valid. It also means that input.offset may be set. In general, process() code should assume that buffers start at input.buf[input.offset].

Row Rejection

process() can "reject" a row, which logs it using Vertica's rejected-rows mechanism. Rejected rows should not be emitted as tuples. All previous input must have been consumed by a call to process(). To reject a row, set input.offset to the position of the row, and return REJECT.

Referenced by processWithMetadata().

virtual StreamState Vertica::UDParser::processWithMetadata ( ServerInterface srvInterface,
DataBuffer input,
LengthBuffer input_lengths,
InputState  input_state 
)
inlinevirtual

UDParser::processWithMetadata()

Reads data from the input stream and record length metadata from the side channel and produces tuples to load. To implement processWithMetadata(), you must override useSideChannel() to return true. Vertica invokes this method repeately during query execution until it returns DONE, or until the query is canceled by the user.

Input: a stream of data bytes, and a stream of bytes containing message length metadata.
Output: a stream of tuples.

Returns
INPUT_NEEDED
DONE
REJECT
KEEP_GOING
See Vertica::StreamState for details about return values.

On each invocation, processWithMetadata() is given an input buffer and a length buffer. It must read data from the input buffer and metadata from the length buffer, convert the data to fields and tuples, then write the tuples with writer. After it has consumed as much as it reasonably can (for example, after it consumes the last complete row in the input buffer), it must return INPUT_NEEDED to indicate that more data is needed, or DONE to indicate that parsing this input stream is complete and it will not read any more bytes.

If input_state == END_OF_FILE, then the last byte in input is the last byte in the input stream. Returning INPUT_NEEDED does not result in any new input. In this case, processWithMetadata() must return DONE as soon as this operator finishes producing all output that it is going to produce.

Note that input might contain null bytes if the source file contains null bytes. Additionally, note that input are NOT automatically null-terminated.

processWithMetadata() must not block indefinitely. If it cannot proceed for an extended period of time, it should return KEEP_GOING so that it can be called again. Failure to do this causes issues, including preventing the user from canceling the query.

In general, processWithMetadata() code should assume that the DataBuffer starts at input.buf[input.offset] and the LengthBuffer starts at `input_lengths.buf[input_lengths.offset].

Row Rejection

processWithMetadata() can "reject" a row, which logs it using Vertica's rejected-rows mechanism. Rejected rows should not be emitted as tuples. All previous input must have been consumed by a call to process(). To reject a row, set input.offset to the position of the row, and return REJECT.

virtual void Vertica::UDParser::setup ( ServerInterface srvInterface,
SizedColumnTypes returnType 
)
inlinevirtual

UDParser::setup()

Will be invoked during query execution, prior to the first time that process() is called on this UDParser instance for a particular input source.

May optionally be overridden to perform setup/initialzation.

Note that UDParsers MUST BE RESTARTABLE! If loading large numbers of files, a given UDParsers may be re-used for multiple files. Vertica follows the worker-pool design pattern: At the start of COPY execution, several Parsers and several Filters are instantiated per node, by calling the corresponding prepare() method multiple times. Each Filter/Parser pair is then internally assigned to an initial Source (UDSource or internal). At that point, setup() is called; then process() is called until it is finished; then destroy() is called. If there are still sources in the pool waiting to be processed, then the UDFilter/UDSource pair will be given a second Source; setup() will be called a second time, then process() until it is finished, then destroy(). This repeats until all sources have been read.

virtual bool Vertica::UDParser::useSideChannel ( )
inlinevirtual

UDParser::useSideChannel()

Provides access to the side channel containing record length metadata, when the data source has metadata about record boundaries available in a structured format that is separate from the data payload, and it is retained throughout the load stack.

Override and return true to indicate that processWithMetadata() should be called instead of process().

Return false to implement process()

Returns
false by default.

Member Data Documentation

StreamWriter* Vertica::UDParser::writer
protected

Writer to write parsed tuples to. Has the same API as PartitionWriter, from the UDT framework.