Java SDK Documentation  24.2.0
com.vertica.sdk.UDParser Class Referenceabstract
Inheritance diagram for com.vertica.sdk.UDParser:
Inheritance graph
Collaboration diagram for com.vertica.sdk.UDParser:
Collaboration graph

Public Member Functions

final void cancelUDX (ServerInterface srvInterface)
 
void destroy (ServerInterface srvInterface, SizedColumnTypes returnType) throws UdfException
 
void destroy (ServerInterface srvInterface, SizedColumnTypes returnType, SessionParamWriterMap udSessionParams) throws UdfException
 
int getRecordsAcceptedInBatch ()
 
RejectedRecord getRejectedRecord () throws UdfException
 
boolean getSeenEOB ()
 
StreamWriter getStreamWriter ()
 
void increRecordsAcceptedInBatch ()
 
boolean isCanceled ()
 
abstract StreamState process (ServerInterface srvInterface, DataBuffer input, InputState input_state) throws UdfException, DestroyInvocation
 
void setRecordsAcceptedInBatch (int i)
 
void setSeenEOB (Boolean b)
 
void setStreamWriter (StreamWriter writer)
 
void setup (ServerInterface srvInterface, SizedColumnTypes returnType) throws UdfException
 

Protected Member Functions

void cancel (ServerInterface srvInterface)
 

Protected Attributes

int recordsAcceptedInBatch
 
boolean seen_eob
 
StreamWriter writer
 

Detailed Description

Parses an input stream into Vertica tuples (rows to be inserted into a table).

A UDParser can be used with up to one UDSource and any number of UDfilters.

Member Function Documentation

◆ cancel()

void com.vertica.sdk.UDXObject.cancel ( ServerInterface  srvInterface)
protectedinherited

Cancel callback to be overridden by the UDX implementation. Called when the query running the UDX has been canceled.

  • This method will be invoked at most once per UDX object. Once a UDX object has been canceled, it will never be un-canceled.
  • This method may be called from a separate thread, concurrently with other methods of this UDX object (but never the constructor). Implementations must be thread-safe with all methods of this UDX.
  • This method will be invoked for either an explicit user cancel, or in the event of an error during query execution.
Parameters
srvInterfacea ServerInterface object used to communicate with Vertica

Referenced by com.vertica.sdk.UDXObject.cancelUDX().

◆ cancelUDX()

final void com.vertica.sdk.UDXObject.cancelUDX ( ServerInterface  srvInterface)
inherited

Cancel callback invoked when the query running the UDX has been canceled.

See cancel().

Parameters
srvInterfacea ServerInterface object used to communicate with Vertica

◆ destroy()

void com.vertica.sdk.UDParser.destroy ( ServerInterface  srvInterface,
SizedColumnTypes  returnType 
) throws UdfException

UDParser::destroy()

Will be invoked during query execution, after the last time that process() is called on this UDParser instance for a particular input file.

May write UD Session Parameters for namespaces public and library

May optionally be overridden to perform tear-down/destruction.

See UDParser::setup() for a note about the restartability of UDParsers.

Parameters
srvInterfacea ServerInterface object used to communicate with Vertica
returnTypethe type of the return
Exceptions
UdfExceptionUDF problem

◆ getRejectedRecord()

RejectedRecord com.vertica.sdk.UDParser.getRejectedRecord ( ) throws UdfException

Returns information about the rejected record

Returns
a rejected record
Exceptions
UdfExceptionUDF problem

◆ isCanceled()

boolean com.vertica.sdk.UDXObject.isCanceled ( )
inherited
Returns
true if execution was canceled.

Referenced by com.vertica.sdk.UDXObject.cancelUDX().

◆ process()

abstract StreamState com.vertica.sdk.UDParser.process ( ServerInterface  srvInterface,
DataBuffer  input,
InputState  input_state 
) throws UdfException, DestroyInvocation
abstract

UDParser::prepareToCooperate()

Notification to this parser that it should prepare to share parsing input with another. This can only happen when a parser has an associated chunker. Default implementation does nothing. UDParser::isReadyToCooperate()

Called after UDParser::prepareToCooperate(), returns false if this parser is not yet ready to cooperate. Once this method returns true the parser can begin to cooperate. Default implementation returns true, override if some preparation is required before the parser can cooperate (e.g. a certain # of rows must be skipped). UDParser::process()

Will be invoked repeatedly during query execution, until it returns DONE or until the query is canceled by the user.

On each invocation, process() will be given an input buffer. It should read data from that buffer, converting it to fields and tuples and writing those tuples via writer. Once it has consumed as much as it reasonably can (for example, once it has consumed the last complete row in the input buffer), it should return INPUT_NEEDED to indicate that more data is needed, or DONE to indicate that it has completed parsing this input stream and will not be reading more bytes from it.

If input_state == END_OF_FILE, then the last byte in input is the last byte in the input stream. Returning INPUT_NEEDED will not result in any new input appearing. process() should return DONE in this case as soon as this operator has finished producing all output that it is going to produce.

Note that input may contain null bytes, if the source file contains null bytes. Note also that input is NOT automatically null-terminated.

process() must not block indefinitely. If it cannot proceed for an extended period of time, it should return KEEP_GOING. It will be called again shortly. Failure to do this will, among other things, prevent the query from being canceled by the user.

Note that, unless INPUT_NEEDED is returned, input will be UNMODIFIED the next time process() is called. This means that pointers into the buffer will continue to be valid. It also means that input.offset may be set. So, in general, process() code should assume that buffers start at input.buf[input.offset].

Row Rejection

process() can "reject" a row, causing it to be logged by Vertica's rejected-rows mechanism. Rejected rows should not be emitted as tuples. A rejected row must start at the first byte of input (meaning all previous input must have been consumed by a previous call to process()). To reject a row, set input.offset to the size of the row, and return REJECT.

Returns
INPUT_NEEDED if this UDParser has more data to produce; DONE if has no more data to produce; REJECT to reject a row

Note that it is UNSAFE to maintain pointers or references to any of these arguments (or any other argument passed by reference into any other function in this API) beyond the scope of the function call in question. For example, do not store a reference to the server interface or the input block on an instance variable. Vertica may free and replace these objects.

Parameters
srvInterfacea ServerInterface object used to communicate with Vertica
inputinput data buffer
input_stateinput state
Exceptions
UdfExceptionUDF problem
DestroyInvocationInvocation needed to be destroyed

◆ setup()

void com.vertica.sdk.UDParser.setup ( ServerInterface  srvInterface,
SizedColumnTypes  returnType 
) throws UdfException

UDParser::setup()

Will be invoked during query execution, prior to the first time that process() is called on this UDParser instance for a particular input source.

May optionally be overridden to perform setup/initialization.

Note that UDParsers MUST BE RESTARTABLE! If loading large numbers of files, a given UDParsers may be re-used for multiple files. Vertica follows the worker-pool design pattern: At the start of COPY execution, several Parsers and several Filters are instantiated per node, by calling the corresponding prepare() method multiple times. Each Filter/Parser pair is then internally assigned to an initial Source (UDSource or internal). At that point, setup() is called; then process() is called until it is finished; then destroy() is called. If there are still sources in the pool waiting to be processed, then the UDFilter/UDSource pair will be given a second Source; setup() will be called a second time, then process() until it is finished, then destroy(). This repeats until all sources have been read.

Parameters
srvInterfacea ServerInterface object used to communicate with Vertica
returnTypetype of the return value
Exceptions
UdfExceptionUDF problem

Member Data Documentation

◆ writer

StreamWriter com.vertica.sdk.UDParser.writer
protected

Writer to write parsed tuples to. Has the same API as PartitionWriter, from the UDT framework.