Inheritance diagram for com.vertica.sdk.UDFilter:

Collaboration diagram for com.vertica.sdk.UDFilter:

Public Member Functions
final void	cancelUDX (ServerInterface srvInterface)

void	destroy (ServerInterface srvInterface) throws UdfException

void	destroy (ServerInterface srvInterface, SessionParamWriterMap udSessionParams) throws UdfException

boolean	isCanceled ()

abstract StreamState	process (ServerInterface srvInterface, DataBuffer input, InputState input_state, DataBuffer output) throws UdfException

void	setup (ServerInterface srvInterface) throws UdfException

Protected Member Functions
void	cancel (ServerInterface srvInterface)

Detailed Description

Reads input data from a UDSource or another UDFilter and transforms it.

For example, a UDFilter might unzip a file, convert UTF-16 to UTF-8, or remove personally identifying information such as social security numbers. UDFilters can be chained, for example unzipping, converting encodings, and then stripping personal information. The first UDFilter in a chain receives its input from a UDSource, and the output of the last one in the chain is sent to a UDParser.

UDFilter is part of the load pipeline. The load pipeline consists of up to one UDSource, any number of UDFilters, and up to one UDParser.

Member Function Documentation

◆ cancel()

void com.vertica.sdk.UDXObject.cancel ( ServerInterface srvInterface )

protectedinherited

Cancel callback to be overridden by the UDX implementation. Called when the query running the UDX has been canceled.

This method will be invoked at most once per UDX object. Once a UDX object has been canceled, it will never be un-canceled.
This method may be called from a separate thread, concurrently with other methods of this UDX object (but never the constructor). Implementations must be thread-safe with all methods of this UDX.
This method will be invoked for either an explicit user cancel, or in the event of an error during query execution.

Parameters

srvInterface a ServerInterface object used to communicate with Vertica

Referenced by com.vertica.sdk.UDXObject.cancelUDX().

◆ cancelUDX()

final void com.vertica.sdk.UDXObject.cancelUDX ( ServerInterface srvInterface )

inherited

Cancel callback invoked when the query running the UDX has been canceled.

See cancel().

Parameters

srvInterface a ServerInterface object used to communicate with Vertica

◆ destroy()

void com.vertica.sdk.UDFilter.destroy ( ServerInterface srvInterface ) throws UdfException

UDFilter::destroy()

Will be invoked during query execution, after the last time that process() is called on this UDFilter instance for a particular input file.

May write UD Session Parameters for namespaces public and library May optionally be overridden to perform tear-down/destruction.

See UDFilter::setup() for a note about the restartability of UDFilters.

Parameters

srvInterface a ServerInterface object used to communicate with Vertica

Exceptions

UdfException UDF problem

◆ isCanceled()

boolean com.vertica.sdk.UDXObject.isCanceled ( )

inherited

Returns: true if execution was canceled.

Referenced by com.vertica.sdk.UDXObject.cancelUDX().

◆ process()

abstract StreamState com.vertica.sdk.UDFilter.process	(	ServerInterface	srvInterface,
		DataBuffer	input,
		InputState	input_state,
		DataBuffer	output
	)		throws UdfException

abstract

UDFilter::process()

Will be invoked repeatedly during query execution, until it returns DONE or until the query is canceled by the user.

On each invocation, process() is handed some input data and a buffer to write output data into. It is expected to read and process some amount of the input data, write some amount of output data, and return a value that informs Vertica what needs to happen next.

process() must set input.offset to the number of bytes that were successfully read from the input buffer, and that will not need to be re-consumed by a subsequent invocation of process(). This may not be larger than input.size. (input.size is the size of the buffer.) If it is set to 0, this indicates that process() cannot process any part of an input buffer of this size, and requires more data per invocation. (For example, a block-based decompression algorithm might return 0 if the input buffer does not contain a complete block.)

Note that input may contain null bytes, if the source file contains null bytes. Note also that input is NOT automatically null-terminated.

If input_state == END_OF_FILE, then the last byte in input is the last byte in the input stream. Returning INPUT_NEEDED will not result in any new input appearing. process() should return DONE in this case as soon as this operator has finished producing all output that it is going to produce.

process() must set output.offset to the number of bytes that were written to the output buffer. This may not be larger than output.size. If it is set to 0, this indicates that process() requires a larger output buffer.

Note that, unless OUTPUT_NEEDED is returned, output will be UNMODIFIED the next time process() is called. This means that pointers into the buffer will continue to be valid. It also means that output.offset may be set. So, in general, process() code should assume that buffers start at output.buf[output.offset]. The same goes for input and INPUT_NEEDED. Note also that, as a performance optimization, upstream operators may start processing emitted data (data between output.buf[0] and output.buf[output.offset]) before OUTPUT_NEEDED is returned. For this reason, output.offset must be strictly increasing.

process() must not block indefinitely. If it cannot proceed for an extended period of time, it should return KEEP_GOING. It will be called again shortly. Failure to do this will, among other things, prevent the query from being canceled by the user.

Note that it is UNSAFE to maintain pointers or references to any of these arguments (or any other argument passed by reference into any other function in this API) beyond the scope of the function call in question. For example, do not store a reference to the server interface or the input block on an instance variable. Vertica may free and replace these objects.

Parameters

srvInterface	a ServerInterface object used to communicate with Vertica
input	DataBuffer for input
input_state	InputState
output	DataBuffer for output

Returns: OUTPUT_NEEDED if this UDFilter has more data to produce; INPUT_NEEDED if it needs more data to continue working; DONE if has no more data to produce.

Exceptions

UdfException UDF problem

◆ setup()

void com.vertica.sdk.UDFilter.setup ( ServerInterface srvInterface ) throws UdfException

UDFilter::setup()

Will be invoked during query execution, prior to the first time that process() is called on this UDFilter instance for a particular input file.

May optionally be overridden to perform setup/initialzation.

Note that UDFilters MUST BE RESTARTABLE! If loading large numbers of files, a given UDFilter may be re-used for multiple files. Vertica follows the worker-pool design pattern: At the start of COPY execution, several Parsers and several Filters are instantiated per node, by calling the corresponding prepare() method multiple times. Each Filter/Parser pair is then internally assigned to an initial Source (UDSource or internal). At that point, setup() is called; then process() is called until it is finished; then destroy() is called. If there are still sources in the pool waiting to be processed, then the UDFilter/UDSource pair will be given a second Source; setup() will be called a second time, then process() until it is finished, then destroy(). This repeats until all sources have been read.

Parameters

srvInterface a ServerInterface object used to communicate with Vertica

Exceptions

UdfException UDF problem

Public Member Functions

Protected Member Functions

Detailed Description

Member Function Documentation

◆ cancel()

◆ cancelUDX()

◆ destroy()

◆ isCanceled()

◆ process()

◆ setup()