C++ SDK Documentation  23.3.0
Vertica::UDSource Class Referenceabstract

Acquires data from an external source (such as a file or URL) and produces that data in a streaming manner. More...

Inheritance diagram for Vertica::UDSource:
Inheritance graph
Collaboration diagram for Vertica::UDSource:
Collaboration graph

Public Member Functions

void cancelUDX (ServerInterface &srvInterface)
 
virtual void destroy (ServerInterface &srvInterface) __override__
 
virtual void destroy (ServerInterface &srvInterface, SessionParamWriterMap &udSessionParams) __override__
 
virtual Portion getPortion () __override__
 
virtual vint getSize ()
 
virtual std::string getUri ()
 
bool isCanceled () const
 
virtual StreamState process (ServerInterface &srvInterface, DataBuffer &output) __override__=0
 
virtual StreamState processWithMetadata (ServerInterface &srvInterface, DataBuffer &output, LengthBuffer &output_lengths)
 
virtual size_t requestMinBufferSize (ServerInterface &srvInterface)
 
virtual void setup (ServerInterface &srvInterface) __override__
 
virtual bool useSideChannel ()
 

Static Public Attributes

static const size_t DEFAULT_MIN_BUFFER_SIZE = 1024 * 1024
 

Protected Member Functions

virtual void cancel (ServerInterface &srvInterface)
 

Detailed Description

Acquires data from an external source (such as a file or URL) and produces that data in a streaming manner.

The output of a UDSource can be sent through one or more UDFilters followed by one UDParser.

Note that it is UNSAFE to maintain pointers or references to any of these arguments (or any other argument passed by reference into any other function in this API) beyond the scope of the function call in question. For example, do not store a reference to the server interface or the input block on an instance variable. Vertica may free and replace these objects.

Member Function Documentation

virtual void Vertica::UDXObject::cancel ( ServerInterface srvInterface)
inlineprotectedvirtualinherited

Cancel callback to be overridden by the UDX. Called when the query running the UDX has been canceled.

Note
  • This method will be invoked at most once per UDX object. Once a UDX object has been canceled, it will never be un-canceled.
  • This method may be called from a separate thread, concurrently with other methods of this UDX object (but never the constructor or destructor). Implementations must be thread-safe with all methods of this UDX.
  • This method will be invoked for either an explicit user cancel, or in the event of an error during query execution.

Referenced by Vertica::UDXObject::cancelUDX().

void Vertica::UDXObject::cancelUDX ( ServerInterface srvInterface)
inlineinherited

Cancel callback invoked when the query running the UDX has been canceled. See cancel().

virtual void Vertica::UDSource::destroy ( ServerInterface srvInterface)
inlinevirtual

UDSource::destroy()

Will be invoked during query execution, after the last time that process() is called on this UDSource instance.

May optionally be overridden to perform tear-down/destruction.

Reimplemented from Vertica::UnsizedUDSource.

virtual Portion Vertica::UDSource::getPortion ( )
inlinevirtual

UDSource::getPortion()

Gets this source's split if the source is apportioned. Default implementation return a Portion instance from default constructor {o=-1, s=-1, t=false}.

Implements Vertica::UnsizedUDSource.

virtual vint Vertica::UDSource::getSize ( )
inlinevirtual

UDSource::getSize()

Returns the estimated number of bytes that process() will return.

This value is treated as advisory only. It is used to indicate the file size in the LOAD_STREAMS and LOAD_SOURCES tables.

IMPORTANT: getSize() can be called at any time, even before setup() is called! (Though not before or during the constructor.)

In the case of Sources whose factories can potentially produce many UDSource instances, getSize() should avoid acquiring resources that last for the life of the object. Doing otherwise can defeat Vertica's attempts to limit the maximum number of Sources that are consuming system resources at any given time. For example, if it opens a file handle and leaves that file handle open for use by process(), and if a large number of UDSources are loaded in a single statement, the query may exceed the operating system limit on file handles and crash, even though Vertica only operates on a small number of files at once. This doesn't apply to singleton Sources, Sources whose factory will only ever produce one UDSource instance.

virtual std::string Vertica::UnsizedUDSource::getUri ( )
inlinevirtualinherited

UnsizedUDSource::getUri()

Return the URI of the current source of data.

This function will be invoked during execution to fill in monitoring information.

bool Vertica::UDXObject::isCanceled ( ) const
inlineinherited
Returns
true iff this UDX has been canceled
virtual StreamState Vertica::UDSource::process ( ServerInterface srvInterface,
DataBuffer output 
)
pure virtual

UDSource::process()

Reads data from the input source and processes it. Vertica invokes this method repeatedly until it returns DONE or the query is canceled by the user.

Input: an external data source.
Output: a stream of bytes.

Returns
OUTPUT_NEEDED
DONE
KEEP_GOING
See Vertica::StreamState for details about return values.

On each invocation, process() should acquire more data and write that data to the buffer specified by output.

process() should set output.offset (an output parameter) to the number of bytes that were written to the output buffer. It is common, though not necessary, for this to be the same as output.size (an input parameter). When process() is called, output.offset is uninitialized. To indicate that the buffer is too small to hold a record, process() should set output.offset to 0 and return OUTPUT_NEEDED. Then, process() is called again with a larger buffer.

In general, process() code should assume that buffers start at output.buf[output.offset]. As a performance optimization, upstream operators might start processing emitted data (data between output.buf[0] and output.buf[output.offset]) before OUTPUT_NEEDED is returned. For this reason, output.offset must be strictly increasing.

Implements Vertica::UnsizedUDSource.

virtual StreamState Vertica::UnsizedUDSource::processWithMetadata ( ServerInterface srvInterface,
DataBuffer output,
LengthBuffer output_lengths 
)
inlinevirtualinherited

UnsizedUDSource::processWithMetadata()

Reads data from the input source and record length metadata from the side channel and processes it. To implement processWithMetadata(), you must override useSideChannel() to return true. Vertica invokes this method repeatedly until it returns DONE or the query is canceled by the user.

Input: an external data source.
Output: a stream of data bytes, and a stream of bytes containing message length metadata from the data source.

Returns
OUTPUT_NEEDED
DONE
KEEP_GOING
See Vertica::StreamState for details about return values.

On each invocation, processWithMetadata() should acquire more data and write that data to the buffer specified by output, and write the message lengths to output_lengths.

For the DataBuffer, processWithMetadata() should set output.offset (an output parameter) to the number of bytes that were written to the output buffer. It is common, though not necessary, for this to be the same as output.size (an input parameter). When processWithMetadata() is called, output.offset is uninitialized. To indicate that the buffer is too small to hold a record, processWithMetadata() should set output.offset and output_length.offset to 0 and return OUTPUT_NEEDED. Then processWithMetadata() is called again with a larger buffer.

For the LengthBuffer, processWithMetadata() should set output_lengths.offset to the number of length values that were written to the output_lengths buffer. If output.offset is set to 0, then output_lengths.offset should also be set to 0.

In general, processWithMetadata() code should assume that data buffers start at output.buf[output.offset] and length buffers start at output_lengths.buf[output_lengths.offset].

As a performance optimization, upstream operators might start processing emitted data (data between output.buf[0] and output.buf[output.offset] in the DataBuffer, and between output_lengths.buf[0] and output_lengths.buf[output.offset] in the LengthBuffer) before OUTPUT_NEEDED is returned. For this reason, output.offset and output_lengths.offset must be strictly increasing.

virtual size_t Vertica::UnsizedUDSource::requestMinBufferSize ( ServerInterface srvInterface)
inlinevirtualinherited

Request a minimum buffer size into which this source will read data. This will be called after setup(), before the first call to process().

virtual void Vertica::UDSource::setup ( ServerInterface srvInterface)
inlinevirtual

UDSource::setup()

Will be invoked during query execution, prior to the first time that process() is called on this UDSource instance.

May optionally be overridden to perform setup/initialzation.

Reimplemented from Vertica::UnsizedUDSource.

virtual bool Vertica::UnsizedUDSource::useSideChannel ( )
inlinevirtualinherited

UnsizedUDSource::useSideChannel()

Provides access to the side channel containing record length metadata, when the UnsizedUDSource has metadata about record boundaries available in a structured format that is separate from the data payload.

Override and return true to indicate that processWithMetadata() should be called instead of process().

Return false to implement process().

Returns
false by default.