Buffer classes

Buffer classes are used as handles to the raw data stream for all UDL functions.

Buffer classes are used as handles to the raw data stream for all UDL functions. The C++ and Java APIs use a single DataBuffer class for both input and output. The Python API has two classes, InputBuffer and OutputBuffer.

DataBuffer API

The DataBuffer class has a pointer to a buffer and size, and an offset indicating how much of the stream has been consumed.

/**
* A contiguous in-memory buffer of char *
*/
    struct DataBuffer {
    /// Pointer to the start of the buffer
    char * buf;

    /// Size of the buffer in bytes
    size_t size;

    /// Number of bytes that have been processed by the UDL
    size_t offset;
};

The DataBuffer class has an offset indicating how much of the stream has been consumed. Because Java is a language whose strings require attention to character encodings, the UDx must decode or encode buffers. A parser can interact with the stream by accessing the buffer directly.

/**
* DataBuffer is a a contiguous in-memory buffer of data.
*/
public class DataBuffer {

/**
* The buffer of data.
*/
public byte[] buf;

/**
* An offset into the buffer that is typically used to track progress
* through the DataBuffer. For example, a UDParser advances the
* offset as it consumes data from the DataBuffer.
*/
public int offset;}

InputBuffer and OutputBuffer APIs

The Python InputBuffer and OutputBuffer classes replace the DataBuffer class in the C++ and Java APIs.

InputBuffer class

The InputBuffer class decodes and translates raw data streams depending on the specified encoding. Python natively supports a wide range of languages and codecs. The InputBuffer is an argument to the process() method for both UDFilters and UDParsers. A user interacts with the UDL's data stream by calling methods of the InputBuffer

If you do not specify a value for setEncoding(), Vertica assumes a value of NONE.

class InputBuffer:
    def getSize(self):
        ...
    def getOffset(self):
    ...

    def setEncoding(self, encoding):
        """
        Set the encoding of the data contained in the underlying buffer
        """
        pass

    def peek(self, length = None):
        """
        Copy data from the input buffer into Python.
        If no encoding has been specified, returns a Bytes object containing raw data.
        Otherwise, returns data decoded into an object corresponding to the specified encoding
        (for example, 'utf-8' would return a string).
        If length is None, returns all available data.
        If length is not None then the length of the returned object is at most what is requested.
        This method does not advance the buffer offset.
        """
        pass

    def read(self, length = None):
        """
        See peek().
        This method does the same thing as peek(), but it also advances the
        buffer offset by the number of bytes consumed.
        """
        pass

        # Advances the DataBuffer offset by a number of bytes equal to the result
        # of calling "read" with the same arguments.
        def advance(self, length = None):
        """
        Advance the buffer offset by the number of bytes indicated by
        the length and encoding arguments.  See peek().
    Returns the new offset.
        """
        pass

OutputBuffer class

The OutputBuffer class encodes and outputs data from Python to Vertica. The OutputBuffer is an argument to the process() method for both UDFilters and UDParsers. A user interacts with the UDL's data stream by calling methods of the OutputBuffer to manipulate and encode data.

The write() method transfers all data from the Python client to Vertica. The output buffer can accept any size object. If a user writes an object to the OutputBuffer larger than Vertica can immediately process, Vertica stores the overflow. During the next call to process(),Vertica checks for leftover data. If there is any, Vertica copies it to the DataBuffer before determining whether it needs to call process() from the Python UDL.

If you do not specify a value for setEncoding(), Vertica assumes a value of NONE.

class OutputBuffer:
def setEncoding(self, encoding):
"""
Specify the encoding of the data which will be written to the underlying buffer
"""
pass
def write(self, data):
"""
Transfer bytes from the data object into Vertica.
If an encoding was specified via setEncoding(), the data object will be converted to bytes using the specified encoding.
Otherwise, the data argument is expected to be a Bytes object and is copied to the underlying buffer.
"""
pass