Buffer classes
Buffer classes are used as handles to the raw data stream for all UDL functions. The C++ and Java APIs use a single DataBuffer
class for both input and output. The Python API has two classes, InputBuffer
and OutputBuffer
.
DataBuffer API (C++, java)
The DataBuffer class has a pointer to a buffer and size, and an offset indicating how much of the stream has been consumed.
/**
* A contiguous in-memory buffer of char *
*/
struct DataBuffer {
/// Pointer to the start of the buffer
char * buf;
/// Size of the buffer in bytes
size_t size;
/// Number of bytes that have been processed by the UDL
size_t offset;
};
The DataBuffer class has an offset indicating how much of the stream has been consumed. Because Java is a language whose strings require attention to character encodings, the UDx must decode or encode buffers. A parser can interact with the stream by accessing the buffer directly.
/**
* DataBuffer is a a contiguous in-memory buffer of data.
*/
public class DataBuffer {
/**
* The buffer of data.
*/
public byte[] buf;
/**
* An offset into the buffer that is typically used to track progress
* through the DataBuffer. For example, a UDParser advances the
* offset as it consumes data from the DataBuffer.
*/
public int offset;}
InputBuffer and OutputBuffer APIs (python)
The Python InputBuffer and OutputBuffer classes replace the DataBuffer
class in the C++ and Java APIs.
InputBuffer class
The InputBuffer class decodes and translates raw data streams depending on the specified encoding. Python natively supports a wide range of languages and codecs. The InputBuffer is an argument to the process()
method for both UDFilters and UDParsers. A user interacts with the UDL's data stream by calling methods of the InputBuffer
If you do not specify a value for setEncoding()
, Vertica assumes a value of NONE.
class InputBuffer:
def getSize(self):
...
def getOffset(self):
...
def setEncoding(self, encoding):
"""
Set the encoding of the data contained in the underlying buffer
"""
pass
def peek(self, length = None):
"""
Copy data from the input buffer into Python.
If no encoding has been specified, returns a Bytes object containing raw data.
Otherwise, returns data decoded into an object corresponding to the specified encoding
(for example, 'utf-8' would return a string).
If length is None, returns all available data.
If length is not None then the length of the returned object is at most what is requested.
This method does not advance the buffer offset.
"""
pass
def read(self, length = None):
"""
See peek().
This method does the same thing as peek(), but it also advances the
buffer offset by the number of bytes consumed.
"""
pass
# Advances the DataBuffer offset by a number of bytes equal to the result
# of calling "read" with the same arguments.
def advance(self, length = None):
"""
Advance the buffer offset by the number of bytes indicated by
the length and encoding arguments. See peek().
Returns the new offset.
"""
pass
OutputBuffer class
The OutputBuffer class encodes and outputs data from Python to Vertica. The OutputBuffer is an argument to the process()
method for both UDFilters and UDParsers. A user interacts with the UDL's data stream by calling methods of the OutputBuffer to manipulate and encode data.
The write()
method transfers all data from the Python client to Vertica. The output buffer can accept any size object. If a user writes an object to the OutputBuffer larger than Vertica can immediately process, Vertica stores the overflow. During the next call to process()
,Vertica checks for leftover data. If there is any, Vertica copies it to the DataBuffer before determining whether it needs to call process()
from the Python UDL.
If you do not specify a value for setEncoding()
, Vertica assumes a value of NONE.
class OutputBuffer:
def setEncoding(self, encoding):
"""
Specify the encoding of the data which will be written to the underlying buffer
"""
pass
def write(self, data):
"""
Transfer bytes from the data object into Vertica.
If an encoding was specified via setEncoding(), the data object will be converted to bytes using the specified encoding.
Otherwise, the data argument is expected to be a Bytes object and is copied to the underlying buffer.
"""
pass