缓冲区类

缓冲区类可用作所有 UDL 函数的原始数据流的句柄。C++ 和 Java API 对输入和输出使用单个 DataBuffer 类。Python API 包含两个类:InputBufferOutputBuffer

DataBuffer API(C++、java)

DataBuffer 类具有指向缓冲区和大小的指针,以及指示已使用的流量的偏移量。

/**
* A contiguous in-memory buffer of char *
*/
    struct DataBuffer {
    /// Pointer to the start of the buffer
    char * buf;

    /// Size of the buffer in bytes
    size_t size;

    /// Number of bytes that have been processed by the UDL
    size_t offset;
};

DataBuffer 类具有可指示已使用的流量的偏移量。因为 Java 是一种其字符串需要注意字符编码的语言,所以 UDx 必须对缓冲区进行解码或编码。解析器可以通过直接访问缓冲区来与流交互。

/**
* DataBuffer is a a contiguous in-memory buffer of data.
*/
public class DataBuffer {

/**
* The buffer of data.
*/
public byte[] buf;

/**
* An offset into the buffer that is typically used to track progress
* through the DataBuffer. For example, a UDParser advances the
* offset as it consumes data from the DataBuffer.
*/
public int offset;}

InputBuffer API 和 OutputBuffer API (Python)

Python InputBuffer 和 OutputBuffer 类会取代 C++ 和 Java API 中的 DataBuffer 类。

InputBuffer 类

InputBuffer 类会根据指定的编码来解码和转换原始数据流。Python 原本就支持各种语言和编解码器。InputBuffer 是 UDFilters 和 UDParsers 的 process() 方法的实参。用户会通过调用 InputBuffer 的方法来与 UDL 的数据流进行交互

如果没有为 setEncoding() 指定值,Vertica 会假设值为 NONE。

class InputBuffer:
    def getSize(self):
        ...
    def getOffset(self):
    ...

    def setEncoding(self, encoding):
        """
        Set the encoding of the data contained in the underlying buffer
        """
        pass

    def peek(self, length = None):
        """
        Copy data from the input buffer into Python.
        If no encoding has been specified, returns a Bytes object containing raw data.
        Otherwise, returns data decoded into an object corresponding to the specified encoding
        (for example, 'utf-8' would return a string).
        If length is None, returns all available data.
        If length is not None then the length of the returned object is at most what is requested.
        This method does not advance the buffer offset.
        """
        pass

    def read(self, length = None):
        """
        See peek().
        This method does the same thing as peek(), but it also advances the
        buffer offset by the number of bytes consumed.
        """
        pass

        # Advances the DataBuffer offset by a number of bytes equal to the result
        # of calling "read" with the same arguments.
        def advance(self, length = None):
        """
        Advance the buffer offset by the number of bytes indicated by
        the length and encoding arguments.  See peek().
    Returns the new offset.
        """
        pass

OutputBuffer 类

OutputBuffer 类会对 Python 中的数据进行编码并输出到 Vertica。OutputBuffer 是 UDFilters 和 UDParsers 的 process() 方法的实参。用户会通过调用 OutputBuffer 的方法来操作数据并进行编码,从而与 UDL 的数据流进行交互。

write() 方法会将所有数据从 Python 客户端传输到 Vertica。输出缓冲区可以接受任何大小的对象。如果用户向 OutputBuffer 写入的对象大于 Vertica 可以立即处理的大小,则 Vertica 会存储溢出。在下一次调用 process() 期间,Vertica 会检查剩余数据。如果有剩余数据,Vertica 会先将其复制到 DataBuffer,然后再确定是否需要从 Python UDL 调用 process()

如果没有为 setEncoding() 指定值,Vertica 会假设值为 NONE。

class OutputBuffer:
def setEncoding(self, encoding):
"""
Specify the encoding of the data which will be written to the underlying buffer
"""
pass
def write(self, data):
"""
Transfer bytes from the data object into Vertica.
If an encoding was specified via setEncoding(), the data object will be converted to bytes using the specified encoding.
Otherwise, the data argument is expected to be a Bytes object and is copied to the underlying buffer.
"""
pass