User-defined load (UDL)
COPY offers extensive options and settings to control how to load data. However, you may find that these options do not suit the type of data load that you want to perform. The user-defined load (UDL) feature lets you develop one or more functions that change how the COPY statement operates. You can create custom libraries using the Vertica SDK to handle various steps in the loading process. .
You use three types of UDL functions during development, one for each stage of the data-load process:
-
User-defined source (UDSource): Controls how COPY obtains the data it loads into the database. For example, COPY might obtain data by fetching it through HTTP or cURL. Up to one UDSource reads data from a file or input stream. Your UDSource can read from more than one source, but COPY invokes only one UDSource.
API support: C++, Java.
-
User-defined filter (UDFilter): Preprocesses the data. For example, a filter might unzip a file or convert UTF-16 to UTF-8. You can chain multiple user-defined filters together, for example unzipping and then converting.
API support: C++, Java, Python.
-
User-defined parser (UDParser): Up to one parser parses the data into tuples that are ready to be inserted into a table. For example, a parser could extract data from an XML-like format. You can optionally define a user-defined chunker (UDChunker, C++ only), to have the parser perform parallel parsing.
API support: C++, Java, Python.
After the final step, COPY inserts the data into a table, or rejects it if the format is incorrect.