User-defined parser
A parser takes a stream of bytes and passes a corresponding sequence of tuples to the Vertica load process. You can use user-defined parser functions to parse:
-
Data in formats not understood by the Vertica built-in parser.
-
Data that requires more specific control than the built-in parser supplies.
For example, you can load a CSV file using a specific CSV library. See the Vertica SDK for two CSV examples.
COPY supports a single user-defined parser that you can use with a user-defined source and zero or more instances of a user-defined filter. If you implement a UDParser class, you must also implement a corresponding ParserFactory.
Sometimes, you can improve the performance of your parser by adding a chunker. A chunker divides up the input and uses multiple threads to parse it. Chunkers are available only in the C++ API. For details, see Cooperative parse and UDChunker class. Under special circumstances you can further improve performance by using apportioned load, an approach where multiple Vertica nodes parse the input.