Transform functions (UDTFs)
A user-defined transform function (UDTF) lets you transform a table of data into another table. It reads one or more arguments (treated as a row of data), and returns zero or more rows of data consisting of one or more columns. A UDTF can produce any number of rows as output. However, each row it outputs must be complete. Advancing to the next row without having added a value for each column produces incorrect results.
The schema of the output table does not need to correspond to the schema of the input table—they can be totally different. The UDTF can return any number of output rows for each row of input.
Unless a UDTF is marked as one-to-many in its factory function, it can only be used in a SELECT list that contains the UDTF call and a required OVER clause. A multi-phase UDTF can make use of partition columns (PARTITION BY), but other UDTFs cannot.
UDTFs are run after GROUP BY, but before the final ORDER BY, when used in conjunction with GROUP BY and ORDER BY in a statement. The ORDER BY clause may contain only columns or expressions that are in a window partition clause (see Window partitioning).
UDTFs can take up to 9800 parameters (input columns). Attempts to pass more parameters to a UDTF return an error.
In this section
- TransformFunction class
- TransformFunctionFactory class
- MultiPhaseTransformFunctionFactory class
- Improving query performance (C++ only)
- Partitioning options for UDTFs
- C++ example: string tokenizer
- Python example: string tokenizer
- R example: log tokenizer
- C++ example: multi-phase indexer
- Python example: multi-phase calculation
- Python example: count elements
- Python example: explode