UDAF performance in statements containing a GROUP BY clause
You may see slower-than-expected performance from your UDAF if the SQL statement calling it also contains a GROUP BY clause. For example:
=> SELECT a, MYUDAF(b) FROM sampletable GROUP BY a;
In statements like this one, Vertica does not consolidate row data together before calling your UDAF's aggregate()
method. Instead, it calls aggregate()
once for each row of data. Usually, the overhead of having Vertica consolidate the row data is greater than the overhead of calling aggregate()
for each row of data. However, if your UDAF's aggregate()
method has significant overhead, then you might notice an impact on your UDAF's performance.
For example, suppose aggregate()
allocates memory. When called in a statement with a GROUP BY clause, it performs this memory allocation for each row of data. Because memory allocation is a relatively expensive process, this allocation can impact the overall performance of your UDAF and the query.
There are two ways you can address UDAF performance in a statement containing a GROUP BY clause:
-
Reduce the overhead of each call to
aggregate()
. If possible, move any allocation or other setup operations to the UDAF'ssetup()
function. -
Declare a special parameter that tells Vertica to group row data together when calling a UDAF. This technique is explained below.
Using the _minimizeCallCount parameter
Your UDAF can tell Vertica to always batch row data together to reduce the number of calls to its aggregate()
method. To trigger this behavior, your UDAF must declare an integer parameter named _minimizeCallCount
. You do not need to set a value for this parameter in your SQL statement. The fact that your UDAF declares this parameter triggers Vertica to group row data together when calling aggregate()
.
You declare the _minimizeCallCount
parameter the same way you declare other UDx parameters. See UDx parameters for more information.
Important
Always test the performance of your UDAF before and after implementing the_minimizeCallCount
parameter to ensure that it improves performance. You might find that the overhead of having Vertica group row data for your UDAF is greater than the cost of the repeated calls to aggregate()
.