Improving query performance (C++ only)

When evaluating a query, Vertica can take advantage of available information about the ranges of values.

When evaluating a query, Vertica can take advantage of available information about the ranges of values. For example, if data is partitioned and a query restricts output by the partitioned value, Vertica can ignore partitions that cannot possibly contain data that would satisfy the query. Similarly, for a scalar function, Vertica can skip processing rows in the data where the value returned from the function cannot possibly affect the results.

Consider a table with millions of rows of data on customer orders and a scalar function that computes the total price paid for everything in an order. A query uses a WHERE clause to restrict results to orders above a given value. A scalar function is called on a block of data; if no rows within that block could produce the target value, skipping the processing of the block could improve query performance.

A scalar function written in C++ can implement the getOutputRange method. Before calling processBlock, Vertica calls getOutputRange to determine the minimum and maximum return values from this block given the input ranges. It then decides whether to call processBlock to perform the computations.

The Add2Ints example implements this function. The minimum output value is the sum of the smallest values of each of the two inputs, and the maximum output is the sum of the largest values of each of the inputs. This function does not consider individual rows. Consider the following inputs:

   a  |  b
------+------
  21  | 92
 500  | 19
 111  | 11

The smallest values of the two inputs are 21 and 11, so the function reports 32 as the low end of the output range. The largest input values are 500 and 92, so it reports 592 as the high end of the output range. 592 is larger than the value returned for any of the input rows and 32 is smaller than any row's return value.

The purpose of getOutputRange is to quickly eliminate calls where outputs would definitely be out of range. For example, if the query included "WHERE Add2Ints(a,b) > 600", this block of data could be skipped. There can still be cases where, after calling getOutputRange, processBlock returns no results. If the query included "WHERE Add2Ints(a,b) > 500", getOutputRange would not eliminate this block of data.

Add2Ints implements getOutputRange as follows:


    /*
     * This method computes the output range for this scalar function from
     *   the ranges of its inputs in a single invocation.
     *
     * The input ranges are retrieved via inRange
     * The output range is returned via outRange
     */
    virtual void getOutputRange(Vertica::ServerInterface &srvInterface,
                                Vertica::ValueRangeReader &inRange,
                                Vertica::ValueRangeWriter &outRange)
    {
        if (inRange.hasBounds(0) && inRange.hasBounds(1)) {
            // Input ranges have bounds defined
            if (inRange.isNull(0) || inRange.isNull(1)) {
                // At least one range has only NULL values.
                // Output range can only have NULL values.
                outRange.setNull();
                outRange.setHasBounds();
                return;
            } else {
                // Compute output range
                const vint& a1LoBound = inRange.getIntRefLo(0);
                const vint& a2LoBound = inRange.getIntRefLo(1);
                outRange.setIntLo(a1LoBound + a2LoBound);

                const vint& a1UpBound = inRange.getIntRefUp(0);
                const vint& a2UpBound = inRange.getIntRefUp(1);
                outRange.setIntUp(a1UpBound + a2UpBound);
            }
        } else {
            // Input ranges are unbounded. No output range can be defined
            return;
        }

        if (!inRange.canHaveNulls(0) && !inRange.canHaveNulls(1)) {
            // There cannot be NULL values in the output range
            outRange.setCanHaveNulls(false);
        }

        // Let Vertica know that the output range is bounded
        outRange.setHasBounds();
    }

If getOutputRange produces an error, Vertica issues a warning and does not call the method again for the current query.