This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

C++ SDK

The Vertica SDK supports writing both fenced and unfenced UDxs in C++ 11.

The Vertica SDK supports writing both fenced and unfenced UDxs in C++ 11. You can download, compile, and run the examples; see Downloading and running UDx example code. Running the examples is a good way to verify that your development environment has all needed libraries.

If you do not have access to a Vertica test environment, you can install Vertica on your development machine and run a single node. Each time you rebuild your UDx library, you need to re-install it into Vertica. The following diagram illustrates the typical development cycle.

This section covers C++-specific topics that apply to all UDx types. For information that applies to all languages, see Arguments and return values, UDx parameters, Errors, warnings, and logging, Handling cancel requests and the sections for specific UDx types. For full API documentation, see the C++ SDK Documentation.

1 - Setting up the C++ SDK

The Vertica C++ Software Development Kit (SDK) is distributed as part of the server installation.

The Vertica C++ Software Development Kit (SDK) is distributed as part of the server installation. It contains the source and header files you need to create your UDx library. For examples that you can compile and run, see Downloading and running UDx example code.

Requirements

At a minimum, install the following on your development machine:

  • gcc version 8 or higher. Vertica has not tested UDx builds with versions higher than 9.

  • g++ and its associated toolchain, such as ld. Some Linux distributions package g++ separately from gcc.

  • A copy of the Vertica SDK.

You must compile with a std flag value of c++11 or later.

The following optional software packages can simplify development:

  • make, or some other build-management tool.

  • gdb, or some other debugger.

  • Valgrind, or similar tools that detect memory leaks.

If you want to use any third-party libraries, such as statistical analysis libraries, you need to install them on your development machine. If you do not statically link these libraries into your UDx library, you must install them on every node in the cluster. See Compiling your C++ library for details.

SDK files

The SDK files are located in the sdk subdirectory under the root Vertica server directory (usually, /opt/vertica/sdk). This directory contains a subdirectory, include, which contains the headers and source files needed to compile UDx libraries.

There are two files in the include directory you need when compiling your UDx:

  • Vertica.h is the main header file for the SDK. Your UDx code needs to include this file in order to find the SDK's definitions.

  • Vertica.cpp contains support code that needs to be compiled into the UDx library.

Much of the Vertica SDK API is defined in the VerticaUDx.h header file (which is included by the Vertica.h file). If you're curious, you might want to review the contents of this file in addition to reading the API documentation.

Finding the current SDK version

You must develop your UDx using the same SDK version as the database in which you plan to use it. To display the SDK version currently installed on your system, run the following command in vsql:

=> SELECT sdk_version();

Running the examples

You can download the examples from the GitHub repository (see Downloading and running UDx example code). Compiling and running the examples helps you to ensure that your development environment is properly set up.

To compile all of the examples, including the Java examples, issue the following command in the Java-and-C++ directory under the examples directory:

$ make

2 - Compiling your C++ library

GNU g++ is the only supported compiler for compiling UDx libraries.

GNU g++ is the only supported compiler for compiling UDx libraries. Always compile your UDx code on the same version of Linux that you use on your Vertica cluster.

When compiling your library, you must always:

  • Compile with a -std flag value of c++11 or later.

  • Pass the -shared and -fPIC flags to the linker. The simplest method is to just pass these flags to g++ when you compile and link your library.

  • Use the -Wno-unused-value flag to suppress warnings when macro arguments are not used. If you do not use this flag, you may get "left-hand operand of comma has no effect" warnings.

  • Compile sdk/include/Vertica.cpp and link it into your library. This file contains support routines that help your UDx communicate with Vertica. The easiest way to do this is to include it in the g++ command to compile your library. Vertica supplies this file as C++ source rather than a library to limit library compatibility issues.

  • Add the Vertica SDK include directory in the include search path using the g++ -I flag.

The SDK examples include a working makefile. See Downloading and running UDx example code.

Example of compiling a UDx

The following command compiles a UDx contained in a single source file named MyUDx.cpp into a shared library named MyUDx.so:

g++ -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value \
      -fPIC -o MyUDx.so MyUDx.cpp /opt/vertica/sdk/include/Vertica.cpp

After you debug your UDx, you are ready to deploy it. Recompile your UDx using the -O3 flag to enable compiler optimization.

You can add additional source files to your library by adding them to the command line. You can also compile them separately and then link them together.

Handling external libraries

You must link your UDx library to any supporting libraries that your UDx code relies on.These libraries might be either ones you developed or others provided by third parties. You have two options for linking:

  • Statically link the support libraries into your UDx. The benefit of this method is that your UDx library does not rely on external files. Having a single UDx library file simplifies deployment because you just transfer a single file to your Vertica cluster. This method's main drawback is that it increases the size of your UDx library file.

  • Dynamically link the library to your UDx. You must sometimes use dynamic linking if a third-party library does not allow static linking. In this case, you must copy the libraries to your Vertica cluster in addition to your UDx library file.

3 - Adding metadata to C++ libraries

For example, the following code demonstrates adding metadata to the Add2Ints example (see C++ Example: Add2Ints).

You can add metadata, such as author name, the version of the library, a description of your library, and so on to your library. This metadata lets you track the version of your function that is deployed on a Vertica Analytic Database cluster and lets third-party users of your function know who created the function. Your library's metadata appears in the USER_LIBRARIES system table after your library has been loaded into the Vertica Analytic Database catalog.

You declare the metadata for your library by calling the RegisterLibrary() function in one of the source files for your UDx. If there is more than one function call in the source files for your UDx, whichever gets interpreted last as Vertica Analytic Database loads the library is used to determine the library's metadata.

The RegisterLibrary() function takes eight string parameters:

RegisterLibrary(author,
                library_build_tag,
                library_version,
                library_sdk_version,
                source_url,
                description,
                licenses_required,
                signature);
  • author contains whatever name you want associated with the creation of the library (your own name or your company's name for example).

  • library_build_tag is a string you want to use to represent the specific build of the library (for example, the SVN revision number or a timestamp of when the library was compiled). This is useful for tracking instances of your library as you are developing them.

  • library_version is the version of your library. You can use whatever numbering or naming scheme you want.

  • library_sdk_version is the version of the Vertica Analytic Database SDK Library for which you've compiled the library.

  • source_url is a URL where users of your function can find more information about it. This can be your company's website, the GitHub page hosting your library's source code, or whatever site you like.

  • description is a concise description of your library.

  • licenses_required is a placeholder for licensing information. You must pass an empty string for this value.

  • signature is a placeholder for a signature that will authenticate your library. You must pass an empty string for this value.

For example, the following code demonstrates adding metadata to the Add2Ints example (see C++ example: Add2Ints).

// Register the factory with Vertica
RegisterFactory(Add2IntsFactory);

// Register the library's metadata.
RegisterLibrary("Whizzo Analytics Ltd.",
                "1234",
                "2.0",
                "7.0.0",
                "http://www.example.com/add2ints",
                "Add 2 Integer Library",
                "",
                "");

Loading the library and querying the USER_LIBRARIES system table shows the metadata supplied in the call to RegisterLibrary():

=> CREATE LIBRARY add2intslib AS '/home/dbadmin/add2ints.so';
CREATE LIBRARY
=> \x
Expanded display is on.
=> SELECT * FROM USER_LIBRARIES WHERE lib_name = 'add2intslib';
-[ RECORD 1 ]-----+----------------------------------------
schema_name       | public
lib_name          | add2intslib
lib_oid           | 45035996273869808
author            | Whizzo Analytics Ltd.
owner_id          | 45035996273704962
lib_file_name     | public_add2intslib_45035996273869808.so
md5_sum           | 732c9e145d447c8ac6e7304313d3b8a0
sdk_version       | v7.0.0-20131105
revision          | 125200
lib_build_tag     | 1234
lib_version       | 2.0
lib_sdk_version   | 7.0.0
source_url        | http://www.example.com/add2ints
description       | Add 2 Integer Library
licenses_required |
signature         |

4 - C++ SDK data types

The Vertica SDK has typedefs and classes for representing Vertica data types within your UDx code.

The Vertica SDK has typedefs and classes for representing Vertica data types within your UDx code. Using these typedefs ensures data type compatibility between the data your UDx processes and generates and the Vertica database. The following table describes some of the typedefs available. Consult the C++ SDK Documentation for a complete list, as well as lists of helper functions to convert and manipulate these data types.

For information about SDK support for complex data types, see Complex Types as Arguments and Return Values.

Type Definition Description
Interval A Vertica interval
IntervalYM A Vertica year-to-month interval.
Timestamp A Vertica timestamp
vint A standard Vertica 64-bit integer
vbool A Boolean value in Vertica
vbool_null A null value for a Boolean data types
vfloat A Vertica floating point value
VString

String data types (such as varchar and char)

Note: Do not use a VString object to hold an intermediate result. Use a std::string or char[] instead.

VNumeric Fixed-point data types from Vertica
VUuid A Vertica universally unique identifier

Notes

  • When making some Vertica SDK API calls (such as VerticaType::getNumericLength()) on objects, make sure they have the correct data type. To minimize overhead and improve performance, most of the APIs do not check the data types of the objects on which they are called. Calling a function on an incorrect data type can result in an error.

  • You cannot create instances of VString or VNumeric yourself. You can manipulate the values of existing objects of these classes that Vertica passes to your UDx, and extract values from them. However, only Vertica can instantiate these classes.

5 - Resource use for C++ UDxs

Your UDxs consume at least a small amount of memory by instantiating classes and creating local variables.

Your UDxs consume at least a small amount of memory by instantiating classes and creating local variables. This basic memory usage by UDxs is small enough that you do not need to be concerned about it.

If your UDx needs to allocate more than one or two megabytes of memory for data structures, or requires access to additional resources such as files, you must inform Vertica about its resource use. Vertica can then ensure that the resources your UDx requires are available before running a query that uses it. Even moderate memory use (10MB per invocation of a UDx, for example) can become an issue if there are many simultaneous queries that call it.

5.1 - Allocating resources for UDxs

You have two options for allocating memory and file handles for your user-defined extensions (UDxs):.

You have two options for allocating memory and file handles for your user-defined extensions (UDxs):

  • Use Vertica SDK macros to allocate resources. This is the best method, since it uses Vertica's own resource manager, and guarantees that resources used by your UDx are reclaimed. See Allocating resources with the SDK macros.

  • While not the recommended option, you can allocate resources in your UDxs yourself using standard C++ methods (instantiating objects using new, allocating memory blocks using malloc(), etc.). You must manually free these resources before your UDx exits.

Whichever method you choose, you usually allocate resources in a function named setup() in your UDx class. This function is called after your UDx function object is instantiated, but before Vertica calls it to process data.

If you allocate memory on your own in the setup() function, you must free it in a corresponding function named destroy(). This function is called after your UDx has performed all of its processing. This function is also called if your UDx returns an error (see Handling errors).

The following code fragment demonstrates allocating and freeing memory using a setup() and destroy() function.

class MemoryAllocationExample : public ScalarFunction
{
public:
    uint64* myarray;
    // Called before running the UDF to allocate memory used throughout
    // the entire UDF processing.
    virtual void setup(ServerInterface &srvInterface, const SizedColumnTypes
                        &argTypes)
    {
        try
        {
            // Allocate an array. This memory is directly allocated, rather than
            // letting Vertica do it. Remember to properly calculate the amount
            // of memory you need based on the data type you are allocating.
            // This example divides 500MB by 8, since that's the number of
            // bytes in a 64-bit unsigned integer.
            myarray = new uint64[1024 * 1024 * 500 / 8];
        }
        catch (std::bad_alloc &ba)
        {
            // Always check for exceptions caused by failed memory
            // allocations.
            vt_report_error(1, "Couldn't allocate memory :[%s]", ba.what());
        }

    }

    // Called after the UDF has processed all of its information. Use to free
    // any allocated resources.
    virtual void destroy(ServerInterface &srvInterface, const SizedColumnTypes
                          &argTypes)
    {
        // srvInterface.log("RowNumber processed %d records", *count_ptr);
        try
        {
            // Properly dispose of the allocated memory.
            delete[] myarray;
        }
        catch (std::bad_alloc &ba)
        {
            // Always check for exceptions caused by failed memory
            // allocations.
            vt_report_error(1, "Couldn't free memory :[%s]", ba.what());
        }

    }

5.2 - Allocating resources with the SDK macros

The Vertica SDK provides three macros to allocate memory:.

The Vertica SDK provides three macros to allocate memory:

  • vt_alloc allocates a block of memory to fit a specific data type (vint, struct, etc.).

  • vt_allocArray allocates a block of memory to hold an array of a specific data type.

  • vt_allocSize allocates an arbitrarily-sized block of memory.

All of these macros allocate their memory from memory pools managed by Vertica. The main benefit of allowing Vertica to manage your UDx's memory is that the memory is automatically reclaimed after your UDx has finished. This ensures there is no memory leaks in your UDx.

Because Vertica frees this memory automatically, do not attempt to free any of the memory you allocate through any of these macros. Attempting to free this memory results in run-time errors.

5.3 - Informing Vertica of resource requirements

When you run your UDx in fenced mode, Vertica monitors its use of memory and file handles.

When you run your UDx in fenced mode, Vertica monitors its use of memory and file handles. If your UDx uses more than a few megabytes of memory or any file handles, it should tell Vertica about its resource requirements. Knowing the resource requirements of your UDx allows Vertica to determine whether it can run the UDx immediately or needs to queue the request until enough resources become available to run it.

Determining how much memory your UDx requires can be difficult in some cases. For example, if your UDx extracts unique data elements from a data set, there is potentially no bound on the number of data items. In this case, a useful technique is to run your UDx in a test environment and monitor its memory use on a node as it handles several differently-sized queries, then extrapolate its memory use based on the worst-case scenario it may face in your production environment. In all cases, it's usually a good idea to add a safety margin to the amount of memory you tell Vertica your UDx uses.

Your UDx informs Vertica of its resource needs by implementing the getPerInstanceResources() function in its factory class (see Vertica::UDXFactory::getPerInstanceResources() in the SDK documentation). If your UDx's factory class implements this function, Vertica calls it to determine the resources your UDx requires.

The getPerInstanceResources() function receives an instance of the Vertica::VResources struct. This struct contains fields that set the amount of memory and the number of file handles your UDx needs. By default, the Vertica server allocates zero bytes of memory and 100 file handles for each instance of your UDx.

Your implementation of the getPerInstanceResources() function sets the fields in the VResources struct based on the maximum resources your UDx may consume for each instance of the UDx function. So, if your UDx's processBlock() function creates a data structure that uses at most 100MB of memory, your UDx must set the VResources.scratchMemory field to at least 104857600 (the number of bytes in 100MB). Leave yourself a safety margin by increasing the number beyond what your UDx should normally consume. In this example, allocating 115000000 bytes (just under 110MB) is a good idea.

The following ScalarFunctionFactory class demonstrates calling getPerInstanceResources() to inform Vertica about the memory requirements of the MemoryAllocationExample class shown in Allocating resources for UDxs. It tells Vertica that the UDSF requires 510MB of memory (which is a bit more than the UDSF actually allocates, to be on the safe size).

class MemoryAllocationExampleFactory : public ScalarFunctionFactory
{
    virtual Vertica::ScalarFunction *createScalarFunction(Vertica::ServerInterface
                                                            &srvInterface)
    {
        return vt_createFuncObj(srvInterface.allocator, MemoryAllocationExample);
    }
    virtual void getPrototype(Vertica::ServerInterface &srvInterface,
                              Vertica::ColumnTypes &argTypes,
                              Vertica::ColumnTypes &returnType)
    {
        argTypes.addInt();
        argTypes.addInt();
        returnType.addInt();
    }
    // Tells Vertica the amount of resources that this UDF uses.
    virtual void getPerInstanceResources(ServerInterface &srvInterface,
                                          VResources &res)
    {
        res.scratchMemory += 1024LL * 1024 * 510; // request 510MB of memory
    }
};

5.4 - Setting memory limits for fenced-mode UDxs

Vertica calls a fenced-mode UDx's implementation of Vertica::UDXFactory::getPerInstanceResources() to determine if there are enough free resources to run the query containing the UDx (see Informing [%=Vertica.DBMS_SHORT%] of Resource Requirements).

Vertica calls a fenced-mode UDx's implementation of Vertica::UDXFactory::getPerInstanceResources() to determine if there are enough free resources to run the query containing the UDx (see Informing Vertica of resource requirements). Since these reports are not generated by actual memory use, they can be inaccurate. Once started by Vertica, a UDx could allocate far more memory or file handles than it reported it needs.

The FencedUDxMemoryLimitMB configuration parameter lets you create an absolute memory limit for UDxs. Any attempt by a UDx to allocate more memory than this limit results in a bad_alloc exception. For an example of setting FencedUDxMemoryLimitMB, see How resource limits are enforced.

5.5 - How resource limits are enforced

Before running a query, Vertica determines how much memory it requires to run.

Before running a query, Vertica determines how much memory it requires to run. If the query contains a fenced-mode UDx which implements the getPerInstanceResources() function in its factory class, Vertica calls it to determine the amount of memory the UDx needs and adds this to the total required for the query. Based on these requirements, Vertica decides how to handle the query:

  • If the total amount of memory required (including the amount that the UDxs report that they need) is larger than the session's MEMORYCAP or resource pool's MAXMEMORYSIZE setting, Vertica rejects the query. For more information about resource pools, see Resource pool architecture.

  • If the amount of memory is below the limit set by the session and resource pool limits, but there is currently not enough free memory to run the query, Vertica queues it until enough resources become available.

  • If there are enough free resources to run the query, Vertica executes it.

If the process executing your UDx attempts to allocate more memory than the limit set by the FencedUDxMemoryLimitMB configuration parameter, it receives a bad_alloc exception. For more information about FencedUDxMemoryLimitMB, see Setting memory limits for fenced-mode UDxs.

Below is the output of loading a UDSF that consumes 500MB of memory, then changing the memory settings to cause out-of-memory errors. The MemoryAllocationExample UDSF in the following example is just the Add2Ints UDSF example altered as shown in Allocating resources for UDxs and Informing Vertica of resource requirements to allocate 500MB of RAM.

=> CREATE LIBRARY mylib AS '/home/dbadmin/MemoryAllocationExample.so';
CREATE LIBRARY
=> CREATE FUNCTION usemem AS NAME 'MemoryAllocationExampleFactory' LIBRARY mylib
-> FENCED;
CREATE FUNCTION
=> SELECT usemem(1,2);
 usemem
--------
      3
(1 row)

The following statements demonstrate setting the session's MEMORYCAP to lower than the amount of memory that the UDSF reports it uses. This causes Vertica to return an error before it executes the UDSF.

=> SET SESSION MEMORYCAP '100M';
SET
=> SELECT usemem(1,2);
ERROR 3596:  Insufficient resources to execute plan on pool sysquery
[Request exceeds session memory cap: 520328KB > 102400KB]
=> SET SESSION MEMORYCAP = default;
SET

The resource pool can also prevent a UDx from running if it requires more memory than is available in the pool. The following statements demonstrate the effect of creating and using a resource pool that has too little memory for the UDSF to run. Similar to the session's MAXMEMORYCAP limit, the pool's MAXMEMORYSIZE setting prevents Vertica from executing the query containing the UDSF.

=> CREATE RESOURCE POOL small MEMORYSIZE '100M' MAXMEMORYSIZE '100M';
CREATE RESOURCE POOL
=> SET SESSION RESOURCE POOL small;
SET
=> CREATE TABLE ExampleTable(a int, b int);
CREATE TABLE
=> INSERT /*+direct*/ INTO ExampleTable VALUES (1,2);
 OUTPUT
--------
      1
(1 row)
=> SELECT usemem(a, b) FROM ExampleTable;
ERROR 3596:  Insufficient resources to execute plan on pool small
[Request Too Large:Memory(KB) Exceeded: Requested = 523136, Free = 102400 (Limit = 102400, Used = 0)]
=> DROP RESOURCE POOL small; --Dropping the pool resets the session's pool
DROP RESOURCE POOL

Finally, setting the FencedUDxMemoryLimitMB configuration parameter to lower than the UDx actually allocates results in the UDx throwing an exception. This is a different case than either of the previous two examples, since the query actually executes. The UDx's code needs to catch and handle the exception. In this example, it uses the vt_report_error macro to report the error back to Vertica and exit.

=> ALTER DATABASE DEFAULT SET FencedUDxMemoryLimitMB = 300;

=> SELECT usemem(1,2);
    ERROR 3412:  Failure in UDx RPC call InvokeSetup(): Error calling setup() in
    User Defined Object [usemem] at [MemoryAllocationExample.cpp:32], error code:
     1, message: Couldn't allocate memory :[std::bad_alloc]

=> ALTER DATABASE DEFAULT SET FencedUDxMemoryLimitMB = -1;

=> SELECT usemem(1,2);
 usemem
--------
      3
(1 row)

See also