Developing user-defined extensions (UDx)

The primary strengths of UDxs are:.

The primary strengths of UDx are:

You can use them wherever you use internal functions.
They take full advantage of the distributed computing feature of the system. The extensions usually execute in parallel on each node in the cluster.
The complicated aspects of developing a distributed piece of analytic code are handled by the system. Your main programming task is to read in data, process it, and then write it out using the SDK APIs.

Fenced mode

Fenced mode runs the UDx code outside of the main system process. Fenced UDx crashes do not impact the core system process. There is a small performance impact when running UDx code in fenced mode. On average, using fenced mode adds about 10% more time to execution.

Fenced mode is currently available for all UDx with the exception of user-defined aggregates. All UDx run in fenced mode, since the Python runtimes cannot run directly within the system process. Using fenced mode does not affect the development of your UDx. Fenced mode is enabled by default for all UDx.

OTCAD does not support the unfenced mode.

Developing with the database SDK

Before you develop your UDx, you need to configure your development environment. The development environment must use the same database version as the production environment. For guidance on obtaining OpenText SDK, contact Technical Support. For more information about configuring your development environment, see Developing with the OpenText Analytics Database SDK.

Database upgrade

When the database is upgraded, the SDK is also upgraded. Your UDx libraries depend on that SDK. If the SDK changes, your existing libraries may no longer work. You need to recompile your library using the new SDK, delete the old library, and upload the newly compiled one. If you do not recompile your library using the new SDK, the error "Library built with incompatible SDK version. Rebuild with SDK version [string] and recreate the library" appears when loading or running the UDx.

If you make any changes to your UDx code, you must compile a new version of the library and upload it again. This also requires removing the existing library and replacing it with your updated one. The database then runs the latest version of your UDx library.

UDx types

OTCAD supports four types of user-defined extensions:

User-defined aggregate functions
User-defined analytic functions
User-defined scalar functions
User-defined transform functions

User-defined aggregate functions (UDAFs)

UDAFs allow you to create custom Aggregate functions specific to your needs. They read one column of data and return one output column. UDAFs can be developed in C++. An Aggregate function performs an operation on a set of values and returns one value. The system provides standard built-in aggregate functions such as AVG, MAX, and MIN. For more information about UDAF, see Aggregate functions (UDAFs).

User-defined analytic functions (UDAnFs)

UDAnFs are similar to UDSFs, in that they read a row of data and return a single row. However, the function can read input rows independently of outputting rows, so that the output values can be calculated over several input rows. UDAnFs can be developed in C++ and Java and are used for analytics. UDAnFs must output a single value for each row of data read and can have no more than 9800 arguments. For more information about UDAnF, see Analytic functions (UDAnFs).

User-defined scalar functions (UDSFs)

UDSFs take in a single row of data and return a single value. UDSFs can be developed in C++, python, or java. A user-defined scalar function (UDSF) returns a single value for each row of data it reads. You can use a UDSF anywhere you can use a built-in database function. For more information about UDSF, see Scalar functions (UDSFs).

User-defined transform functions (UDTFs)

UDTFs operate on table partitions and return zero or more rows of data. The data they return can be an entirely new table, unrelated to the schema of the input table, with its own ordering and segmentation expressions. They can only be used in the SELECT list of a query. UDTFs can be developed in C++, python, and java.

To optimize query performance, you can use live aggregate projections to pre-aggregate the data that a UDTF returns. A user-defined transform function (UDTF) lets you transform a table of data into another table. It reads one or more arguments (as a row of data) and returns zero or more rows of data comprising one or more columns. For more information about UDTF, see Transform functions (UDTFs).

While each UDx type has a unique base class, developing them is similar in many ways. Different UDx types can also share the same library.

UDx types and supported languages

The UDx types and their supported languages are as follows:

UDx type	Java	C++	Python
User-defined scalar functions	Yes	Yes	Yes
User-defined aggregate functions	No	Yes	No
User-defined analytic functions	Yes	Yes	No
User-defined transform functions	Yes	Yes	Yes

For more information, see [Developing user-defined extensions (UDx)] Developing user-defined extensions (UDxs)