This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
User-defined extensions
A user-defined extension (UDx) is a component that expands Vertica functionality—for example, new types of data analysis and the ability to parse and load new types of data.
A user-defined extension (UDx) is a component that expands Vertica functionality—for example, new types of data analysis and the ability to parse and load new types of data.
This section provides an overview of how to install and use a UDx. If you are using a UDx developed by a third party, consult its documentation for detailed installation and usage instructions.
1 - Loading UDxs
User-defined extensions (UDxs) are contained in libraries.
User-defined extensions (UDxs) are contained in libraries. A library can contain multiple UDxs. To add UDxs to Vertica, you must:
-
Deploy the library (once per library).
-
Create each UDx (once per UDx).
If you are using UDxs written in Java, you must also set up a Java runtime environment. See Installing Java on Vertica hosts.
Deploying libraries
To deploy a library to your Vertica database:
-
Copy the UDx shared library file (.so
), Python file, Java JAR file, or R functions file that contains your function to a node on your Vertica cluster. You do not need to copy it to every node.
-
Connect to the node where you copied the library (for example, using vsql).
-
Add your library to the database catalog using the CREATE LIBRARY statement.
=> CREATE LIBRARY libname AS '/path_to_lib/filename'
LANGUAGE 'language';
libname
is the name you want to use to reference the library. path_to_lib/filename
is the fully-qualified path to the library or JAR file you copied to the host. language
is the implementation language.
For example, if you created a JAR file named TokenizeStringLib.jar
and copied it to the dbadmin account's home directory, you would use this command to load the library:
=> CREATE LIBRARY tokenizelib AS '/home/dbadmin/TokenizeStringLib.jar'
LANGUAGE 'Java';
You can load any number of libraries into Vertica.
Privileges
Superusers can create, modify, and drop any library. Users with the UDXDEVELOPER role or explicit grants can also act on libraries, as shown in the following table:
Creating UDx functions
After the library is loaded, define individual UDxs using SQL statements such as CREATE FUNCTION and CREATE SOURCE. These statements assign SQL function names to the extension classes in the library. They add the UDx to the database catalog and remain available after a database restart.
The statement you use depends on the type of UDx you are declaring, as shown in the following table:
If a UDx of the given name already exists, you can replace it or instruct Vertica to not replace it. To replace it, use the OR REPLACE syntax, as in the following example:
=> CREATE OR REPLACE TRANSFORM FUNCTION tokenize
AS LANGUAGE 'C++' NAME 'TokenFactory' LIBRARY TransformFunctions;
CREATE TRANSFORM FUNCTION
You might want to replace an existing function to change between fenced and unfenced modes.
Alternatively, you can use IF NOT EXISTS to prevent the function from being created again if it already exists. You might want to use this in upgrade or test scripts that require, and therefore load, UDxs. By using IF NOT EXISTS, you preserve the original definition including fenced status. The following example shows this syntax:
--- original creation:
=> CREATE TRANSFORM FUNCTION tokenize
AS LANGUAGE 'C++' NAME 'TokenFactory' LIBRARY TransformFunctions NOT FENCED;
CREATE TRANSFORM FUNCTION
--- function is not replaced (and is still unfenced):
=> CREATE TRANSFORM FUNCTION IF NOT EXISTS tokenize
AS LANGUAGE 'C++' NAME 'TokenFactory' LIBRARY TransformFunctions FENCED;
CREATE TRANSFORM FUNCTION
After you add the UDx to the database, you can use your extension within SQL statements. The database superuser can grant access privileges to the UDx for users. See GRANT (user defined extension) for details.
When you call a UDx, Vertica creates an instance of the UDx class on each node in the cluster and provides it with the data it needs to process.
2 - Installing Java on Vertica hosts
If you are using UDxs written in Java, follow the instructions in this section.
If you are using UDxs written in Java, follow the instructions in this section.
You must install a Java Virtual Machine (JVM) on every host in your cluster in order for Vertica to be able to execute your Java UDxs.
Installing Java on your Vertica cluster is a two-step process:
-
Install a Java runtime on all of the hosts in your cluster.
-
Set the JavaBinaryForUDx configuration parameter to tell Vertica the location of the Java executable.
Installing a Java runtime
For Java-based features, Vertica requires a 64-bit Java 6 (Java version 1.6) or later Java runtime. Vertica supports runtimes from either Oracle or OpenJDK. You can choose to install either the Java Runtime Environment (JRE) or Java Development Kit (JDK), since the JDK also includes the JRE.
Many Linux distributions include a package for the OpenJDK runtime. See your Linux distribution's documentation for information about installing and configuring OpenJDK.
To install the Oracle Java runtime, see the Java Standard Edition (SE) Download Page. You usually run the installation package as root in order to install it. See the download page for instructions.
Once you have installed a JVM on each host, ensure that the java
command is in the search path and calls the correct JVM by running the command:
$ java -version
This command should print something similar to:
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
Note
Any previously installed Java VM on your hosts may interfere with a newly installed Java runtime. See your Linux distribution's documentation for instructions on configuring which JVM is the default. Unless absolutely required, you should uninstall any incompatible version of Java before installing the Java 6 or Java 7 runtime.
Setting the JavaBinaryForUDx configuration parameter
The JavaBinaryForUDx configuration parameter tells Vertica where to look for the JRE to execute Java UDxs. After you have installed the JRE on all of the nodes in your cluster, set this parameter to the absolute path of the Java executable. You can use the symbolic link that some Java installers create (for example /usr/bin/java
). If the Java executable is in your shell search path, you can get the path of the Java executable by running the following command from the Linux command line shell:
$ which java
/usr/bin/java
If the java
command is not in the shell search path, use the path to the Java executable in the directory where you installed the JRE. Suppose you installed the JRE in /usr/java/default
(which is where the installation package supplied by Oracle installs the Java 1.6 JRE). In this case the Java executable is /usr/java/default/bin/java
.
You set the configuration parameter by executing the following statement as a database superuser:
=> ALTER DATABASE DEFAULT SET PARAMETER JavaBinaryForUDx = '/usr/bin/java';
See ALTER DATABASE for more information on setting configuration parameters.
To view the current setting of the configuration parameter, query the CONFIGURATION_PARAMETERS system table:
=> \x
Expanded display is on.
=> SELECT * FROM CONFIGURATION_PARAMETERS WHERE parameter_name = 'JavaBinaryForUDx';
-[ RECORD 1 ]-----------------+----------------------------------------------------------
node_name | ALL
parameter_name | JavaBinaryForUDx
current_value | /usr/bin/java
default_value |
change_under_support_guidance | f
change_requires_restart | f
description | Path to the java binary for executing UDx written in Java
Once you have set the configuration parameter, Vertica can find the Java executable on each node in your cluster.
Note
Since the location of the Java executable is set by a single configuration parameter for the entire cluster, you must ensure that the Java executable is installed in the same path on all of the hosts in the cluster.
3 - UDx restrictions
Some UDx types have special considerations or restrictions.
Some UDx types have special considerations or restrictions.
UDxs written in Java and R do not support complex types.
Aggregate functions
You cannot use the DISTINCT clause in queries with more than one aggregate function or provide inputs or return values containing complex types.
Analytic functions
UDAnFs do not support framing windows using ROWS.
Only UDAnFs written in C++ can use complex types.
As with Vertica's built-in analytic functions, UDAnFs cannot be used with MATCH clause functions.
Scalar functions
If the result of applying a UDSF is an invalid record, COPY aborts the load even if CopyFaultTolerantExpressions is set to true.
A ROW returned from a UDSF cannot be used as an argument to COUNT.
A query that includes a UDTF cannot:
-
Include statements other than the SELECT statement that calls the UDTF and a PARTITION BY expression unless the UDTF is marked as a one-to-many UDTF
-
Call an analytic function
-
Call another UDTF
-
Include one of the following clauses:
Load functions
Installing an untrusted UDL function can compromise the security of the server. UDxs can contain arbitrary code. In particular, user-defined parser functions can read data from any arbitrary location. It is up to the developer of the function to enforce proper security limitations. Superusers must not grant access to UDxs to untrusted users.
You cannot ALTER UDL functions.
UDFilter and UDSource functions do not support complex types.
4 - Fenced and unfenced modes
User-defined extensions (UDxs) written in the C++ programming language have the option of running in fenced or unfenced mode.
User-defined extensions (UDxs) written in the C++ programming language have the option of running in fenced or unfenced mode. Fenced mode runs the UDx code outside of the main Vertica process in a separate zygote process. UDxs that use unfenced mode run directly within the Vertica process.
Fenced mode
You can run most C++ UDxs in fenced mode. Fenced mode uses a separate zygote process, so fenced UDx crashes do not impact the core Vertica process. There is a small performance impact when running UDx code in fenced mode. On average, using fenced mode adds about 10% more time to execution compared to unfenced mode.
Fenced mode is currently available for all C++ UDxs with the exception of user-defined aggregates. All UDxs developed in the Python, R, and Java programming languages must run in fenced mode, since the Python, R, and Java runtimes cannot run directly within the Vertica process.
Using fenced mode does not affect the development of your UDx. Fenced mode is enabled by default for UDxs that support fenced mode. Optionally, you can issue the CREATE FUNCTION command with the NOT FENCED
modifier to disable fenced mode for the function. Additionally, you can enable or disable fenced mode on any fenced mode-supported C++ UDx by using the ALTER FUNCTION command.
Unfenced mode
Unfenced UDxs run within Vertica, so they have little overhead, and can perform almost as fast as Vertica's own built-in functions. However, because they run within Vertica directly, any bugs in their code (memory leaks, for example) can destabilize the main Vertica process and bring one or more database nodes down.
Important
Always carefully review code downloaded from third-party sources and test UDxs in a test or staging environment before deciding to run them in unfenced mode in production.
About the zygote process
The Vertica zygote process starts when Vertica starts. Each node has a single zygote process. Side processes are created "on demand". The zygote listens for requests and spawns a UDx side session that runs the UDx in fenced mode when a UDx is called by the user.
About fenced mode logging:
UDx code that runs in fenced mode is logged in the UDxZygote.log
and is stored in the UDxLogs
directory in the catalog directory of Vertica. Log entries for the side process are denoted by the UDx language (for example, C++), node, zygote process ID, and the UDxSideProcess ID.
For example, for the following query returns the current fenced processes:
=> SELECT * FROM UDX_FENCED_PROCESSES;
node_name | process_type | session_id | pid | port | status
------------------+------------------+----------------------------------+-------+-------+--------
v_vmart_node0001 | UDxZygoteProcess | | 27468 | 51900 | UP
v_vmart_node0001 | UDxSideProcess | localhost.localdoma-27465:0x800b | 5677 | 44123 | UP
Below is the corresponding log file for the fenced processes returned in the previous query:
2016-05-16 11:24:43.990 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 UDx side process started
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 Finished setting up signal handlers.
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 My port: 44123
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 My address: 0.0.0.0
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 Vertica port: 51900
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 Vertica address: 127.0.0.1
11:25:19.749 [C++-localhost.localdoma-27465:0x800b-5677] 0x41837940 Setting memory resource limit to -1
11:30:11.523 [C++-localhost.localdoma-27465:0x800b-5677] 0x41837940 Exiting UDx side process
The last line indicates that the side process was killed. In this case it was killed when the user session (vsql) closed.
About fenced mode configuration parameters
Fenced mode supports the following configuration parameters:
-
FencedUDxMemoryLimitMB
: The maximum memory size, in MB, to use for fenced mode processes. The default is -1
(no limit). The side process is killed if this limit is exceeded.
-
ForceUDxFencedMode
: When set to 1
, force all UDx's that support fenced mode to run in fenced mode even if their definition specified NOT FENCED. The default is 0
(disabled).
-
UDxFencedBlockTimeout
: The maximum time, in seconds, that the Vertica server waits for a UDx to return before aborting with ERROR 3399. The default is 60
.
See also
5 - Updating UDx libraries
There are two cases where you need to update libraries that you have already deployed:.
There are two cases where you need to update libraries that you have already deployed:
-
When you have upgraded Vertica to a new version that contains changes to the SDK API. For your libraries to work with the new server version, you need to recompile them with new version of the SDK. See UDx library compatibility with new server versions for more information.
-
When you have made changes to your UDxs and you want to deploy these changes. Before updating your UDx library, you need to determine if you have changed the signature of any of the functions contained in the library. If you have, you need to drop the functions from the Vertica catalog before you update the library.
5.1 - UDx library compatibility with new server versions
The Vertica SDK defines an application programming interface (API) that UDxs use to interact with the database.
The Vertica SDK defines an application programming interface (API) that UDxs use to interact with the database. When developers compile their UDx code, it is linked to the SDK code to form a library. This library is only compatible with Vertica servers that support the version of the SDK API used to compile the code. The library and servers that share the same API version are compatible on a binary level (referred to as "binary compatible").
The Vertica server returns an error message if you attempt to load a library that is not binary compatible with it. Similarly, if you upgrade your Vertica server to a version that supports a new SDK API, any existing UDx that relies on newly-incompatible libraries returns an error messages when you call it:
ERROR 2858: Could not find function definition
HINT:
This usually happens due to missing or corrupt libraries, libraries built
with the wrong SDK version, or due to a concurrent session dropping the library
or function. Try recreating the library and function
To resolve this issue, you must install UDx libraries that have been recompiled with the correct version of the SDK.
New versions of the Vertica server do not always change the SDK API version. The SDK API version changes whenever OpenText changes the components that make up the SDK. If the SDK API does not change in a new version of the server, then the old libraries remain compatible with the new server.
The SDK API almost always changes in Vertica releases (major, minor, service pack) as OpenText expands the SDK's features. Vertica will never change the API in a hotfix patch.
These policies mean that you must update UDx libraries when you upgrade between major versions. For example, if you upgrade from version 10.0 to 10.1, you must update your UDx libraries.
Note
A UDx written in a scripting language has no compiled binary, and so does not need to maintain binary compatibility from one version to another. UDxs written in scripting languages only become incompatible if the APIs used in the SDK actually change. For example, if the number of arguments to an API call changes, a UDx has to be changed to use the new number of arguments.
Pre-upgrade steps
Before upgrading your Vertica server, consider whether you have any UDx libraries that may be incompatible with the new version. Consult the release notes of the new server version to determine whether the SDK API has changed between the version of Vertica server you currently have installed and the new version. As mentioned previously, only upgrades from a previous major version or from the initial release of a major version to a service pack release can cause your currently-loaded UDx libraries to become incompatible with the server.
Any UDx libraries that are incompatible with the new version of the Vertica server must be recompiled. If you got the UDx library from a third party, you need to see if a new version has been released. If so, deploy the new version after you have upgraded the server (see Deploying a new version of your UDx library).
If you developed the UDx yourself (or if you have the source code) you must:
-
Recompile your UDx library using the new version of the Vertica SDK. See Compiling your C++ library or Compiling and packaging a Java library for more information.
-
Deploy the new version of your library. See Deploying a new version of your UDx library.
5.2 - Determining if a UDx signature has changed
You need to be careful when making changes to UDx libraries that contain functions you have already deployed in your Vertica database.
You need to be careful when making changes to UDx libraries that contain functions you have already deployed in your Vertica database. When you deploy a new version of your UDx library, Vertica does not ensure that the signatures of the functions that are defined in the library match the signature of the function that is already defined in the Vertica catalog. If you have changed the signature of a UDx in the library then update the library in the Vertica database, calls to the altered UDx will produce errors.
Making any of the following changes to a UDx alters its signature:
-
Changing the number of arguments accepted or the data type of any argument accepted by your function (not including polymorphic functions).
-
Changing the number or data types of any return values or output columns.
-
Changing the name of the factory class that Vertica uses to create an instance of your function code.
-
Changing the null handling or volatility behavior of your function.
-
Removed the function's factory class from the library completely.
The following changes do not alter the signature of your function, and do not require you to drop the function before updating the library:
-
Changing the number or type of arguments handled by a polymorphic function. Vertica does not process the arguments the user passes to a polymorphic function.
-
Changing the name, data type, or number of parameters accepted by your function. The parameters your function accepts are not determined by the function signature. Instead, Vertica passes all of the parameters the user included in the function call, and your function processes them at runtime. See UDx parameters for more information about parameters.
-
Changing any of the internal processing performed by your function.
-
Adding new UDxs to the library.
After you drop any functions whose signatures have changed, you load the new library file, then re-create your altered functions. If you have not made any changes to the signature of your UDxs, you can just update the library file in your Vertica database without having to drop or alter your function definitions. As long as the UDx definitions in the Vertica catalog match the signatures of the functions in your library, function calls will work transparently after you have updated the library. See Deploying a new version of your UDx library.
5.3 - Deploying a new version of your UDx library
You need to deploy a new version of your UDx library if:.
You need to deploy a new version of your UDx library if:
The process of deploying a new version of your library is similar to deploying it initially.
-
If you are deploying a UDx library developed in C++ or Java, you must compile it with the current version of the Vertica SDK.
-
Copy your UDx's library file (a .so
file for libraries developed in C++, a .py
file for libraries developed in Python, or a .jar
file for libraries developed in Java) or R source file to a host in your Vertica database.
-
Connect to the host using vsql.
-
If you have changed the signature of any of the UDxs in the shared library, you must drop them using DROP statements such as DROP FUNCTION or DROP SOURCE. If you are unsure whether any of the signatures of your functions have changed, see Determining if a UDx signature has changed.
Note
If all of the UDx signatures in your library have changed, you may find it more convenient to drop the library using the
DROP LIBRARY statement with the CASCADE option to drop the library and all of the functions and loaders that reference it. Droping the library can save you the time it would take to drop each UDx individually. You can then reload the library and recreate all of the extensions using the same process you used to deploy the library in the first place. See
CREATE LIBRARY.
-
Use the ALTER LIBRARY statement to update the UDx library definition with the file you copied in step 1. For example, if you want to update the library named ScalarFunctions
with a file named ScalarFunctions-2.0.so
in the dbadmin user's home directory, you could use the command:
=> ALTER LIBRARY ScalarFunctions AS '/home/dbadmin/ScalarFunctions-2.0.so';
After you have updated the UDx library definition to use the new version of your shared library, the UDxs that are defined using classes in your UDx library begin using the new shared library file without any further changes.
-
If you had to drop any functions in step 4, recreate them using the new signature defined by the factory classes in your library. See CREATE FUNCTION statements.
6 - Listing the UDxs contained in a library
Once a library has been loaded using the CREATE LIBRARY statement, you can find the UDxs and UDLs it contains by querying the USER_LIBRARY_MANIFEST system table:.
Once a library has been loaded using the CREATE LIBRARY statement, you can find the UDxs and UDLs it contains by querying the USER_LIBRARY_MANIFEST system table:
=> CREATE LIBRARY ScalarFunctions AS '/home/dbadmin/ScalarFunctions.so';
CREATE LIBRARY
=> \x
Expanded display is on.
=> SELECT * FROM USER_LIBRARY_MANIFEST WHERE lib_name = 'ScalarFunctions';
-[ RECORD 1 ]-------------------
schema_name | public
lib_name | ScalarFunctions
lib_oid | 45035996273792402
obj_name | RemoveSpaceFactory
obj_type | Scalar Function
arg_types | Varchar
return_type | Varchar
-[ RECORD 2 ]-------------------
schema_name | public
lib_name | ScalarFunctions
lib_oid | 45035996273792402
obj_name | Div2intsInfo
obj_type | Scalar Function
arg_types | Integer, Integer
return_type | Integer
-[ RECORD 3 ]-------------------
schema_name | public
lib_name | ScalarFunctions
lib_oid | 45035996273792402
obj_name | Add2intsInfo
obj_type | Scalar Function
arg_types | Integer, Integer
return_type | Integer
The obj_name
column lists the factory classes contained in the library. These are the names you use to define UDxs and UDLs in the database catalog using statements such as CREATE FUNCTION and CREATE SOURCE.
7 - Using wildcards in your UDx
Vertica supports wildcard * characters in the place of column names in user-defined functions.
Vertica supports wildcard * characters in the place of column names in user-defined functions.
You can use wildcards when:
-
Your query contains a table in the FROM clause
-
You are using a Vertica-supported development language
-
Your UDx is running in fenced or unfenced mode
Supported SQL statements
The following SQL statements can accept wildcards:
-
DELETE
-
INSERT
-
SELECT
-
UPDATE
Unsupported configurations
The following situations do not support wildcards:
-
You cannot pass a wildcard in the OVER clause of a query
-
You cannot us a wildcard with a DROP statement
-
You cannot use wildcards with any other arguments
Examples
These examples show wildcards and user-defined functions in a range of data manipulation operations.
DELETE statements:
=> DELETE FROM tablename WHERE udf(tablename.*) = 5;
INSERT statements:
=> INSERT INTO table1 SELECT udf(*) FROM table2;
SELECT statements:
=> SELECT udf(*) FROM tablename;
=> SELECT udf(tablename.*) FROM tablename;
=> SELECT udf(f.*) FROM table f;
=> SELECT udf(*) FROM table1,table2;
=> SELECT udf1( udf2(*) ) FROM table1,table2;
=> SELECT udf( db.schema.table.*) FROM tablename;
=> SELECT udf(sub.*) FROM (select col1, col2 FROM table) sub;
=> SELECT x FROM tablename WHERE udf(*) = y;
=> WITH sub as (SELECT * FROM tablename) select x, udf(*) FROM sub;
=> SELECT udf( * using parameters x=1) FROM tablename;
=> SELECT udf(table1.*, table2.col2) FROM table1,table2;
UPDATE statements:
=> UPDATE tablename set col1 = 4 FROM tablename WHERE udf(*) = 3;