This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

R SDK

The Vertica R SDK extends the capabilities of the Vertica Analytic Database so you can leverage additional R libraries.

The Vertica R SDK extends the capabilities of the Vertica Analytic Database so you can leverage additional R libraries. Before you can begin developing User Defined Extensions (UDxs) in R, you must install the R Language Pack for Vertica on each of the nodes in your cluster. The R SDK supports scalar and transform functions in fenced mode. Other UDx types are not supported.

The following workflow is typical for the R SDK:

You can find detailed documentation of all of the classes in the Vertica R SDK.

1 - Installing/upgrading the R language pack for Vertica

To create R UDxs in Vertica, install the R Language Pack package that matches your server version.

To create R UDxs in Vertica, install the R Language Pack package that matches your server version. The R Language Pack includes the R runtime and associated libraries for interfacing with Vertica. You must use this version of the R runtime; you cannot upgrade it.

You must install the R Language Pack on each node in the cluster. The Vertica R Language Pack must be the only R Language Pack installed on the node.

Vertica R language pack prerequisites

The R Language Pack package requires a number of packages for installation and execution. The names of these dependencies vary among Linux distributions. For Vertica-supported Linux platforms the packages are:

  • RHEL/CentOS: libfortran, xz-libs, libgomp

  • SUSE Linux Enterprise Server: libfortran3, liblzma5, libgomp1

  • Debian/Ubuntu: libfortran3, liblzma5, libgomp1

Vertica requires a version of the libgfortran4 library later than 7.1 to create R extensions. The libgfortran library is included by default with the devtool and gcc packages.

Installing the Vertica R language pack

If you use your operating systems package manager, rather than the rpm or dpkg command, for installation, you do not need to manually install the R Language Pack. The native package managers for each supported Linux version are:

  • RHEL/CentOS: yum

  • SUSE Linux Enterprise Server: zypper

  • Debian/Ubuntu: apt-get

  • Amazon Linux 2.0: yum

  1. Download the R language package by browsing to the Vertica website.

  2. On the Support tab, select Customer Downloads.

  3. When prompted, log in using your Micro Focus credentials.

  4. Located and select the vertica-R-lang_version.rpm or vertica-R-lang_version.deb file for your server version. The R language package version must match your server version to three decimal points.

  5. Install the package as root or using sudo:

    • RHEL/CentOS

      $ yum install vertica-R-lang-<version>.rpm
      
    • SUSE Linux Enterprise Server

      $ zypper install vertica-R-lang-<version>.rpm
      
    • Debian

      $ apt-get install ./vertica-R-lang_<version>.deb
      
    • Amazon Linux 2.0

       $ yum install vertica-R-lang-<version>.AMZN.rpm
      

The installer puts the R binary in /opt/vertica/R.

Upgrading the Vertica R language pack

When upgrading, some R packages you have manually installed may not work and may have to be reinstalled. If you do not update your package(s), then R returns an error if the package cannot be used. Instructions for upgrading these packages are below.

  1. You must uninstall the R Language package before upgrading Vertica. Any additional R packages you manually installed remain in /opt/vertica/R and are not removed when you uninstall the package.

  2. Upgrade your server package as detailed in Upgrading Vertica to a New Version.

  3. After the server package has been updated, install the new R Language package on each host.

If you have installed additional R packages, on each node:

  1. As root run /opt/vertica/R/bin/R and issue the command:

    > update.packages(checkBuilt=TRUE)
    
  2. Select a CRAN mirror from the list displayed.

  3. You are prompted to update each package that has an update available for it. You must update any packages that you manually installed and are not compatible with the current version of R in the R Language Pack.
    Do NOT update:

    • Rcpp

    • Rinside

    The packages you selected to be updated are installed. Quit R with the command:

    > quit()
    

Vertica UDx functions written in R do not need to be compiled and you do not need to reload your Vertica-R libraries and functions after an upgrade.

2 - R packages

The Vertica R Language Pack includes the following R packages in addition to the default packages bundled with R:.

The Vertica R Language Pack includes the following R packages in addition to the default packages bundled with R:

  • Rcpp

  • RInside

  • IpSolve

  • lpSolveAPI

You can install additional R packages not included in the Vertica R Language Pack by using one of two methods. You must install the same packages on all nodes.

Installing R packages

You can install additional R packages by using one of the two following methods.

Using the install.packages() R command:

$ sudo /opt/vertica/R/bin/R
> install.packages("Zelig");

Using CMD INSTALL:

/opt/vertica/R/bin/R CMD INSTALL <path-to-package-tgz>

The installed packages are located in: /opt/vertica/R/library.

3 - R and Vertica data types

The following data types are supported when passing data to/from an R UDx:.

The following data types are supported when passing data to/from an R UDx:

Vertica Data Type R Data Type
BOOLEAN logical
DATE, DATETIME, SMALLDATETIME, TIME, TIMESTAMP, TIMESTAMPTZ, TIMETZ numeric
DOUBLE PRECISION, FLOAT, REAL numeric
BIGINT, DECIMAL, INT, NUMERIC, NUMBER, MONEY numeric
BINARY, VARBINARY character
CHAR, VARCHAR character

NULL values in Vertica are translated to R NA values when sent to the R function. R NA values are translated into Vertica null values when returned from the R function to Vertica.

4 - Adding metadata to R libraries

The following example shows how to add metadata to an R UDx.

You can add metadata, such as author name, the version of the library, a description of your library, and so on to your library. This metadata lets you track the version of your function that is deployed on a Vertica Analytic Database cluster and lets third-party users of your function know who created the function. Your library's metadata appears in the USER_LIBRARIES system table after your library has been loaded into the Vertica Analytic Database catalog.

You declare the metadata for your library by calling the RegisterLibrary() function in one of the source files for your UDx. If there is more than one function call in the source files for your UDx, whichever gets interpreted last as Vertica Analytic Database loads the library is used to determine the library's metadata.

The RegisterLibrary() function takes eight string parameters:

RegisterLibrary(author,
                library_build_tag,
                library_version,
                library_sdk_version,
                source_url,
                description,
                licenses_required,
                signature);
  • author contains whatever name you want associated with the creation of the library (your own name or your company's name for example).

  • library_build_tag is a string you want to use to represent the specific build of the library (for example, the SVN revision number or a timestamp of when the library was compiled). This is useful for tracking instances of your library as you are developing them.

  • library_version is the version of your library. You can use whatever numbering or naming scheme you want.

  • library_sdk_version is the version of the Vertica Analytic Database SDK Library for which you've compiled the library.

  • source_url is a URL where users of your function can find more information about it. This can be your company's website, the GitHub page hosting your library's source code, or whatever site you like.

  • description is a concise description of your library.

  • licenses_required is a placeholder for licensing information. You must pass an empty string for this value.

  • signature is a placeholder for a signature that will authenticate your library. You must pass an empty string for this value.

The following example shows how to add metadata to an R UDx.


RegisterLibrary("Speedy Analytics Ltd.",
                "1234",
                "1.0",
                "8.1.0",
                "http://www.example.com/sales_tax_calculator.R",
                "Sales Tax R Library",
                "",
                "")

Loading the library and querying the USER_LIBRARIES system table shows the metadata supplied in the call to RegisterLibrary:

=> CREATE LIBRARY rLib AS '/home/dbadmin/sales_tax_calculator.R' LANGUAGE 'R';
CREATE LIBRARY
=> SELECT * FROM USER_LIBRARIES WHERE lib_name = 'rLib';
-[ RECORD 1 ]-----+---------------------------------------------------------
schema_name       | public
lib_name          | rLib
lib_oid           | 45035996273708350
author            | Speedy Analytics Ltd.
owner_id          | 45035996273704962
lib_file_name     | rLib_02552872a35d9352b4907d3fcd03cf9700a0000000000d3e.R
md5_sum           | 30da555537c4d93c352775e4f31332d2
sdk_version       |
revision          |
lib_build_tag     | 1234
lib_version       | 1.0
lib_sdk_version   | 8.1.0
source_url        | http://www.example.com/sales_tax_calculator.R
description       | Sales Tax R Library
licenses_required |
signature         |
dependencies      |
is_valid          | t
sal_storage_id    | 02552872a35d9352b4907d3fcd03cf9700a0000000000d3e

5 - Setting null input and volatility behavior for R functions

Vertica supports defining volatility and null-input settings for UDxs written in R.

Vertica supports defining volatility and null-input settings for UDxs written in R. Both settings aid in the performance of your R function.

Volatility settings

Volatility settings describe the behavior of the function to the Vertica optimizer. For example, if you have identical rows of input data and you know the UDx is immutable, then you can define the UDx as IMMUTABLE. This tells the Vertica optimizer that it can return a cached value for subsequent identical rows on which the function is called rather than having the function run on each identical row.

To indicate your UDx's volatility, set the volatility parameter of your R factory function to one of the following values:

Value Description
VOLATILE Repeated calls to the function with the same arguments always result in different values. Vertica always calls volatile functions for each invocation.
IMMUTABLE Calls to the function with the same arguments always results in the same return value.
STABLE Repeated calls to the function with the same arguments within the same statement returns the same output. For example, a function that returns the current user name is stable because the user cannot change within a statement. The user name could change between statements.
DEFAULT_VOLATILITY The default volatility. This is the same as VOLATILE.

If you do not define a volatility, then the function is considered to be VOLATILE.

The following example sets the volatility to STABLE in the multiplyTwoIntsFactory function:

multiplyTwoIntsFactory <- function() {
  list(name                  = multiplyTwoInts,
       udxtype               = c("scalar"),
       intype                = c("float","float"),
       outtype               = c("float"),
       volatility            = c("stable"),
       parametertypecallback = multiplyTwoIntsParameters)
}

Null input behavior

Null input setting determine how to respond to rows that have null input. For example, you can choose to return null if any inputs are null rather than calling the function and having the function deal with a NULL input.

To indicate how your UDx reacts to NULL input, set the strictness parameter of your R factory function to one of the following values:

Value Description
CALLED_ON_NULL_INPUT The function must be called, even if one or more arguments are NULL.
RETURN_NULL_ON_NULL_INPUT The function always returns a NULL value if any of its arguments are NULL.
STRICT A synonym for RETURN_NULL_ON_NULL_INPUT
DEFAULT_STRICTNESS The default strictness setting. This is the same as CALLED_ON_NULL_INPUT.

If you do not define a null input behavior, then the function is called on every row of data regardless of the presence of NULL values.

The following example sets the NULL input behavior to STRICT in the multiplyTwoIntsFactory function:

multiplyTwoIntsFactory <- function() {
  list(name                  = multiplyTwoInts,
       udxtype               = c("scalar"),
       intype                = c("float","float"),
       outtype               = c("float"),
       strictness            = c("strict"),
       parametertypecallback = multiplyTwoIntsParameters)
}