Hadoop-Integration: Using Kerberos with OpenText™ Analytics Database

Mon, 01 Jan 0001 00:00:00 +0000

If you use Kerberos for your OpenText™ Analytics Database cluster and your principals have access to HDFS, then you can configure the database to use the same credentials for HDFS.

The database authenticates with Hadoop in two ways that require different configurations:

User Authentication—On behalf of the user, by passing along the user's existing Kerberos credentials. This method is also called user impersonation. Actions performed on behalf of particular users, like executing queries, generally use user authentication.
Database Authentication—On behalf of system processes that access ROS data or the catalog, by using a special Kerberos credential stored in a keytab file.

Note

The database and Hadoop must use the same Kerberos server or servers (KDCs).

The database can interact with more than one Kerberos realm. To configure multiple realms, see Multi-realm Support.

The database attempts to automatically refresh Hadoop tokens before they expire. See Token expiration.

User authentication

To use the database with Kerberos and Hadoop, the client user first authenticates with one of the Kerberos servers (Key Distribution Center, or KDC) being used by the Hadoop cluster. A user might run kinit or sign in to Active Directory, for example.

A user who authenticates to a Kerberos server receives a Kerberos ticket. At the beginning of a client session, the database automatically retrieves this ticket. The database then uses this ticket to get a Hadoop token, which Hadoop uses to grant access. The database uses this token to access HDFS, such as when executing a query on behalf of the user. When the token expires, the database automatically renews it, also renewing the Kerberos ticket if necessary.

The user must have been granted permission to access the relevant files in HDFS. This permission is checked the first time the database reads HDFS data.

The database can use multiple KDCs serving multiple Kerberos realms, if proper cross-realm trust has been set up between realms.

OpenText™ Analytics Database authentication

Automatic processes, such as the Tuple Mover or the processes that access Eon Mode communal storage, do not log in the way users do. Instead, the database uses a special identity (principal) stored in a keytab file on every database node. (This approach is also used for the database clusters that use Kerberos but do not use Hadoop.) After you configure the keytab file, the database uses the principal residing there to automatically obtain and maintain a Kerberos ticket, much as in the client scenario. In this case, the client does not interact with Kerberos.

Each database node uses its own principal; it is common to incorporate the name of the node into the principal name. You can either create one keytab per node, containing only that node's principal, or you can create a single keytab containing all the principals and distribute the file to all nodes. Either way, the node uses its principal to get a Kerberos ticket and then uses that ticket to get a Hadoop token.

When creating HDFS storage locations the database uses the principal in the keytab file, not the principal of the user issuing the CREATE LOCATION statement. The HCatalog Connector sometimes uses the principal in the keytab file, depending on how Hive authenticates users.

Configuring users and the keytab file

If you have not already configured Kerberos authentication for the database, follow the instructions in Configure OpenText™ Analytics Database for Kerberos authentication. Of particular importance for Hadoop integration:

Create one Kerberos principal per node.
Place the keytab files in the same location on each database node and set configuration parameter KerberosKeytabFile to that location.
Set KerberosServiceName to the name of the principal. (See Inform OpenText™ Analytics Database about the Kerberos principal.)

If you are using the HCatalog Connector, follow the additional steps in Configuring security in the HCatalog Connector documentation.

If you are using HDFS storage locations, give all node principals read and write permission to the HDFS directory you will use as a storage location.

Hadoop-Integration: Proxy users and delegation tokens

Mon, 01 Jan 0001 00:00:00 +0000

An alternative to granting HDFS access to individual OpenText™ Analytics Database users is to use delegation tokens, either directly or with a proxy user. In this configuration, the database accesses HDFS on behalf of some other (Hadoop) user. The Hadoop users need not be database users at all.

In OpenText™ Analytics Database, you can either specify the name of the Hadoop user to act on behalf of (doAs), or you can directly use a Kerberos delegation token that you obtain from HDFS (Bring Your Own Delegation Token). In the doAs case, the database obtains a delegation token for that user, so both approaches ultimately use delegation tokens to access files in HDFS.

Use the HadoopImpersonationConfig session parameter to specify a user or delegation token to use for HDFS access. Each session can use a different user and can use either doAs or a delegation token. The value of HadoopImpersonationConfig is a set of JSON objects.

To use delegation tokens of either type (more specifically, when HadoopImpersonationConfig is set), you must access HDFS through WebHDFS.

Hadoop-Integration: Token expiration

Mon, 01 Jan 0001 00:00:00 +0000

OpenText™ Analytics Database uses Hadoop tokens when using Kerberos tickets (Using Kerberos with OpenText™ Analytics Database) or doAs (User impersonation (doAs)). The database attempts to automatically refresh Hadoop tokens before they expire, but you can also set a minimum refresh frequency if you prefer. Use the HadoopFSTokenRefreshFrequency configuration parameter to specify the frequency in seconds:

=> ALTER DATABASE exampledb SET HadoopFSTokenRefreshFrequency = '86400';

If the current age of the token is greater than the value specified in this parameter, the database refreshes the token before accessing data stored in HDFS.

The database does not refresh delegation tokens (Bring your own delegation token).

OpenText Analytics Database 26.2.x – Accessing kerberized HDFS data

Hadoop-Integration: Using Kerberos with OpenText™ Analytics Database

Note

User authentication

OpenText™ Analytics Database authentication

Configuring users and the keytab file

Hadoop-Integration: Proxy users and delegation tokens

Hadoop-Integration: Token expiration