User impersonation (doAs)
You can use user impersonation to access data in an HDFS cluster from OpenText™ Analytics Database. This approach is called "doAs" (for "do as") because the database uses a single proxy user on behalf of another (Hadoop) user. The impersonated Hadoop user does not need to also be a database user.
In the following illustration, Alice is a Hadoop user but not a database user. She connects to the database as the proxy user, vertica-etl. In her session, the database obtains a delegation token (DT) on behalf of the doAs user (Alice), and uses that delegation token to access HDFS.
You can use doAs with or without Kerberos, so long as HDFS and the database match. If HDFS uses Kerberos then the database must too.
User configuration
The Hadoop administrator must create a proxy user and allow it to access HDFS on behalf of other users. Set values in core-site.xml as in the following example:
<name>hadoop.proxyuser.vertica-etl.users</name>
<value>*</value>
<name>hadoop.proxyuser.vertica-etl.hosts</name>
<value>*</value>
In the database, create a corresponding user.
Session configuration
To make requests on behalf of a Hadoop user, first set the HadoopImpersonationConfig session parameter to specify the user and HDFS cluster. The database will access HDFS as that user until the session ends or you change the parameter.
The value of this session parameter is a collection of JSON objects. Each object specifies an HDFS cluster and a Hadoop user. For the cluster, you can specify either a name service or an individual name node. If you are using HA name node, then you must either use a name service or specify all name nodes. HadoopImpersonationConfig format describes the full JSON syntax.
The following example shows access on behalf of two different users. The users "stephanie" and "bob" are Hadoop users, not database users. "vertica-etl" is a database user.
$ vsql -U vertica-etl
=> ALTER SESSION SET
HadoopImpersonationConfig = '[{"nameservice":"hadoopNS", "doAs":"stephanie"}]';
=> COPY nation FROM 'webhdfs:///user/stephanie/nation.dat';
=> ALTER SESSION SET
HadoopImpersonationConfig = '[{"nameservice":"hadoopNS", "doAs":"bob"}, {"authority":"hadoop2:50070", "doAs":"rob"}]';
=> COPY nation FROM 'webhdfs:///user/bob/nation.dat';
The database uses Hadoop delegation tokens, obtained from the name node, to impersonate Hadoop users. In a long-running session, a token could expire. The database attempts to renew tokens automatically; see Token expiration.
Testing the configuration
You can use the HADOOP_IMPERSONATION_CONFIG_CHECK function to test your HDFS delegation tokens and HCATALOGCONNECTOR_CONFIG_CHECK to test your HCatalog Connector delegation token.