Hadoop parameters

The following table describes general parameters for configuring integration with Apache Hadoop.

The following table describes general parameters for configuring integration with Apache Hadoop. See Apache Hadoop integration for more information.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.
EnableHDFSBlockInfoCache
Boolean, whether to distribute block location metadata collected during planning on the initiator to all database nodes for execution. Distributing this metadata reduces name node accesses, and thus load, but can degrade database performance somewhat in deployments where the name node isn't contended. This performance effect is because the data must be serialized and distributed. Enable distribution if protecting the name node is more important than query performance; usually this applies to large HDFS clusters where name node contention is already an issue.

Default: 0 (disabled)

HadoopConfDir
Directory path containing the XML configuration files copied from Hadoop. The same path must be valid on every Vertica node. You can use the VERIFY_HADOOP_CONF_DIR meta-function to test that the value is set correctly. Setting this parameter is required to read data from HDFS.

For all Vertica users, the files are accessed by the Linux user under which the Vertica server process runs.

When you set this parameter, previously-cached configuration information is flushed.

You can set this parameter at the session level. Doing so overrides the database value; it does not append to it. For example:

  
=> ALTER SESSION SET HadoopConfDir='/test/conf:/hadoop/hcat/conf';

To append, get the current value and include it on the new path after your additions. Setting this parameter at the session level does not change how the files are accessed.

Default: obtained from environment if possible

HadoopFSAuthentication
How (or whether) to use Kerberos authentication with HDFS. By default, if KerberosKeytabFile is set, Vertica uses that credential for both Vertica and HDFS. Usually this is the desired behavior. However, if you are using a Kerberized Vertica cluster with a non-Kerberized HDFS cluster, set this parameter to "none" to indicate that Vertica should not use the Vertica Kerberos credential to access HDFS.

Default: "keytab" if KerberosKeytabFile is set, otherwise "none"

HadoopFSBlockSizeBytes
Block size to write to HDFS. Larger files are divided into blocks of this size.

Default: 64MB

HadoopFSNNOperationRetryTimeout
Number of seconds a metadata operation (such as list directory) waits for a response before failing. Accepts float values for millisecond precision.

Default: 6 seconds

HadoopFSReadRetryTimeout
Number of seconds a read operation waits before failing. Accepts float values for millisecond precision. If you are confident that your file system will fail more quickly, you can improve performance by lowering this value.

Default: 180 seconds

HadoopFSReplication
Number of replicas HDFS makes. This is independent of the replication that Vertica does to provide K-safety. Do not change this setting unless directed otherwise by Vertica support.

Default: 3

HadoopFSRetryWaitInterval
Initial number of seconds to wait before retrying read, write, and metadata operations. Accepts float values for millisecond precision. The retry interval increases exponentially with every retry.

Default: 3 seconds

HadoopFSTokenRefreshFrequency
How often, in seconds, to refresh the Hadoop tokens used to hold Kerberos tickets (see Token expiration).

Default: 0 (refresh when token expires)

HadoopFSWriteRetryTimeout
Number of seconds a write operation waits before failing. Accepts float values for millisecond precision. If you are confident that your file system will fail more quickly, you can improve performance by lowering this value.

Default: 180 seconds

HadoopImpersonationConfig
Session parameter specifying the delegation token or Hadoop user for HDFS access. See HadoopImpersonationConfig format for information about the value of this parameter and Proxy users and delegation tokens for more general context.
HDFSUseWebHDFS
Boolean, whether URLs in the hdfs scheme use WebHDFS instead of LibHDFS++ to access data.

Default: 1 (enabled)

WebhdfsClientCertConf
mTLS configurations for accessing one or more WebHDFS servers. The value is a JSON string; each member has the following properties:
  • nameservice: WebHDFS name service

  • authority: host:port

  • certName: name of a certificate defined by CREATE CERTIFICATE

nameservice and authority are mutually exclusive.

For example:

=> ALTER SESSION SET WebhdfsClientCertConf =
    '[{"authority" : "my.authority.com:50070", "certName" : "myCert"},
      {"nameservice" : "prod", "certName" : "prodCert"}]';

HCatalog Connector parameters

The following table describes the parameters for configuring the HCatalog Connector. See Using the HCatalog Connector for more information.

EnableHCatImpersonation
Boolean, whether the HCatalog Connector uses (impersonates) the current Vertica user when accessing Hive. If impersonation is enabled, the HCatalog Connector uses the Kerberos credentials of the logged-in Vertica user to access Hive data. Disable impersonation if you are using an authorization service to manage access without also granting users access to the underlying files. For more information, see Configuring security.

Default: 1 (enabled)

HCatalogConnectorUseHiveServer2
Boolean, whether Vertica internally uses HiveServer2 instead of WebHCat to get metadata from Hive.

Default: 1 (enabled)

HCatalogConnectorUseLibHDFSPP
Boolean, whether the HCatalog Connector should use the hdfs scheme instead of webhdfs with the HCatalog Connector.

Default: 1 (enabled)

HCatConnectionTimeout
The number of seconds the HCatalog Connector waits for a successful connection to the HiveServer2 (or WebHCat) server before returning a timeout error.

Default: 0 (Wait indefinitely)

HCatSlowTransferLimit
Lowest transfer speed (in bytes per second) that the HCatalog Connector allows when retrieving data from the HiveServer2 (or WebHCat) server. In some cases, the data transfer rate from the server to Vertica is below this threshold. In such cases, after the number of seconds specified in the HCatSlowTransferTime parameter pass, the HCatalog Connector cancels the query and closes the connection.

Default: 65536

HCatSlowTransferTime
Number of seconds the HCatalog Connector waits before testing whether the data transfer from the server is too slow. See the HCatSlowTransferLimit parameter.

Default: 60