Hadoop parameters
The following table describes general parameters for configuring integration with Apache Hadoop. See Apache Hadoop integration for more information.
Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.- EnableHDFSBlockInfoCache
- Boolean, whether to distribute block location metadata collected during planning on the initiator to all database nodes for execution. Distributing this metadata reduces name node accesses, and thus load, but can degrade database performance somewhat in deployments where the name node isn't contended. This performance effect is because the data must be serialized and distributed. Enable distribution if protecting the name node is more important than query performance; usually this applies to large HDFS clusters where name node contention is already an issue.
Default: 0 (disabled)
- HadoopConfDir
- Directory path containing the XML configuration files copied from Hadoop. The same path must be valid on every Vertica node. You can use the VERIFY_HADOOP_CONF_DIR meta-function to test that the value is set correctly. Setting this parameter is required to read data from HDFS.
For all Vertica users, the files are accessed by the Linux user under which the Vertica server process runs.
When you set this parameter, previously-cached configuration information is flushed.
You can set this parameter at the session level. Doing so overrides the database value; it does not append to it. For example:
=> ALTER SESSION SET HadoopConfDir='/test/conf:/hadoop/hcat/conf';
To append, get the current value and include it on the new path after your additions. Setting this parameter at the session level does not change how the files are accessed.
Default: obtained from environment if possible
- HadoopFSAuthentication
- How (or whether) to use Kerberos authentication with HDFS. By default, if KerberosKeytabFile is set, Vertica uses that credential for both Vertica and HDFS. Usually this is the desired behavior. However, if you are using a Kerberized Vertica cluster with a non-Kerberized HDFS cluster, set this parameter to "none" to indicate that Vertica should not use the Vertica Kerberos credential to access HDFS.
Default: "keytab" if KerberosKeytabFile is set, otherwise "none"
- HadoopFSBlockSizeBytes
- Block size to write to HDFS. Larger files are divided into blocks of this size.
Default: 64MB
- HadoopFSNNOperationRetryTimeout
- Number of seconds a metadata operation (such as list directory) waits for a response before failing. Accepts float values for millisecond precision.
Default: 6 seconds
- HadoopFSReadRetryTimeout
- Number of seconds a read operation waits before failing. Accepts float values for millisecond precision. If you are confident that your file system will fail more quickly, you can improve performance by lowering this value.
Default: 180 seconds
- HadoopFSReplication
- Number of replicas HDFS makes. This is independent of the replication that Vertica does to provide K-safety. Do not change this setting unless directed otherwise by Vertica support.
Default: 3
- HadoopFSRetryWaitInterval
- Initial number of seconds to wait before retrying read, write, and metadata operations. Accepts float values for millisecond precision. The retry interval increases exponentially with every retry.
Default: 3 seconds
- HadoopFSTokenRefreshFrequency
- How often, in seconds, to refresh the Hadoop tokens used to hold Kerberos tickets (see Token expiration).
Default: 0 (refresh when token expires)
- HadoopFSWriteRetryTimeout
- Number of seconds a write operation waits before failing. Accepts float values for millisecond precision. If you are confident that your file system will fail more quickly, you can improve performance by lowering this value.
Default: 180 seconds
- HadoopImpersonationConfig
- Session parameter specifying the delegation token or Hadoop user for HDFS access. See HadoopImpersonationConfig format for information about the value of this parameter and Proxy users and delegation tokens for more general context.
- WebhdfsClientCertConf
- mTLS configurations for accessing one or more WebHDFS servers. The value is a JSON string; each member has the following properties:
-
nameservice
: WebHDFS name service -
authority
:host
:port
-
certName
: name of a certificate defined by CREATE CERTIFICATE
nameservice
andauthority
are mutually exclusive.For example:
=> ALTER SESSION SET WebhdfsClientCertConf = '[{"authority" : "my.authority.com:50070", "certName" : "myCert"}, {"nameservice" : "prod", "certName" : "prodCert"}]';
-
HCatalog Connector parameters
The following table describes the parameters for configuring the HCatalog Connector. See Using the HCatalog Connector for more information.
Note
You can override HCatalog configuration parameters when you create an HCatalog schema with CREATE HCATALOG SCHEMA.- EnableHCatImpersonation
- Boolean, whether the HCatalog Connector uses (impersonates) the current Vertica user when accessing Hive. If impersonation is enabled, the HCatalog Connector uses the Kerberos credentials of the logged-in Vertica user to access Hive data. Disable impersonation if you are using an authorization service to manage access without also granting users access to the underlying files. For more information, see Configuring security.
Default: 1 (enabled)
- HCatalogConnectorUseHiveServer2
- Boolean, whether Vertica internally uses HiveServer2 instead of WebHCat to get metadata from Hive.
Default: 1 (enabled)
- HCatConnectionTimeout
- The number of seconds the HCatalog Connector waits for a successful connection to the HiveServer2 (or WebHCat) server before returning a timeout error.
Default: 0 (Wait indefinitely)
- HCatSlowTransferLimit
- Lowest transfer speed (in bytes per second) that the HCatalog Connector allows when retrieving data from the HiveServer2 (or WebHCat) server. In some cases, the data transfer rate from the server to Vertica is below this threshold. In such cases, after the number of seconds specified in the HCatSlowTransferTime parameter pass, the HCatalog Connector cancels the query and closes the connection.
Default: 65536
- HCatSlowTransferTime
- Number of seconds the HCatalog Connector waits before testing whether the data transfer from the server is too slow. See the HCatSlowTransferLimit parameter.
Default: 60