Best practices for Vertica for SQL on Apache Hadoop

If you are using the Vertica for SQLxa0on Apache Hadoop product, Vertica recommends the following best practices for storage locations:.

If you are using the Vertica for SQL on Apache Hadoop product, Vertica recommends the following best practices for storage locations:

  • Place only data type storage locations on HDFS storage.

  • Place temp space directly on the local Linux file system, not in HDFS.

  • For the best performance, place the Vertica catalog directly on the local Linux file system.

  • Create the database first on a local Linux file system. Then, you can extend the database to HDFS storage locations and set storage policies that exclusively place data blocks on the HDFS storage location.

  • For better performance, if you are running Vertica only on a subset of the HDFS nodes, do not run the HDFS balancer on them. The HDFS balancer can move data blocks farther away, causing Vertica to read non-local data during query execution. Queries run faster if they do not require network I/O.

Generally, HDFS requires approximately 2 GB of memory for each node in the cluster. To support this requirement in your Vertica configuration:

  1. Create a 2-GB resource pool.

  2. Do not assign any Vertica execution resources to this pool. This approach reserves the space for use by HDFS.

Alternatively, use Ambari or Cloudera Manager to find the maximum heap size required by HDFS and set the size of the resource pool to that value.

For more about how to configure resource pools, see Managing workloads.