Best practices for SQL on Apache Hadoop

If you are using the OpenText™ Analytics Database for SQL on Apache Hadoop product, use the following best practices for storage locations:.

If you are using the OpenText™ Analytics Database for SQL on Apache Hadoop product, use the following best practices for storage locations:

  • Place only data type storage locations on HDFS storage.

  • Place temp space directly on the local Linux file system, not in HDFS.

  • For the best performance, place the database catalog directly on the local Linux file system.

  • Create the database first on a local Linux file system. Then, you can extend the database to HDFS storage locations and set storage policies that exclusively place data blocks on the HDFS storage location.

  • For better performance, if you are running the database only on a subset of the HDFS nodes, do not run the HDFS balancer on them. The HDFS balancer can move data blocks farther away, causing the database to read non-local data during query execution. Queries run faster if they do not require network I/O.

Generally, HDFS requires approximately 2 GB of memory for each node in the cluster. To support this requirement in your database configuration:

  1. Create a 2-GB resource pool.

  2. Do not assign any database execution resources to this pool. This approach reserves the space for use by HDFS.

Alternatively, use Ambari or Cloudera Manager to find the maximum heap size required by HDFS and set the size of the resource pool to that value.

For more about how to configure resource pools, see Managing workloads.