Co-located clusters
With co-located clusters, OpenText™ Analytics Database is installed on some or all of your Hadoop nodes. The database nodes use a private network in addition to the public network used by all Hadoop nodes, as the following figure shows:
You might choose to place the database on all of your Hadoop nodes or only on some of them. If you are using HDFS Storage Locations you should use at least three database nodes, the minimum number for K-safety in an Enterprise Mode database.
Using more database nodes can improve performance because the HDFS data needed by a query is more likely to be local to the node.
You can place Hadoop and the database clusters within a single rack, or you can span across many racks and nodes. If you do not co-locate the database on every node, you can improve performance by co-locating it on at least one node in each rack. See Configuring rack locality.
Normally, both Hadoop and the database use the entire node. Because this configuration uses shared nodes, you must address potential resource contention in your configuration on those nodes. See Configuring Hadoop for co-located clusters for more information. No changes are needed on Hadoop-only nodes.
Hardware recommendations
Hadoop clusters frequently do not have identical provisioning requirements or hardware configurations. However, the database nodes should be equivalent in size and capability, per the best-practice standards recommended in Platform and hardware requirements and recommendations.
It is recommended to use the following specifications for the database nodes in your Hadoop cluster because Hadoop cluster specifications do not always meet these standards.
Specifications for... | Recommendation |
---|---|
Processor |
For best performance, run:
|
Memory |
Distribute the memory appropriately across all memory channels in the server:
|
Storage |
Read/write:
Storage post RAID: Each node should have 1–9 TB. For a production setting, it is recommended to have RAID 10. In some cases, RAID 50 is acceptable. Because the database performs heavy compression and encoding, SSDs are not required. In most cases, a RAID of more, less-expensive HDDs performs just as well as a RAID of fewer SSDs. If you intend to use RAID 50 for your data partition, you should keep a spare node in every rack, allowing for manual failover of a database node in the case of a drive failure. A database node recovery is faster than a RAID 50 rebuild. Also, be sure to never put more than 10 TB compressed on any node, to keep node recovery times at an acceptable rate. |
Network | 10 GB networking in almost every case. With the introduction of 10 GB over cat6a (Ethernet), the cost difference is minimal. |