Apache Hadoop integration
Apache™ Hadoop™, like OpenText™ Analytics Database, uses a cluster of nodes for distributed processing. The primary component of interest is HDFS, the Hadoop Distributed File System.
You can use OpenText™ Analytics Database with HDFS in several ways:
-
You can import HDFS data into locally-stored ROS files.
-
You can access HDFS data in place using external tables. You can define the tables yourself or get schema information from Hive, a Hadoop component.
-
You can use HDFS as a storage location for ROS files.
-
You can export data from the database to share with other Hadoop components using a Hadoop columnar format. See File export for more information.
Hadoop file paths are expressed as URLs in the webhdfs
or hdfs
URL scheme. For more about using these schemes, see HDFS file system.
Hadoop distributions
The database can be used with Hadoop distributions from Hortonworks, Cloudera, and MapR. See Hadoop integrations for the specific versions that are supported.
If you are using Cloudera, you can manage your database cluster using Cloudera Manager. See Integrating with Cloudera Manager.
If you are using MapR, see Integrating OpenText™ Analytics Database with the MapR distribution of Hadoop.
In this section
- Cluster layout
- Configuring HDFS access
- Accessing kerberized HDFS data
- Using HDFS storage locations
- Using the HCatalog Connector
- Integrating with Cloudera Manager
- Integrating OpenText™ Analytics Database with the MapR distribution of Hadoop
- Hive primer for OpenText™ Analytics Database integration