Integrating Vertica with the MapR distribution of Hadoop

MapR is a distribution of Apache Hadoop produced by MapR Technologies that extends the standard Hadoop components with its own features.

MapR is a distribution of Apache Hadoop produced by MapR Technologies that extends the standard Hadoop components with its own features. Vertica can integrate with MapR in the following ways:

  • You can read data from MapR through an NFS mount point. After you mount the MapR file system as an NFS mount point, you can use CREATE EXTERNAL TABLE AS COPY or COPY to access the data as if it were on the local file system. This option provides the best performance for reading data.

  • You can use the HCatalog Connector to read Hive data. Do not use the HCatalog Connector with ORC or Parquet data in MapR for performance reasons. Instead, mount the MapR file system as an NFS mount point and create external tables without using the Hive schema. For more about reading Hive data, see Using the HCatalog Connector.

  • You can create a storage location to store data in MapR using the native Vertica format (ROS). Mount the MapR file system as an NFS mount point and then use CREATE LOCATION...ALL NODES SHARED to create a storage location. (CREATE LOCATION does not support NFS mount points in general, but does support them for MapR.)

Other Vertica integrations for Hadoop are not available for MapR.

For information on mounting the MapR file system as an NFS mount point, seeAccessing Data with NFS and Configuring Vertica Analytics Platform with MapR on the MapR website. In particular, you must configure MapR to add Vertica as a MapR service.

Examples

In the following examples, the MapR file system has been mounted as /mapr.

The following statement creates an external table from ORC data:

=> CREATE EXTERNAL TABLE t (a1 INT, a2 VARCHAR(20))
   AS COPY FROM '/mapr/data/file.orc' ORC;

The following statement creates an external table from Parquet data and takes advantage of partition pruning (see Using partition columns):

=> CREATE EXTERNAL TABLE t2 (id int, name varchar(50), created date, region varchar(50))
   AS COPY FROM '/mapr/*/*/*' PARQUET(hive_partition_cols='created,region');

The following statement loads ORC data from MapR into Vertica:

=> COPY t FROM '/mapr/data/*.orc' ON ANY NODE ORC;

The following statements create a storage location to hold ROS data in the MapR file system:

=> CREATE LOCATION '/mapr/my.cluster.com/data' SHARED USAGE 'DATA' LABEL 'maprfs';

=> SELECT ALTER_LOCATION_USE('/mapr/my.cluster.com/data', '', 'DATA');