Create an Eon Mode database on-premises with HDFS

To use HDFS as a communal storage location for an Eon Mode database you must:.

Step 1: satisfy HDFS environment prerequisites

To use HDFS as a communal storage location for an Eon Mode database you must:

  • Run the WebHDFS service.

  • If using Kerberos, create a Kerberos principal for the Vertica (system) user as described in Kerberos authentication, and grant it read and write access to the location in HDFS where you will place your communal storage. Vertica always uses this system principal to access communal storage.

  • If using High Availability Name Node or swebhdfs, distribute the HDFS configuration files to all Vertica nodes as described in Configuring HDFS access. This step is necessary even though you do not use the hdfs scheme for communal storage.

  • If using swebhdfs (wire encryption) instead of webhdfs, configure the HDFS cluster with certificates trusted by the Vertica hosts and set dfs.encrypt.data.transfer in hdfs-site.xml.

  • Vertica has no additional requirements for encryption at rest. Consult the documentation for your Hadoop distribution for information on how to configure encryption at rest for WebHDFS.

Step 2: install Vertica on your cluster

To install Vertica:

  1. Ensure your nodes are configured properly by reviewing all of the content in the Before you install Vertica section.

  2. Use the install_vertica script to verify that your nodes are correctly configured and to install the Vertica binaries on all of your nodes. Follow the steps under Install Vertica using the command line to install Vertica.

Step 3: create a bootstrapping file

Before you create your Eon Mode on-premises database, you must create a bootstrapping file to specify parameters that are required for database creation. This step applies if you are using Kerberos, High Availability Name Node, or TLS (wire encryption).

  1. On the Vertica node where you will run admintools to create your database, use a text editor to create a file. You can name this file anything you wish. In these steps, it is named bootstrap_params.conf. The location of this file isn't important, as long as it is readable by the Linux user you use to create the database (usually, dbadmin).

  2. Add the following lines to the file. HadoopConfDir is typically set to /etc/hadoop/conf; KerberosServiceName is usually set to vertica.

    HadoopConfDir = config-path
    KerberosServiceName = principal-name
    KerberosRealm = realm-name
    KerberosKeytabFile = keytab-path
    

    If you are not using HA Name Node, for example in a test environment, you can omit HadoopConfDir and use an explicit Name Node host and port when specifying the location of the communal storage.

  3. Save the file and exit the editor.

Step 4: choose a depot path on all nodes

Choose or create a directory on each node for the depot storage path. The directory you supply for the depot storage path parameter must:

  • Have the same path on all nodes in the cluster (i.e. /home/dbadmin/depot).

  • Be readable and writable by the dbadmin user.

  • Have sufficient storage. By default, Vertica uses 60% of the filesystem space containing the directory for depot storage. You can limit the size of the depot by using the --depot-size argument in the create_db command. See Configuring your Vertica cluster for Eon Mode for guidelines on choosing a size for your depot.

The admintools create_db tool will attempt to create the depot path for you if it doesn't exist.

Step 5: create the Eon on-premises database

Use the admintools create_db tool to create the database. You must pass this tool the following arguments:

Argument Description
-x The path to the bootstrap configuration file (bootstrap_params.conf in the examples in this section).
--communal-storage-location The webhdfs or swebhdfs URL for the HDFS location. You cannot use the hdfs scheme.
--depot-path The absolute path to store the depot on the nodes in the cluster.
--shard-count The number of shards for the database. This is an integer number that is usually either a multiple of the number of nodes in your cluster, or an even divider. See Planning for Scaling Your Cluster for more information.
-s A comma-separated list of the nodes in your database.
-d The name for your database.

Some common optional arguments include:

Argument Description
-l The absolute path to the Vertica license file to apply to the new database.
-p The password for the new database.
--depot-size

The maximum size for the depot. Defaults to 60% of the filesystem containing the depot path.

You can specify the size in two ways:

  • integer%: Percentage of filesystem's disk space to allocate.

  • integer{K|M|G|T}: Amount of disk space to allocate for the depot in kilobytes, megabytes, gigabytes, or terabytes.

However you specify this value, the depot size cannot be more than 80 percent of disk space of the file system where the depot is stored.

To view all arguments for the create_db tool, run the command:

admintools -t create_db --help

The following example demonstrates creating a three-node database named verticadb, specifying the depot will be stored in the home directory of the dbadmin user.

$ admintools -t create_db -x bootstrap_params.conf \
  --communal-storage-location=webhdfs://mycluster/verticadb \
  --depot-path=/home/dbadmin/depot  --shard-count=6 \
  -s vnode01,vnode02,vnode03 -d verticadb -p 'YourPasswordHere'

If you are not using HA Name Node, for example in a test environment, you can use an explicit Name Node host and port for --communal-storage-location as in the following example.

$ admintools -t create_db -x bootstrap_params.conf \
  --communal-storage-location=webhdfs://namenode.hadoop.example.com:50070/verticadb \
  --depot-path=/home/dbadmin/depot  --shard-count=6 \
  -s vnode01,vnode02,vnode03 -d verticadb -p 'YourPasswordHere'