CREATE LOCATION
Creates a storage location where Vertica can store data. After you create the location, you create storage policies that assign the storage location to the database objects that will store data in the location.
Caution
While no technical issue prevents you from using CREATE LOCATION
to add one or more Network File System (NFS) storage locations, Vertica does not support NFS data or catalog storage except for MapR mount points. You will be unable to run queries against any other NFS data. When creating locations on MapR file systems, you must specify ALL NODES SHARED
.
If you use HDFS storage locations, the HDFS data must be available when you start Vertica. Your HDFS cluster must be operational, and the ROS files must be present. If you moved data files, or they are corrupted, or your HDFS cluster is not responsive, Vertica cannot start.
Syntax
CREATE LOCATION 'path'
[NODE 'node' | ALL NODES]
[SHARED]
[USAGE 'usage']
[LABEL 'label']
[LIMIT 'size']
Arguments
path
- Where to store this location's data. The type of file system on which the location is based determines the
path
format:-
Linux: Absolute path to the directory where Vertica can write the storage location's data.
-
Shared file systems: See the URL specifications in HDFS file system, S3 object store, Google Cloud Storage (GCS) object store, and Azure Blob Storage object store.
HDFS storage locations have additional requirements.
-
ALL NODES | NODE '
node
'- The node or nodes on which the storage location is defined, one of the following:
-
ALL NODES
(default): Create the storage location on each node. If SHARED is also specified, create the storage location once for use by all nodes. -
NODE '
node
'
: Create the storage location on a single node, wherenode
is the name of the node in the NODES system table. You cannot use this option with SHARED.
-
SHARED
- Indicates the location set by
path
is shared (used by all nodes) rather than local to each node. You cannot specify individual nodes with SHARED; you must use ALL NODES.Most remote file systems such as HDFS and S3 are shared. For these file systems, the
path
argument represents a single location in the remote file system where all nodes store data. If using a remote file system, you must specify SHARED, even for one-node clusters.Note
Ifpath
is set to S3 communal storage,SHARED
is always implied and can be omitted.Deprecated
SHARED DATA and SHARED DATA,TEMP storage locations are deprecated. USAGE '
usage
'
- The type of data the storage location can hold, where
usage
is one of the following:-
DATA,TEMP
(default): The storage location can store persistent and temporary DML-generated data, and data for temporary tables. -
TEMP
: Apath
-specified location to store DML-generated temporary data. Ifpath
is set to S3, then this location is used only when the RemoteStorageForTemp configuration parameter is set to 1, andTEMP
must be qualified with ALL NODES SHARED. For details, see S3 Storage of Temporary Data. -
DATA
: The storage location can only store persistent data. -
USER
: Users with READ and WRITE privileges can access data and external tables of this storage location. -
DEPOT
: The storage location is used in Eon Mode to store the depot. Only createDEPOT
storage locations on local Linux file systems.Vertica allows a single
DEPOT
storage location per node. If you want to move your depot to different location (on a different file system, for example) you must first drop the old depot storage location, then create the new location.
-
LABEL '
label
'
- A label for the storage location, used when assigning the storage location to data objects. You use this name later when assigning the storage location to data objects.
Important
You must supply a label for depot storage locations. LIMIT '
size
'
Valid only if the storage location usage type is set to
DEPOT
, specifies the maximum amount of disk space that the depot can allocate from the storage location's file system.You can specify
size
in two ways:-
integer
%
: Percentage of storage location disk size. -
integer
{K|M|G|T}
: Amount of storage location disk size in kilobytes, megabytes, gigabytes, or terabytes.
Important
The depot size cannot exceed 80 percent of the file system disk space where the depot is stored. If you specify a value that is too large, Vertica issues a warning and automatically changes the value to 80 percent of the file system size.If you do not specify a limit, it is set to 60 percent.
-
Privileges
Superuser
File system access
The Vertica process must have read and write permissions to the location where data is to be stored. Each file system has its own requirements:
File system | Requirements |
---|---|
Linux |
Database superuser account (usually named dbadmin) must have full read and write access to the directory in the path argument. |
HDFS without Kerberos |
A Hadoop user whose username matches the Vertica database administrator username (usually dbadmin) must have read and write access to the HDFS directory specified in the path argument. The UseServerIdentityOverUserIdentity configuration parameter must be set to true in the user session; otherwise Vertica tries to use the identity associated with the logged-in user. |
HDFS with Kerberos | A Hadoop user whose username matches the principal in the keytab file on each Vertica node must have read and write access to the HDFS directory stored in the path argument. This is not the same as the database administrator username. The UseServerIdentityOverUserIdentity configuration parameter must be set to true in the user session; otherwise Vertica tries to use the Kerberos principal associated with the logged-in user. |
Object stores (S3, GCS, Azure) | Database-level credentials must be specified and provide full read and write access to the location in the path argument. If session-level credentials are specified they are used, directly overriding the use of the storage location. |
Examples
Create a storage location in the local Linux file system for temporary data storage:
=> CREATE LOCATION '/home/dbadmin/testloc' USAGE 'TEMP' LABEL 'tempfiles';
Create a storage location on HDFS. The HDFS cluster does not use Kerberos:
=> CREATE LOCATION 'hdfs://hadoopNS/vertica/colddata' ALL NODES SHARED
USAGE 'data' LABEL 'coldstorage';
Create the same storage location, but on a Hadoop cluster that uses Kerberos. Note the output that reports the principal being used:
=> CREATE LOCATION 'hdfs://hadoopNS/vertica/colddata' ALL NODES SHARED
USAGE 'data' LABEL 'coldstorage';
NOTICE 0: Performing HDFS operations using kerberos principal [vertica/hadoop.example.com]
CREATE LOCATION
Create a location for user data, grant access to it, and use it to create an external table:
=> CREATE LOCATION '/tmp' ALL NODES USAGE 'user';
CREATE LOCATION
=> GRANT ALL ON LOCATION '/tmp' to Bob;
GRANT PRIVILEGE
=> CREATE EXTERNAL TABLE ext1 (x integer) AS COPY FROM '/tmp/data/ext1.dat' DELIMITER ',';
CREATE TABLE
Create a user storage location on S3 and a role, so that users without their own S3 credentials can read data from S3 using the server credential:
--- set database-level credential (once):
=> ALTER DATABASE DEFAULT SET AWSAuth = 'myaccesskeyid123456:mysecretaccesskey123456789012345678901234';
=> CREATE LOCATION 's3://datalake' SHARED USAGE 'USER' LABEL 's3user';
=> CREATE ROLE ExtUsers;
--- Assign users to this role using GRANT (Role).
=> GRANT READ ON LOCATION 's3://datalake' TO ExtUsers;