Azure Blob Storage object store

Azure has several interfaces for accessing data.

Azure has several interfaces for accessing data. Vertica reads and always writes Block Blobs in Azure Storage. Vertica can read external data created using ADLS Gen2, and data that Vertica exports can be read using ADLS Gen2.

URI format

One of the following:

  • azb://account/container/path
  • azb://[account@]host[:port]/container/path

In the first version, a URI like 'azb://myaccount/mycontainer/path' treats the first token after the '//' as the account name. In the second version, you can specify account and must specify host explicitly.

The following rules apply to the second form:

  • If account is not specified, the first label of the host is used. For example, if the URI is 'azb://myaccount.blob.core.windows.net/mycontainer/my/object', then 'myaccount' is used for account.
  • If account is not specified and host has a single label and no port, the endpoint is host.blob.core.windows.net. Otherwise, the endpoint is the host and port specified in the URI.

The protocol (HTTP or HTTPS) is specified in the AzureStorageEndpointConfig configuration parameter.

Authentication

If you are using Azure managed identities, no further configuration in Vertica is needed. If your Azure storage uses multiple managed identities, you must tag the one to be used. Vertica looks for an Azure tag with a key of VerticaManagedIdentityClientId, the value of which must be the client_id attribute of the managed identity to be used. If you update the Azure tag, call AZURE_TOKEN_CACHE_CLEAR.

If you are not using managed identities, use the AzureStorageCredentials configuration parameter to provide credentials to Azure. If loading data, you can set the parameter at the session level. If using Eon Mode communal storage on Azure, you must set this configuration parameter at the database level.

In Azure you must also grant access to the containers for the identities used from Vertica.

Configuration parameters

The following database configuration parameters apply to the Azure blob file system. You can set parameters at different levels with the appropriate ALTER statement, such as ALTER SESSION...SET PARAMETER. Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

For external tables using highly partitioned data in an object store, see the ObjectStoreGlobStrategy configuration parameter and Partitions on Object Stores.

AzureStorageCredentials
Collection of JSON objects, each of which specifies connection credentials for one endpoint. This parameter takes precedence over Azure managed identities.

The collection must contain at least one object and may contain more. Each object must specify at least one of accountName or blobEndpoint, and at least one of accountKey or sharedAccessSignature.

  • accountName: If not specified, uses the label of blobEndpoint.
  • blobEndpoint: Host name with optional port (host:port). If not specified, uses account.blob.core.windows.net.
  • accountKey: Access key for the account or endpoint.
  • sharedAccessSignature: Access token for finer-grained access control, if being used by the Azure endpoint.
AzureStorageEndpointConfig
Collection of JSON objects, each of which specifies configuration elements for one endpoint. Each object must specify at least one of accountName or blobEndpoint.
  • accountName: If not specified, uses the label of blobEndpoint.
  • blobEndpoint: Host name with optional port (host:port). If not specified, uses account.blob.core.windows.net.
  • protocol: HTTPS (default) or HTTP.
  • isMultiAccountEndpoint: true if the endpoint supports multiple accounts, false otherwise (default is false). To use multiple-account access, you must include the account name in the URI. If a URI path contains an account, this value is assumed to be true unless explicitly set to false.

Examples

The following examples use these values for the configuration parameters. AzureStorageCredentials contains sensitive information and is set at the session level in this example.

=> ALTER SESSION SET AzureStorageCredentials =
    '[{"accountName": "myaccount", "accountKey": "REAL_KEY"},
      {"accountName": "myaccount", "blobEndpoint": "localhost:8080", "accountKey": "TEST_KEY"}]';

=> ALTER DATABASE default SET AzureStorageEndpointConfig =
    '[{"accountName": "myaccount", "blobEndpoint": "localhost:8080", "protocol": "http"}]';

The following example creates an external table using data from Azure. The URI specifies an account name of "myaccount".

=> CREATE EXTERNAL TABLE users (id INT, name VARCHAR(20))
    AS COPY FROM 'azb://myaccount/mycontainer/my/object/*';

Vertica uses AzureStorageEndpointConfig and the account name to produce the following location for the files:

https://myaccount.blob.core.windows.net/mycontainer/my/object/*

Data is accessed using the REAL_KEY credential.

If the URI in the COPY statement is instead azb://myaccount.blob.core.windows.net/mycontainer/my/object, then the resulting location is https://myaccount.blob.core.windows.net/mycontainer/my/object, again using the REAL_KEY credential.

However, if the URI in the COPY statement is azb://myaccount@localhost:8080/mycontainer/my/object, then the host and port specify a different endpoint: http://localhost:8080/myaccount/mycontainer/my/object. This endpoint is configured to use a different credential, TEST_KEY.