This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Administrator's guide

Welcome to the Vertica Administator's Guide.

Welcome to the Vertica Administator's Guide. This document describes how to set up and maintain a Vertica Analytics Platform database.

Prerequisites

This document makes the following assumptions:

  • You are familiar with the concepts discussed in Architecture.

  • Performed the following procedures as described in Setup:

    • Constructed a hardware platform.

    • Installed Linux.

    • Installed Vertica and configured a cluster of hosts.

1 - Administration overview

This document describes the functions performed by a Vertica database administrator (DBA).

This document describes the functions performed by a Vertica database administrator (DBA). Perform these tasks using only the dedicated database administrator account that was created when you installed Vertica. The examples in this documentation set assume that the administrative account name is dbadmin.

  • To perform certain cluster configuration and administration tasks, the DBA (users of the administrative account) must be able to supply the root password for those hosts. If this requirement conflicts with your organization's security policies, these functions must be performed by your IT staff.

  • If you perform administrative functions using a different account from the account provided during installation, Vertica encounters file ownership problems.

  • If you share the administrative account password, make sure that only one user runs the Administration tools at any time. Otherwise, automatic configuration propagation does not work correctly.

  • The Administration Tools require that the calling user's shell be /bin/bash. Other shells give unexpected results and are not supported.

2 - Managing licenses

You must license Vertica in order to use it.

You must license Vertica in order to use it. Vertica supplies your license in the form of one or more license files, which encode the terms of your license.

To prevent introducing special characters that invalidate the license, do not open the license files in an editor. Opening the file in this way can introduce special characters, such as line endings and file terminators, that may not be visible within the editor. Whether visible or not, these characters invalidate the license.

Applying license files

Be careful not to change the license key file in any way when copying the file between Windows and Linux, or to any other location. To help prevent applications from trying to alter the file, enclose the license file in an archive file (such as a .zip or .tar file). You should keep a back up of your license key file. OpenText recommends that you keep the backup in /opt/vertica.

After copying the license file from one location to another, check that the copied file size is identical to that of the one you received from Vertica.

2.1 - Obtaining a license key file

Follow these steps to obtain a license key file:.

Follow these steps to obtain a license key file:

  1. Log in to the Software Entitlement Key site using your passport login information. If you do not have a passport login, create one.

  2. On the Request Access page, enter your order number and select a role.

  3. Enter your request access reasoning.

  4. Click Submit.

  5. After your request is approved, you will receive a confirmation email. On the site, click the Entitlements tab to see your Vertica software.

  6. Under the Action tab, click Activate. You may select more than one product.

  7. The License Activation page opens. Enter your Target Name.

  8. Select you Vertica version and the quantity you want to activate.

  9. Click Next.

  10. Confirm your activation details and click Submit.

  11. The Activation Results page displays. Follow the instructions in New Vertica license installations or Vertica license changes to complete your installation or upgrade.

Your Vertica Community Edition download package includes the Community Edition license, which allows three nodes and 1TB of data. The Vertica Community Edition license does not expire.

2.2 - Understanding Vertica licenses

Vertica has flexible licensing terms.

Vertica has flexible licensing terms. It can be licensed on the following bases:

  • Term-based (valid until a specific date).

  • Size-based (valid to store up to a specified amount of raw data).

  • Both term- and size-based.

  • Unlimited duration and data storage.

  • Node-based with an unlimited number of CPUs and users (one node is a server acting as a single computer system, whether physical or virtual).

  • A pay-as-you-go model where you pay for only the number of hours you use. This license is available on your cloud provider's marketplace.

Your license key has your licensing bases encoded into it. If you are unsure of your current license, you can view your license information from within Vertica.

Community edition license

Vertica Community Edition (CE) is free and allows customers to cerate databases with the following limits:

  • up to 3 of nodes

  • up to 1 terabyte of data

Community Edition licenses cannot be installed co-located in a Hadoop infrastructure and used to query data stored in Hadoop formats.

As part of the CE license, you agree to the collection of some anonymous, non-identifying usage data. This data lets Vertica understand how customers use the product, and helps guide the development of new features. None of your personal data is collected. For details on what is collected, see the Community Edition End User License Agreement.

Vertica for SQL on Apache Hadoop license

Vertica for SQL on Apache Hadoop is a separate product with its own license. This documentation covers both products. Consult your license agreement for details about available features and limitations.

2.3 - Installing or upgrading a license key

The steps you follow to apply your Vertica license key vary, depending on the type of license you are applying and whether you are upgrading your license.

The steps you follow to apply your Vertica license key vary, depending on the type of license you are applying and whether you are upgrading your license.

2.3.1 - New Vertica license installations

Follow these steps to install a new Vertica license:.

Follow these steps to install a new Vertica license:

  1. Copy the license key file you generated from the Software Entitlement Key site to your Administration host.

  2. Ensure the license key's file permissions are set to 400 (read permissions).

  3. Install Vertica as described in the Installing Vertica if you have not already done so. The interface prompts you for the license key file.

  4. To install Community Edition, leave the default path blank and click OK. To apply your evaluation or Premium Edition license, enter the absolute path of the license key file you downloaded to your Administration Host and press OK. The first time you log in as the Database Superuser and run the Administration tools, the interface prompts you to accept the End-User License Agreement (EULA).

  5. Choose View EULA.

  6. Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing the license, or choose Reject EULA to reject the EULA and return to the Advanced Menu.

2.3.2 - Vertica license changes

If your license is expiring or you want your database to grow beyond your licensed data size, you must renew or upgrade your license.

If your license is expiring or you want your database to grow beyond your licensed data size, you must renew or upgrade your license. After you obtain your renewal or upgraded license key file, you can install it using Administration Tools or Management Console.

Upgrading does not require a new license unless you are increasing the capacity of your database. You can add-on capacity to your database using the Software Entitlement Key. You do not need uninstall and reinstall the license to add-on capacity.

Uploading or upgrading a license key using administration tools

  1. Copy the license key file you generated from the Software Entitlement Key site to your Administration host.

  2. Ensure the license key's file permissions are set to 400 (read permissions).

  3. Start your database, if it is not already running.

  4. In the Administration Tools, select Advanced > Upgrade License Key and click OK.

  5. Enter the absolute path to your new license key file and click OK. The interface prompts you to accept the End-User License Agreement (EULA).

  6. Choose View EULA.

  7. Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing the license, or choose Reject EULA to reject the EULA and return to the Advanced Menu.

Uploading or upgrading a license key using Management Console

  1. From your database's Overview page in Management Console, click the License tab. The License page displays. You can view your installed licenses on this page.

  2. Click Install New License at the top of the License page.

  3. Browse to the location of the license key from your local computer and upload the file.

  4. Click Apply at the top of the page. Management Console prompts you to accept the End-User License Agreement (EULA).

  5. Select the check box to officially accept the EULA and continue installing the license, or click Cancel to exit.

Adding capacity

If you are adding capacity to your database, you do not need to uninstall and reinstall the license. Instead, you can install multiple licenses to increase the size of your database. This additive capacity only works for licenses with the same format, such as adding a Premium license capacity to an existing Premium license type. When you add capacity, the size of license will be the total of both licenses; the previous license is not overwritten. You cannot add capacity using two different license formats, such as adding Hadoop license capacity to an existing Premium license.

You can run the AUDIT() function to verify the license capacity was added on. The reflection of add-on capacity to your license will run during the automatic run of the audit function. If you want to see the immediate result of the add-on capacity, run the AUDIT() function to refresh.

2.4 - Viewing your license status

You can use several functions to display your license terms and current status.

You can use several functions to display your license terms and current status.

Examining your license key

Use the DISPLAY_LICENSE SQL function to display the license information. This function displays the dates for which your license is valid (or Perpetual if your license does not expire) and any raw data allowance. For example:

=> SELECT DISPLAY_LICENSE();
                  DISPLAY_LICENSE
---------------------------------------------------
 Vertica Systems, Inc.
2007-08-03
Perpetual
500GB

(1 row)

You can also query the LICENSES system table to view information about your installed licenses. This table displays your license types, the dates for which your licenses are valid, and the size and node limits your licenses impose.

Alternatively, use the LICENSES table in Management Console. On your database Overview page, click the License tab to view information about your installed licenses.

Viewing your license compliance

If your license includes a raw data size allowance, Vertica periodically audits your database's size to ensure it remains compliant with the license agreement. If your license has a term limit, Vertica also periodically checks to see if the license has expired. You can see the result of the latest audits using the GET_COMPLIANCE_STATUS function.


=> select GET_COMPLIANCE_STATUS();
                       GET_COMPLIANCE_STATUS
---------------------------------------------------------------------------------
 Raw Data Size: 2.00GB +/- 0.003GB
 License Size : 4.000GB
 Utilization  : 50%
 Audit Time   : 2011-03-09 09:54:09.538704+00
 Compliance Status : The database is in compliance with respect to raw data size.
 License End Date: 04/06/2011
 Days Remaining: 28.59
(1 row)

To see how your ORC/Parquet data is affecting your license compliance, see Viewing license compliance for Hadoop file formats.

Viewing your license status through MC

Information about license usage is on the Settings page. See Monitoring database size for license compliance.

2.5 - Viewing license compliance for Hadoop file formats

You can use the EXTERNAL_TABLE_DETAILS system table to gather information about all of your tables based on Hadoop file formats.

You can use the EXTERNAL_TABLE_DETAILS system table to gather information about all of your tables based on Hadoop file formats. This information can help you understand how much of your license's data allowance is used by ORC and Parquet-based data.

Vertica computes the values in this table at query time, so to avoid performance problems, restrict your queries to filter by table_schema, table_name, or source_format. These three columns are the only columns you can use in a predicate, but you may use all of the usual predicate operators.

=> SELECT * FROM EXTERNAL_TABLE_DETAILS
    WHERE source_format = 'PARQUET' OR source_format = 'ORC';
-[ RECORD 1 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schema_oid            | 45035996273704978
table_schema          | public
table_oid             | 45035996273760390
table_name            | ORC_demo
source_format         | ORC
total_file_count      | 5
total_file_size_bytes | 789
source_statement      | COPY FROM 'ORC_demo/*' ORC
file_access_error     |
-[ RECORD 2 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schema_oid            | 45035196277204374
table_schema          | public
table_oid             | 45035996274460352
table_name            | Parquet_demo
source_format         | PARQUET
total_file_count      | 3
total_file_size_bytes | 498
source_statement      | COPY FROM 'Parquet_demo/*' PARQUET
file_access_error     |

When computing the size of an external table, Vertica counts all data found in the location specified by the COPY FROM clause. If you have a directory that contains ORC and delimited files, for example, and you define your external table with "COPY FROM *" instead of "COPY FROM *.orc", this table includes the size of the delimited files. (You would probably also encounter errors when querying that external table.) When you query this table Vertica does not validate your table definition; it just uses the path to find files to report.

You can also use the AUDIT function to find the size of a specific table or schema. When using the AUDIT function on ORC or PARQUET external tables, the error tolerance and confidence level parameters are ignored. Instead, the AUDIT always returns the size of the ORC or Parquet files on disk.

=> select AUDIT('customers_orc');
   AUDIT
-----------
 619080883
(1 row)

2.6 - Moving a cloud installation from by the hour (BTH) to bring your own license (BYOL)

Vertica offers two licensing options for some of the entries in the Amazon Web Services Marketplace and Google Cloud Marketplace:.

Vertica offers two licensing options for some of the entries in the Amazon Web Services Marketplace and Google Cloud Marketplace:

  • Bring Your Own License (BYOL): a long-term license that you obtain through an online licensing portal. These deployments also work with a free Community Edition license. Vertica uses a community license automatically if you do not install a license that you purchased. (For more about Vertica licenses, see Managing licenses and Understanding Vertica licenses.)
  • Vertica by the Hour (BTH): a pay-as-you-go environment where you are charged an hourly fee for both the use of Vertica and the cost of the instances it runs on. The Vertica by the hour deployment offers an alternative to purchasing a term license. If you want to crunch large volumes of data within a short period of time, this option might work better for you. The BTH license is automatically applied to all clusters you create using a BTH MC instance.

If you start out with an hourly license, you can later decide to use a long-term license for your database. The support for an hourly versus a long-term license is built into the instances running your database. To move your database from an hourly license to a long-term license, you must create a new database cluster with a new set of instances.

To move from an hourly to a long-term license, follow these steps:

  1. Purchase a BYOL license. Follow the process described in Obtaining a license key file.

  2. Apply the new license to your database.

  3. Shut down your database.

  4. Create a new database cluster using a BYOL marketplace entry.

  5. Revive your database onto the new cluster.

The exact steps you must take depend on your database mode and your preferred tool for managing your database:

Moving an Eon Mode database from BTH to BYOL using the command line

Follow these steps to move an Eon Mode database from an hourly to a long-term license.

Obtain a long-term BYOL license from the online licensing portal, described in Obtaining a license key file.Upload the license file to a node in your database. Note the absolute path in the node's filesystem, as you will need this later when installing the license.Connect to the node you uploaded the license file to in the previous step. Connect to your database using vsql and view the licenses table:

=> SELECT * FROM licenses;

Note the name of the hourly license listed in the NAME column, so you can check if it is still present later.

Install the license in the database using the INSTALL_LICENSE function with the absolute path to the license file you uploaded in step 2:

=> SELECT install_license('absolute path to BYOL license');

View the licenses table again:

=> SELECT * FROM licenses;

If only the new BYOL license appears in the table, skip to step 8. If the hourly license whose name you noted in step 4 is still in the table, copy the name and proceed to step 7.

Call the DROP_LICENSE function to drop the hourly license:

=> SELECT drop_license('hourly license name');

  1. You will need the path for your cluster's communal storage in a later step. If you do not already know the path, you can find this information by executing this query:

    => SELECT location_path FROM V_CATALOG.STORAGE_LOCATIONS
       WHERE sharing_type = 'COMMUNAL';
    
  2. Synchronize your database's metadata. See Synchronizing metadata.

  3. Shut down the database by calling the SHUTDOWN function:

    => SELECT SHUTDOWN();
    
  4. You now need to create a new BYOL cluster onto which you will revive your database. Deploy a new cluster including a new MC instance using a BYOL entry in the marketplace of your chosen cloud platform. See:

  5. Revive your database onto the new cluster. For instructions, see Reviving an Eon Mode database cluster. Because you created the new cluster using a BYOL entry in the marketplace, the database uses the BYOL you applied earlier.

  6. After reviving the database on your new BYOL cluster, terminate the instances for your hourly license cluster and MC. For instructions, see your cloud provider's documentation.

Moving an Eon Mode database from BTH to BYOL using the MC

Follow this procedure to move to BYOL and revive your database using MC:

  1. Purchase a long-term BYOL license from the online licensing portal, following the steps detailed in Obtaining a license key file. Save the file to a location on your computer.

  2. You now need to install the new license on your database. Log into MC and click your database in the Recent Databases list.

  3. At the bottom of your database's Overview page, click the License tab.

  4. Under the Installed Licenses list, note the name of the BTH license in the License Name column. You will need this later to check whether it is still present after installing the new long-term license.

  5. In the ribbon at the top of the License History page, click the Install New License button. The Settings: License page opens.

  6. Click the Browse button next to the Upload a new license box.

  7. Locate the license file you obtained in step 1, and click Open.

  8. Click the Apply button on the top right of the page.

  9. Select the checkbox to agree to the EULA terms and click OK.

  10. After Vertica installs the license, click the Close button.

  11. Click the License tab at the bottom of the page.

  12. If only the new long-term license appears in the Installed Licenses list, skip to Step 16. If the by-the-hour license also appears in the list, copy down its name from the License Name column.

  13. You must drop the by-the-hour license before you can proceed. At the bottom of the page, click the Query Execution tab.

  14. In the query editor, enter the following statement:

    SELECT DROP_LICENSE('hourly license name');
    
  15. Click Execute Query. The query should complete indicating that the license has been dropped.

  16. You will need the path for your cluster's communal storage in a later step. If you do not already know the path, you can find this information by executing this query in the Query Execution tab:

    SELECT location_path FROM V_CATALOG.STORAGE_LOCATIONS
       WHERE sharing_type = 'COMMUNAL';
    
  17. Synchronize your database's metadata. See Synchronizing metadata.

  18. You must now stop your by-the-hour database cluster. At the bottom of the page, click the Manage tab.

  19. In the banner at the top of the page, click Stop Database and then click OK to confirm.

  20. From the Amazon Web Services Marketplace or the Google Cloud Marketplace, deploy a new Vertica Management Console using a BYOL entry. Do not deploy a full cluster. You just need an MC deployment.

  21. Log into your new MC instance and revive the database. See Reviving an Eon Mode database on AWS in MC for detailed instructions.

  22. After reviving the database on your new environment, terminate the instances for your hourly license environment. To do so, on the AWS CloudFormation Stacks page, select the hourly environment's stack (its collection of AWS resources) and click Actions > Delete Stack.

Moving an Enterprise Mode database from hourly to BYOL using backup and restore

In an Enterprise Mode database, follow this procedure to move to BYOL, and then back up and restore your database:

Obtain a long-term BYOL license from the online licensing portal, described in Obtaining a license key file.Upload the license file to a node in your database. Note the absolute path in the node's filesystem, as you will need this later when installing the license.Connect to the node you uploaded the license file to in the previous step. Connect to your database using vsql and view the licenses table:

=> SELECT * FROM licenses;

Note the name of the hourly license listed in the NAME column, so you can check if it is still present later.

Install the license in the database using the INSTALL_LICENSE function with the absolute path to the license file you uploaded in step 2:

=> SELECT install_license('absolute path to BYOL license');

View the licenses table again:

=> SELECT * FROM licenses;

If only the new BYOL license appears in the table, skip to step 8. If the hourly license whose name you noted in step 4 is still in the table, copy the name and proceed to step 7.

Call the DROP_LICENSE function to drop the hourly license:

=> SELECT drop_license('hourly license name');

  1. Back up the database. See Backing up and restoring the database.

  2. Deploy a new cluster for your database using one of the BYOL entries in the Amazon Web Services Marketplace.

  3. Restore the database from the backup you created earlier. See Backing up and restoring the database. When you restore the database, it will use the BYOL you loaded earlier.

  4. After restoring the database on your new environment, terminate the instances for your hourly license environment. To do so, on the AWS CloudFormation Stacks page, select the hourly environment's stack (its collection of AWS resources) and click Actions > Delete Stack.

After completing one of these procedures, see Viewing your license status to confirm the license drop and install were successful.

2.7 - Auditing database size

You can use your Vertica software until columnar data reaches the maximum raw data size that your license agreement allows.

You can use your Vertica software until columnar data reaches the maximum raw data size that your license agreement allows. Vertica periodically runs an audit of the columnar data size to verify that your database complies with this agreement. You can also run your own audits of database size with two functions:

  • AUDIT: Estimates the raw data size of a database, schema, or table.

  • AUDIT_FLEX: Estimates the size of one or more flexible tables in a database, schema, or projection.

The following two examples audit the database and one schema:

=> SELECT AUDIT('', 'database');
  AUDIT
----------
 76376696
(1 row)
=> SELECT AUDIT('online_sales', 'schema');
  AUDIT
----------
 35716504
(1 row)

Raw data size

AUDIT and AUDIT_FLEX use statistical sampling to estimate the raw data size of data stored in tables—that is, the uncompressed data that the database stores. For most data types, Vertica evaluates the raw data size as if the data were exported from the database in text format, rather than as compressed data. For details, see Evaluating Data Type Footprint.

By using statistical sampling, the audit minimizes its impact on database performance. The tradeoff between accuracy and performance impact is a small margin of error. Reports on your database size include the margin of error, so you can assess the accuracy of the estimate.

Data in ORC and Parquet-based external tables are also audited whether they are stored locally in the Vertica cluster's file system or remotely in S3 or on a Hadoop cluster. AUDIT always uses the file size of the underlying data files as the amount of data in the table. For example, suppose you have an external table based on 1GB of ORC files stored in HDFS. Then an audit of the table reports it as being 1GB in size.

Unaudited data

Table data that appears in multiple projections is counted only once. An audit also excludes the following data:

  • Temporary table data.

  • Data in SET USING columns.

  • Non-columnar data accessible through external table definitions. Data in columnar formats such as ORC and Parquet count against your totals.

  • Data that was deleted but not yet purged.

  • Data stored in system and work tables such as monitoring tables, Data collector tables, and Database Designer tables.

  • Delimiter characters.

Evaluating data type footprint

Vertica evaluates the footprint of different data types as follows:

  • Strings and binary types—CHAR, VARCHAR, BINARY, VARBINARY—are counted as their actual size in bytes using UTF-8 encoding.

  • Numeric data types are evaluated as if they were printed. Each digit counts as a byte, as does any decimal point, sign, or scientific notation. For example, -123.456 counts as eight bytes—six digits plus the decimal point and minus sign.

  • Date/time data types are evaluated as if they were converted to text, including hyphens, spaces, and colons. For example, vsql prints a timestamp value of 2011-07-04 12:00:00 as 19 characters, or 19 bytes.

  • Complex types are evaluated as the sum of the sizes of their component parts. An array is counted as the total size of all elements, and a ROW is counted as the total size of all fields.

Controlling audit accuracy

AUDIT can specify the level of an audit's error tolerance and confidence, by default set to 5 and 99 percent, respectively. For example, you can obtain a high level of audit accuracy by setting error tolerance and confidence level to 0 and 100 percent, respectively. Unlike estimating raw data size with statistical sampling, Vertica dumps all audited data to a raw format to calculate its size.

The following example audits the database with 25% error tolerance:

=> SELECT AUDIT('', 25);
  AUDIT
----------
 75797126
(1 row)

The following example audits the database with 25% level of tolerance and 90% confidence level:

=> SELECT AUDIT('',25,90);
  AUDIT
----------
 76402672
(1 row)

2.8 - Monitoring database size for license compliance

Your Vertica license can include a data storage allowance.

Your Vertica license can include a data storage allowance. The allowance can consist of data in columnar tables, flex tables, or both types of data. The AUDIT() function estimates the columnar table data size and any flex table materialized columns. The AUDIT_FLEX() function estimates the amount of __raw__ column data in flex or columnar tables. In regards to license data limits, data in __raw__ columns is calculated at 1/10th the size of structured data. Monitoring data sizes for columnar and flex tables lets you plan either to schedule deleting old data to keep your database in compliance with your license, or to consider a license upgrade for additional data storage.

Viewing your license compliance status

Vertica periodically runs an audit of the columnar data size to verify that your database is compliant with your license terms. You can view the results of the most recent audit by calling the GET_COMPLIANCE_STATUS function.


=> select GET_COMPLIANCE_STATUS();
                       GET_COMPLIANCE_STATUS
---------------------------------------------------------------------------------
 Raw Data Size: 2.00GB +/- 0.003GB
 License Size : 4.000GB
 Utilization  : 50%
 Audit Time   : 2011-03-09 09:54:09.538704+00
 Compliance Status : The database is in compliance with respect to raw data size.
 License End Date: 04/06/2011
 Days Remaining: 28.59
(1 row)

Periodically running GET_COMPLIANCE_STATUS to monitor your database's license status is usually enough to ensure that your database remains compliant with your license. If your database begins to near its columnar data allowance, you can use the other auditing functions described below to determine where your database is growing and how recent deletes affect the database size.

Manually auditing columnar data usage

You can manually check license compliance for all columnar data in your database using the AUDIT_LICENSE_SIZE function. This function performs the same audit that Vertica periodically performs automatically. The AUDIT_LICENSE_SIZE check runs in the background, so the function returns immediately. You can then query the results using GET_COMPLIANCE_STATUS.

An alternative to AUDIT_LICENSE_SIZE is to use the AUDIT function to audit the size of the columnar tables in your entire database by passing an empty string to the function. This function operates synchronously, returning when it has estimated the size of the database.

=> SELECT AUDIT('');
  AUDIT
----------
 76376696
(1 row)

The size of the database is reported in bytes. The AUDIT function also allows you to control the accuracy of the estimated database size using additional parameters. See the entry for the AUDIT function for full details. Vertica does not count the AUDIT function results as an official audit. It takes no license compliance actions based on the results.

Manually auditing __raw__ column data

You can use the AUDIT_FLEX function to manually audit data usage for flex or columnar tables with a __raw__ column. The function calculates the encoded, compressed data stored in ROS containers for any __raw__ columns. Materialized columns in flex tables are calculated by the AUDIT function. The AUDIT_FLEX results do not include data in the __raw__ columns of temporary flex tables.

Targeted auditing

If audits determine that the columnar table estimates are unexpectedly large, consider schemas, tables, or partitions that are using the most storage. You can use the AUDIT function to perform targeted audits of schemas, tables, or partitions by supplying the name of the entity whose size you want to find. For example, to find the size of the online_sales schema in the VMart example database, run the following command:

=> SELECT AUDIT('online_sales');
  AUDIT
----------
 35716504
(1 row)

You can also change the granularity of an audit to report the size of each object in a larger entity (for example, each table in a schema) by using the granularity argument of the AUDIT function. See the AUDIT function.

Using Management Console to monitor license compliance

You can also get information about data storage of columnar data (for columnar tables and for materialized columns in flex tables) through the Management Console. This information is available in the database Overview page, which displays a grid view of the database's overall health.

  • The needle in the license meter adjusts to reflect the amount used in megabytes.

  • The grace period represents the term portion of the license.

  • The Audit button returns the same information as the AUDIT() function in a graphical representation.

  • The Details link within the License grid (next to the Audit button) provides historical information about license usage. This page also shows a progress meter of percent used toward your license limit.

2.9 - Managing license warnings and limits

The term portion of a Vertica license is easy to manage—you are licensed to use Vertica until a specific date.

Term license warnings and expiration

The term portion of a Vertica license is easy to manage—you are licensed to use Vertica until a specific date. If the term of your license expires, Vertica alerts you with messages appearing in the Administration tools and vsql. For example:

=> CREATE TABLE T (A INT);
NOTICE 8723: Vertica license 432d8e57-5a13-4266-a60d-759275416eb2 is in its grace period; grace period expires in 28 days
HINT: Renew at https://softwaresupport.softwaregrp.com/
CREATE TABLE

Contact Vertica at https://softwaresupport.softwaregrp.com/ as soon as possible to renew your license, and then install the new license. After the grace period expires, Vertica stops processing DML queries and allows DDL queries with a warning message. If a license expires and one or more valid alternative licenses are installed, Vertica uses the alternative licenses.

Data size license warnings and remedies

If your Vertica columnar license includes a raw data size allowance, Vertica periodically audits the size of your database to ensure it remains compliant with the license agreement. For details of this audit, see Auditing database size. You should also monitor your database size to know when it will approach licensed usage. Monitoring the database size helps you plan to either upgrade your license to allow for continued database growth or delete data from the database so you remain compliant with your license. See Monitoring database size for license compliance for details.

If your database's size approaches your licensed usage allowance (above 75% of license limits), you will see warnings in the Administration tools , vsql, and Management Console. You have two options to eliminate these warnings:

  • Upgrade your license to a larger data size allowance.

  • Delete data from your database to remain under your licensed raw data size allowance. The warnings disappear after Vertica's next audit of the database size shows that it is no longer close to or over the licensed amount. You can also manually run a database audit (see Monitoring database size for license compliance for details).

If your database continues to grow after you receive warnings that its size is approaching your licensed size allowance, Vertica displays additional warnings in more parts of the system after a grace period passes. Use the GET_COMPLIANCE_STATUS function to check the status of your license.

If your Vertica premium edition database size exceeds your licensed limits

If your Premium Edition database size exceeds your licensed data allowance, all successful queries from ODBC and JDBC clients return with a status of SUCCESS_WITH_INFO instead of the usual SUCCESS. The message sent with the results contains a warning about the database size. Your ODBC and JDBC clients should be prepared to handle these messages instead of assuming that successful requests always return SUCCESS.

If your Vertica community edition database size exceeds 1 terabyte

If your Community Edition database size exceeds the limit of 1 terabyte, Vertica stops processing DML queries and allows DDL queries with a warning message.

To bring your database under compliance, you can choose to:

  • Drop database tables. You can also consider truncating a table or dropping a partition. See TRUNCATE TABLE or DROP_PARTITIONS.

  • Upgrade to Vertica Premium Edition (or an evaluation license).

2.10 - Exporting license audit results to CSV

You can use admintools to audit a database for license compliance and export the results in CSV format, as follows:.

You can use admintools to audit a database for license compliance and export the results in CSV format, as follows:

admintools -t license_audit [--password=password] --database=database] [--file=csv-file] [--quiet]

where:

  • database must be a running database. If the database is password protected, you must also supply the password.

  • --file csv-file directs output to the specified file. If csv-file already exists, the tool returns an error message. If this option is unspecified, output is directed to stdout.

  • --quiet specifies that the tool should run in quiet mode; if unspecified, status messages are sent to stdout.

Running the license_audit tool is equivalent to invoking the following SQL statements:


select audit('');
select audit_flex('');
select * from dc_features_used;
select * from v_catalog.license_audits;
select * from v_catalog.user_audits;

Audit results include the following information:

  • Log of used Vertica features

  • Estimated database size

  • Raw data size allowed by your Vertica license

  • Percentage of licensed allowance that the database currently uses

  • Audit timestamps

The following truncated example shows the raw CSV output that license_audit generates:


FEATURES_USED
features_used,feature,date,sum
features_used,metafunction::get_compliance_status,2014-08-04,1
features_used,metafunction::bootstrap_license,2014-08-04,1
...

LICENSE_AUDITS
license_audits,database_size_bytes,license_size_bytes,usage_percent,audit_start_timestamp,audit_end_timestamp,confidence_level_percent,error_tolerance_percent,used_sampling,confidence_interval_lower_bound_bytes,confidence_interval_upper_bound_bytes,sample_count,cell_count,license_name
license_audits,808117909,536870912000,0.00150523690320551,2014-08-04 23:59:00.024874-04,2014-08-04 23:59:00.578419-04,99,5,t,785472097,830763721,10000,174754646,vertica
...

USER_AUDITS
user_audits,size_bytes,user_id,user_name,object_id,object_type,object_schema,object_name,audit_start_timestamp,audit_end_timestamp,confidence_level_percent,error_tolerance_percent,used_sampling,confidence_interval_lower_bound_bytes,confidence_interval_upper_bound_bytes,sample_count,cell_count
user_audits,812489249,45035996273704962,dbadmin,45035996273704974,DATABASE,,VMart,2014-10-14 11:50:13.230669-04,2014-10-14 11:50:14.069057-04,99,5,t,789022736,835955762,10000,174755178

AUDIT_SIZE_BYTES
audit_size_bytes,now,audit
audit_size_bytes,2014-10-14 11:52:14.015231-04,810584417

FLEX_SIZE_BYTES
flex_size_bytes,now,audit_flex
flex_size_bytes,2014-10-14 11:52:15.117036-04,11850

3 - Configuring the database

Before reading the topics in this section, you should be familiar with the material in [%=Vertica.GETTING_STARTED_GUIDE%] and are familiar with creating and configuring a fully-functioning example database.

Before reading the topics in this section, you should be familiar with the material in Getting started and are familiar with creating and configuring a fully-functioning example database.

See also

3.1 - Configuration procedure

This section describes the tasks required to set up a Vertica database.

This section describes the tasks required to set up a Vertica database. It assumes that you have a valid license key file, installed the Vertica rpm package, and ran the installation script as described.

You complete the configuration procedure using:

Continuing configuring

Follow the configuration procedure sequentially as this section describes.

Vertica strongly recommends that you first experiment with creating and configuring a database.

You can use this generic configuration procedure several times during the development process, modifying it to fit your changing goals. You can omit steps such as preparing actual data files and sample queries, and run the Database Designer without optimizing for queries. For example, you can create, load, and query a database several times for development and testing purposes, then one final time to create and load the production database.

3.1.1 - Prepare disk storage locations

You must create and specify directories in which to store your catalog and data files ().

You must create and specify directories in which to store your catalog and data files (physical schema). You can specify these locations when you install or configure the database, or later during database operations. Both the catalog and data directories must be owned by the database superuser.

The directory you specify for database catalog files (the catalog path) is used across all nodes in the cluster. For example, if you specify /home/catalog as the catalog directory, Vertica uses that catalog path on all nodes. The catalog directory should always be separate from any data file directories.

The data path you designate is also used across all nodes in the cluster. Specifying that data should be stored in /home/data, Vertica uses this path on all database nodes.

Do not use a single directory to contain both catalog and data files. You can store the catalog and data directories on different drives, which can be either on drives local to the host (recommended for the catalog directory) or on a shared storage location, such as an external disk enclosure or a SAN.

Before you specify a catalog or data path, be sure the parent directory exists on all nodes of your database. Creating a database in admintools also creates the catalog and data directories, but the parent directory must exist on each node.

You do not need to specify a disk storage location during installation. However, you can do so by using the --data-dir parameter to the install_vertica script. See Specifying disk storage location during installation.

3.1.1.1 - Specifying disk storage location during database creation

When you invoke the Create Database command in the , a dialog box allows you to specify the catalog and data locations.

When you invoke the Create Database command in the Administration tools, a dialog box allows you to specify the catalog and data locations. These locations must exist on each host in the cluster and must be owned by the database administrator.

Database data directories

When you click OK, Vertica automatically creates the following subdirectories:

catalog-pathname/database-name/node-name_catalog/data-pathname/database-name/node-name_data/

For example, if you use the default value (the database administrator's home directory) of /home/dbadmin for the Stock Exchange example database, the catalog and data directories are created on each node in the cluster as follows:

/home/dbadmin/Stock_Schema/stock_schema_node1_host01_catalog/home/dbadmin/Stock_Schema/stock_schema_node1_host01_data

Notes

  • Catalog and data path names must contain only alphanumeric characters and cannot have leading space characters. Failure to comply with these restrictions will result in database creation failure.

  • Vertica refuses to overwrite a directory if it appears to be in use by another database. Therefore, if you created a database for evaluation purposes, dropped the database, and want to reuse the database name, make sure that the disk storage location previously used has been completely cleaned up. See Managing storage locations for details.

3.1.1.2 - Specifying disk storage location on MC

You can use the MC interface to specify where you want to store database metadata on the cluster in the following ways:.

You can use the MC interface to specify where you want to store database metadata on the cluster in the following ways:

  • When you configure MC the first time

  • When you create new databases using on MC

See also

Configuring Management Console.

3.1.1.3 - Configuring disk usage to optimize performance

Once you have created your initial storage location, you can add additional storage locations to the database later.

Once you have created your initial storage location, you can add additional storage locations to the database later. Not only does this provide additional space, it lets you control disk usage and increase I/O performance by isolating files that have different I/O or access patterns. For example, consider:

  • Isolating execution engine temporary files from data files by creating a separate storage location for temp space.

  • Creating labeled storage locations and storage policies, in which selected database objects are stored on different storage locations based on measured performance statistics or predicted access patterns.

See also

Managing storage locations

3.1.1.4 - Using shared storage with Vertica

If using shared SAN storage, ensure there is no contention among the nodes for disk space or bandwidth.

If using shared SAN storage, ensure there is no contention among the nodes for disk space or bandwidth.

  • Each host must have its own catalog and data locations. Hosts cannot share catalog or data locations.

  • Configure the storage so that there is enough I/O bandwidth for each node to access the storage independently.

3.1.1.5 - Viewing database storage information

You can view node-specific information on your Vertica cluster through the.

You can view node-specific information on your Vertica cluster through the Management Console. See Monitoring Vertica Using Management Console for details.

3.1.1.6 - Anti-virus scanning exclusions

You should exclude the Vertica catalog and data directories from anti-virus scanning.

You should exclude the Vertica catalog and data directories from anti-virus scanning. Certain anti-virus products have been identified as targeting Vertica directories, and sometimes lock or delete files in them. This can adversely affect Vertica performance and data integrity.

Identified anti-virus products include the following:

  • ClamAV

  • SentinelOne

  • Sophos

  • Symantec

  • Twistlock

3.1.2 - Disk space requirements for Vertica

In addition to actual data stored in the database, Vertica requires disk space for several data reorganization operations, such as and managing nodes in the cluster.

In addition to actual data stored in the database, Vertica requires disk space for several data reorganization operations, such as mergeout and managing nodes in the cluster. For best results, Vertica recommends that disk utilization per node be no more than sixty percent (60%) for a K-Safe=1 database to allow such operations to proceed.

In addition, disk space is temporarily required by certain query execution operators, such as hash joins and sorts, in the case when they cannot be completed in memory (RAM). Such operators might be encountered during queries, recovery, refreshing projections, and so on. The amount of disk space needed (known as temp space) depends on the nature of the queries, amount of data on the node and number of concurrent users on the system. By default, any unused disk space on the data disk can be used as temp space. However, Vertica recommends provisioning temp space separate from data disk space.

See also

Configuring disk usage to optimize performance.

3.1.3 - Disk space requirements for Management Console

You can install Management Console on any node in the cluster, so it has no special disk requirements other than disk space you allocate for your database cluster.

You can install Management Console on any node in the cluster, so it has no special disk requirements other than disk space you allocate for your database cluster.

3.1.4 - Prepare the logical schema script

Designing a logical schema for a Vertica database is no different from designing one for any other SQL database.

Designing a logical schema for a Vertica database is no different from designing one for any other SQL database. Details are described more fully in Designing a logical schema.

To create your logical schema, prepare a SQL script (plain text file, typically with an extension of .sql) that:

  1. Creates additional schemas (as necessary). See Using multiple schemas.

  2. Creates the tables and column constraints in your database using the CREATE TABLE command.

  3. Defines the necessary table constraints using the ALTER TABLE command.

  4. Defines any views on the table using the CREATE VIEW command.

You can generate a script file using:

  • A schema designer application.

  • A schema extracted from an existing database.

  • A text editor.

  • One of the example database example-name_define_schema.sql scripts as a template. (See the example database directories in /opt/vertica/examples.)

In your script file, make sure that:

  • Each statement ends with a semicolon.

  • You use data types supported by Vertica, as described in the SQL Reference Manual.

Once you have created a database, you can test your schema script by executing it as described in Create the logical schema. If you encounter errors, drop all tables, correct the errors, and run the script again.

3.1.5 - Prepare data files

Prepare two sets of data files:.

Prepare two sets of data files:

  • Test data files. Use test files to test the database after the partial data load. If possible, use part of the actual data files to prepare the test data files.

  • Actual data files. Once the database has been tested and optimized, use your data files for your initial Data load.

How to name data files

Name each data file to match the corresponding table in the logical schema. Case does not matter.

Use the extension .tbl or whatever you prefer. For example, if a table is named Stock_Dimension, name the corresponding data file stock_dimension.tbl. When using multiple data files, append _nnn (where nnn is a positive integer in the range 001 to 999) to the file name. For example, stock_dimension.tbl_001, stock_dimension.tbl_002, and so on.

3.1.6 - Prepare load scripts

You can postpone this step if your goal is to test a logical schema design for validity.

Prepare SQL scripts to load data directly into physical storage using COPY on vsql, or through ODBC.

You need scripts that load:

  • Large tables

  • Small tables

Vertica recommends that you load large tables using multiple files. To test the load process, use files of 10GB to 50GB in size. This size provides several advantages:

  • You can use one of the data files as a sample data file for the Database Designer.

  • You can load just enough data to Perform a partial data load before you load the remainder.

  • If a single load fails and rolls back, you do not lose an excessive amount of time.

  • Once the load process is tested, for multi-terabyte tables, break up the full load in file sizes of 250–500GB.

See also

3.1.7 - Create an optional sample query script

The purpose of a sample query script is to test your schema and load scripts for errors.

The purpose of a sample query script is to test your schema and load scripts for errors.

Include a sample of queries your users are likely to run against the database. If you don't have any real queries, just write simple SQL that collects counts on each of your tables. Alternatively, you can skip this step.

3.1.8 - Create an empty database

Two options are available for creating an empty database:.

Two options are available for creating an empty database:

Although you can create more than one database (for example, one for production and one for testing), there can be only one active database for each installation of Vertica Analytic Database.

3.1.8.1 - Creating a database name and password

Database names must conform to the following rules:.

Database names

Database names must conform to the following rules:

  • Be between 1-30 characters

  • Begin with a letter

  • Follow with any combination of letters (upper and lowercase), numbers, and/or underscores.

Database names are case sensitive; however, Vertica strongly recommends that you do not create databases with names that differ only in case. For example, do not create a database called mydatabase and another called MyDataBase.

Database passwords

Database passwords can contain letters, digits, and special characters listed in the next table. Passwords cannot include non-ASCII Unicode characters.

The allowed password length is between 0-100 characters. The database superuser can change a Vertica user's maximum password length using ALTER PROFILE.

You use Profiles to specify and control password definitions. For instance, a profile can define the maximum length, reuse time, and the minimum number or required digits for a password, as well as other details.

The following special (ASCII) characters are valid in passwords. Special characters can appear anywhere in a password string. For example, mypas$word or $mypassword are both valid, while ±mypassword is not. Using special characters other than the ones listed below can cause database instability.

  • #

  • ?

  • =

  • _

  • '

  • )

  • (

  • @

  • \

  • /

  • !

  • ,

  • ~

  • :

  • %

  • ;

  • `

  • ^

  • +

  • .

  • -

  • space

  • &

  • <

  • >

  • [

  • ]

  • {

  • }

  • |

  • *

  • $

  • "

See also

3.1.8.2 - Create a database using administration tools

Run the from your as follows:.
  1. Run the Administration tools from your Administration host as follows:

    $ /opt/vertica/bin/admintools
    

    If you are using a remote terminal application, such as PuTTY or a Cygwin bash shell, see Notes for remote terminal users.

  2. Accept the license agreement and specify the location of your license file. For more information see Managing licenses for more information.

    This step is necessary only if it is the first time you have run the Administration Tools

  3. On the Main Menu, click Configuration Menu, and click OK.

  4. On the Configuration Menu, click Create Database, and click OK.

  5. Enter the name of the database and an optional comment, and click OK. See Creating a database name and password for naming guidelines and restrictions.

  6. Establish the superuser password for your database.

    • To provide a password enter the password and click OK. Confirm the password by entering it again, and then click OK.

    • If you don't want to provide the password, leave it blank and click OK. If you don't set a password, Vertica prompts you to verify that you truly do not want to establish a superuser password for this database. Click Yes to create the database without a password or No to establish the password.

  7. Select the hosts to include in the database from the list of hosts specified when Vertica was installed ( install_vertica -s), and click OK.

  8. Specify the directories in which to store the data and catalog files, and click OK.

  9. Catalog and data path names must contain only alphanumeric characters and cannot have leading spaces. Failure to comply with these restrictions results in database creation failure.

    For example:

    Catalog pathname: /home/dbadmin

    Data Pathname: /home/dbadmin

  10. Review the Current Database Definition screen to verify that it represents the database you want to create, and then click Yes to proceed or No to modify the database definition.

  11. If you click Yes, Vertica creates the database you defined and then displays a message to indicate that the database was successfully created.

  12. Click OK to acknowledge the message.

3.1.9 - Create the logical schema

Connect to the database.
  1. Connect to the database.

    In the Administration Tools Main Menu, click Connect to Database and click OK.

    See Connecting to the Database for details.

    The vsql welcome script appears:

    Welcome to vsql, the Vertica Analytic Database interactive terminal.
    Type:  \h or \? for help with vsql commands
           \g or terminate with semicolon to execute query
           \q to quit
    
    =>
    
  2. Run the logical schema script

    Using the \i meta-command in vsql to run the SQL logical schema script that you prepared earlier.

  3. Disconnect from the database

    Use the \q meta-command in vsql to return to the Administration Tools.

3.1.10 - Perform a partial data load

Vertica recommends that for large tables, you perform a partial data load and then test your database before completing a full data load.

Vertica recommends that for large tables, you perform a partial data load and then test your database before completing a full data load. This load should load a representative amount of data.

  1. Load the small tables.

    Load the small table data files using the SQL load scripts and data files you prepared earlier.

  2. Partially load the large tables.

    Load 10GB to 50GB of table data for each table using the SQL load scripts and data files that you prepared earlier.

For more information about projections, see Projections.

3.1.11 - Test the database

Test the database to verify that it is running as expected.

Test the database to verify that it is running as expected.

Check queries for syntax errors and execution times.

  1. Use the vsql \timing meta-command to enable the display of query execution time in milliseconds.

  2. Execute the SQL sample query script that you prepared earlier.

  3. Execute several ad hoc queries.

3.1.12 - Optimize query performance

Optimizing the database consists of optimizing for compression and tuning for queries.

Optimizing the database consists of optimizing for compression and tuning for queries. (See Creating a database design.)

To optimize the database, use the Database Designer to create and deploy a design for optimizing the database. See Using Database Designer to create a comprehensive design.

After you run the Database Designer, use the techniques described in Query optimization to improve the performance of certain types of queries.

3.1.13 - Complete the data load

To complete the load:.

To complete the load:

  1. Monitor system resource usage.

    Continue to run the top, free, and df utilities and watch them while your load scripts are running (as described in Monitoring Linux resource usage). You can do this on any or all nodes in the cluster. Make sure that the system is not swapping excessively (watch kswapd in top) or running out of swap space (watch for a large amount of used swap space in free).

  2. Complete the large table loads.

    Run the remainder of the large table load scripts.

3.1.14 - Test the optimized database

Check query execution times to test your optimized design:.

Check query execution times to test your optimized design:

  1. Use the vsql \timing meta-command to enable the display of query execution time in milliseconds.

    Execute a SQL sample query script to test your schema and load scripts for errors.

  2. Execute several ad hoc queries

    1. Run Administration tools and select Connect to Database.

    2. Use the \i meta-command to execute the query script; for example:

      vmartdb=> \i vmart_query_03.sql  customer_name   | annual_income
      ------------------+---------------
       James M. McNulty |        999979
       Emily G. Vogel   |        999998
      (2 rows)
      Time: First fetch (2 rows): 58.411 ms. All rows formatted: 58.448 ms
      vmartdb=> \i vmart_query_06.sql
       store_key | order_number | date_ordered
      -----------+--------------+--------------
              45 |       202416 | 2004-01-04
             113 |        66017 | 2004-01-04
             121 |       251417 | 2004-01-04
              24 |       250295 | 2004-01-04
               9 |       188567 | 2004-01-04
             166 |        36008 | 2004-01-04
              27 |       150241 | 2004-01-04
             148 |       182207 | 2004-01-04
             198 |        75716 | 2004-01-04
      (9 rows)
      Time: First fetch (9 rows): 25.342 ms. All rows formatted: 25.383 ms
      

Once the database is optimized, it should run queries efficiently. If you discover queries that you want to optimize, you can modify and update the design incrementally.

3.1.15 - Implement locales for international data sets

Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax.

Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate GROUP BY and ORDER BY clauses, joins, and the analytic ORDER BY clause.

The default locale for a Vertica database is en_US@collation=binary (English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default en_US@collation=binary collation, regardless of the session collation. Any locale-specific collation is applied at query time.

If you set the locale to null, Vertica sets the locale to en_US_POSIX. You can set the locale back to the default locale and collation by issuing the vsql meta-command \locale. For example:

You can set locale through ODBC, JDBC, and ADO.net.

ICU locale support

Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax. The locale used by the database session is not derived from the operating system (through the LANG variable), so Vertica recommends that you set the LANG for each node running vsql, as described in the next section.

While ICU library services can specify collation, currency, and calendar preferences, Vertica supports only the collation component. Any keywords not relating to collation are rejected. Projections are always collated using the en_US@collation=binary collation regardless of the session collation. Any locale-specific collation is applied at query time.

The SET DATESTYLE TO ... command provides some aspects of the calendar, but Vertica supports only dollars as currency.

Changing DB locale for a session

This examples sets the session locale to Thai.

  1. At the operating-system level for each node running vsql, set the LANG variable to the locale language as follows:

    export LANG=th_TH.UTF-8
    
  2. For each Vertica session (from ODBC/JDBC or vsql) set the language locale.

    From vsql:

    \locale th_TH
    
  3. From ODBC/JDBC:

    "SET LOCALE TO th_TH;"
    
  4. In PUTTY (or ssh terminal), change the settings as follows:

    settings > window > translation > UTF-8
    
  5. Click Apply and then click Save.

All data loaded must be in UTF-8 format, not an ISO format, as described in Delimited data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8, are not supported, so functions like SUBSTRING do not work correctly for multibyte characters. Thus, settings for locale should not work correctly. If the translation setting ISO-8859-11:2001 (Latin/Thai) works, the data is loaded incorrectly. To convert data correctly, use a utility program such as Linux iconv.

See also

3.1.15.1 - Specify the default locale for the database

After you start the database, the default locale configuration parameter, DefaultSessionLocale, sets the initial locale.

After you start the database, the default locale configuration parameter, DefaultSessionLocale, sets the initial locale. You can override this value for individual sessions.

To set the locale for the database, use the configuration parameter as follows:

=> ALTER DATABASE DEFAULT SET DefaultSessionLocale = 'ICU-locale-identifier';

For example:

=> ALTER DATABASE DEFAULT SET DefaultSessionLocale = 'en_GB';

3.1.15.2 - Override the default locale for a session

You can override the default locale for the current session in two ways:.

You can override the default locale for the current session in two ways:

  • VSQL command \locale. For example:

    => \locale en_GBINFO:
    INFO 2567:  Canonical locale: 'en_GB'
    Standard collation: 'LEN'
    English (United Kingdom)
    
  • SQL statement SET LOCALE. For example:

    
    => SET LOCALE TO en_GB;
    INFO 2567:  Canonical locale: 'en_GB'
    Standard collation: 'LEN'
    English (United Kingdom)
    

Both methods accept locale short and long forms. For example:

=> SET LOCALE TO LEN;
INFO 2567:  Canonical locale: 'en'
Standard collation: 'LEN'
English

=> \locale LEN
INFO 2567:  Canonical locale: 'en'
Standard collation: 'LEN'
English

See also

3.1.15.3 - Server versus client locale settings

Vertica differentiates database server locale settings from client application locale settings:.

Vertica differentiates database server locale settings from client application locale settings:

  • Server locale settings only impact collation behavior for server-side query processing.

  • Client applications verify that locale is set appropriately in order to display characters correctly.

The following sections describe best practices to ensure predictable results.

Server locale

The server session locale should be set as described in Specify the default locale for the database. If locales vary across different sessions, set the server locale at the start of each session from your client.

vsql client

  • If the database does not have a default session locale, set the server locale for the session to the desired locale.

  • The locale setting in the terminal emulator where the vsql client runs should be set to be equivalent to session locale setting on the server side (ICU locale). By doing so, the data is collated correctly on the server and displayed correctly on the client.

  • All input data for vsql should be in UTF-8, and all output data is encoded in UTF-8

  • Vertica does not support non UTF-8 encodings and associated locale values; .

  • For instructions on setting locale and encoding, refer to your terminal emulator documentation.

ODBC clients

  • ODBC applications can be either in ANSI or Unicode mode. If the user application is Unicode, the encoding used by ODBC is UCS-2. If the user application is ANSI, the data must be in single-byte ASCII, which is compatible with UTF-8 used on the database server. The ODBC driver converts UCS-2 to UTF-8 when passing to the Vertica server and converts data sent by the Vertica server from UTF-8 to UCS-2.

  • If the user application is not already in UCS-2, the application must convert the input data to UCS-2, or unexpected results could occur. For example:

    • For non-UCS-2 data passed to ODBC APIs, when it is interpreted as UCS-2, it could result in an invalid UCS-2 symbol being passed to the APIs, resulting in errors.

    • The symbol provided in the alternate encoding could be a valid UCS-2 symbol. If this occurs, incorrect data is inserted into the database.

  • If the database does not have a default session locale, ODBC applications should set the desired server session locale using SQLSetConnectAttr (if different from database wide setting). By doing so, you get the expected collation and string functions behavior on the server.

JDBC and ADO.NET clients

  • JDBC and ADO.NET applications use a UTF-16 character set encoding and are responsible for converting any non-UTF-16 encoded data to UTF-16. The same cautions apply as for ODBC if this encoding is violated.

  • The JDBC and ADO.NET drivers convert UTF-16 data to UTF-8 when passing to the Vertica server and convert data sent by Vertica server from UTF-8 to UTF-16.

  • If there is no default session locale at the database level, JDBC and ADO.NET applications should set the correct server session locale by executing the SET LOCALE TO command in order to get the expected collation and string functions behavior on the server. For more information, see SET LOCALE.

3.1.16 - Using time zones with Vertica

Vertica uses the public-domain tz database (time zone database), which contains code and data that represent the history of local time for locations around the globe.

Vertica uses the public-domain tz database (time zone database), which contains code and data that represent the history of local time for locations around the globe. This database organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree on timestamps that are later than the POSIX Epoch (1970-01-01 00:00:00 UTC). Each timezone has a unique identifier. Identifiers typically follow the convention area/location, where area is a continent or ocean, and location is a specific location within the area—for example, Africa/Cairo, America/New_York, and Pacific/Honolulu.

Vertica uses the TZ environment variable (if set) on each node for the default current time zone. Otherwise, Vertica uses the operating system time zone.

The TZ variable can be set by the operating system during login (see /etc/profile, /etc/profile.d, or /etc/bashrc) or by the user in .profile, .bashrc or .bash-profile. TZ must be set to the same value on each node when you start Vertica.

The following command returns the current time zone for your database:

=> SHOW TIMEZONE;
   name   |     setting
----------+------------------
 timezone | America/New_York
(1 row)

You can also set the time zone for a single session with SET TIME ZONE.

Conversion and storage of date/time data

There is no database default time zone. TIMESTAMPTZ (TIMESTAMP WITH TIMEZONE) data is converted from the current local time and stored as GMT/UTC (Greenwich Mean Time/Coordinated Universal Time).

When TIMESTAMPTZ data is used, data is converted back to the current local time zone, which might be different from the local time zone where the data was stored. This conversion takes into account daylight saving time (summer time), depending on the year and date to determine when daylight saving time begins and ends.

TIMESTAMP WITHOUT TIMEZONE data stores the timestamp as given, and retrieves it exactly as given. The current time zone is ignored. The same is true for TIME WITHOUT TIMEZONE. For TIME WITH TIMEZONE (TIMETZ), however, the current time zone setting is stored along with the given time, and that time zone is used on retrieval.

Querying data/time data

TIMESTAMPTZ uses the current time zone on both input and output, as in the following example:

=> CREATE TEMP TABLE s (tstz TIMESTAMPTZ);=> SET TIMEZONE TO 'America/New_York';
=> INSERT INTO s VALUES ('2009-02-01 00:00:00');
=> INSERT INTO s VALUES ('2009-05-12 12:00:00');
=> SELECT tstz AS 'Local timezone', tstz AT TIMEZONE 'America/New_York' AS 'America/New_York',
   tstz AT TIMEZONE 'GMT' AS 'GMT' FROM s;
     Local timezone     |  America/New_York   |         GMT
------------------------+---------------------+---------------------
 2009-02-01 00:00:00-05 | 2009-02-01 00:00:00 | 2009-02-01 05:00:00
 2009-05-12 12:00:00-04 | 2009-05-12 12:00:00 | 2009-05-12 16:00:00
(2 rows)

The -05 in the Local time zone column shows that the data is displayed in EST, while -04 indicates EDT. The other two columns show the TIMESTAMP WITHOUT TIMEZONE at the specified time zone.

The next example shows what happens if the current time zone is changed to GMT:

=> SET TIMEZONE TO 'GMT';=> SELECT tstz AS 'Local timezone', tstz AT TIMEZONE 'America/New_York' AS
   'America/New_York', tstz AT TIMEZONE 'GMT' as 'GMT' FROM s;
     Local timezone     |  America/New_York   |         GMT
------------------------+---------------------+---------------------
 2009-02-01 05:00:00+00 | 2009-02-01 00:00:00 | 2009-02-01 05:00:00
 2009-05-12 16:00:00+00 | 2009-05-12 12:00:00 | 2009-05-12 16:00:00
(2 rows)

The +00 in the Local time zone column indicates that TIMESTAMPTZ is displayed in GMT.

The approach of using TIMESTAMPTZ fields to record events captures the GMT of the event, as expressed in terms of the local time zone. Later, it allows for easy conversion to any other time zone, either by setting the local time zone or by specifying an explicit AT TIMEZONE clause.

The following example shows how TIMESTAMP WITHOUT TIMEZONE fields work in Vertica.

=> CREATE TEMP TABLE tnoz (ts TIMESTAMP);=> INSERT INTO tnoz VALUES('2009-02-01 00:00:00');
=> INSERT INTO tnoz VALUES('2009-05-12 12:00:00');
=> SET TIMEZONE TO 'GMT';
=> SELECT ts AS 'No timezone', ts AT TIMEZONE 'America/New_York' AS
   'America/New_York', ts AT TIMEZONE 'GMT' AS 'GMT' FROM tnoz;
      No timezone    |    America/New_York    |          GMT
---------------------+------------------------+------------------------
 2009-02-01 00:00:00 | 2009-02-01 05:00:00+00 | 2009-02-01 00:00:00+00
 2009-05-12 12:00:00 | 2009-05-12 16:00:00+00 | 2009-05-12 12:00:00+00
(2 rows)

The +00 at the end of a timestamp indicates that the setting is TIMESTAMP WITH TIMEZONE in GMT (the current time zone). The America/New_York column shows what the GMT setting was when you recorded the time, assuming you read a normal clock in the America/New_York time zone. What this shows is that if it is midnight in the America/New_York time zone, then it is 5 am GMT.

The GMT column displays the GMT time, assuming the input data was captured in GMT.

If you don't set the time zone to GMT, and you use another time zone, for example America/New_York, then the results display in America/New_York with a -05 and -04, showing the difference between that time zone and GMT.

=> SET TIMEZONE TO 'America/New_York';
=> SHOW TIMEZONE;
    name   |     setting
 ----------+------------------
  timezone | America/New_York
 (1 row)
=> SELECT ts AS 'No timezone', ts AT TIMEZONE 'America/New_York' AS
   'America/New_York', ts AT TIMEZONE 'GMT' AS 'GMT' FROM tnoz;
      No timezone    |    America/New_York    |          GMT
---------------------+------------------------+------------------------
 2009-02-01 00:00:00 | 2009-02-01 00:00:00-05 | 2009-01-31 19:00:00-05
 2009-05-12 12:00:00 | 2009-05-12 12:00:00-04 | 2009-05-12 08:00:00-04
(2 rows)

In this case, the last column is interesting in that it returns the time in New York, given that the data was captured in GMT.

See also

3.1.17 - Change transaction isolation levels

By default, Vertica uses the READ COMMITTED isolation level for all sessions.

By default, Vertica uses the READ COMMITTED isolation level for all sessions. You can change the default isolation level for the database or for a given session.

A transaction retains its isolation level until it completes, even if the session's isolation level changes during the transaction. Vertica internal processes (such as the Tuple Mover and refresh operations) and DDL operations always run at the SERIALIZABLE isolation level to ensure consistency.

Database isolation level

The configuration parameter TransactionIsolationLevel specifies the database isolation level, and is used as the default for all sessions. Use ALTER DATABASE to change the default isolation level.For example:

=> ALTER DATABASE DEFAULT SET TransactionIsolationLevel = 'SERIALIZABLE';
ALTER DATABASE
=> ALTER DATABASE DEFAULT SET TransactionIsolationLevel = 'READ COMMITTED';
ALTER DATABASE

Changes to the database isolation level only apply to future sessions. Existing sessions and their transactions continue to use their original isolation level.

Use SHOW CURRENT to view the database isolation level:

=> SHOW CURRENT TransactionIsolationLevel;
  level   |           name            |    setting
----------+---------------------------+----------------
 DATABASE | TransactionIsolationLevel | READ COMMITTED
(1 row)

Session isolation level

SET SESSION CHARACTERISTICS AS TRANSACTION changes the isolation level for a specific session. For example:

=> SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET

Use SHOW to view the current session's isolation level:

=> SHOW TRANSACTION_ISOLATION;

See also

Transactions

3.2 - Configuration parameter management

For details about individual configuration parameters grouped by category, see Configuration Parameters.

Vertica supports a wide variety of configuration parameters that affect many facets of database behavior. These parameters can be set with the appropriate ALTER statements at one or more levels, listed here in descending order of precedence:

  1. User (ALTER USER)

  2. Session (ALTER SESSION)

  3. Node (ALTER NODE)

  4. Database (ALTER DATABASE)

Not all parameters can be set at all levels. Consult the documentation of individual parameters for restrictions.

You can query the CONFIGURATION_PARAMETERS system table to obtain the current settings for all user-accessible parameters. For example, the following query returns settings for partitioning parameters: their current and default values, which levels they can be set at, and whether changes require a database restart to take effect:

=> SELECT parameter_name, current_value, default_value, allowed_levels, change_requires_restart
      FROM configuration_parameters  WHERE parameter_name ILIKE '%partitioncount%';
    parameter_name    | current_value | default_value | allowed_levels | change_requires_restart
----------------------+---------------+---------------+----------------+-------------------------
 MaxPartitionCount    | 1024          | 1024          | NODE, DATABASE | f
 ActivePartitionCount | 1             | 1             | NODE, DATABASE | f
(2 rows)

For details about individual configuration parameters grouped by category, see Configuration parameters.

Setting and clearing configuration parameters

You change specific configuration parameters with the appropriate ALTER statements; the same statements also let you reset configuration parameters to their default values. For example, the following ALTER statements change ActivePartitionCount at the database level from 1 to 2 , and DisablePartitionCount at the session level from 0 to 1:

=> ALTER DATABASE DEFAULT SET ActivePartitionCount = 2;
ALTER DATABASE
=> ALTER SESSION SET DisableAutopartition = 1;
ALTER SESSION
=> SELECT parameter_name, current_value, default_value FROM configuration_parameters
      WHERE parameter_name IN ('ActivePartitionCount', 'DisableAutopartition');
    parameter_name    | current_value | default_value
----------------------+---------------+---------------
 ActivePartitionCount | 2             | 1
 DisableAutopartition | 1             | 0
(2 rows)

You can later reset the same configuration parameters to their default values:

=> ALTER DATABASE DEFAULT CLEAR ActivePartitionCount;
ALTER DATABASE
=> ALTER SESSION CLEAR DisableAutopartition;
ALTER DATABASE
=> SELECT parameter_name, current_value, default_value FROM configuration_parameters
      WHERE parameter_name IN ('ActivePartitionCount', 'DisableAutopartition');
    parameter_name    | current_value | default_value
----------------------+---------------+---------------
 DisableAutopartition | 0             | 0
 ActivePartitionCount | 1             | 1
(2 rows)

3.2.1 - Viewing configuration parameter values

You can view active configuration parameter values in two ways:.

You can view active configuration parameter values in two ways:

SHOW statements

Use the following SHOW statements to view active configuration parameters:

  • SHOW CURRENT: Returns settings of active configuration parameter values. Vertica checks settings at all levels, in the following ascending order of precedence:

    • session

    • node

    • database

    If no values are set at any scope, SHOW CURRENT returns the parameter's default value.

  • SHOW DATABASE: Displays configuration parameter values set for the database.

  • SHOW USER: Displays configuration parameters set for the specified user, and for all users.

  • SHOW SESSION: Displays configuration parameter values set for the current session.

  • SHOW NODE: Displays configuration parameter values set for a node.

If a configuration parameter requires a restart to take effect, the values in a SHOW CURRENT statement might differ from values in other SHOW statements. To see which parameters require restart, query the CONFIGURATION_PARAMETERS system table.

System tables

You can query several system tables for configuration parameters:

3.3 - Designing a logical schema

Designing a logical schema for a Vertica database is the same as designing for any other SQL database.

Designing a logical schema for a Vertica database is the same as designing for any other SQL database. A logical schema consists of objects such as schemas, tables, views and referential Integrity constraints that are visible to SQL users. Vertica supports any relational schema design that you choose.

3.3.1 - Using multiple schemas

Using a single schema is effective if there is only one database user or if a few users cooperate in sharing the database.

Using a single schema is effective if there is only one database user or if a few users cooperate in sharing the database. In many cases, however, it makes sense to use additional schemas to allow users and their applications to create and access tables in separate namespaces. For example, using additional schemas allows:

  • Many users to access the database without interfering with one another.

    Individual schemas can be configured to grant specific users access to the schema and its tables while restricting others.

  • Third-party applications to create tables that have the same name in different schemas, preventing table collisions.

Unlike other RDBMS, a schema in a Vertica database is not a collection of objects bound to one user.

3.3.1.1 - Multiple schema examples

This section provides examples of when and how you might want to use multiple schemas to separate database users.

This section provides examples of when and how you might want to use multiple schemas to separate database users. These examples fall into two categories: using multiple private schemas and using a combination of private schemas (i.e. schemas limited to a single user) and shared schemas (i.e. schemas shared across multiple users).

Using multiple private schemas

Using multiple private schemas is an effective way of separating database users from one another when sensitive information is involved. Typically a user is granted access to only one schema and its contents, thus providing database security at the schema level. Database users can be running different applications, multiple copies of the same application, or even multiple instances of the same application. This enables you to consolidate applications on one database to reduce management overhead and use resources more effectively. The following examples highlight using multiple private schemas.

Using multiple schemas to separate users and their unique applications

In this example, both database users work for the same company. One user (HRUser) uses a Human Resource (HR) application with access to sensitive personal data, such as salaries, while another user (MedUser) accesses information regarding company healthcare costs through a healthcare management application. HRUser should not be able to access company healthcare cost information and MedUser should not be able to view personal employee data.

To grant these users access to data they need while restricting them from data they should not see, two schemas are created with appropriate user access, as follows:

  • HRSchema—A schema owned by HRUser that is accessed by the HR application.

  • HealthSchema—A schema owned by MedUser that is accessed by the healthcare management application.

Using multiple schemas to support multitenancy

This example is similar to the last example in that access to sensitive data is limited by separating users into different schemas. In this case, however, each user is using a virtual instance of the same application.

An example of this is a retail marketing analytics company that provides data and software as a service (SaaS) to large retailers to help them determine which promotional methods they use are most effective at driving customer sales.

In this example, each database user equates to a retailer, and each user only has access to its own schema. The retail marketing analytics company provides a virtual instance of the same application to each retail customer, and each instance points to the user’s specific schema in which to create and update tables. The tables in these schemas use the same names because they are created by instances of the same application, but they do not conflict because they are in separate schemas.

Example of schemas in this database could be:

  • MartSchema—A schema owned by MartUser, a large department store chain.

  • PharmSchema—A schema owned by PharmUser, a large drug store chain.

Using multiple schemas to migrate to a newer version of an application

Using multiple schemas is an effective way of migrating to a new version of a software application. In this case, a new schema is created to support the new version of the software, and the old schema is kept as long as necessary to support the original version of the software. This is called a “rolling application upgrade.”

For example, a company might use a HR application to store employee data. The following schemas could be used for the original and updated versions of the software:

  • HRSchema—A schema owned by HRUser, the schema user for the original HR application.

  • V2HRSchema—A schema owned by V2HRUser, the schema user for the new version of the HR application.

Combining private and shared schemas

The previous examples illustrate cases in which all schemas in the database are private and no information is shared between users. However, users might want to share common data. In the retail case, for example, MartUser and PharmUser might want to compare their per store sales of a particular product against the industry per store sales average. Since this information is an industry average and is not specific to any retail chain, it can be placed in a schema on which both users are granted USAGE privileges.

Example of schemas in this database might be:

  • MartSchema—A schema owned by MartUser, a large department store chain.

  • PharmSchema—A schema owned by PharmUser, a large drug store chain.

  • IndustrySchema—A schema owned by DBUser (from the retail marketing analytics company) on which both MartUser and PharmUser have USAGE privileges. It is unlikely that retailers would be given any privileges beyond USAGE on the schema and SELECT on one or more of its tables.

3.3.1.2 - Creating schemas

You can create as many schemas as necessary for your database.

You can create as many schemas as necessary for your database. For example, you could create a schema for each database user. However, schemas and users are not synonymous as they are in Oracle.

By default, only a superuser can create a schema or give a user the right to create a schema. (See GRANT (database) in the SQL Reference Manual.)

To create a schema use the CREATE SCHEMA statement, as described in the SQL Reference Manual.

3.3.1.3 - Specifying objects in multiple schemas

Once you create two or more schemas, each SQL statement or function must identify the schema associated with the object you are referencing.

Once you create two or more schemas, each SQL statement or function must identify the schema associated with the object you are referencing. You can specify an object within multiple schemas by:

  • Qualifying the object name by using the schema name and object name separated by a dot. For example, to specify MyTable, located in Schema1, qualify the name as Schema1.MyTable.

  • Using a search path that includes the desired schemas when a referenced object is unqualified. By Setting search paths, Vertica will automatically search the specified schemas to find the object.

3.3.1.4 - Setting search paths

Each user session has a search path of schemas.

Each user session has a search path of schemas. Vertica uses this search path to find tables and user-defined functions (UDFs) that are unqualified by their schema name. A session search path is initially set from the user's profile. You can change the session's search path at any time by calling SET SEARCH_PATH. This search path remains in effect until the next SET SEARCH_PATH statement, or the session ends.

Viewing the current search path

SHOW SEARCH_PATH returns the session's current search path. For example:


=> SHOW SEARCH_PATH;
    name     |                      setting
-------------+---------------------------------------------------
 search_path | "$user", public, v_catalog, v_monitor, v_internal

Schemas are listed in descending order of precedence. The first schema has the highest precedence in the search order. If this schema exists, it is also defined as the current schema, which is used for tables that are created with unqualified names. You can identify the current schema by calling the function CURRENT_SCHEMA:

=> SELECT CURRENT_SCHEMA;
 current_schema
----------------
 public
(1 row)

Setting the user search path

A session search path is initially set from the user's profile. If the search path in a user profile is not set by CREATE USER or ALTER USER, it is set to the database default:

=> CREATE USER agent007;
CREATE USER
=> \c - agent007
You are now connected as user "agent007".
=> SHOW SEARCH_PATH;
    name     |                      setting
-------------+---------------------------------------------------
 search_path | "$user", public, v_catalog, v_monitor, v_internal

$user resolves to the session user name—in this case, agent007—and has the highest precedence. If a schema agent007, exists, Vertica begins searches for unqualified tables in that schema. Also, calls to CURRENT_SCHEMA return this schema. Otherwise, Vertica uses public as the current schema and begins searches in it.

Use ALTER USER to modify an existing user's search path. These changes overwrite all non-system schemas in the search path, including $USER. System schemas are untouched. Changes to a user's search path take effect only when the user starts a new session; current sessions are unaffected.

For example, the following statements modify agent007's search path, and grant access privileges to schemas and tables that are on the new search path:

=> ALTER USER agent007 SEARCH_PATH store, public;
ALTER USER
=> GRANT ALL ON SCHEMA store, public TO agent007;
GRANT PRIVILEGE
=> GRANT SELECT ON ALL TABLES IN SCHEMA store, public TO agent007;
GRANT PRIVILEGE
=> \c - agent007
You are now connected as user "agent007".
=> SHOW SEARCH_PATH;
    name     |                     setting
-------------+-------------------------------------------------
 search_path | store, public, v_catalog, v_monitor, v_internal
(1 row)

To verify a user's search path, query the system table USERS:

=> SELECT search_path FROM USERS WHERE user_name='agent007';
                   search_path
-------------------------------------------------
 store, public, v_catalog, v_monitor, v_internal
(1 row)

To revert a user's search path to the database default settings, call ALTER USER and set the search path to DEFAULT. For example:


=> ALTER USER agent007 SEARCH_PATH DEFAULT;
ALTER USER
=> SELECT search_path FROM USERS WHERE user_name='agent007';
                    search_path
---------------------------------------------------
 "$user", public, v_catalog, v_monitor, v_internal
(1 row)

Ignored search path schemas

Vertica only searches among existing schemas to which the current user has access privileges. If a schema in the search path does not exist or the user lacks access privileges to it, Vertica silently excludes it from the search. For example, if agent007 lacks SELECT privileges to schema public, Vertica silently skips this schema. Vertica returns with an error only if it cannot find the table anywhere on the search path.

Setting session search path

Vertica initially sets a session's search path from the user's profile. You can change the current session's search path with SET SEARCH_PATH. You can use SET SEARCH_PATH in two ways:

  • Explicitly set the session search path to one or more schemas. For example:

    
    => \c - agent007
    You are now connected as user "agent007".
    dbadmin=> SHOW SEARCH_PATH;
        name     |                      setting
    -------------+---------------------------------------------------
     search_path | "$user", public, v_catalog, v_monitor, v_internal
    (1 row)
    
    => SET SEARCH_PATH TO store, public;
    SET
    => SHOW SEARCH_PATH;
        name     |                     setting
    -------------+-------------------------------------------------
     search_path | store, public, v_catalog, v_monitor, v_internal
    (1 row)
    
  • Set the session search path to the database default:

    
    => SET SEARCH_PATH TO DEFAULT;
    SET
    => SHOW SEARCH_PATH;
        name     |                      setting
    -------------+---------------------------------------------------
     search_path | "$user", public, v_catalog, v_monitor, v_internal
    (1 row)
    

SET SEARCH_PATH overwrites all non-system schemas in the search path, including $USER. System schemas are untouched.

3.3.1.5 - Creating objects that span multiple schemas

Vertica supports that reference tables across multiple schemas.

Vertica supports views that reference tables across multiple schemas. For example, a user might need to compare employee salaries to industry averages. In this case, the application queries two schemas:

  • Shared schema IndustrySchema for salary averages

  • Private schema HRSchema for company-specific salary information

Best Practice: When creating objects that span schemas, use qualified table names. This naming convention avoids confusion if the query path or table structure within the schemas changes at a later date.

3.3.2 - Tables in schemas

In Vertica you can create persistent and temporary tables, through CREATE TABLE and CREATE TEMPORARY TABLE, respectively.

In Vertica you can create persistent and temporary tables, through CREATE TABLE and CREATE TEMPORARY TABLE, respectively.

For detailed information on both types, see Creating Tables and Creating temporary tables.

Persistent tables

CREATE TABLE creates a table in the Vertica logical schema. For example:

CREATE TABLE vendor_dimension (
   vendor_key        INTEGER      NOT NULL PRIMARY KEY,
   vendor_name       VARCHAR(64),
   vendor_address    VARCHAR(64),
   vendor_city       VARCHAR(64),
   vendor_state      CHAR(2),
   vendor_region     VARCHAR(32),
   deal_size         INTEGER,
   last_deal_update  DATE

);

For detailed information, see Creating Tables.

Temporary tables

CREATE TEMPORARY TABLE creates a table whose data persists only during the current session. Temporary table data is never visible to other sessions.

Temporary tables can be used to divide complex query processing into multiple steps. Typically, a reporting tool holds intermediate results while reports are generated—for example, the tool first gets a result set, then queries the result set, and so on.

CREATE TEMPORARY TABLE can create tables at two scopes, global and local, through the keywords GLOBAL and LOCAL, respectively:

  • GLOBAL (default): The table definition is visible to all sessions. However, table data is session-scoped.

  • LOCAL: The table definition is visible only to the session in which it is created. When the session ends, Vertica automatically drops the table.

For detailed information, see Creating temporary tables.

3.4 - Creating a database design

A is a physical storage plan that optimizes query performance.

A design is a physical storage plan that optimizes query performance. Data in Vertica is physically stored in projections. When you initially load data into a table using INSERT, COPY (or COPY LOCAL), Vertica creates a default superprojection for the table. This superprojection ensures that all of the data is available for queries. However, these superprojections might not optimize database performance, resulting in slow query performance and low data compression.

To improve performance, create a design for your Vertica database that optimizes query performance and data compression. You can create a design in several ways:

Database Designer can help you minimize how much time you spend on manual database tuning. You can also use Database Designer to redesign the database incrementally as requirements such as workloads change over time.

Database Designer runs as a background process. This is useful if you have a large design that you want to run overnight. An active SSH session is not required, so design and deploy operations continue to run uninterrupted if the session ends.

3.4.1 - About Database Designer

Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.

Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.

During the design process, Database Designer analyzes the logical schema definition, sample data, and sample queries, and creates a physical schema (projections) in the form of a SQL script that you deploy automatically or manually. This script creates a minimal set of superprojections to ensure K-safety.

In most cases, the projections that Database Designer creates provide excellent query performance within physical constraints while using disk space efficiently.

General design options

When you run Database Designer, several general options are available:

  • Create a comprehensive or incremental design.

  • Optimize for query execution, load, or a balance of both.

  • Require K-safety.

  • Recommend unsegmented projections when feasible.

  • Analyze statistics before creating the design.

Design input

Database Designer bases its design on the following information that you provide:

  • Design queries that you typically run during normal database operations.

  • Design tables that contain sample data.

Output

Database Designer yields the following output:

  • Design script that creates the projections for the design in a way that meets the optimization objectives and distributes data uniformly across the cluster.

  • Deployment script that creates and refreshes the projections for your design. For comprehensive designs, the deployment script contains commands that remove non-optimized projections. The deployment script includes the full design script.

  • Backup script that contains SQL statements to deploy the design that existed on the system before deployment. This file is useful in case you need to revert to the pre-deployment design.

Design restrictions

Database Designer-generated designs:

  • Exclude live aggregate or Top-K projections. You must create these manually. See CREATE PROJECTION.

  • Do not sort, segment, or partition projections on LONG VARBINARY and LONG VARCHAR columns.

  • Do not support operations on complex types.

Post-design options

While running Database Designer, you can choose to deploy your design automatically after the deployment script is created, or to deploy it manually, after you have reviewed and tested the design. Vertica recommends that you test the design on a non-production server before deploying the design to your production server.

3.4.2 - How Database Designer creates a design

Database Designer-generated designs can include the following recommendations:.

Design recommendations

Database Designer-generated designs can include the following recommendations:

  • Sort buddy projections in the same order, which can significantly improve load, recovery, and site node performance. All buddy projections have the same base name so that they can be identified as a group.

  • Accepts unlimited queries for a comprehensive design.

  • Identifies similar design queries and assigns them a signature.

    For queries with the same signature, Database Designer weights the queries, depending on how many queries have that signature. It then considers the weighted query when creating a design.

  • Recommends and creates projections in a way that minimizes data skew by distributing data uniformly across the cluster.

  • Produces higher quality designs by considering UPDATE, DELETE, and SELECT statements.

3.4.3 - Database Designer access requirements

By default, only users with the DBADMIN role can run Database Designer.

By default, only users with the DBADMIN role can run Database Designer. Non-DBADMIN users can run Database Designer only if they are granted the necessary privileges and DBDUSER role, as described below. You can also enable users to run Database Designer on the Management Console (see Enabling Users to run Database Designer on Management Console).

  1. Add a temporary folder to all cluster nodes with CREATE LOCATION:

    => CREATE LOCATION '/tmp/dbd' ALL NODES;
    
  2. Grant the desired user CREATE privileges to create schemas on the current (DEFAULT) database, with GRANT DATABASE:

    => GRANT CREATE ON DATABASE DEFAULT TO dbd-user;
    
  3. Grant the DBDUSER role to dbd-user with GRANT ROLE:

    => GRANT DBDUSER TO dbd-user;
    
  4. On all nodes in the cluster, grant dbd-user access to the temporary folder with GRANT LOCATION:

    => GRANT ALL ON LOCATION '/tmp/dbd' TO dbd-user;
    
  5. Grant dbd-user privileges on one or more database schemas and their tables, with GRANT SCHEMA and GRANT TABLE, respectively:

    => GRANT ALL ON SCHEMA this-schema[,...] TO dbd-user;
    => GRANT ALL ON ALL TABLES IN SCHEMA this-schema[,...] TO dbd-user;
    
  6. Enable the DBDUSER role on dbd-user in one of the following ways:

    • As dbd-user, enable the DBDUSER role with SET ROLE:

      => SET ROLE DBDUSER;
      
    • As DBADMIN, automatically enable the DBDUSER role for dbd-user on each login, with ALTER USER:

      => ALTER USER dbd-user DEFAULT ROLE DBDUSER;
      

Enabling users to run Database Designer on Management Console

Users who are already granted the DBDUSER role and required privileges, as described above, can also be enabled to run Database Designer on Management Console:

  1. Log in as a superuser to Management Console.

  2. Click MC Settings.

  3. Click User Management.

  4. Specify an MC user:

    • To create an MC user, click Add.

    • To use an existing MC user, select the user and click Edit.

  5. Next to the DB access level window, click Add.

  6. In the Add Permissions window:

    1. From the Choose a database drop-down list, select the database on which to create a design.

    2. In the Database username field, enter the dbd-user user name that you created earlier.

    3. In the Database password field, enter the database password.

    4. In the Restrict access drop-down list, select the level of MC user for this user.

  7. Click OK to save your changes.

  8. Log out of the MC Super user account.

The MC user is now mapped to dbd-user. Log in as the MC user and use Database Designer to create an optimized design for your database.

DBDUSER capabilities and limitations

As a DBDUSER, the following constraints apply:

  • Designs must set K-safety to be equal to system K-safety. If a design violates K-safety by lacking enough buddy projections for tables, the design does not complete.

  • You cannot explicitly advance the ancient history mark (AHM)—for example, call MAKE_AHM_NOW—until after deploying the design.

When you create a design, you automatically have privileges to manipulate that design. Other tasks might require additional privileges:

Task Required privileges
Submit design tables
  • USAGE on the design table schema

  • OWNER on the design table

Submit a single design query
  • EXECUTE on the design query
Submit a file of design queries
  • READ privilege on the storage location that contains the query file

  • EXECUTE privilege on all queries in the file

Submit design queries from results of a user query
  • EXECUTE privilege on the user queries

  • EXECUTE privilege on each design query retrieved from the results of the user query

Create design and deployment scripts
  • WRITE privilege on the storage location of the design script

  • WRITE privilege on the storage location of the deployment script

3.4.4 - Logging projection data for Database Designer

When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify.

When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify. When you deploy the design, Database Designer creates the design based on these projections. However, space or budget constraints may prevent Database Designer from creating all the proposed projections. In addition, Database Designer may not be able to implement the projections using ideal criteria.

To get information about the projections, first enable the Database Designer logging capability. When enabled, Database Designer stores information about the proposed projections in two Data Collector tables. After Database Designer deploys the design, these logs contain information about which proposed projections were actually created. After deployment, the logs contain information about:

  • Projections that the Optimizer proposed

  • Projections that Database Designer actually created when the design was deployed

  • Projections that Database Designer created, but not with the ideal criteria that the Optimizer identified.

  • The DDL used to create all the projections

  • Column optimizations

If you do not deploy the design immediately, review the log to determine if you want to make any changes. If the design has been deployed, you can still manually create some of the projections that Database Designer did not create.

To enable the Database Designer logging capability, see Enabling logging for Database Designer.

To view the logged information, see Viewing Database Designer logs.

3.4.4.1 - Enabling logging for Database Designer

By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.

By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.

To enable Database Designer logging, enter the following command:

=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 1;

To disable Database Designer logging, enter the following command:

=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 0;

See also

3.4.4.2 - Viewing Database Designer logs

You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:.

You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:

  • DC_DESIGN_PROJECTION_CANDIDATES

  • DC_DESIGN_QUERY_PROJECTION_CANDIDATES

DC_DESIGN_PROJECTION_CANDIDATES

The DC_DESIGN_PROJECTION_CANDIDATES table contains information about all the projections that the Optimizer proposed. This table also includes the DDL that creates them. The is_a_winner field indicates if that projection was part of the actual deployed design. To view the DC_DESIGN_PROJECTION_CANDIDATES table, enter:

=> SELECT *  FROM DC_DESIGN_PROJECTION_CANDIDATES;

DC_DESIGN_QUERY_PROJECTION_CANDIDATES

The DC_DESIGN_QUERY_PROJECTION_CANDIDATES table lists plan features for all design queries.

Possible features are:

  • FULLY DISTRIBUTED JOIN

  • MERGE JOIN

  • GROUPBY PIPE

  • FULLY DISTRIBUTED GROUPBY

  • RLE PREDICATE

  • VALUE INDEX PREDICATE

  • LATE MATERIALIZATION

For all design queries, the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table includes the following plan feature information:

  • Optimizer path cost.

  • Database Designer benefits.

  • Ideal plan feature and its description, which identifies how the referenced projection should be optimized.

  • If the design was deployed, the actual plan feature and its description is included in the table. This information identifies how the referenced projection was actually optimized.

Because most projections have multiple optimizations, each projection usually has multiple rows.To view the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table, enter:

=> SELECT *  FROM DC_DESIGN_QUERY_PROJECTION_CANDIDATES;

To see example data from these tables, see Database Designer logs: example data.

3.4.4.3 - Database Designer logs: example data

In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database.

In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database. The output shows two records from the DC_DESIGN_PROJECTION_CANDIDATES table.

The first record contains information about the customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$ projection. The record includes the CREATE PROJECTION statement that Database Designer used to create the projection. The is_a_winner column is t, indicating that Database Designer created this projection when it deployed the design.

The second record contains information about the product_dimension_dbd_2_sort_$product_version$__$product_key$ projection. For this projection, the is_a_winner column is f. The Optimizer recommended that Database Designer create this projection as part of the design. However, Database Designer did not create the projection when it deployed the design. The log includes the DDL for the CREATE PROJECTION statement. If you want to add the projection manually, you can use that DDL. For more information, see Creating a design manually.

=> SELECT * FROM dc_design_projection_candidates;
-[ RECORD 1 ]--------+---------------------------------------------------------------
time                 | 2014-04-11 06:30:17.918764-07
node_name            | v_vmart_node0001
session_id           | localhost.localdoma-931:0x1b7
user_id              | 45035996273704962
user_name            | dbadmin
design_id            | 45035996273705182
design_table_id      | 45035996273720620
projection_id        | 45035996273726626
iteration_number     | 1
projection_name      | customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."customer_dimension_dbd_1_
            sort_$customer_gender$__$annual_income$"
(
customer_key ENCODING AUTO,
customer_type ENCODING AUTO,
customer_name ENCODING AUTO,
customer_gender ENCODING RLE,
title ENCODING AUTO,
household_id ENCODING AUTO,
customer_address ENCODING AUTO,
customer_city ENCODING AUTO,
customer_state ENCODING AUTO,
customer_region ENCODING AUTO,
marital_status ENCODING AUTO,
customer_age ENCODING AUTO,
number_of_children ENCODING AUTO,
annual_income ENCODING AUTO,
occupation ENCODING AUTO,
largest_bill_amount ENCODING AUTO,
store_membership_card ENCODING AUTO,
customer_since ENCODING AUTO,
deal_stage ENCODING AUTO,
deal_size ENCODING AUTO,
last_deal_update ENCODING AUTO
)
AS
SELECT customer_key,
customer_type,
customer_name,
customer_gender,
title,
household_id,
customer_address,
customer_city,
customer_state,
customer_region,
marital_status,
customer_age,
number_of_children,
annual_income,
occupation,
largest_bill_amount,
store_membership_card,
customer_since,
deal_stage,
deal_size,
last_deal_update
FROM public.customer_dimension
ORDER BY customer_gender,
annual_income
UNSEGMENTED ALL NODES;
is_a_winner          | t
-[ RECORD 2 ]--------+-------------------------------------------------------------
time                 | 2014-04-11 06:30:17.961324-07
node_name            | v_vmart_node0001
session_id           | localhost.localdoma-931:0x1b7
user_id              | 45035996273704962
user_name            | dbadmin
design_id            | 45035996273705182
design_table_id      | 45035996273720624
projection_id        | 45035996273726714
iteration_number     | 1
projection_name      | product_dimension_dbd_2_sort_$product_version$__$product_key$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."product_dimension_dbd_2_
        sort_$product_version$__$product_key$"
(
product_key ENCODING AUTO,
product_version ENCODING RLE,
product_description ENCODING AUTO,
sku_number ENCODING AUTO,
category_description ENCODING AUTO,
department_description ENCODING AUTO,
package_type_description ENCODING AUTO,
package_size ENCODING AUTO,
fat_content ENCODING AUTO,
diet_type ENCODING AUTO,
weight ENCODING AUTO,
weight_units_of_measure ENCODING AUTO,
shelf_width ENCODING AUTO,
shelf_height ENCODING AUTO,
shelf_depth ENCODING AUTO,
product_price ENCODING AUTO,
product_cost ENCODING AUTO,
lowest_competitor_price ENCODING AUTO,
highest_competitor_price ENCODING AUTO,
average_competitor_price ENCODING AUTO,
discontinued_flag ENCODING AUTO
)
AS
SELECT product_key,
product_version,
product_description,
sku_number,
category_description,
department_description,
package_type_description,
package_size,
fat_content,
diet_type,
weight,
weight_units_of_measure,
shelf_width,
shelf_height,
shelf_depth,
product_price,
product_cost,
lowest_competitor_price,
highest_competitor_price,
average_competitor_price,
discontinued_flag
FROM public.product_dimension
ORDER BY product_version,
product_key
UNSEGMENTED ALL NODES;
is_a_winner          | f
.
.
.

The next example shows the contents of two records in the DC_DESIGN_QUERY_PROJECTION_CANDIDATES. Both of these rows apply to projection id 45035996273726626.

In the first record, the Optimizer recommends that Database Designer optimize the customer_gender column for the GROUPBY PIPE algorithm.

In the second record, the Optimizer recommends that Database Designer optimize the public.customer_dimension table for late materialization. Late materialization can improve the performance of joins that might spill to disk.

=> SELECT * FROM dc_design_query_projection_candidates;
-[ RECORD 1 ]-----------------+------------------------------------------------------------
time                           | 2014-04-11 06:30:17.482377-07
node_name                      | v_vmart_node0001
session_id                     | localhost.localdoma-931:0x1b7
user_id                        | 45035996273704962
user_name                      | dbadmin
design_id                      | 45035996273705182
design_query_id                | 3
iteration_number               | 1
design_table_id                | 45035996273720620
projection_id                  | 45035996273726626
ideal_plan_feature             | GROUP BY PIPE
ideal_plan_feature_description | Group-by pipelined on column(s) customer_gender
dbd_benefits                   | 5
opt_path_cost                  | 211
-[ RECORD 2 ]-----------------+------------------------------------------------------------
time                           | 2014-04-11 06:30:17.48276-07
node_name                      | v_vmart_node0001
session_id                     | localhost.localdoma-931:0x1b7
user_id                        | 45035996273704962
user_name                      | dbadmin
design_id                      | 45035996273705182
design_query_id                | 3
iteration_number               | 1
design_table_id                | 45035996273720620
projection_id                  | 45035996273726626
ideal_plan_feature             | LATE MATERIALIZATION
ideal_plan_feature_description | Late materialization on table public.customer_dimension
dbd_benefits                   | 4
opt_path_cost                  | 669
.
.
.

You can view the actual plan features that Database Designer implemented for the projections it created. To do so, query the V_INTERNAL.DC_DESIGN_QUERY_PROJECTIONS table:

=> select * from v_internal.dc_design_query_projections;
-[ RECORD 1 ]-------------------+-------------------------------------------------------------
time                            | 2014-04-11 06:31:41.19199-07
node_name                       | v_vmart_node0001
session_id                      | localhost.localdoma-931:0x1b7
user_id                         | 45035996273704962
user_name                       | dbadmin
design_id                       | 45035996273705182
design_query_id                 | 1
projection_id                   | 2
design_table_id                 | 45035996273720624
actual_plan_feature             | RLE PREDICATE
actual_plan_feature_description | RLE on predicate column(s) department_description
dbd_benefits                    | 2
opt_path_cost                   | 141
-[ RECORD 2 ]-------------------+-------------------------------------------------------------
time                            | 2014-04-11 06:31:41.192292-07
node_name                       | v_vmart_node0001
session_id                      | localhost.localdoma-931:0x1b7
user_id                         | 45035996273704962
user_name                       | dbadmin
design_id                       | 45035996273705182
design_query_id                 | 1
projection_id                   | 2
design_table_id                 | 45035996273720624
actual_plan_feature             | GROUP BY PIPE
actual_plan_feature_description | Group-by pipelined on column(s) fat_content
dbd_benefits                    | 5
opt_path_cost                   | 155

3.4.5 - General design settings

Before you run Database Designer, you must provide specific information on the design to create.

Before you run Database Designer, you must provide specific information on the design to create.

Design name

All designs that you create with Database Designer must have unique names that conform to the conventions described in Identifiers, and are no more than 32 characters long (16 characters if you use Database Designer in Administration Tools or Management Console).

The design name is incorporated into the names of files that Database Designer generates, such as its deployment script. This can help you differentiate files that are associated with different designs.

Design type

Database Designer can create two distinct design types: comprehensive or incremental.

Comprehensive design

A comprehensive design creates an initial or replacement design for all the tables in the specified schemas. Create a comprehensive design when you are creating a new database.

To help Database Designer create an efficient design, load representative data into the tables before you begin the design process. When you load data into a table, Vertica creates an unoptimized superprojection so that Database Designer has projections to optimize. If a table has no data, Database Designer cannot optimize it.

Optionally, supply Database Designer with representative queries that you plan to use so Database Designer can optimize the design for them. If you do not supply any queries, Database Designer creates a generic optimization of the superprojections that minimizes storage, with no query-specific projections.

During a comprehensive design, Database Designer creates deployment scripts that:

  • Create projections to optimize query performance.

  • Create replacement buddy projections when Database Designer changes the encoding of existing projections that it decides to keep.

Incremental design

After you create and deploy a comprehensive database design, your database is likely to change over time in various ways. Consider using Database Designer periodically to create incremental designs that address these changes. Changes that warrant an incremental design can include:

  • Significant data additions or updates

  • New or modified queries that you run regularly

  • Performance issues with one or more queries

  • Schema changes

Optimization objective

Database Designer can optimize the design for one of three objectives:

  • Load: Designs that are optimized for loads minimize database size, potentially at the expense of query performance.
  • Query: Designs that are optimized for query performance. These designs typically favor fast query execution over load optimization, and thus result in a larger storage footprint.
  • Balanced: Designs that are balanced between database size and query performance.

A fully optimized query has an optimization ratio of 0.99. Optimization ratio is the ratio of a query's benefits achieved in the design produced by the Database Designer to that achieved in the ideal plan. The optimization ratio is set in the OptRatio parameter in designer.log.

Design tables

Database Designer needs one or more tables with a moderate amount of sample data—approximately 10 GB—to create optimal designs. Design tables with large amounts of data adversely affect Database Designer performance. Design tables with too little data prevent Database Designer from creating an optimized design. If a design table has no data, Database Designer ignores it.

Design queries

A database design that is optimized for query performance requires a set of representative queries, or design queries. Design queries are required for incremental designs, and optional for comprehensive designs. You list design queries in a SQL file that you supply as input to Database Designer. Database Designer checks the validity of the queries when you add them to your design, and again when it builds the design. If a query is invalid, Database Designer ignores it.

If you use Management Console to create a database design, you can submit queries either from an input file or from the system table QUERY_REQUESTS. For details, see Creating a design manually.

The maximum number of design queries depends on the design type: ≤200 queries for a comprehensive design, ≤100 queries for an incremental design. Optionally, you can assign weights to the design queries that signify their relative importance. Database Designer uses those weights to prioritize the queries in its design.

Segmented and unsegmented projections

When creating a comprehensive design, Database Designer creates projections based on data statistics and queries. It also reviews the submitted design tables to decide whether projections should be segmented (distributed across the cluster nodes) or unsegmented (replicated on all cluster nodes).

By default, Database Designer recommends only segmented projections. You can enable Database Designer to recommend unsegmented projections . In this case, Database Designer recommends segmented superprojections for large tables when deploying to multi-node clusters, and unsegmented superprojections for smaller tables.

Database Designer uses the following algorithm to determine whether to recommend unsegmented projections. Assuming that largest-row-count equals the number of rows in the design table with the largest number of rows, Database Designer recommends unsegmented projections if any of the following conditions is true:

  • largest-row-count< 1,000,000 ANDnumber-table-rows10%-largest-row-count

  • largest-row-count≥ 10,000,000 ANDnumber-table-rows1%-largest-row-count

  • 1,000,000 ≤ largest-row-count< 10,000,000 ANDnumber-table-rows ≤ 100,000

Database Designer does not segment projections on:

  • Single-node clusters

  • LONG VARCHAR and LONG VARBINARY columns

For more information, see High availability with projections.

Statistics analysis

By default, Database Designer analyzes statistics for design tables when they are added to the design. Accurate statistics help Database Designer optimize compression and query performance.

Analyzing statistics takes time and resources. If you are certain that design table statistics are up to date, you can specify to skip this step and avoid the overhead otherwise incurred.

For more information, see Collecting Statistics.

3.4.6 - Building a design

After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.

After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.

When you build a database design, Vertica generates two scripts:

  • Deployment script: design-name_deploy.sql—Contains the SQL statements that create projections for the design you are deploying, deploy the design, and drop unused projections. When the deployment script runs, it creates the optimized design. For details about how to run this script and deploy the design, see Deploying a Design.

  • Design script: design-name_design.sql—Contains the CREATE PROJECTION statements that Database Designeruses to create the design. Review this script to make sure you are happy with the design.

    The design script is a subset of the deployment script. It serves as a backup of the DDL for the projections that the deployment script creates.

When you create a design using Management Console:

  • If you submit a large number of queries to your design and build it right immediately, a timing issue could cause the queries not to load before deployment starts. If this occurs, you might see one of the following errors:

    • No queries to optimize for

    • No tables to design projections for

    To accommodate this timing issue, you may need to reset the design, check the Queries tab to make sure the queries have been loaded, and then rebuild the design. Detailed instructions are in:

  • The scripts are deleted when deployment completes. To save a copy of the deployment script after the design is built but before the deployment completes, go to the Output window and copy and paste the SQL statements to a file.

3.4.7 - Resetting a design

You must reset a design when:.

You must reset a design when:

  • You build a design and the output scripts described in Building a Design are not created.

  • You build a design but Database Designer cannot complete the design because the queries it expects are not loaded.

Resetting a design discards all the run-specific information of the previous Database Designer build, but retains its configuration (design type, optimization objectives, K-safety, etc.) and tables and queries.

After you reset a design, review the design to see what changes you need to make. For example, you can fix errors, change parameters, or check for and add additional tables or queries. Then you can rebuild the design.

You can only reset a design in Management Console or by using the DESIGNER_RESET_DESIGN function.

3.4.8 - Deploying a design

After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.

After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.

Both the design and deployment processes run in the background. This is useful if you have a large design that you want to run overnight. Because an active SSH session is not required, the design/deploy operations continue to run uninterrupted, even if the session is terminated.

Database Designer runs as a background process. Multiple users can run Database Designer concurrently without interfering with each other or using up all the cluster resources. However, if multiple users are deploying a design on the same tables at the same time, Database Designer may not be able to complete the deployment. To avoid problems, consider the following:

  • Schedule potentially conflicting Database Designer processes to run sequentially overnight so that there are no concurrency problems.

  • Avoid scheduling Database Designer runs on the same set of tables at the same time.

There are two ways to deploy your design:

3.4.8.1 - Deploying designs using Database Designer

OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.

OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.

If you choose to allow Database Designer to automatically deploy your script during a comprehensive design and are running Administrative Tools, Database Designer creates a backup script of your database's current design. This script helps you re-create the design of projections that may have been dropped by the new design. The backup script is located in the output directory you specified during the design process.

If you choose not to have Database Designer automatically run the deployment script (for example, if you want to maintain projections from a pre-existing deployment), you can manually run the deployment script later. See Deploying designs manually.

To deploy a design while running Database Designer, do one of the following:

  • In Management Console, select the design and click Deploy Design.

  • In the Administration Tools, select Deploy design in the Design Options window.

If you are running Database Designer programmatically, use DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY and set the deploy parameter to 'true'.

Once you have deployed your design, query the DEPLOY_STATUS system table to see the steps that the deployment took:

vmartdb=> SELECT * FROM V_MONITOR.DEPLOY_STATUS;

3.4.8.2 - Deploying designs manually

If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:.

If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:

  1. Make sure that the target database contains the same tables and projections as the database where you ran Database Designer. The database should also contain sample data.

  2. To deploy the projections to a test or production environment, execute the deployment script in vsql with the meta-command \i as follows, where design-name is the name of the database design:

    => \i design-name_deploy.sql
    
  3. For a K-safe database, call Vertica meta-function GET_PROJECTIONS on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.

  4. If you create projections for tables that already contains data, call REFRESH or START_REFRESH to update new projections. Otherwise, these projections are not available for query processing.

  5. Call MAKE_AHM_NOW to set the Ancient History Mark (AHM) to the most recent epoch.

  6. Call DROP PROJECTION on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.

  7. Call ANALYZE_STATISTICS on all database projections:

    => SELECT ANALYZE_STATISTICS ('');
    

    This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.

3.4.9 - How to create a design

There are three ways to create a design using Database Designer:.

There are three ways to create a design using Database Designer:

The following table shows what Database Designer capabilities are available in each tool:

Database Designer Capability Management Console Running Database Designer Programmatically Administrative Tools
Create design Yes Yes Yes
Design name length (# of characters) 16 32 16
Build design (create design and deployment scripts) Yes Yes Yes
Create backup script Yes
Set design type (comprehensive or incremental) Yes Yes Yes
Set optimization objective Yes Yes Yes
Add design tables Yes Yes Yes
Add design queries file Yes Yes Yes
Add single design query Yes
Use query repository Yes Yes
Set K-safety Yes Yes Yes
Analyze statistics Yes Yes Yes
Require all unsegmented projections Yes Yes
View event history Yes Yes
Set correlation analysis mode (Default = 0) Yes

3.4.9.1 - Using administration tools to create a design

To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user.

To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user. Follow these steps:

  1. Log in as the dbadmin user and start Administration Tools.

  2. From the main menu, start the database for which you want to create a design. The database must be running before you can create a design for it.

  3. On the main menu, select Configuration Menu and click OK.

  4. On the Configuration Menu, select Run Database Designer and click OK.

  5. On the Select a database to design window, enter the name of the database for which you are creating a design and click OK.

  6. On the Enter the directory for Database Designer output window, enter the full path to the directory to contain the design script, deployment script, backup script, and log files, and click OK.

    For information about the scripts, see Building a design.

  7. On the Database Designer window, enter a name for the design and click OK.

  8. On the Design Type window, choose which type of design to create and click OK.

    For details, see Design Types.

  9. The Select schema(s) to add to query search path window lists all the schemas in the database that you selected. Select the schemas that contain representative data that you want Database Designer to consider when creating the design and click OK.

    For details about choosing schema and tables to submit to Database Designer, see Design Tables with Sample Data.

  10. On the Optimization Objectives window, select the objective you want for the database optimization:

  11. The final window summarizes the choices you have made and offers you two choices:

    • Proceed with building the design, and deploying it if you specified to deploy it immediately. If you did not specify to deploy, you can review the design and deployment scripts and deploy them manually, as described in Deploying designs manually.

    • Cancel the design and go back to change some of the parameters as needed.

  12. Creating a design can take a long time.To cancel a running design from the Administration Tools window, enter Ctrl+C.

To create a design for the VMart example database, see Using Database Designer to create a comprehensive design in Getting Started.

3.4.10 - Running Database Designer programmatically

Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality.

Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality. Run Database Designer programmatically to perform the following tasks:

  • Optimize performance on tables that you own.

  • Create or update a design without requiring superuser or DBADMIN intervention.

  • Add individual queries and tables, or add data to your design, and then rerun Database Designer to update the design based on this new information.

  • Customize the design.

  • Use recently executed queries to set up your database to run Database Designer automatically on a regular basis.

  • Assign each design query a query weight that indicates the importance of that query in creating the design. Assign a higher weight to queries that you run frequently so that Database Designer prioritizes those queries in creating the design.

For more details about Database Designer functions, see Database Designer function categories.

3.4.10.1 - Database Designer function categories

Database Designer functions perform the following operations, generally performed in the following order:

  1. Create a design.

  2. Set design properties.

  3. Populate a design.

  4. Create design and deployment scripts.

  5. Get design data.

  6. Clean up.

For detailed information, see Workflow for running Database Designer programmatically. For information on required privileges, see Privileges for running Database Designer functions

Create a design

DESIGNER_CREATE_DESIGN directs Database Designer to create a design.

Set design properties

The following functions let you specify design properties:

Populate a design

The following functions let you add tables and queries to your Database Designer design:

Create design and deployment scripts

The following functions populate the Database Designer workspace and create design and deployment scripts. You can also analyze statistics, deploy the design automatically, and drop the workspace after the deployment:

Reset a design

DESIGNER_RESET_DESIGN discards all the run-specific information of the previous Database Designer build or deployment of the specified design but retains its configuration.

Get design data

The following functions display information about projections and scripts that the Database Designer created:

Cleanup

The following functions cancel any running Database Designer operation or drop a Database Designer design and all its contents:

3.4.10.2 - Workflow for running Database Designer programmatically

The following example shows the steps you take to create a design by running Database Designer programmatically.

The following example shows the steps you take to create a design by running Database Designer programmatically.

Before you run this example, you should have the DBDUSER role, and you should have enabled that role using the SET ROLE DBDUSER command:

  1. Create a table in the public schema:

    => CREATE TABLE T(
       x INT,
       y INT,
       z INT,
       u INT,
       v INT,
       w INT PRIMARY KEY
       );
    
  2. Add data to the table:

    \! perl -e 'for ($i=0; $i<100000; ++$i)   {printf("%d, %d, %d, %d, %d, %d\n", $i/10000, $i/100, $i/10, $i/2, $i, $i);}'
       | vsql -c "COPY T FROM STDIN DELIMITER ',' DIRECT;"
    
  3. Create a second table in the public schema:

    => CREATE TABLE T2(
       x INT,
       y INT,
       z INT,
       u INT,
       v INT,
       w INT PRIMARY KEY
       );
    
  4. Copy the data from table T1 to table T2 and commit the changes:

    => INSERT /*+DIRECT*/ INTO T2 SELECT * FROM T;
    => COMMIT;
    
  5. Create a new design:

    => SELECT DESIGNER_CREATE_DESIGN('my_design');
    

    This command adds information to the DESIGNS system table in the V_MONITOR schema.

  6. Add tables from the public schema to the design :

    => SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t');
    => SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t2');
    

    These commands add information to the DESIGN_TABLES system table.

  7. Create a file named queries.txt in /tmp/examples, or another directory where you have READ and WRITE privileges. Add the following two queries in that file and save it. Database Designer uses these queries to create the design:

    SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
    SELECT DISTINCT w FROM T;
    
  8. Add the queries file to the design and display the results—the numbers of accepted queries, non-design queries, and unoptimizable queries:

    => SELECT DESIGNER_ADD_DESIGN_QUERIES
         ('my_design',
         '/tmp/examples/queries.txt',
         'true'
         );
    

    The results show that both queries were accepted:

    Number of accepted queries                      =2
    Number of queries referencing non-design tables =0
    Number of unsupported queries                   =0
    Number of illegal queries                       =0
    

    The DESIGNER_ADD_DESIGN_QUERIES function populates the DESIGN_QUERIES system table.

  9. Set the design type to comprehensive. (This is the default.) A comprehensive design creates an initial or replacement design for all the design tables:

    => SELECT DESIGNER_SET_DESIGN_TYPE('my_design', 'comprehensive');
    
  10. Set the optimization objective to query. This setting creates a design that focuses on faster query performance, which might recommend additional projections. These projections could result in a larger database storage footprint:

    => SELECT DESIGNER_SET_OPTIMIZATION_OBJECTIVE('my_design', 'query');
    
  11. Create the design and save the design and deployment scripts in /tmp/examples, or another directory where you have READ and WRITE privileges. The following command:

    • Analyzes statistics

    • Doesn't deploy the design.

    • Doesn't drop the design after deployment.

    • Stops if it encounters an error.

    => SELECT DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
       ('my_design',
        '/tmp/examples/my_design_projections.sql',
        '/tmp/examples/my_design_deploy.sql',
        'True',
        'False',
        'False',
        'False'
        );
    

    This command adds information to the following system tables:

  12. Examine the status of the Database Designer run to see what projections Database Designer recommends. In the deployment_projection_name column:

    • rep indicates a replicated projection

    • super indicates a superprojection

      The deployment_status column is pending because the design has not yet been deployed.

      For this example, Database Designer recommends four projections:

      => \x
      Expanded display is on.
      => SELECT * FROM OUTPUT_DEPLOYMENT_STATUS;
      -[ RECORD 1 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 1
      deployment_projection_name | T_DBD_1_rep_my_design
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 2 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 2
      deployment_projection_name | T2_DBD_2_rep_my_design
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 3 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 3
      deployment_projection_name | T_super
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 4 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 4
      deployment_projection_name | T2_super
      deployment_status          | pending
      error_message              | N/A
      
  13. View the script /tmp/examples/my_design_deploy.sql to see how these projections are created when you run the deployment script. In this example, the script also assigns the encoding schemes RLE and COMMONDELTA_COMP to columns where appropriate.

  14. Deploy the design from the directory where you saved it:

    => \i /tmp/examples/my_design_deploy.sql
    
  15. Now that the design is deployed, delete the design:

    => SELECT DESIGNER_DROP_DESIGN('my_design');
    

3.4.10.3 - Privileges for running Database Designer functions

Non-DBADMIN users with the DBDUSER role can run Database Designer functions.

Non-DBADMIN users with the DBDUSER role can run Database Designer functions. Two steps are required to enable users to run these functions:

  1. A DBADMIN or superuser grants the user the DBDUSER role:

    => GRANT DBDUSER TO username;
    

    This role persists until the DBADMIN revokes it.

  2. Before the DBDUSER can run Database Designer functions, one of the following must occur:

    • The user enables the DBDUSER role:

      => SET ROLE DBDUSER;
      
    • The superuser sets the user's default role to DBDUSER:

      => ALTER USER username DEFAULT ROLE DBDUSER;
      

General DBDUSER limitations

As a DBDUSER, the following restrictions apply:

  • You can set a design's K-safety to a value less than or equal to system K-safety. You cannot change system K-safety.

  • You cannot explicitly change the ancient history mark (AHM), even during design deployment.

Design dependencies and privileges

Individual design tasks are likely to have dependencies that require specific privileges:

Task Required privileges
Add tables to a design
  • USAGE privilege on the design table schema

  • OWNER privilege on the design table

Add a single design query to the design
  • Privilege to execute the design query
Add a query file to the design
  • Read privilege on the storage location that contains the query file

  • Privilege to execute all the queries in the file

Add queries from the result of a user query to the design
  • Privilege to execute the user query

  • Privilege to execute each design query retrieved from the results of the user query

Create design and deployment scripts
  • WRITE privilege on the storage location of the design script

  • WRITE privilege on the storage location of the deployment script

3.4.10.4 - Resource pool for Database Designer users

When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs.

When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs. This allows multiple users to run Database Designer concurrently without interfering with each other or using up all cluster resources.

3.4.11 - Creating custom designs

Vertica strongly recommends that you use the physical schema design produced by , which provides , excellent query performance, and efficient use of storage space.

Vertica strongly recommends that you use the physical schema design produced by Database Designer, which provides K-safety, excellent query performance, and efficient use of storage space. If any queries run less as efficiently than you expect, consider using the Database Designer incremental design process to optimize the database design for the query.

If the projections created by Database Designer still do not meet your needs, you can write custom projections, from scratch or based on projection designs created by Database Designer.

If you are unfamiliar with writing custom projections, start by modifying an existing design generated by Database Designer.

3.4.11.1 - Custom design process

To create a custom design or customize an existing one:.

To create a custom design or customize an existing one:

  1. Plan the new design or modifications to an existing one. See Planning your design.

  2. Create or modify projections. See Design fundamentals and CREATE PROJECTION for more detail.

  3. Deploy projections to a test environment. See Writing and deploying custom projections.

  4. Test and modify projections as needed.

  5. After you finalize the design, deploy projections to the production environment.

3.4.11.2 - Planning your design

The syntax for creating a design is easy for anyone who is familiar with SQL.

The syntax for creating a design is easy for anyone who is familiar with SQL. As with any successful project, however, a successful design requires some initial planning. Before you create your first design:

  • Become familiar with standard design requirements and plan your design to include them. See Design requirements.

  • Determine how many projections you need to include in the design. See Determining the number of projections to use.

  • Determine the type of compression and encoding to use for columns. See Architecture.

  • Determine whether or not you want the database to be K-safe. Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values are 0, 1, and 2. See Designing for K-safety.

3.4.11.2.1 - Design requirements

A physical schema design is a script that contains CREATE PROJECTION statements.

A physical schema design is a script that contains CREATE PROJECTION statements. These statements determine which columns are included in projections and how they are optimized.

If you use Database Designer as a starting point, it automatically creates designs that meet all fundamental design requirements. If you intend to create or modify designs manually, be aware that all designs must meet the following requirements:

  • Every design must create at least one superprojection for every table in the database that is used by the client application. These projections provide complete coverage that enables users to perform ad-hoc queries as needed. They can contain joins and they are usually configured to maximize performance through sort order, compression, and encoding.

  • Query-specific projections are optional. If you are satisfied with the performance provided through superprojections, you do not need to create additional projections. However, you can maximize performance by tuning for specific query work loads.

  • Vertica recommends that all production databases have a minimum K-safety of one (K=1) to support high availability and recovery. (K-safety can be set to 0, 1, or 2.) See High availability with projections and Designing for K-safety.

  • Vertica recommends that if you have more than 20 nodes, but small tables, do not create replicated projections. If you create replicated projections, the catalog becomes very large and performance may degrade. Instead, consider segmenting those projections.

3.4.11.2.2 - Determining the number of projections to use

In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding.

In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding. This is especially true if the sort orders for the projections have been used to maximize performance for one or more query predicates (WHERE clauses).

However, you might want to add additional query-specific projections to increase the performance of queries that run slowly, are used frequently, or are run as part of business-critical reporting. The number of additional projections (and their buddies) that you create should be determined by:

  • Your organization's needs

  • The amount of disk space you have available on each node in the cluster

  • The amount of time available for loading data into the database

As the number of projections that are tuned for specific queries increases, the performance of these queries improves. However, the amount of disk space used and the amount of time required to load data increases as well. Therefore, you should create and test designs to determine the optimum number of projections for your database configuration. On average, organizations that choose to implement query-specific projections achieve optimal performance through the addition of a few query-specific projections.

3.4.11.2.3 - Designing for K-safety

Vertica recommends that all production databases have a minimum K-safety of one (K=1).

Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values for production databases are 1 and 2. Non-production databases do not have to be K-safe and can be set to 0.

A K-safe database must have at least three nodes, as shown in the following table:

K-safety level Number of required nodes
1 3+
2 5+

You can set K-safety to 1 or 2 only when the physical schema design meets certain redundancy requirements. See Requirements for a K-safe physical schema design.

Using Database Designer

To create designs that are K-safe, Vertica recommends that you use the Database Designer. When creating projections with Database Designer, projection definitions that meet K-safe design requirements are recommended and marked with a K-safety level. Database Designer creates a script that uses the MARK_DESIGN_KSAFE function to set the K-safety of the physical schema to 1. For example:

=> \i VMart_Schema_design_opt_1.sql
CREATE PROJECTION
CREATE PROJECTION
mark_design_ksafe
----------------------
Marked design 1-safe
(1 row)

By default, Vertica creates K-safe superprojections when database K-safety is greater than 0.

Monitoring K-safety

Monitoring tables can be accessed programmatically to enable external actions, such as alerts. You monitor the K-safety level by querying the SYSTEM table for settings in columns DESIGNED_FAULT_TOLERANCE and CURRENT_FAULT_TOLERANCE.

Loss of K-safety

When K nodes in your cluster fail, your database continues to run, although performance is affected. Further node failures could potentially cause the database to shut down if the failed node's data is not available from another functioning node in the cluster.

See also

K-safety in an Enterprise Mode database

3.4.11.2.3.1 - Requirements for a K-safe physical schema design

Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes.

Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes. (If your cluster has one or two nodes, it generates designs with a K-safety of 0. You can modify a design created for a three-node (or greater) cluster, and the K-safe requirements are already set.

If you create custom projections, your physical schema design must meet the following requirements to be able to successfully recover the database in the event of a failure:

You can use the MARK_DESIGN_KSAFE function to find out whether your schema design meets requirements for K-safety.

3.4.11.2.3.2 - Requirements for a physical schema design with no K-safety

If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).

If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).

If you want to start from scratch, do the following to establish minimal projection requirements for a functioning database with no K-safety (K=0):

  1. Define at least one superprojection for each table in the logical schema.

  2. Replicate (define an exact copy of) each dimension table superprojection on each node.

3.4.11.2.3.3 - Designing segmented projections for K-safety

Projections must comply with database K-safety requirements.

Projections must comply with database K-safety requirements. In general, you must create buddy projections for each segmented projection, where the number of buddy projections is K+1. Thus, if system K-safety is set to 1, each projection segment must be duplicated by one buddy; if K-safety is set to 2, each segment must be duplicated by two buddies.

Automatic creation of buddy projections

You can use CREATE PROJECTION so it automatically creates the number of buddy projections required to satisfy K-safety, by including SEGMENTED BY ... ALL NODES. If CREATE PROJECTION specifies K-safety (KSAFE=n), Vertica uses that setting; if the statement omits KSAFE, Vertica uses system K-safety.

In the following example, CREATE PROJECTION creates segmented projection ttt_p1 for table ttt. Because system K-safety is set to 1, Vertica requires a buddy projection for each segmented projection. The CREATE PROJECTION statement omits KSAFE, so Vertica uses system K-safety and creates two buddy projections: ttt_p1_b0 and ttt_p1_b1:

=> SELECT mark_design_ksafe(1);

  mark_design_ksafe
----------------------
 Marked design 1-safe
(1 row)

=> CREATE TABLE ttt (a int, b int);
WARNING 6978:  Table "ttt" will include privileges from schema "public"
CREATE TABLE

=> CREATE PROJECTION ttt_p1 as SELECT * FROM ttt SEGMENTED BY HASH(a) ALL NODES;
CREATE PROJECTION

=> SELECT projection_name from projections WHERE anchor_table_name='ttt';
 projection_name
-----------------
 ttt_p1_b0
 ttt_p1_b1
(2 rows)

Vertica automatically names buddy projections by appending the suffix _bn to the projection base name—for example ttt_p1_b0.

Manual creation of buddy projections

If you create a projection on a single node, and system K-safety is greater than 0, you must manually create the number of buddies required for K-safety. For example, you can create projection xxx_p1 for table xxx on a single node, as follows:

=> CREATE TABLE xxx (a int, b int);
WARNING 6978:  Table "xxx" will include privileges from schema "public"
CREATE TABLE

=> CREATE PROJECTION xxx_p1 AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0001;
CREATE PROJECTION

Because K-safety is set to 1, a single instance of this projection is not K-safe. Attempts to insert data into its anchor table xxx return with an error like this:

=> INSERT INTO xxx VALUES (1, 2);
ERROR 3586:  Insufficient projections to answer query
DETAIL:  No projections that satisfy K-safety found for table xxx
HINT:  Define buddy projections for table xxx

In order to comply with K-safety, you must create a buddy projection for projection xxx_p1. For example:

=> CREATE PROJECTION xxx_p1_buddy AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0002;
CREATE PROJECTION

Table xxx now complies with K-safety and accepts DML statements such as INSERT:

VMart=> INSERT INTO xxx VALUES (1, 2);
 OUTPUT
--------
      1
(1 row)

See also

For general information about segmented projections and buddies, see Segmented projections. For information about designing for K-safety, see Designing for K-safety and Designing for segmentation.

3.4.11.2.3.4 - Designing unsegmented projections for K-Safety

In many cases, dimension tables are relatively small, so you do not need to segment them.

In many cases, dimension tables are relatively small, so you do not need to segment them. Accordingly, you should design a K-safe database so projections for its dimension tables are replicated without segmentation on all cluster nodes. You create these projections with a CREATE PROJECTION statement that includes the keywords UNSEGMENTED ALL NODES. These keywords specify to create identical instances of the projection on all cluster nodes.

The following example shows how to create an unsegmented projection for the table store.store_dimension:


=> CREATE PROJECTION store.store_dimension_proj (storekey, name, city, state)
             AS SELECT store_key, store_name, store_city, store_state
             FROM store.store_dimension
             UNSEGMENTED ALL NODES;
CREATE PROJECTION

Vertica uses the same name to identify all instances of the unsegmented projection—in this example, store.store_dimension_proj. The keyword ALL NODES specifies to replicate the projection on all nodes:


=> \dj store.store_dimension_proj
                         List of projections
 Schema |         Name         |  Owner  |       Node       | Comment
--------+----------------------+---------+------------------+---------
 store  | store_dimension_proj | dbadmin | v_vmart_node0001 |
 store  | store_dimension_proj | dbadmin | v_vmart_node0002 |
 store  | store_dimension_proj | dbadmin | v_vmart_node0003 |
(3 rows)

For more information about projection name conventions, see Projection naming.

3.4.11.2.4 - Designing for segmentation

You segment projections using hash segmentation.

You segment projections using hash segmentation. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns that meet the criteria could be an excellent choice for hash segmentation.

When segmenting projections, determine which columns to use to segment the projection. Choose one or more columns that have a large number of unique data values and acceptable skew in their data distribution. Primary key columns are an excellent choice for hash segmentation. The columns must be unique across all the tables being used in a query.

3.4.11.3 - Design fundamentals

Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point.

Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point. This ensures that you have projections that meet basic requirements.

3.4.11.3.1 - Writing and deploying custom projections

Before you write custom projections, review the topics in Planning Your Design carefully.

Before you write custom projections, review the topics in Planning your design carefully. Failure to follow these considerations can result in non-functional projections.

To manually modify or create a projection:

  1. Write a script with CREATE PROJECTION statements to create the desired projections.

  2. Run the script in vsql with the meta-command \i.

  3. For a K-safe database, call Vertica meta-function GET_PROJECTIONS on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.

  4. If you create projections for tables that already contains data, call REFRESH or START_REFRESH to update new projections. Otherwise, these projections are not available for query processing.

  5. Call MAKE_AHM_NOW to set the Ancient History Mark (AHM) to the most recent epoch.

  6. Call DROP PROJECTION on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.

  7. Call ANALYZE_STATISTICS on all database projections:

    => SELECT ANALYZE_STATISTICS ('');
    

    This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.

3.4.11.3.2 - Designing superprojections

Superprojections have the following requirements:.

Superprojections have the following requirements:

  • They must contain every column within the table.

  • For a K-safe design, superprojections must either be replicated on all nodes within the database cluster (for dimension tables) or paired with buddies and segmented across all nodes (for very large tables and medium large tables). See Projections and High availability with projections for an overview of projections and how they are stored. See Designing for K-safety for design specifics.

To provide maximum usability, superprojections need to minimize storage requirements while maximizing query performance. To achieve this, the sort order for columns in superprojections is based on storage requirements and commonly used queries.

3.4.11.3.3 - Sort order benefits

Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.

Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.

Minimize storage requirements

Minimizing storage saves on physical resources and increases performance by reducing disk I/O. You can minimize projection storage by prioritizing low-cardinality columns in its sort order. This reduces the number of rows Vertica stores and accesses to retrieve query results.

After identifying projection sort columns, analyze their data and choose the most effective encoding method. The Vertica optimizer gives preference to columns with run-length encoding (RLE), so be sure to use it whenever appropriate. Run-length encoding replaces sequences (runs) of identical values with a single pair that contains the value and number of occurrences. Therefore, it is especially appropriate to use it for low-cardinality columns whose run length is large.

Maximize query performance

You can facilitate query performance through column sort order as follows:

  • Where possible, sort order should prioritize columns with the lowest cardinality.

  • Do not sort projections on columns of type LONG VARBINARY and LONG VARCHAR.

See also

Choosing sort order: best practices

3.4.11.3.4 - Choosing sort order: best practices

When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.

When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.

Combine RLE and sort order

When dealing with predicates on low-cardinality columns, use a combination of RLE and sorting to minimize storage requirements and maximize query performance.

Suppose you have a students table contain the following values and encoding types:

Column # of Distinct Values Encoded With
gender 2 (M or F) RLE
pass_fail 2 (P or F) RLE
class 4 (freshman, sophomore, junior, or senior) RLE
name 10000 (too many to list) Auto

You might have queries similar to this one:

SELECT name FROM studentsWHERE gender = 'M' AND pass_fail = 'P' AND class = 'senior';

The fastest way to access the data is to work through the low-cardinality columns with the smallest number of distinct values before the high-cardinality columns. The following sort order minimizes storage and maximizes query performance for queries that have equality restrictions on gender, class, pass_fail, and name. Specify the ORDER BY clause of the projection as follows:

ORDER BY students.gender, students.pass_fail, students.class, students.name

In this example, the gender column is represented by two RLE entries, the pass_fail column is represented by four entries, and the class column is represented by 16 entries, regardless of the cardinality of the students table. Vertica efficiently finds the set of rows that satisfy all the predicates, resulting in a huge reduction of search effort for RLE encoded columns that occur early in the sort order. Consequently, if you use low-cardinality columns in local predicates, as in the previous example, put those columns early in the projection sort order, in increasing order of distinct cardinality (that is, in increasing order of the number of distinct values in each column).

If you sort this table with student.class first, you improve the performance of queries that restrict only on the student.class column, and you improve the compression of the student.class column (which contains the largest number of distinct values), but the other columns do not compress as well. Determining which projection is better depends on the specific queries in your workload, and their relative importance.

Storage savings with compression decrease as the cardinality of the column increases; however, storage savings with compression increase as the number of bytes required to store values in that column increases.

Maximize the advantages of RLE

To maximize the advantages of RLE encoding, use it only when the average run length of a column is greater than 10 when sorted. For example, suppose you have a table with the following columns, sorted in order of cardinality from low to high:

address.country, address.region, address.state, address.city, address.zipcode

The zipcode column might not have 10 sorted entries in a row with the same zip code, so there is probably no advantage to run-length encoding that column, and it could make compression worse. But there are likely to be more than 10 countries in a sorted run length, so applying RLE to the country column can improve performance.

Put lower cardinality column first for functional dependencies

In general, put columns that you use for local predicates (as in the previous example) earlier in the join order to make predicate evaluation more efficient. In addition, if a lower cardinality column is uniquely determined by a higher cardinality column (like city_id uniquely determining a state_id), it is always better to put the lower cardinality, functionally determined column earlier in the sort order than the higher cardinality column.

For example, in the following sort order, the Area_Code column is sorted before the Number column in the customer_info table:

ORDER BY = customer_info.Area_Code, customer_info.Number, customer_info.Address

In the query, put the Area_Code column first, so that only the values in the Number column that start with 978 are scanned.

=> SELECT AddressFROM customer_info WHERE Area_Code='978' AND Number='9780123457';

Sort for merge joins

When processing a join, the Vertica optimizer chooses from two algorithms:

  • Merge join—If both inputs are pre-sorted on the join column, the optimizer chooses a merge join, which is faster and uses less memory.

  • Hash join—Using the hash join algorithm, Vertica uses the smaller (inner) joined table to build an in-memory hash table on the join column. A hash join has no sort requirement, but it consumes more memory because Vertica builds a hash table with the values in the inner table. The optimizer chooses a hash join when projections are not sorted on the join columns.

If both inputs are pre-sorted, merge joins do not have to do any pre-processing, making the join perform faster. Vertica uses the term sort-merge join to refer to the case when at least one of the inputs must be sorted prior to the merge join. Vertica sorts the inner input side but only if the outer input side is already sorted on the join columns.

To give the Vertica query optimizer the option to use an efficient merge join for a particular join, create projections on both sides of the join that put the join column first in their respective projections. This is primarily important to do if both tables are so large that neither table fits into memory. If all tables that a table will be joined to can be expected to fit into memory simultaneously, the benefits of merge join over hash join are sufficiently small that it probably isn't worth creating a projection for any one join column.

Sort on columns in important queries

If you have an important query, one that you run on a regular basis, you can save time by putting the columns specified in the WHERE clause or the GROUP BY clause of that query early in the sort order.

If that query uses a high-cardinality column such as Social Security number, you may sacrifice storage by placing this column early in the sort order of a projection, but your most important query will be optimized.

Sort columns of equal cardinality by size

If you have two columns of equal cardinality, put the column that is larger first in the sort order. For example, a CHAR(20) column takes up 20 bytes, but an INTEGER column takes up 8 bytes. By putting the CHAR(20) column ahead of the INTEGER column, your projection compresses better.

Sort foreign key columns first, from low to high distinct cardinality

Suppose you have a fact table where the first four columns in the sort order make up a foreign key to another table. For best compression, choose a sort order for the fact table such that the foreign keys appear first, and in increasing order of distinct cardinality. Other factors also apply to the design of projections for fact tables, such as partitioning by a time dimension, if any.

In the following example, the table inventory stores inventory data, and product_key and warehouse_key are foreign keys to the product_dimension and warehouse_dimension tables:

=> CREATE TABLE inventory (
 date_key INTEGER NOT NULL,
 product_key INTEGER NOT NULL,
 warehouse_key INTEGER NOT NULL,
 ...
);
=> ALTER TABLE inventory
   ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
   REFERENCES warehouse_dimension(warehouse_key);
ALTER TABLE inventory
   ADD CONSTRAINT fk_inventory_product FOREIGN KEY(product_key)
   REFERENCES product_dimension(product_key);

The inventory table should be sorted by warehouse_key and then product, since the cardinality of the warehouse_key column is probably lower that the cardinality of the product_key.

3.4.11.3.5 - Prioritizing column access speed

If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank.

If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank. For more information, see Setting storage performance.

How columns are ranked

Vertica stores columns included in the projection sort order on the fastest available storage locations. Columns not included in the projection sort order are stored on slower disks. Columns for each projection are ranked as follows:

  • Columns in the sort order are given the highest priority (numbers > 1000).

  • The last column in the sort order is given the rank number 1001.

  • The next-to-last column in the sort order is given the rank number 1002, and so on until the first column in the sort order is given 1000 + # of sort columns.

  • The remaining columns are given numbers from 1000–1, starting with 1000 and decrementing by one per column.

Vertica then stores columns on disk from the highest ranking to the lowest ranking. It places highest-ranking columns on the fastest disks and the lowest-ranking columns on the slowest disks.

Overriding default column ranking

You can modify which columns are stored on fast disks by manually overriding the default ranks for these columns. To accomplish this, set the ACCESSRANK keyword in the column list. Make sure to use an integer that is not already being used for another column. For example, if you want to give a column the fastest access rank, use a number that is significantly higher than 1000 + the number of sort columns. This allows you to enter more columns over time without bumping into the access rank you set.

The following example sets column store_key's access rank to 1500:

CREATE PROJECTION retail_sales_fact_p (
     store_key ENCODING RLE ACCESSRANK 1500,
     pos_transaction_number ENCODING RLE,
     sales_dollar_amount,
     cost_dollar_amount )
AS SELECT
     store_key,
     pos_transaction_number,
     sales_dollar_amount,
     cost_dollar_amount
FROM store.store_sales_fact
ORDER BY store_key
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;

4 - Database users and privileges

Database users should only have access to the database resources that they need to perform their tasks.

Database users should only have access to the database resources that they need to perform their tasks. For example, most users should be able to read data but not modify or insert new data. A smaller number of users typically need permission to perform a wider range of database tasks—for example, create and modify schemas, tables, and views. A very small number of users can perform administrative tasks, such as rebalance nodes on a cluster, or start or stop a database. You can also let certain users extend their own privileges to other users.

Client authentication controls what database objects users can access and change in the database. You specify access for specific users or roles with GRANT statements.

4.1 - Database users

Every Vertica database has one or more users.

Every Vertica database has one or more users. When users connect to a database, they must log on with valid credentials (username and password) that a superuser defined in the database.

Database users own the objects they create in a database, such as tables, procedures, and storage locations.

4.1.1 - Types of database users

In a Vertica database, there are three types of users:.

In a Vertica database, there are three types of users:

  • Database administrator (DBADMIN)

  • Object owner

  • Everyone else (PUBLIC)

4.1.1.1 - Database administration user

On installation, a new Vertica database automatically contains a user with superuser privileges.

On installation, a new Vertica database automatically contains a user with superuser privileges. Unless explicitly named during installation, this user is identified as dbadmin. This user cannot be dropped and has the following irrevocable roles:

With these roles, the dbadmin user can perform all database operations. This user can also create other users with administrative privileges.

Creating additional database administrators

As the dbadmin user, you can create other users with the same privileges:

  1. Create a user:

    => CREATE USER DataBaseAdmin2;
    CREATE USER
    
  2. Grant the appropriate roles to new user DataBaseAdmin2:

    => GRANT dbduser, dbadmin, pseudosuperuser to DataBaseAdmin2;
    GRANT ROLE
    

    User DataBaseAdmin2 now has the same privileges granted to the original dbadmin user.

  3. As DataBaseAdmin2, enable your assigned roles with SET ROLE:

    => \c - DataBaseAdmin2;
    You are now connected to database "VMart" as user "DataBaseAdmin2".
    => SET ROLE dbadmin, dbduser, pseudosuperuser;
    SET ROLE
    
  4. Confirm the roles are enabled:

    => SHOW ENABLED ROLES;
    name          | setting
    -------------------------------------------------
    enabled roles | dbduser, dbadmin, pseudosuperuser
    

4.1.1.2 - Object owner

An object owner is the user who creates a particular database object and can perform any operation on that object.

An object owner is the user who creates a particular database object and can perform any operation on that object. By default, only an owner (or a superuser) can act on a database object. In order to allow other users to use an object, the owner or superuser must grant privileges to those users using one of the GRANT statements.

See Database privileges for more information.

4.1.1.3 - PUBLIC user

All non-DBA (superuser) or object owners are PUBLIC users.

All non-DBA (superuser) or object owners are PUBLIC users.

Newly-created users do not have access to schema PUBLIC by default. Make sure to GRANT USAGE ON SCHEMA PUBLIC to all users you create.

See also

4.1.2 - Creating a database user

To create a database user:.

To create a database user:

  1. From vsql, connect to the database as a superuser.

  2. Issue the CREATE USER statement with optional parameters.

  3. Run a series of GRANT statements to grant the new user privileges.

To create a user on MC, see User administration in MC in Management Console

New user privileges

By default, new database users have the right to create temporary tables in the database.

New users do not have access to schema PUBLIC by default. Be sure to call GRANT USAGE ON SCHEMA PUBLIC to all users you create.

Modifying users

You can change information about a user, such as his or her password, by using the ALTER USER statement. If you want to configure a user to not have any password authentication, you can set the empty password ‘’ in CREATE USER or ALTER USER statements, or omit the IDENTIFIED BY parameter in CREATE USER.

Example

The following series of commands add user Fred to a database with password 'password. The second command grants USAGE privileges to Fred on the public schema:

=> CREATE USER Fred IDENTIFIED BY 'password';
=> GRANT USAGE ON SCHEMA PUBLIC to Fred;

User names created with double-quotes are case sensitive. For example:

=> CREATE USER "FrEd1";

In the above example, the logon name must be an exact match. If the user name was created without double-quotes (for example, FRED1), then the user can log on as FRED1, FrEd1, fred1, and so on.

See also

4.1.3 - User-level configuration parameters

ALTER USER lets you set user-level configuration parameters on individual users.

ALTER USER lets you set user-level configuration parameters on individual users. These settings override database- or session-level settings on the same parameters. For example, the following ALTER USER statement sets DepotOperationsForQuery for users Yvonne and Ahmed to FETCHES, thus overriding the default setting of ALL:

=> SELECT user_name, parameter_name, current_value, default_value FROM user_configuration_parameters
   WHERE user_name IN('Ahmed', 'Yvonne') AND parameter_name = 'DepotOperationsForQuery';
 user_name |     parameter_name      | current_value | default_value
-----------+-------------------------+---------------+---------------
 Ahmed     | DepotOperationsForQuery | ALL           | ALL
 Yvonne    | DepotOperationsForQuery | ALL           | ALL
(2 rows)

=> ALTER USER Ahmed SET DepotOperationsForQuery='FETCHES';
ALTER USER
=> ALTER USER Yvonne SET DepotOperationsForQuery='FETCHES';
ALTER USER

Identifying user-level parameters

To identify user-level configuration parameters, query the allowed_levels column of system table CONFIGURATION_PARAMETERS. For example, the following query identifies user-level parameters that affect depot usage:

n=> SELECT parameter_name, allowed_levels, default_value, current_level, current_value
    FROM configuration_parameters WHERE allowed_levels ilike '%USER%' AND parameter_name ilike '%depot%';
     parameter_name      |     allowed_levels      | default_value | current_level | current_value
-------------------------+-------------------------+---------------+---------------+---------------
 UseDepotForReads        | SESSION, USER, DATABASE | 1             | DEFAULT       | 1
 DepotOperationsForQuery | SESSION, USER, DATABASE | ALL           | DEFAULT       | ALL
 UseDepotForWrites       | SESSION, USER, DATABASE | 1             | DEFAULT       | 1
(3 rows)

Viewing user parameter settings

You can obtain user settings in two ways:

  • Query system table USER_CONFIGURATION_PARAMETERS:

    => SELECT * FROM user_configuration_parameters;
     user_name |      parameter_name       | current_value | default_value
    -----------+---------------------------+---------------+---------------
     Ahmed     | DepotOperationsForQuery   | FETCHES       | ALL
     Yvonne    | DepotOperationsForQuery   | FETCHES       | ALL
     Yvonne    | LoadSourceStatisticsLimit | 512           | 256
    (3 rows)
    
  • Use SHOW USER:

    => SHOW USER Yvonne PARAMETER ALL;
      user  |         parameter         | setting
    --------+---------------------------+---------
     Yvonne | DepotOperationsForQuery   | FETCHES
     Yvonne | LoadSourceStatisticsLimit | 512
    (2 rows)
    
    => SHOW USER ALL PARAMETER ALL;
      user  |         parameter         | setting
    --------+---------------------------+---------
     Yvonne | DepotOperationsForQuery   | FETCHES
     Yvonne | LoadSourceStatisticsLimit | 512
     Ahmed  | DepotOperationsForQuery   | FETCHES
    (3 rows)
    

4.1.4 - Locking user accounts

As a superuser, you can manually lock and unlock a database user account with ALTER USER...ACCOUNT LOCK and ALTER USER...ACCOUNT UNLOCK, respectively.

As a superuser, you can manually lock and unlock a database user account with ALTER USER...ACCOUNT LOCK and ALTER USER...ACCOUNT UNLOCK, respectively. For example, the following command prevents user Fred from logging in to the database:

=> ALTER USER Fred ACCOUNT LOCK;
=> \c - Fred
FATAL 4974: The user account "Fred" is locked
HINT: Please contact the database administrator

The following example unlocks access to Fred's user account:

=> ALTER USER Fred ACCOUNT UNLOCK;|
=> \c - Fred
You are now connected as user "Fred".

Locking new accounts

CREATE USER can specify to lock a new account. Like any locked account, it can be unlocked with ALTER USER...ACCOUNT UNLOCK.

=> CREATE USER Bob ACCOUNT LOCK;
CREATE USER

Locking accounts for failed login attempts

A user's profile can specify to lock an account after a certain number of failed login attempts.

4.1.5 - Setting and changing user passwords

As a superuser, you can set any user's password when you create that user with CREATE USER, or later with ALTER USER.

As a superuser, you can set any user's password when you create that user with CREATE USER, or later with ALTER USER. Non-superusers can also change their own passwords with ALTER USER. One exception applies: users who are added to the Vertica database with the LDAPLink service cannot change their passwords with ALTER USER.

You can also give a user a pre-hashed password if you provide its associated salt. The salt must be a hex string. This method bypasses password complexity requirements.

To view password hashes and salts of existing users, see the PASSWORDS system table.

Changing a user's password has no effect on their current session.

Setting user passwords in VSQL

In this example, the user 'Bob' is created with the password 'mypassword.'

=> CREATE USER Bob IDENTIFIED BY 'mypassword';
CREATE USER

The password is then changed to 'Orca.'

=> ALTER USER Bob IDENTIFIED BY 'Orca' REPLACE 'mypassword';
ALTER USER

In this example, the user 'Alice' is created with a pre-hashed password and salt.

=> CREATE USER Alice IDENTIFIED BY
'sha512e0299de83ecfaa0b6c9cbb1feabfbe0b3c82a1495875cd9ec1c4b09016f09b42c1'
SALT '465a4aec38a85d6ecea5a0ac8f2d36d8';

Setting user passwords in Management Console

On Management Console, users with ADMIN or IT privileges can reset a user's non-LDAP password:

  1. Sign in to Management Console and navigate to MC Settings > User management.

  2. Click to select the user to modify and click Edit.

  3. Click Edit password and enter the new password twice.

  4. Click OK and then Save.

4.2 - Database roles

A role is a collection of privileges that can be granted to one or more users or other roles.

A role is a collection of privileges that can be granted to one or more users or other roles. Roles help you grant and manage sets of privileges for various categories of users, rather than grant those privileges to each user individually.

For example, several users might require administrative privileges. You can grant these privileges to them as follows:

  1. Create an administrator role with CREATE ROLE:

    CREATE ROLE administrator;
    
  2. Grant the role to the appropriate users.

  3. Grant the appropriate privileges to this role with one or more GRANT statements. You can later add and remove privileges as needed. Changes in role privileges are automatically propagated to the users who have that role.

After users are assigned roles, they can either enable those roles themselves, or you can automatically enable their roles for them.

4.2.1 - Predefined database roles

Vertica has the following predefined roles:.

Vertica has the following predefined roles:

Automatic role grants

On installation, Vertica automatically grants and enables predefined roles as follows:

  • The DBADMIN, PSEUDOSUPERUSER, and DBDUSER roles are irrevocably granted to the dbadmin user. These roles are always enabled for dbadmin, and can never be dropped.

  • PUBLIC is granted to dbadmin, and to all other users as they are created. This role is always enabled and cannot be dropped or revoked.

Granting predefined roles

After installation, the dbadmin user and users with the PSEUDOSUPERUSER role can grant one or more predefined roles to any user or non-predefined role. For example, the following set of statements creates the userdba role and grants it the predefined role DBADMIN:

=> CREATE ROLE userdba;
CREATE ROLE
=> GRANT DBADMIN TO userdba WITH ADMIN OPTION;
GRANT ROLE

Users and roles that are granted a predefined role can extend that role to other users, if the original GRANT (Role) statement includes WITH ADMIN OPTION. One exception applies: if you grant a user the PSEUDOSUPERUSER role and omit WITH ADMIN OPTION, the grantee can grant any role, including all predefined roles, to other users.

For example, the userdba role was previously granted the DBADMIN role. Because the GRANT statement includes WITH ADMIN OPTION, users who are assigned the userdba role can grant the DBADMIN role to other users:

=> GRANT userdba TO fred;
GRANT ROLE
=> \c - fred
You are now connected as user "fred".
=> SET ROLE userdba;
SET
=> GRANT dbadmin TO alice;
GRANT ROLE

Modifying predefined Roles

Excluding SYSMONITOR, you can grant predefined roles privileges on individual database objects, such as tables or schemas. For example:

=> CREATE SCHEMA s1;
CREATE SCHEMA
=> GRANT ALL ON SCHEMA s1 to PUBLIC;
GRANT PRIVILEGE

You can grant PUBLIC any role, including predefined roles. For example:


=> CREATE ROLE r1;
CREATE ROLE
=> GRANT r1 TO PUBLIC;
GRANT ROLE

You cannot modify any other predefined role by granting another role to it. Attempts to do so return a rollback error:

=> CREATE ROLE r2;
CREATE ROLE
=> GRANT r2 TO PSEUDOSUPERUSER;
ROLLBACK 2347:  Cannot alter predefined role "pseudosuperuser"

4.2.1.1 - DBADMIN

The DBADMIN role is a predefined role that is assigned to the dbadmin user on database installation.

The DBADMIN role is a predefined role that is assigned to the dbadmin user on database installation. Thereafter, the dbadmin user and users with the PSEUDOSUPERUSER role can grant any role to any user or non-predefined role.

For example, superuser dbadmin creates role fred and grants fred the DBADMIN role:

=> CREATE USER fred;
CREATE USER
=> GRANT DBADMIN TO fred WITH ADMIN OPTION;
GRANT ROLE

After user fred enables its DBADMIN role, he can exercise his DBADMIN privileges by creating user alice. Because the GRANT statement includes WITH ADMIN OPTION, fred can also grant the DBADMIN role to user alice:


=> \c - fred
You are now connected as user "fred".
=> SET ROLE dbadmin;
SET
CREATE USER alice;
CREATE USER
=> GRANT DBADMIN TO alice;
GRANT ROLE

DBADMIN privileges

The following table lists privileges that are supported for the DBADMIN role:

  • Create users and roles, and grant them roles and privileges

  • Create and drop schemas

  • View all system tables

  • View and terminate user sessions

  • Access all data created by any user

4.2.1.2 - PSEUDOSUPERUSER

The PSEUDOSUPERUSER role is a predefined role that is automatically assigned to the dbadmin user on database installation.

The PSEUDOSUPERUSER role is a predefined role that is automatically assigned to the dbadmin user on database installation. The dbadmin can grant this role to any user or non-predefined role. Thereafter, PSEUDOSUPERUSER users can grant any role, including predefined roles, to other users.

PSEUDOSUPERUSER privileges

Users with the PSEUDOSUPERUSER role are entitled to complete administrative privileges, which cannot be revoked. Role privileges include:

  • Bypass all GRANT/REVOKE authorization

  • Create schemas and tables

  • Create users and roles, and grant privileges to them

  • Modify user accounts—for example, set user account's passwords, and lock/unlock accounts.

  • Create or drop a UDF library and function, or any external procedure

4.2.1.3 - DBDUSER

The DBDUSER role is a predefined role that is assigned to the dbadmin user on database installation.

The DBDUSER role is a predefined role that is assigned to the dbadmin user on database installation. The dbadmin and any PSEUDOSUPERUSER can grant this role to any user or non-predefined role. Users who have this role and enable it can call Database Designer functions from the command line.

Associating DBDUSER with resource pools

Be sure to associate a resource pool with the DBDUSER role, to facilitate resource management when you run Database Designer. Multiple users can run Database Designer concurrently without interfering with each other or exhausting all the cluster resources. Whether you run Database Designer programmatically or with Administration Tools, design execution is generally contained by the user's resource pool, but might spill over into system resource pools for less-intensive tasks.

4.2.1.4 - SYSMONITOR

An organization's database administrator may have many responsibilities outside of maintaining Vertica as a DBADMIN user.

An organization's database administrator may have many responsibilities outside of maintaining Vertica as a DBADMIN user. In this case, as the DBADMIN you may want to delegate some Vertica administrative tasks to another Vertica user.

The DBADMIN can assign a delegate the SYSMONITOR role to grant access to system tables without granting full DBADMIN access.

The SYSMONITOR role provides the following privileges.

  • View all system tables that are marked as monitorable. You can see a list of all the monitorable tables by issuing the statement:

    => SELECT * FROM system_tables WHERE is_monitorable='t';
    
  • If WITH ADMIN OPTION was included when granting SYSMONITOR to the user or role, that user or role can then grant SYSMONITOR privileges to other users and roles.

Grant a SYSMONITOR role

To grant a user or role the SYSMONITOR role, you must be one of the following:

  • a DBADMIN user

  • a user assigned the SYSMONITOR who has the ADMIN OPTION

Use the GRANT (Role) SQL statement to assign a user the SYSMONITOR role. This example shows how to grant the SYSMONITOR role to user1 and includes administration privileges by using the WITH ADMIN OPTION parameter. The ADMIN OPTION grants the SYSMONITOR role administrative privileges:

=> GRANT SYSMONITOR TO user1 WITH ADMIN OPTION;

This example shows how to revoke the ADMIN OPTION from the SYSMONITOR role for user1:

=> REVOKE ADMIN OPTION for SYSMONITOR FROM user1;

Use CASCADE to revoke ADMIN OPTION privileges for all users assigned the SYSMONITOR role:

=> REVOKE ADMIN OPTION for SYSMONITOR FROM PUBLIC CASCADE;

Example

This example shows how to:

  • Create a user

  • Create a role

  • Grant SYSMONITOR privileges to the new role

  • Grant the role to the user

=> CREATE USER user1;
=> CREATE ROLE monitor;
=> GRANT SYSMONITOR TO monitor;
=> GRANT monitor TO user1;

Assign SYSMONITOR privileges

This example uses the user and role created in the Grant SYSMONITOR Role example and shows how to:

  • Create a table called personal_data

  • Log in as user1

  • Grant user1 the monitor role. (You already granted the monitor SYSMONITOR privileges in the Grant a SYSMONITOR Role example.)

  • Run a SELECT statement as user1

The results of the operations are based on the privilege already granted to user1.

=> CREATE TABLE personal_data (SSN varchar (256));
=> \c -user1;
=> SET ROLE monitor;
=> SELECT COUNT(*) FROM TABLES;
 COUNT
-------
 1
(1 row)

Because you assigned the SYSMONITOR role, user1 can see the number of rows in the Tables system table. In this simple example, there is only one table (personal_data) in the database so the SELECT COUNT returns one row. In actual conditions, the SYSMONITOR role would see all the tables in the database.

Check if a table is accessible by SYSMONITOR

To check if a system table can be accessed by a user assigned the SYSMONITOR role:

=> SELECT table_name, is_monitorable FROM system_tables WHERE table_name='table_name'

For example, the following statement shows that the CURRENT_SESSION system table is accessible by the SYSMONITOR:

=> SELECT table_name, is_monitorable FROM system_tables WHERE table_name='current_session';
   table_name    | is_monitorable
-----------------+----------------
 current_session | t
(1 row)

4.2.1.5 - UDXDEVELOPER

The UDXDEVELOPER role is a predefined role that enables users to create and replace user-defined libraries.

The UDXDEVELOPER role is a predefined role that enables users to create and replace user-defined libraries. The dbadmin can grant this role to any user or non-predefined role.

UDXDEVELOPER privileges

Users with the UDXDEVELOPER role can perform the following actions:

To use the privileges of this role, you must explicitly enable it using SET ROLE.

Security considerations

A user with the UDXDEVELOPER role can create libraries and, therefore, can install any UDx function in the database. UDx functions run as the Linux user that owns the database, and therefore have access to resources that Vertica has access to.

A poorly-written function can degrade database performance. Give this role only to users you trust to use UDxs responsibly. You can limit the memory that a UDx can consume by running UDxs in fenced mode and by setting the FencedUDxMemoryLimitMB configuration parameter.

4.2.1.6 - MLSUPERVISOR

The MLSUPERVISOR role is a predefined role to which all the ML-model management privileges of DBADMIN are delegated.

The MLSUPERVISOR role is a predefined role to which all the ML-model management privileges of DBADMIN are delegated. An MLSUPERVISOR can manage all models in the V_CATALOG.MODELS table on behalf of dbadmin.

In the following example, user alice uses her MLSUPERVISOR privileges to reassign ownership of the model my_model from user bob to user nina:


=> \c - alice
You are now connected as user "alice".

=> SELECT model_name, schema_name, owner_name FROM models;
 model_name  | schema_name | owner_name
-------------+-------------+------------
 my_model    | public      | bob
 mylinearreg | myschema2   | alice
 (2 rows)

=> SET ROLE MLSUPERVISOR;

=> ALTER MODEL my_model OWNER to nina;

=> SELECT model_name, schema_name, owner_name FROM models;
 model_name  | schema_name | owner_name
-------------+-------------+------------
 my_model    | public      | nina
 mylinearreg | myschema2   | alice
 (2 rows)

=> DROP MODEL my_model;

MLSUPERVISOR privileges

The following privileges are supported for the MLSUPERVISOR role:

  • ML-model management privileges of DBADMIN

  • Management (USAGE, ALTER, DROP) of all models in V_CATALOG.MODELS

To use the privileges of this role, you must explicitly enable it using SET ROLE.

See also

4.2.1.7 - PUBLIC

The PUBLIC role is a predefined role that is automatically assigned to all new users.

The PUBLIC role is a predefined role that is automatically assigned to all new users. It is always enabled and cannot be dropped or revoked. Use this role to grant all database users the same minimum set of privileges.

Like any role, the PUBLIC role can be granted privileges to individual objects and other roles. The following example grants the PUBLIC role INSERT and SELECT privileges on table publicdata. This enables all users to read data in that table and insert new data:

=> CREATE TABLE publicdata (a INT, b VARCHAR);
CREATE TABLE
=> GRANT INSERT, SELECT ON publicdata TO PUBLIC;
GRANT PRIVILEGE
=> CREATE PROJECTION publicdataproj AS (SELECT * FROM publicdata);
CREATE PROJECTION
=> \c - bob
You are now connected as user "bob".
=> INSERT INTO publicdata VALUES (10, 'Hello World');
OUTPUT
--------
      1
(1 row)

The following example grants PUBLIC the employee role, so all database users have employee privileges:

=> GRANT employee TO public;
GRANT ROLE

4.2.2 - Role hierarchy

By granting roles to other roles, you can build a hierarchy of roles, where roles lower in the hierarchy have a narrow range of privileges, while roles higher in the hierarchy are granted combinations of roles and their privileges.

By granting roles to other roles, you can build a hierarchy of roles, where roles lower in the hierarchy have a narrow range of privileges, while roles higher in the hierarchy are granted combinations of roles and their privileges. When you organize roles hierarchically, any privileges that you add to lower-level roles are automatically propagated to the roles above them.

Creating hierarchical roles

The following example creates two roles, assigns them privileges, then assigns both roles to another role.

  1. Create table applog:

    => CREATE TABLE applog (id int, sourceID VARCHAR(32), data TIMESTAMP, event VARCHAR(256));
    
  2. Create the logreader role and grant it read-only privileges on table applog:

    => CREATE ROLE logreader;
    CREATE ROLE
    => GRANT SELECT ON applog TO logreader;
    GRANT PRIVILEGE
    
  3. Create the logwriter role and grant it write privileges on table applog:

    => CREATE ROLE logwriter;
    CREATE ROLE
    => GRANT INSERT, UPDATE ON applog to logwriter;
    GRANT PRIVILEGE
    
  4. Create the logadmin role and grant it DELETE privilege on table applog:

    => CREATE ROLE logadmin;
    CREATE ROLE
    => GRANT DELETE ON applog to logadmin;
    GRANT PRIVILEGE
    
  5. Grant the logreader and logwriter roles to role logadmin:

    => GRANT logreader, logwriter TO logadmin;
    
  6. Create user bob and grant him the logadmin role:

    => CREATE USER bob;
    CREATE USER
    => GRANT logadmin TO bob;
    GRANT PRIVILEGE
    
  7. Modify user bob's account so his logadmin role is automatically enabled on login:

    
    => ALTER USER bob DEFAULT ROLE logadmin;
    ALTER USER
    => \c - bob
    You are now connected as user "bob".
    => SHOW ENABLED_ROLES;
         name      | setting
    ---------------+----------
     enabled roles | logadmin
    (1 row)
    

Enabling hierarchical roles

Only roles that are explicitly granted to a user can be enabled for that user. In the previous example, roles logreader or logwriter cannot be enabled for bob. They can only be enabled indirectly, by enabling logadmin.

Hierarchical role grants and WITH ADMIN OPTION

If one or more roles are granted to another role using WITH ADMIN OPTION, then users who are granted the 'higher' role inherit administrative access to the subordinate roles.

For example, you might modify the earlier grants of roles logreader and logwriter to logadmin as follows:

=> GRANT logreader, logwriter TO logadmin WITH ADMIN OPTION;
NOTICE 4617:  Role "logreader" was already granted to role "logadmin"
NOTICE 4617:  Role "logwriter" was already granted to role "logadmin"
GRANT ROLE

User bob , through his logadmin role, is now authorized to grant its two subordinate roles to other users—in this case, role logreader to user Alice:


=> \c - bob;
You are now connected as user "bob".
=> GRANT logreader TO Alice;
GRANT ROLE
=> \c - alice;
You are now connected as user "alice".
=> show available_roles;
      name       |  setting
-----------------+-----------
 available roles | logreader
(1 row)

4.2.3 - Creating and dropping roles

As a superuser with the DBADMIN or PSEUDOSUPERUSER role, you can create and drop roles with CREATE ROLE and DROP ROLE, respectively.

As a superuser with the DBADMIN or PSEUDOSUPERUSER role, you can create and drop roles with CREATE ROLE and DROP ROLE, respectively.

=> CREATE ROLE administrator;
CREATE ROLE

A new role has no privileges or roles granted to it. Only superusers can grant privileges and access to the role.

Dropping database roles with dependencies

If you try to drop a role that is granted to users or other roles Vertica returns a rollback message:

=> DROP ROLE administrator;
NOTICE:  User Bob depends on Role administrator
ROLLBACK:  DROP ROLE failed due to dependencies
DETAIL:  Cannot drop Role administrator because other objects depend on it
HINT:  Use DROP ROLE ... CASCADE to remove granted roles from the dependent users/roles

To force the drop operation, qualify the DROP ROLE statement with CASCADE:

=> DROP ROLE administrator CASCADE;
DROP ROLE

4.2.4 - Granting privileges to roles

You can use GRANT statements to assign privileges to a role, just as you assign privileges to users.

You can use GRANT statements to assign privileges to a role, just as you assign privileges to users. See Database privileges for information about which privileges can be granted.

Granting a privilege to a role immediately affects active user sessions. When you grant a privilege to a role, it becomes immediately available to all users with that role enabled.

The following example creates two roles and assigns them different privileges on the same table.

  1. Create table applog:

    => CREATE TABLE applog (id int, sourceID VARCHAR(32), data TIMESTAMP, event VARCHAR(256));
    
  2. Create roles logreader and logwriter:

    => CREATE ROLE logreader;
    CREATE ROLE
    => CREATE ROLE logwriter;
    CREATE ROLE
    
  3. Grant read-only privileges on applog to logreader, and write privileges to logwriter:

    => GRANT SELECT ON applog TO logreader;
    GRANT PRIVILEGE
    => GRANT INSERT ON applog TO logwriter;
    GRANT PRIVILEGE
    

Revoking privileges from roles

Use REVOKE statements to revoke a privilege from a role. Revoking a privilege from a role immediately affects active user sessions. When you revoke a privilege from a role, it is no longer available to users who have the privilege through that role.

For example:

=> REVOKE INSERT ON applog FROM logwriter;
REVOKE PRIVILEGE

4.2.5 - Granting database roles

You can assign one or more roles to a user or another role with GRANT (Role):.

You can assign one or more roles to a user or another role with GRANT (Role):

GRANT role[,...] TO grantee[,...] [ WITH ADMIN OPTION ]

For example, you might create three roles—appdata, applogs, and appadmin—and grant appadmin to user bob:

=> CREATE ROLE appdata;
CREATE ROLE
=> CREATE ROLE applogs;
CREATE ROLE
=> CREATE ROLE appadmin;
CREATE ROLE
=> GRANT appadmin TO bob;
GRANT ROLE

Granting roles to another role

GRANT can assign one or more roles to another role. For example, the following GRANT statement grants roles appdata and applogs to role appadmin:

=> GRANT appdata, applogs TO appadmin;
 -- grant to other roles
GRANT ROLE

Because user bob was previously assigned the role appadmin, he now has all privileges that are granted to roles appdata and applogs.

When you grant one role to another role, Vertica checks for circular references. In the previous example, role appdata is assigned to the appadmin role. Thus, subsequent attempts to assign appadmin to appdata fail, returning with the following warning:

=> GRANT appadmin TO appdata;
WARNING:  Circular assignation of roles is not allowed
HINT:  Cannot grant appadmin to appdata
GRANT ROLE

Enabling roles

After granting a role to a user, the role must be enabled. You can enable a role for the current session:


=> SET ROLE appdata;
SET ROLE

You can also enable a role as part of the user's login, by modifying the user's profile with ALTER USER...DEFAULT ROLE:


=> ALTER USER bob DEFAULT ROLE appdata;
ALTER USER

For details, see Enabling roles and Enabling roles automatically.

Granting administrative privileges

You can delegate to non-superusers users administrative access to a role by qualifying the GRANT (Role) statement with the option WITH ADMIN OPTION. Users with administrative access can manage access to the role for other users, including granting them administrative access. In the following example, a superuser grants the appadmin role with administrative privileges to users bob and alice.

=> GRANT appadmin TO bob, alice WITH ADMIN OPTION;
GRANT ROLE

Now, both users can exercise their administrative privileges to grant the appadmin role to other users, or revoke it. For example, user bob can now revoke the appadmin role from user alice:


=> \connect - bob
You are now connected as user "bob".
=> REVOKE appadmin FROM alice;
REVOKE ROLE

Example

The following example creates a role called commenter and grants that role to user bob:

  1. Create the comments table:

    => CREATE TABLE comments (id INT, comment VARCHAR);
    
  2. Create the commenter role:

    => CREATE ROLE commenter;
    
  3. Grant to commenter INSERT and SELECT privileges on the comments table:

    => GRANT INSERT, SELECT ON comments TO commenter;
    
  4. Grant the commenter role to user bob.

    => GRANT commenter TO bob;
    
  5. In order to access the role and its associated privileges, bob enables the newly-granted role for himself:

    => \c - bob
    => SET ROLE commenter;
    
  6. Because bob has INSERT and SELECT privileges on the comments table, he can perform the following actions:

    => INSERT INTO comments VALUES (1, 'Hello World');
     OUTPUT
    --------
          1
    (1 row)
    => SELECT * FROM comments;
     id |   comment
    ----+-------------
      1 | Hello World
    (1 row)
    => COMMIT;
    COMMIT
    
  7. Because bob's role lacks DELETE privileges, the following statement returns an error:

    
    => DELETE FROM comments WHERE id=1;
    ERROR 4367:  Permission denied for relation comments
    

See also

Database privileges

4.2.6 - Revoking database roles

REVOKE (Role) can revoke roles from one or more grantees—that is, from users or roles:.

REVOKE (Role) can revoke roles from one or more grantees—that is, from users or roles:

REVOKE [ ADMIN OPTION FOR ] role[,...] FROM grantee[,...] [ CASCADE ]

For example, the following statement revokes the commenter role from user bob:

=> \c
You are now connected as user "dbadmin".
=> REVOKE commenter FROM bob;
REVOKE ROLE

Revoking administrative access from a role

You can qualify REVOKE (Role) with the clause ADMIN OPTION FOR. This clause revokes from the grantees the authority (granted by an earlier GRANT (Role)...WITH ADMIN OPTION statement) to grant the specified roles to other users or roles. Current roles for the grantees are unaffected.

The following example revokes user Alice's authority to grant and revoke the commenter role:

=> \c
You are now connected as user "dbadmin".
=> REVOKE ADMIN OPTION FOR commenter FROM alice;
REVOKE ROLE

4.2.7 - Enabling roles

When you enable a role in a session, you obtain all privileges assigned to that role.

When you enable a role in a session, you obtain all privileges assigned to that role. You can enable multiple roles simultaneously, thereby gaining all privileges of those roles, plus any privileges that are already granted to you directly.

By default, only predefined roles are enabled automatically for users. Otherwise, on starting a session, you must explicitly enable assigned roles with the Vertica function SET ROLE.

For example, the dbadmin creates the logreader role and assigns it to user alice:

=> \c
You are now connected as user "dbadmin".
=> CREATE ROLE logreader;
CREATE ROLE
=> GRANT SELECT ON TABLE applog to logreader;
GRANT PRIVILEGE
=> GRANT logreader TO alice;
GRANT ROLE

User alice must enable the new role before she can view the applog table:


=> \c - alice
You are now connected as user "alice".
=> SELECT * FROM applog;
ERROR:  permission denied for relation applog
=> SET ROLE logreader;
SET
=> SELECT * FROM applog;
 id | sourceID |            data            |                    event
----+----------+----------------------------+----------------------------------------------
  1 | Loader   | 2011-03-31 11:00:38.494226 | Error: Failed to open source file
  2 | Reporter | 2011-03-31 11:00:38.494226 | Warning: Low disk space on volume /scratch-a
(2 rows)

Enabling all user roles

You can enable all roles available to your user account with SET ROLE ALL:

=> SET ROLE ALL;
SET
=> SHOW ENABLED_ROLES;
     name      |           setting
---------------+------------------------------
 enabled roles | logreader, logwriter
(1 row)

Disabling roles

A user can disable all roles with SET ROLE NONE. This statement disables all roles for the current session, excluding predefined roles:

=> SET ROLE NONE;
=> SHOW ENABLED_ROLES;
     name      | setting
---------------+---------
 enabled roles |
(1 row)

4.2.8 - Enabling roles automatically

By default, new users are assigned the PUBLIC role, which is automatically enabled when a new session starts.

By default, new users are assigned the PUBLIC, which is automatically enabled when a new session starts. Typically, other roles are created and users are assigned to them, but these are not automatically enabled. Instead, users must explicitly enable their assigned roles with each new session, with SET ROLE.

You can automatically enable roles for users in two ways:

  • Enable roles for individual users on login

  • Enable all roles for all users on login

Enable roles for individual users

After assigning roles to users, you can set one or more default roles for each user by modifying their profiles, with ALTER USER...DEFAULT ROLE. User default roles are automatically enabled at the start of the user session. You should consider setting default roles for users if they typically rely on the privileges of those roles to carry out routine tasks.

The following example shows how to set regional_manager as the default role for user LilyCP:

=> \c
You are now connected as user "dbadmin".
=> GRANT regional_manager TO LilyCP;
GRANT ROLE
=> ALTER USER LilyCP DEFAULT ROLE regional_manager;
ALTER USER
=> \c - LilyCP
You are now connected as user "LilyCP".
=> SHOW ENABLED_ROLES;
     name      |     setting
---------------+------------------
 enabled roles | regional_manager
(1 row)

Enable all roles for all users

Configuration parameter EnableAllRolesOnLogin specifies whether to enable all roles for all database users on login. By default, this parameter is set to 0. If set to 1, Vertica enables the roles of all users when they log in to the database.

Clearing default roles

You can clear all default role assignments for a user with ALTER USER...DEFAULT ROLE NONE. For example:

=> ALTER USER fred DEFAULT ROLE NONE;
ALTER USER
=> SELECT user_name, default_roles, all_roles FROM users WHERE user_name = 'fred';
 user_name | default_roles | all_roles
-----------+---------------+-----------
 fred      |               | logreader
(1 row)

4.2.9 - Viewing user roles

You can obtain information about roles in three ways:.

You can obtain information about roles in three ways:

Verifying role assignments

The function HAS_ROLE checks whether a Vertica role is granted to the specified user or role. Non-superusers can use this function to check their own role membership. Superusers can use it to determine role assignments for other users and roles. You can also use Management Console to check role assignments.

In the following example, a dbadmin user checks whether user MikeL is assigned the admnistrator role:

=> \c
You are now connected as user "dbadmin".
=> SELECT HAS_ROLE('MikeL', 'administrator');
 HAS_ROLE
----------
 t
(1 row)

User MikeL checks whether he has the regional_manager role:

=> \c - MikeL
You are now connected as user "MikeL".
=> SELECT HAS_ROLE('regional_manager');
 HAS_ROLE
----------
 f
(1 row)

The dbadmin grants the regional_manager role to the administrator role. On checking again, MikeL verifies that he now has the regional_manager role:

dbadmin=> \c
You are now connected as user "dbadmin".
dbadmin=> GRANT regional_manager to administrator;
GRANT ROLE
dbadmin=> \c - MikeL
You are now connected as user "MikeL".
dbadmin=> SELECT HAS_ROLE('regional_manager');
 HAS_ROLE
----------
 t
(1 row)

Viewing available and enabled roles

SHOW AVAILABLE ROLES lists all roles granted to you:

=> SHOW AVAILABLE ROLES;
      name       |           setting
-----------------+-----------------------------
 available roles | logreader, logwriter
(1 row)

SHOW ENABLED ROLES lists the roles enabled in your session:

=> SHOW ENABLED ROLES;
     name      | setting
---------------+----------
 enabled roles | logreader
(1 row)

Querying system tables

You can query tables ROLES, USERS, AND GRANTS, either separately or joined, to obtain detailed information about user roles, users assigned to those roles, and the privileges granted explicitly to users and implicitly through roles.

The following query on ROLES returns the names of all roles users can access, and the roles granted (assigned) to those roles. An asterisk (*) appended to a role indicates that the user can grant the role to other users:

=> SELECT * FROM roles;
      name       | assigned_roles
-----------------+----------------
 public          |
 dbduser         |
 dbadmin         | dbduser*
 pseudosuperuser | dbadmin*
 logreader       |
 logwriter       |
 logadmin        | logreader, logwriter
(7 rows)

The following query on system table USERS returns all users with the DBADMIN role. An asterisk (*) appended to a role indicates that the user can grant the role to other users:

=> SELECT user_name, is_super_user, default_roles, all_roles FROM v_catalog.users WHERE all_roles ILIKE '%dbadmin%';
 user_name | is_super_user |            default_roles             |              all_roles
-----------+---------------+--------------------------------------+--------------------------------------
 dbadmin   | t             | dbduser*, dbadmin*, pseudosuperuser* | dbduser*, dbadmin*, pseudosuperuser*
 u1        | f             |                                      | dbadmin*
 u2        | f             |                                      | dbadmin
(3 rows)

The following query on system table GRANTS returns the privileges granted to user Jane or role R1. An asterisk (*) appended to a privilege indicates that the user can grant the privilege to other users:

=> SELECT grantor,privileges_description,object_name,object_type,grantee FROM grants WHERE grantee='Jane' OR grantee='R1';
grantor | privileges_description | object_name | object_type  |  grantee
--------+------------------------+-------------+--------------+-----------
dbadmin | USAGE                  | general     | RESOURCEPOOL | Jane
dbadmin |                        | R1          | ROLE         | Jane
dbadmin | USAGE*                 | s1          | SCHEMA       | Jane
dbadmin | USAGE, CREATE*         | s1          | SCHEMA       | R1
(4 rows)

4.3 - Database privileges

When a database object is created, such as a schema, table, or view, ownership of that object is assigned to the user who created it.

When a database object is created, such as a schema, table, or view, ownership of that object is assigned to the user who created it. By default, only the object's owner, and users with superuser privileges such as database administrators, have privileges on a new object. Only these users (and other users whom they explicitly authorize) can grant object privileges to other users

Privileges are granted and revoked by GRANT and REVOKE statements, respectively. The privileges that can be granted on a given object are specific to its type. For example, table privileges include SELECT, INSERT, and UPDATE, while library and resource pool privileges have USAGE privileges only. For a summary of object privileges, see Database object privileges.

Because privileges on database objects can come from several different sources like explicit grants, roles, and inheritance, privileges can be difficult to monitor. Use the GET_PRIVILEGES_DESCRIPTION meta-function to check the current user's effective privileges across all sources on a specified database object.

4.3.1 - Ownership and implicit privileges

All users have implicit privileges on the objects that they own.

All users have implicit privileges on the objects that they own. On creating an object, its owner automatically is granted all privileges associated with the object's type (see Database object privileges). Regardless of object type, the following privileges are inseparable from ownership and cannot be revoked, not even by the owner:

  • Authority to grant all object privileges to other users, and revoke them

  • ALTER (where applicable) and DROP

  • Extension of privilege granting authority on their objects to other users, and revoking that authority

Object owners can revoke all non-implicit, or ordinary, privileges from themselves. For example, on creating a table, its owner is automatically granted all implicit and ordinary privileges:

Implicit table privileges Ordinary table privileges
ALTER
DROP
DELETE
INSERT
REFERENCES
SELECT
TRUNCATE
UPDATE

If user Joan creates table t1, she can revoke ordinary privileges UPDATE and INSERT from herself, which effectively makes this table read-only:


=> \c - Joan
You are now connected as user "Joan".
=> CREATE TABLE t1 (a int);
CREATE TABLE
=> INSERT INTO t1 VALUES (1);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> REVOKE UPDATE, INSERT ON TABLE t1 FROM Joan;
REVOKE PRIVILEGE
=> INSERT INTO t1 VALUES (3);
ERROR 4367:  Permission denied for relation t1
=> SELECT * FROM t1;
 a
---
 1
(1 row)

Joan can subsequently restore UPDATE and INSERT privileges to herself:


=> GRANT UPDATE, INSERT on TABLE t1 TO Joan;
GRANT PRIVILEGE
dbadmin=> INSERT INTO t1 VALUES (3);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
dbadmin=> SELECT * FROM t1;
 a
---
 1
 3
(2 rows)

4.3.2 - Inherited privileges

You can manage inheritance of privileges at three levels:.

You can manage inheritance of privileges at three levels:

  • Database

  • Schema

  • Tables, views, and models

By default, inherited privileges are enabled at the database level and disabled at the schema level. If privilege inheritance is enabled at both levels, newly created tables, views, and models automatically inherit their parent schema's privileges. You can also disable privilege inheritance on individual objects with the following statements:

4.3.2.1 - Enabling database inheritance

By default, inherited privileges are enabled at the database level.

By default, inherited privileges are enabled at the database level. You can toggle database-level inherited privileges with the DisableInheritedPrivileges configuration parameter.

To enable inherited privileges:

=> ALTER DATABASE database_name SET DisableInheritedPrivileges = 0;

To disable inherited privileges:

=> ALTER DATABASE database_name SET DisableInheritedPrivileges = 1;

4.3.2.2 - Enabling schema inheritance

By default, inherited privileges are disabled at the schema level.

By default, inherited privileges are disabled at the schema level. If inherited privileges are enabled at the database level, you can enable inheritance at the schema level with CREATE SCHEMA and ALTER SCHEMA. For example, the following statement creates the schema my_schema with schema inheritance enabled:

To create a schema with schema inheritance enabled:

=> CREATE SCHEMA my_schema DEFAULT INCLUDE PRIVILEGES;

To enable schema inheritance for an existing schema:

=> ALTER SCHEMA my_schema DEFAULT INCLUDE SCHEMA PRIVILEGES;

After schema-level privilege inheritance is enabled, privileges granted on the schema are automatically inherited by all newly created tables, views, and models in that schema. You can explicitly exclude a table, view, or model from privilege inheritance with the following statements:

For example, to prevent my_table from inheriting the privileges of my_schema:

=> ALTER TABLE my_schema.my_table EXCLUDE SCHEMA PRIVILEGES;

For information about which objects inherit privileges from which schemas, see INHERITING_OBJECTS.

For information about which privileges each object inherits, see INHERITED_PRIVILEGES.

Schema inheritance for existing objects

Enabling schema inheritance on an existing schema only affects newly created tables, views, and models in that schema. To allow an existing objects to inherit the privileges from their parent schema, you must explicitly set schema inheritance on each object with ALTER TABLE, ALTER VIEW, or ALTER MODEL.

For example, my_schema contains my_table, my_view, and my_model. Enabling schema inheritance on my_schema does not affect the privileges of my_table and my_view. The following statements explicitly set schema inheritance on these objects:

=> ALTER VIEW my_schema.my_view INCLUDE SCHEMA PRIVILEGES;
=> ALTER TABLE my_schema.my_table INCLUDE SCHEMA PRIVILEGES;
=> ALTER MODEL my_schema.my_model INCLUDE SCHEMA PRIVILEGES;

After enabling inherited privileges on a schema, you can grant privileges on it to users and roles with GRANT (schema). The specified user or role then implicitly has these same privileges on the objects in the schema:

=> GRANT USAGE, CREATE, SELECT, INSERT ON SCHEMA my_schema TO PUBLIC;
GRANT PRIVILEGE

See also

4.3.2.3 - Setting privilege inheritance on tables and views

If inherited privileges are enabled for the database and a schema, privileges granted to the schema are automatically granted to all new tables and views in it.

If inherited privileges are enabled for the database and a schema, privileges granted to the schema are automatically granted to all new tables and views in it. You can also explicitly exclude tables and views from inheriting schema privileges.

For information about which tables and views inherit privileges from which schemas, see INHERITING_OBJECTS.

For information about which privileges each table or view inherits, see the INHERITED_PRIVILEGES.

Set privileges inheritance on tables and views

CREATE TABLE/ALTER TABLE and CREATE VIEW/ALTER VIEW can allow tables and views to inherit privileges from their parent schemas. For example, the following statements enable inheritance on schema s1, so new table s1.t1 and view s1.myview automatically inherit the privileges set on that schema as applicable:

=> CREATE SCHEMA s1 DEFAULT INCLUDE PRIVILEGES;
CREATE SCHEMA
=> GRANT USAGE, CREATE, SELECT, INSERT ON SCHEMA S1 TO PUBLIC;
GRANT PRIVILEGE
=> CREATE TABLE s1.t1 ( ID int, f_name varchar(16), l_name(24));
WARNING 6978:  Table "t1" will include privileges from schema "s1"
CREATE TABLE
=> CREATE VIEW s1.myview AS SELECT ID, l_name FROM s1.t1
WARNING 6978:  View "myview" will include privileges from schema "s1"
CREATE VIEW

If the schema already exists, you can use ALTER SCHEMA to have all newly created tables and views inherit the privileges of the schema. Tables and views created on the schema before this statement, however, are not affected:

=> CREATE SCHEMA s2;
CREATE SCHEMA
=> CREATE TABLE s2.t22 ( a int );
CREATE TABLE
...
=> ALTER SCHEMA S2 DEFAULT INCLUDE PRIVILEGES;
ALTER SCHEMA

In this case, inherited privileges were enabled on schema s2 after it already contained table s2.t22. To set inheritance on this table and other existing tables and views, you must explicitly set schema inheritance on them with ALTER TABLE and ALTER VIEW:

=> ALTER TABLE s2.t22 INCLUDE SCHEMA PRIVILEGES;

Exclude privileges inheritance from tables and views

You can use CREATE TABLE/ALTER TABLE and CREATE VIEW/ALTER VIEW to prevent table and views from inheriting schema privileges.

The following example shows how to create a table that does not inherit schema privileges:

=> CREATE TABLE s1.t1 ( x int) EXCLUDE SCHEMA PRIVILEGES;

You can modify an existing table so it does not inherit schema privileges:

=> ALTER TABLE s1.t1 EXCLUDE SCHEMA PRIVILEGES;

4.3.2.4 - Example usage: implementing inherited privileges

The following steps show how user Joe enables inheritance of privileges on a given schema so other users can access tables in that schema.

The following steps show how user Joe enables inheritance of privileges on a given schema so other users can access tables in that schema.

  1. Joe creates schema schema1, and creates table table1 in it:

    
    =>\c - Joe
    You are now connected as user Joe
    => CREATE SCHEMA schema1;
    CRDEATE SCHEMA
    => CREATE TABLE schema1.table1 (id int);
    CREATE TABLE
    
  2. Joe grants USAGE and CREATE privileges on schema1 to Myra:

    
    => GRANT USAGE, CREATE ON SCHEMA schema1 to Myra;
    GRANT PRIVILEGE
    
  3. Myra queries schema1.table1, but the query fails:

    
    =>\c - Myra
    You are now connected as user Myra
    => SELECT * FROM schema1.table1;
    ERROR 4367: Permission denied for relation table1
    
  4. Joe grants Myra SELECT ON SCHEMA privileges on schema1:

    
    =>\c - Joe
    You are now connected as user Joe
    => GRANT SELECT ON SCHEMA schema1 to Myra;
    GRANT PRIVILEGE
    
  5. Joe uses ALTER TABLE to include SCHEMA privileges for table1:

    
    => ALTER TABLE schema1.table1 INCLUDE SCHEMA PRIVILEGES;
    ALTER TABLE
    
  6. Myra's query now succeeds:

    
    =>\c - Myra
    You are now connected as user Myra
    => SELECT * FROM schema1.table1;
    id
    ---
    (0 rows)
    
  7. Joe modifies schema1 to include privileges so all tables created in schema1 inherit schema privileges:

    
    =>\c - Joe
    You are now connected as user Joe
    => ALTER SCHEMA schema1 DEFAULT INCLUDE PRIVILEGES;
    ALTER SCHEMA
    => CREATE TABLE schema1.table2 (id int);
    CREATE TABLE
    
  8. With inherited privileges enabled, Myra can query table2 without Joe having to explicitly grant privileges on the table:

    
    =>\c - Myra
    You are now connected as user Myra
    => SELECT * FROM schema1.table2;
    id
    ---
    (0 rows)
    

4.3.3 - Default user privileges

To set the minimum level of privilege for all users, Vertica has the special PUBLIC role, which it grants to each user automatically.

To set the minimum level of privilege for all users, Vertica has the special PUBLIC, which it grants to each user automatically. This role is automatically enabled, but the database administrator or a superuser can also grant higher privileges to users separately using GRANT statements.

Default privileges for MC users

Privileges on Management Console (MC) are managed through roles, which determine a user's access to MC and to MC-managed Vertica databases through the MC interface. MC privileges do not alter or override Vertica privileges or roles. See Users, roles, and privileges in MC for details.

4.3.4 - Effective privileges

A user's effective privileges on an object encompass privileges of all types, including:.

A user's effective privileges on an object encompass privileges of all types, including:

You can view your effective privileges on an object with the GET_PRIVILEGES_DESCRIPTION meta-function.

4.3.5 - Privileges required for common database operations

This topic lists the required privileges for database objects in Vertica.

This topic lists the required privileges for database objects in Vertica.

Unless otherwise noted, superusers can perform all operations shown in the following tables. Object owners always can perform operations on their own objects.

Schemas

The PUBLIC schema is present in any newly-created Vertica database. Newly-created users must be granted access to this schema:

=> GRANT USAGE ON SCHEMA public TO user;

A database superuser must also explicitly grant new users CREATE privileges, as well as grant them individual object privileges so the new users can create or look up objects in the PUBLIC schema.

Operation Required Privileges
CREATE SCHEMA Database: CREATE
DROP SCHEMA Schema: owner
ALTER SCHEMA Database: CREATE

Tables

Operation Required Privileges
CREATE TABLE

Schema: CREATE

DROP TABLE Schema: USAGE or schema owner
TRUNCATE TABLE Schema: USAGE or schema owner
ALTER TABLE ADD/DROP/ RENAME/ALTER-TYPE COLUMN Schema: USAGE
ALTER TABLE ADD/DROP CONSTRAINT Schema: USAGE
ALTER TABLE PARTITION (REORGANIZE) Schema: USAGE
ALTER TABLE RENAME USAGE and CREATE privilege on the schema that contains the table
ALTER TABLE...SET SCHEMA
  • New schema: CREATE

  • Old Schema: USAGE

SELECT
  • Schema: USAGE

  • SELECT privilege on table

INSERT
  • Table: INSERT

  • Schema: USAGE

DELETE
  • Schema: USAGE

  • Table: DELETE, SELECT when executing DELETE that references table column values in a WHERE or SET clause

UPDATE
  • Schema: USAGE

  • Table: UPDATE, SELECT when executing UPDATE that references table column values in a WHERE or SET clause

REFERENCES
  • Schema: USAGE on schema that contains constrained table and source of foreign key

  • Table: REFERENCES to create foreign key constraints that reference this table

ANALYZE_STATISTICS
ANALYZE_STATISTICS_PARTITION
  • Schema: USAGE

  • Table: One of INSERT, DELETE, or UPDATE

DROP_STATISTICS
  • Schema: USAGE

  • Table: One of INSERT, DELETE, or UPDATE

DROP_PARTITIONS Schema: USAGE

Views

Operation Required Privileges
CREATE VIEW
  • Schema: CREATE on view schema, USAGE on schema with base objects

  • Base objects: SELECT

DROP VIEW
  • Schema: USAGE or owner

  • View: Owner

SELECT
  • Base table: View owner must have SELECT...WITH GRANT OPTION

  • Schema: USAGE

  • View: SELECT

Projections

Operation Required Privileges
CREATE PROJECTION
  • Anchor table: SELECT

  • Schema: USAGE and CREATE, or owner

AUTO/DELAYED PROJECTION

On projections created during INSERT...SELECT or COPY operations:

  • Schema: USAGE

  • Anchor table: SELECT

ALTER PROJECTION Schema: USAGE and CREATE
DROP PROJECTION Schema: USAGE or owner

External procedures

Operation Required Privileges
CREATE PROCEDURE (external) Superuser
DROP PROCEDURE (external) Superuser
EXECUTE
  • Schema: USAGE

  • Procedure: EXECUTE

Stored procedures

Operation Required Privileges
CREATE PROCEDURE (stored) Schema: CREATE

Triggers

Operation Required Privileges
CREATE TRIGGER Superuser

Schedules

Operation Required Privileges
CREATE SCHEDULE Superuser

Libraries

Operation Required Privileges
CREATE LIBRARY Superuser
DROP LIBRARY Superuser

User-defined functions

Operation Required Privileges
CREATE FUNCTION (SQL)CREATE FUNCTION (scalar)
CREATE TRANSFORM FUNCTION
CREATE ANALYTIC FUNCTION (UDAnF)
CREATE AGGREGATE FUNCTION (UDAF)
  • Schema: CREATE

  • Base library: USAGE (if applicable)

DROP FUNCTION
DROP TRANSFORM FUNCTION
DROP AGGREGATE FUNCTION
DROP ANALYTIC FUNCTION
  • Schema: USAGE privilege

  • Function: owner

ALTER FUNCTION (scalar)...RENAME TO Schema: USAGE and CREATE
ALTER FUNCTION (scalar)...SET SCHEMA
  • Old schema: USAGE

  • New Schema: CREATE

EXECUTE (SQL/UDF/UDT/ ADAF/UDAnF) function
  • Schema: USAGE

  • Function: EXECUTE

Sequences

Operation Required Privileges
CREATE SEQUENCE Schema: CREATE
DROP SEQUENCE Schema: USAGE or owner
ALTER SEQUENCE Schema: USAGE and CREATE
ALTER SEQUENCE...SET SCHEMA
  • Old schema: USAGE

  • New schema: CREATE

CURRVAL
NEXTVAL
  • Sequence schema: USAGE

  • Sequence: SELECT

Resource pools

Operation Required Privileges
CREATE RESOURCE POOL Superuser
ALTER RESOURCE POOL

Superuser to alter:

  • MAXMEMORYSIZE

  • PRIORITY

  • QUEUETIMEOUT

Non-superuser, UPDATE to alter:

  • PLANNEDCONCURRENCY

  • SINGLEINITIATOR

  • MAXCONCURRENCY

SET SESSION RESOURCE_POOL
  • Resource pool: USAGE

  • Users can only change their own resource pool setting using ALTER USER syntax

DROP RESOURCE POOL Superuser

Users/profiles/roles

Operation Required Privileges
CREATE USER
CREATE PROFILE
CREATE ROLE
Superuser
ALTER USER
ALTER PROFILE
ALTER ROLE
Superuser
DROP USER
DROP PROFILE
DROP ROLE
Superuser

Object visibility

You can use one or a combination of vsql \d meta commands and SQL system tables to view objects on which you have privileges to view.

  • Use \dn to view schema names and owners

  • Use \dt to view all tables in the database, as well as the system table V_CATALOG.TABLES

  • Use \dj to view projections showing the schema, projection name, owner, and node, as well as the system table V_CATALOG.PROJECTIONS

Operation Required Privileges
Look up schema Schema: At least one privilege
Look up object in schema or in system tables
  • Schema: USAGE

  • At least one privilege on any of the following objects:

    • TABLE

    • VIEW

    • FUNCTION

    • PROCEDURE

    • SEQUENCE

Look up projection

All anchor tables: At least one privilege

Schema (all anchor tables): USAGE

Look up resource pool Resource pool: SELECT
Existence of object Schema: USAGE

I/O operations

Operation Required Privileges
CONNECT TO VERTICADISCONNECT None
EXPORT TO VERTICA
  • Source table: SELECT

  • Source schema: USAGE

  • Destination table: INSERT

  • Destination schema: USAGE

COPY FROM VERTICA
  • Source/destination schema: USAGE

  • Source table: SELECT

  • Destination table: INSERT

COPY FROM file Superuser
COPY FROM STDIN
  • Schema: USAGE

  • Table: INSERT

COPY LOCAL
  • Schema: USAGE

  • Table: INSERT

Comments

Operation Required Privileges

COMMENT ON { is one of }:

Object owner or superuser

Transactions

Operation Required Privileges
COMMIT None
ROLLBACK None
RELEASE SAVEPOINT None
SAVEPOINT None

Sessions

Operation Required Privileges

SET { is one of }:

None
SHOW { name | ALL } None

Tuning operations

Operation Required Privileges
PROFILE Same privileges required to run the query being profiled
EXPLAIN Same privileges required to run the query for which you use the EXPLAIN keyword

TLS configuration

Operation Required Privileges
ALTER ALTER privileges on the TLS Configuration
DROP DROP privileges on the TLS Configuration
Add certificates to a TLS Configuration USAGE on the certificate's private key

Cryptographic key

Operation Required Privileges
Create a certificate from the key USAGE privileges on the key
DROP DROP privileges on the key

Certificate

Operation Required Privileges
Add certificate to TLS Configuration

ALTER privileges on the TLS Configuration and one of the following:

  • USAGE privileges on the certificate

  • USAGE privileges on the certificate's private key

DROP DROP privileges on the certificate's private key

4.3.6 - Database object privileges

Privileges can be granted explicitly on most user-visible objects in a Vertica database, such as tables and models.

Privileges can be granted explicitly on most user-visible objects in a Vertica database, such as tables and models. For some objects such as projections, privileges are implicitly derived from other objects.

Explicitly granted privileges

The following table provides an overview of privileges that can be explicitly granted on Vertica database objects:

Database Object Privileges
ALTER DROP CREATE DELETE EXECUTE INSERT READ REFERENCES SELECT TEMP TRUNCATE UPDATE USAGE WRITE
Database
Schema ! ! ! ! ! ! ! !
Table
View
Sequence
Procedure
User-defined function
Model
Library
Resource Pool
Storage Location
Key
TLS Configuration

Implicitly granted privileges

Metadata privileges

Superusers have unrestricted access to all non-cryptographic database metadata. For non-superusers, access to the metadata of specific objects depends on their privileges on those objects:

Metadata User access

Catalog objects:

  • Tables

  • Columns

  • Constraints

  • Sequences

  • External procedures

  • Projections

  • ROS containers

Users must possess USAGE privilege on the schema and any type of access (SELECT) or modify privilege on the object to see catalog metadata about the object.

For internal objects such as projections and ROS containers, which have no access privileges directly associated with them, you must have the requisite privileges on the associated schema and tables to view their metadata. For example, to determine whether a table has any projection data, you must have USAGE on the table schema and SELECT on the table.

User sessions and functions, and system tables related to these sessions

Non-superusers can access information about their own (current) sessions only, using the following functions:

Projection privileges

Projections, which store table data, do not have an owner or privileges directly associated with them. Instead, the privileges to create, access, or alter a projection are derived from the privileges that are set on its anchor tables and respective schemas.

Cryptographic privileges

Unless they have ownership, superusers only have implicit DROP privileges on keys, certificates, and TLS Configurations. This allows superusers to see the existence of these objects in their respective system tables (CRYPTOGRAPHIC_KEYS, CERTIFICATES, and TLS_CONFIGURATIONS) and DROP them, but does not allow them to see the key or certificate texts.

For details on granting additional privileges, see GRANT (key) and GRANT (TLS configuration).

4.3.7 - Granting and revoking privileges

Vertica supports GRANT and REVOKE statements to control user access to database objects—for example, GRANT (Schema) and REVOKE (Schema), GRANT (Table) and REVOKE (Table), and so on.

Vertica supports GRANT and REVOKE statements to control user access to database objects—for example, GRANT (schema) and REVOKE (schema), GRANT (table) and REVOKE (table), and so on. Typically, a superuser creates users and roles shortly after creating the database, and then uses GRANT statements to assign them privileges.

Where applicable, GRANT statements require USAGE privileges on the object schema. The following users can grant and revoke privileges:

  • Superusers: all privileges on all database objects, including the database itself

  • Non-superusers: all privileges on objects that they own

  • Grantees of privileges that include WITH GRANT OPTION: the same privileges on that object

In the following example, a dbadmin (with superuser privileges) creates user Carol. Subsequent GRANT statements grant Carol schema and table privileges:

  • CREATE and USAGE privileges on schema PUBLIC

  • SELECT, INSERT, and UPDATE privileges on table public.applog. This GRANT statement also includes WITH GRANT OPTION. This enables Carol to grant the same privileges on this table to other users —in this case, SELECT privileges to user Tom:

=> CREATE USER Carol;
CREATE USER
=> GRANT CREATE, USAGE ON SCHEMA PUBLIC to Carol;
GRANT PRIVILEGE
=> GRANT SELECT, INSERT, UPDATE ON TABLE public.applog TO Carol WITH GRANT OPTION;
GRANT PRIVILEGE
=> GRANT SELECT ON TABLE public.applog TO Tom;
GRANT PRIVILEGE

4.3.7.1 - Superuser privileges

A Vertica superuser is a database user—by default, named dbadmin—that is automatically created on installation.

A Vertica superuser is a database user—by default, named dbadmin—that is automatically created on installation. Vertica superusers have complete and irrevocable authority over database users, privileges, and roles.

Superusers can change the privileges of any user and role, as well as override any privileges that are granted by users with the PSEUDOSUPERUSER role. They can also grant and revoke privileges on any user-owned object, and reassign object ownership.

Cryptographic privileges

For most catalog objects, superusers have all possible privileges. However, for keys, certificates, and TLS Configurations superusers only get DROP privileges by default and must be granted the other privileges by their owners. For details, see GRANT (key) and GRANT (TLS configuration).

Superusers can see the existence of all keys, certificates, and TLS Configurations, but they cannot see the text of keys or certificates unless they are granted USAGE privileges.

See also

DBADMIN

4.3.7.2 - Schema owner privileges

The schema owner is typically the user who creates the schema.

The schema owner is typically the user who creates the schema. By default, the schema owner has privileges to create objects within a schema. The owner can also alter the schema: reassign ownership, rename it, and enable or disable inheritance of schema privileges.

Schema ownership does not necessarily grant the owner access to objects in that schema. Access to objects depends on the privileges that are granted on them.

All other users and roles must be explicitly granted access to a schema by its owner or a superuser.

4.3.7.3 - Object owner privileges

The database, along with every object in it, has an owner.

The database, along with every object in it, has an owner. The object owner is usually the person who created the object, although a superuser can alter ownership of objects, such as table and sequence.

Object owners must have appropriate schema privilege to access, alter, rename, move or drop any object it owns without any additional privileges.

An object owner can also:

  • Grant privileges on their own object to other users

    The WITH GRANT OPTION clause specifies that a user can grant the permission to other users. For example, if user Bob creates a table, Bob can grant privileges on that table to users Ted, Alice, and so on.

  • Grant privileges to roles

    Users who are granted the role gain the privilege.

4.3.7.4 - Granting privileges

As described in Granting and Revoking Privileges, specific users grant privileges using the GRANT statement with or without the optional WITH GRANT OPTION, which allows the user to grant the same privileges to other users.

As described in Granting and revoking privileges, specific users grant privileges using the GRANT statement with or without the optional WITH GRANT OPTION, which allows the user to grant the same privileges to other users.

  • A superuser can grant privileges on all object types to other users.

  • A superuser or object owner can grant privileges to roles. Users who have been granted the role then gain the privilege.

  • An object owner can grant privileges on the object to other users using the optional WITH GRANT OPTION clause.

  • The user needs to have USAGE privilege on schema and appropriate privileges on the object.

When a user grants an explicit list of privileges, such as GRANT INSERT, DELETE, REFERENCES ON applog TO Bob:

  • The GRANT statement succeeds only if all the roles are granted successfully. If any grant operation fails, the entire statement rolls back.

  • Vertica will return ERROR if the user does not have grant options for the privileges listed.

When a user grants ALL privileges, such as GRANT ALL ON applog TO Bob, the statement always succeeds. Vertica grants all the privileges on which the grantor has the WITH GRANT OPTION and skips those privileges without the optional WITH GRANT OPTION.

For example, if the user Bob has delete privileges with the optional grant option on the applog table, only DELETE privileges are granted to Bob, and the statement succeeds:

=> GRANT DELETE ON applog TO Bob WITH GRANT OPTION;GRANT PRIVILEGE

For details, see the GRANT statements.

4.3.7.5 - Revoking privileges

The following non-superusers can revoke privileges on an object:.

The following non-superusers can revoke privileges on an object:

  • Object owner

  • Grantor of the object privileges

The user also must have USAGE privilege on the object's schema.

For example, the following query on system table V_CATALOG.GRANTS shows that users u1, u2, and u3 have the following privileges on schema s1 and table s1.t1:

 => SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
     WHERE object_name IN ('s1', 't1') AND grantee IN ('u1', 'u2', 'u3');
object_type | object_name | grantee | grantor |  privileges_description
-------------+-------------+---------+---------+---------------------------
 SCHEMA      | s1          | u1      | dbadmin | USAGE, CREATE
 SCHEMA      | s1          | u2      | dbadmin | USAGE, CREATE
 SCHEMA      | s1          | u3      | dbadmin | USAGE
 TABLE       | t1          | u1      | dbadmin | INSERT*, SELECT*, UPDATE*
 TABLE       | t1          | u2      | u1      | INSERT*, SELECT*, UPDATE*
 TABLE       | t1          | u3      | u2      | SELECT*
(6 rows)

In the following statements, u2 revokes the SELECT privileges that it granted on s1.t1 to u3 . Subsequent attempts by u3 to query this table return an error:

=> \c - u2
You are now connected as user "u2".
=> REVOKE SELECT ON s1.t1 FROM u3;
REVOKE PRIVILEGE
=> \c - u3
You are now connected as user "u2".
=> SELECT * FROM s1.t1;
ERROR 4367:  Permission denied for relation t1

Revoking grant option

If you revoke privileges on an object from a user, that user can no longer act as grantor of those same privileges to other users. If that user previously granted the revoked privileges to other users, the REVOKE statement must include the CASCADE option to revoke the privilege from those users too; otherwise, it returns with an error.

For example, user u2 can grant SELECT, INSERT, and UPDATE privileges, and grants those privileges to user u4:

=> \c - u2
You are now connected as user "u2".
=> GRANT SELECT, INSERT, UPDATE on TABLE s1.t1 to u4;
GRANT PRIVILEGE

If you query V_CATALOG.GRANTS for privileges on table s1.t1, it returns the following result set:

=> \ c
You are now connected as user "dbadmin".
=> SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
     WHERE object_name IN ('t1') ORDER BY grantee;
 object_type | object_name | grantee | grantor |                   privileges_description
-------------+-------------+---------+---------+------------------------------------------------------------
 TABLE       | t1          | dbadmin | dbadmin | INSERT*, SELECT*, UPDATE*, DELETE*, REFERENCES*, TRUNCATE*
 TABLE       | t1          | u1      | dbadmin | INSERT*, SELECT*, UPDATE*
 TABLE       | t1          | u2      | u1      | INSERT*, SELECT*, UPDATE*
 TABLE       | t1          | u4      | u2      | INSERT, SELECT, UPDATE
(3 rows)

Now, if user u1 wants to revoke UPDATE privileges from user u2, the revoke operation must cascade to user u4, who also has UPDATE privileges that were granted by u2; otherwise, the REVOKE statement returns with an error:

=> \c - u1
=> REVOKE update ON TABLE s1.t1 FROM u2;
ROLLBACK 3052:  Dependent privileges exist
HINT:  Use CASCADE to revoke them too
=> REVOKE update ON TABLE s1.t1 FROM u2 CASCADE;
REVOKE PRIVILEGE
=> \c
You are now connected as user "dbadmin".
=>  SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
     WHERE object_name IN ('t1') ORDER BY grantee;
 object_type | object_name | grantee | grantor |                   privileges_description
-------------+-------------+---------+---------+------------------------------------------------------------
 TABLE       | t1          | dbadmin | dbadmin | INSERT*, SELECT*, UPDATE*, DELETE*, REFERENCES*, TRUNCATE*
 TABLE       | t1          | u1      | dbadmin | INSERT*, SELECT*, UPDATE*
 TABLE       | t1          | u2      | u1      | INSERT*, SELECT*
 TABLE       | t1          | u4      | u2      | INSERT, SELECT
(4 rows)

You can also revoke grantor privileges from a user without revoking those privileges. For example, user u1 can prevent user u2 from granting INSERT privileges to other users, but allow user u2 to retain that privilege:

=> \c - u1
You are now connected as user "u1".
=> REVOKE GRANT OPTION FOR INSERT ON TABLE s1.t1 FROM U2 CASCADE;
REVOKE PRIVILEGE

You can confirm results of the revoke operation by querying V_CATALOG.GRANTS for privileges on table s1.t1:


=> \c
You are now connected as user "dbadmin".
=> SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
      WHERE object_name IN ('t1') ORDER BY grantee;
 object_type | object_name | grantee | grantor |                   privileges_description
-------------+-------------+---------+---------+------------------------------------------------------------
 TABLE       | t1          | dbadmin | dbadmin | INSERT*, SELECT*, UPDATE*, DELETE*, REFERENCES*, TRUNCATE*
 TABLE       | t1          | u1      | dbadmin | INSERT*, SELECT*, UPDATE*
 TABLE       | t1          | u2      | u1      | INSERT, SELECT*
 TABLE       | t1          | u4      | u2      | SELECT
(4 rows)

The query results show:

  • User u2 retains INSERT privileges on the table but can no longer grant INSERT privileges to other users (as indicated by absence of an asterisk).

  • The revoke operation cascaded down to grantee u4, who now lacks INSERT privileges.

See also

REVOKE (table)

4.3.7.6 - Privilege ownership chains

The ability to revoke privileges on objects can cascade throughout an organization.

The ability to revoke privileges on objects can cascade throughout an organization. If the grant option was revoked from a user, the privilege that this user granted to other users will also be revoked.

If a privilege was granted to a user or role by multiple grantors, to completely revoke this privilege from the grantee the privilege has to be revoked by each original grantor. The only exception is a superuser may revoke privileges granted by an object owner, with the reverse being true, as well.

In the following example, the SELECT privilege on table t1 is granted through a chain of users, from a superuser through User3.

  • A superuser grants User1 CREATE privileges on the schema s1:

    => \c - dbadmin
    You are now connected as user "dbadmin".
    => CREATE USER User1;
    CREATE USER
    => CREATE USER User2;
    CREATE USER
    => CREATE USER User3;
    CREATE USER
    => CREATE SCHEMA s1;
    CREATE SCHEMA
    => GRANT USAGE on SCHEMA s1 TO User1, User2, User3;
    GRANT PRIVILEGE
    => CREATE ROLE reviewer;
    CREATE ROLE
    => GRANT CREATE ON SCHEMA s1 TO User1;
    GRANT PRIVILEGE
    
  • User1 creates new table t1 within schema s1 and then grants SELECT WITH GRANT OPTION privilege on s1.t1 to User2:

    => \c - User1
    You are now connected as user "User1".
    => CREATE TABLE s1.t1(id int, sourceID VARCHAR(8));
    CREATE TABLE
    => GRANT SELECT on s1.t1 to User2 WITH GRANT OPTION;
    GRANT PRIVILEGE
    
  • User2 grants SELECT WITH GRANT OPTION privilege on s1.t1 to User3:

    => \c - User2
    You are now connected as user "User2".
    => GRANT SELECT on s1.t1 to User3 WITH GRANT OPTION;
    GRANT PRIVILEGE
    
  • User3 grants SELECT privilege on s1.t1 to the reviewer role:

    => \c - User3
    You are now connected as user "User3".
    => GRANT SELECT on s1.t1 to reviewer;
    GRANT PRIVILEGE
    

Users cannot revoke privileges upstream in the chain. For example, User2 did not grant privileges on User1, so when User1 runs the following REVOKE command, Vertica rolls back the command:

=> \c - User2
You are now connected as user "User2".
=> REVOKE CREATE ON SCHEMA s1 FROM User1;
ROLLBACK 0:  "CREATE" privilege(s) for schema "s1" could not be revoked from "User1"

Users can revoke privileges indirectly from users who received privileges through a cascading chain, like the one shown in the example above. Here, users can use the CASCADE option to revoke privileges from all users "downstream" in the chain. A superuser or User1 can use the CASCADE option to revoke the SELECT privilege on table s1.t1 from all users. For example, a superuser or User1 can execute the following statement to revoke the SELECT privilege from all users and roles within the chain:

=> \c - User1
You are now connected as user "User1".
=> REVOKE SELECT ON s1.t1 FROM User2 CASCADE;
REVOKE PRIVILEGE

When a superuser or User1 executes the above statement, the SELECT privilege on table s1.t1 is revoked from User2, User3, and the reviewer role. The GRANT privilege is also revoked from User2 and User3, which a superuser can verify by querying the V_CATALOG.GRANTS system table.

=> SELECT * FROM grants WHERE object_name = 's1' AND grantee ILIKE 'User%';
 grantor | privileges_description | object_schema | object_name | grantee
---------+------------------------+---------------+-------------+---------
 dbadmin | USAGE                  |               | s1          | User1
 dbadmin | USAGE                  |               | s1          | User2
 dbadmin | USAGE                  |               | s1          | User3
(3 rows)

4.3.8 - Modifying privileges

A or object owner can use one of the ALTER statements to modify a privilege, such as changing a sequence owner or table owner.

A superuser or object owner can use one of the ALTER statements to modify a privilege, such as changing a sequence owner or table owner. Reassignment to the new owner does not transfer grants from the original owner to the new owner; grants made by the original owner are dropped.

4.3.9 - Viewing privileges granted on objects

You can view information about privileges, grantors, grantees, and objects by querying these system tables:.

You can view information about privileges, grantors, grantees, and objects by querying these system tables:

An asterisk (*) appended to a privilege indicates that the user can grant the privilege to other users.

You can also view the effective privileges on a specified database object by using the GET_PRIVILEGES_DESCRIPTION meta-function.

Viewing explicitly granted privileges

To view explicitly granted privileges on objects, query the GRANTS table.

The following query returns the explicit privileges for the schema, myschema.

=> SELECT grantee, privileges_description FROM grants WHERE object_name='myschema';
 grantee | privileges_description
---------+------------------------
 Bob     | USAGE, CREATE
 Alice   | CREATE
 (2 rows)

Viewing inherited privileges

To view which tables and views inherit privileges from which schemas, query the INHERITING_OBJECTS table.

The following query returns the tables and views that inherit their privileges from their parent schema, customers.

=> SELECT * FROM inheriting_objects WHERE object_schema='customers';
     object_id     |     schema_id     | object_schema |  object_name  | object_type
-------------------+-------------------+---------------+---------------+-------------
 45035996273980908 | 45035996273980902 | customers     | cust_info     | table
 45035996273980984 | 45035996273980902 | customers     | shipping_info | table
 45035996273980980 | 45035996273980902 | customers     | cust_set      | view
 (3 rows)

To view the specific privileges inherited by tables and views and information on their associated grant statements, query the INHERITED_PRIVILEGES table.

The following query returns the privileges that the tables and views inherit from their parent schema, customers.

=> SELECT object_schema,object_name,object_type,privileges_description,principal,grantor FROM inherited_privileges WHERE object_schema='customers';
 object_schema |  object_name  | object_type |                          privileges_description                           | principal | grantor
---------------+---------------+-------------+---------------------------------------------------------------------------+-----------+---------
 customers     | cust_info     | Table       | INSERT*, SELECT*, UPDATE*, DELETE*, ALTER*, REFERENCES*, DROP*, TRUNCATE* | dbadmin   | dbadmin
 customers     | shipping_info | Table       | INSERT*, SELECT*, UPDATE*, DELETE*, ALTER*, REFERENCES*, DROP*, TRUNCATE* | dbadmin   | dbadmin
 customers     | cust_set      | View        | SELECT*, ALTER*, DROP*                                                    | dbadmin   | dbadmin
 customers     | cust_info     | Table       | SELECT                                                                    | Val       | dbadmin
 customers     | shipping_info | Table       | SELECT                                                                    | Val       | dbadmin
 customers     | cust_set      | View        | SELECT                                                                    | Val       | dbadmin
 customers     | cust_info     | Table       | INSERT                                                                    | Pooja     | dbadmin
 customers     | shipping_info | Table       | INSERT                                                                    | Pooja     | dbadmin
 (8 rows)

Viewing effective privileges on an object

To view the current user's effective privileges on a specified database object, user the GET_PRIVILEGES_DESCRIPTION meta-function.

In the following example, user Glenn has set the REPORTER role and wants to check his effective privileges on schema s1 and table s1.articles.

  • Table s1.articles inherits privileges from its schema (s1).

  • The REPORTER role has the following privileges:

    • SELECT on schema s1

    • INSERT WITH GRANT OPTION on table s1.articles

  • User Glenn has the following privileges:

    • UPDATE and USAGE on schema s1.

    • DELETE on table s1.articles.

GET_PRIVILEGES_DESCRIPTION returns the following effective privileges for Glenn on schema s1:

=> SELECT GET_PRIVILEGES_DESCRIPTION('schema', 's1');
   GET_PRIVILEGES_DESCRIPTION
--------------------------------
 SELECT, UPDATE, USAGE
(1 row)

GET_PRIVILEGES_DESCRIPTION returns the following effective privileges for Glenn on table s1.articles:


=> SELECT GET_PRIVILEGES_DESCRIPTION('table', 's1.articles');
   GET_PRIVILEGES_DESCRIPTION
--------------------------------
 INSERT*, SELECT, UPDATE, DELETE
(1 row)

See also

4.4 - Access policies

CREATE ACCESS POLICY lets you create access policies on tables that specify how much data certain users and roles can query from those tables.

CREATE ACCESS POLICY lets you create access policies on tables that specify how much data certain users and roles can query from those tables. Access policies typically prevent these users from viewing the data of specific columns and rows of a table. You can apply access policies to table columns and rows. If a table has access policies on both, Vertica filters row access policies first, then filters the column access policies.

You can create most access policies for any table type—columnar, external, or flex. (You cannot create column access policies on flex tables.) You can also create access policies on any column type, including joins.

4.4.1 - Creating column access policies

CREATE ACCESS POLICY can create access policies on individual table columns, one policy per column.

CREATE ACCESS POLICY can create access policies on individual table columns, one policy per column. Each column access policy lets you specify, for different users and roles, various levels of access to the data of that column. The column access expression can also specify how to render column data for users and roles.

The following example creates an access policy on the customer_address column in the client_dimension table. This access policy gives non-superusers with the administrator role full access to all data in that column, but masks customer address data from all other users:

=> CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address
-> CASE
-> WHEN ENABLED_ROLE('administrator') THEN customer_address
-> ELSE '**************'
-> END ENABLE;
CREATE ACCESS POLICY

Vertica uses this policy to determine the access it gives to users MaxineT and MikeL, who are assigned employee and administrator roles, respectively. When these users query the customer_dimension table, Vertica applies the column access policy expression as follows:

=> \c - MaxineT;
You are now connected as user "MaxineT".
=> SET ROLE employee;
SET
=> SELECT customer_type, customer_name, customer_gender, customer_address, customer_city FROM customer_dimension;
 customer_type |      customer_name      | customer_gender | customer_address |  customer_city
---------------+-------------------------+-----------------+------------------+------------------
 Individual    | Craig S. Robinson       | Male            | **************   | Fayetteville
 Individual    | Mark M. Kramer          | Male            | **************   | Joliet
 Individual    | Barbara S. Farmer       | Female          | **************   | Alexandria
 Individual    | Julie S. McNulty        | Female          | **************   | Grand Prairie
 ...

=> \c - MikeL
You are now connected as user "MikeL".
=> SET ROLE administrator;
SET
=> SELECT customer_type, customer_name, customer_gender, customer_address, customer_city FROM customer_dimension;
 customer_type |      customer_name      | customer_gender | customer_address |  customer_city
---------------+-------------------------+-----------------+------------------+------------------
 Individual    | Craig S. Robinson       | Male            | 138 Alden Ave    | Fayetteville
 Individual    | Mark M. Kramer          | Male            | 311 Green St     | Joliet
 Individual    | Barbara S. Farmer       | Female          | 256 Cherry St    | Alexandria
 Individual    | Julie S. McNulty        | Female          | 459 Essex St     | Grand Prairie
 ...

Restrictions

The following limitations apply to access policies:

  • A column can have only one access policy.

  • Column access policies cannot be set on columns of complex types other than native arrays.

  • Column access policies cannot be set for materialized columns on flex tables. While it is possible to set an access policy for the __raw__ column, doing so restricts access to the whole table.

  • Row access policies are invalid on temporary tables and tables with aggregate projections.

  • Access policy expressions cannot contain:

    • Subqueries

    • Aggregate functions

    • Analytic functions

    • User-defined transform functions (UDTF)

  • If the query optimizer cannot replace a deterministic expression that involves only constants with their computed values, it blocks all DML operations such as INSERT.

4.4.2 - Creating row access policies

CREATE ACCESS POLICY can create a single row access policy for a given table.

CREATE ACCESS POLICY can create a single row access policy for a given table. This policy lets you specify for different users and roles various levels of access to table row data. When a user launches a query, Vertica evaluates the access policy's WHERE expression against all table rows. The query returns with only those rows where the expression evaluates to true for the current user or role.

For example, you might want to specify different levels of access to table store.store_store_sales for four roles:

  • employee: Users with this role should only access sales records that identify them as the employee, in column employee_key. The following query shows how many sales records (in store.store_sales_fact) are associated with each user (in public.emp_dimension):

    => SELECT COUNT(sf.employee_key) AS 'Total Sales', sf.employee_key, ed.user_name FROM store.store_sales_fact sf
         JOIN emp_dimension ed ON sf.employee_key=ed.employee_key
         WHERE ed.job_title='Sales Associate' GROUP BY sf.employee_key, ed.user_name ORDER BY sf.employee_key
    
     Total Sales | employee_key |  user_name
    -------------+--------------+-------------
             533 |          111 | LucasLC
             442 |          124 | JohnSN
             487 |          127 | SamNS
             477 |          132 | MeghanMD
             545 |          140 | HaroldON
             ...
             563 |         1991 | MidoriMG
             367 |         1993 | ThomZM
    (318 rows)
    
  • regional_manager: Users with this role (public.emp_dimension) should only access sales records for the sales region that they manage (store.store_dimension):

    => SELECT distinct sd.store_region, ed.user_name, ed.employee_key, ed.job_title FROM store.store_dimension sd
         JOIN emp_dimension ed ON sd.store_region=ed.employee_region WHERE ed.job_title = 'Regional Manager';
     store_region | user_name | employee_key |    job_title
    --------------+-----------+--------------+------------------
     West         | JamesGD   |         1070 | Regional Manager
     South        | SharonDM  |         1710 | Regional Manager
     East         | BenOV     |          593 | Regional Manager
     MidWest      | LilyCP    |          611 | Regional Manager
     NorthWest    | CarlaTG   |         1058 | Regional Manager
     SouthWest    | MarcusNK  |          150 | Regional Manager
    (6 rows)
    
  • dbadmin and administrator: Users with these roles have unlimited access to all table data.

Given these users and the data associated with them, you can create a row access policy on store.store_store_sales that looks like this:

CREATE ACCESS POLICY ON store.store_sales_fact FOR ROWS WHERE
   (ENABLED_ROLE('employee')) AND (store.store_sales_fact.employee_key IN
     (SELECT employee_key FROM public.emp_dimension WHERE user_name=CURRENT_USER()))
   OR
   (ENABLED_ROLE('regional_manager')) AND (store.store_sales_fact.store_key IN
     (SELECT sd.store_key FROM store.store_dimension sd
      JOIN emp_dimension ed ON sd.store_region=ed.employee_region WHERE ed.user_name = CURRENT_USER()))
   OR ENABLED_ROLE('dbadmin')
   OR ENABLED_ROLE ('administrator')
ENABLE;

The following examples indicate the different levels of access that are available to users with the specified roles:

  • dbadmin has access to all rows in store.store_sales_fact:

    => \c
    You are now connected as user "dbadmin".
    => SELECT count(*) FROM store.store_sales_fact;
      count
    ---------
     5000000
    (1 row)
    
  • User LilyCP has the role of regional_manager, so she can access all sales data of the Midwest region that she manages:

    
    => \c - LilyCP;
    You are now connected as user "LilyCP".
    => SET ROLE regional_manager;
    SET
    => SELECT count(*) FROM store.store_sales_fact;
     count
    --------
     782272
    (1 row)
    
  • User SamRJ has the role of employee, so he can access only the sales data that he is associated with:

    
    => \c - SamRJ;
    You are now connected as user "SamRJ".
    => SET ROLE employee;
    SET
    => SELECT count(*) FROM store.store_sales_fact;
     count
    -------
       417
    (1 row)
    

Restrictions

The following limitations apply to row access policies:

  • A table can have only one row access policy.

  • Row access policies are invalid on the following tables:

    • Tables with aggregate projections

    • Temporary tables

    • System tables

    • Views

  • You cannot create directed queries on a table with a row access policy.

4.4.3 - Access policies and DML operations

By default, Vertica abides by a rule that a user can only edit what they can see.

By default, Vertica abides by a rule that a user can only edit what they can see. That is, you must be able to view all rows and columns in the table in their original values (as stored in the table) and in their originally defined data types to perform actions that modify data on a table. For example, if a column is defined as VARCHAR(9) and an access policy on that column specifies the same column as VARCHAR(10), users using the access policy will be unable to perform the following operations:

  • INSERT

  • UPDATE

  • DELETE

  • MERGE

  • COPY

You can override this behavior by specifying GRANT TRUSTED in a new or existing access policy. This option forces the access policy to defer entirely to explicit GRANT statements when assessing whether a user can perform the above operations.

You can view existing access policies with the ACCESS_POLICY system table.

Row access

On tables where a row access policy is enabled, you can only perform DML operations when the condition in the row access policy evaluates to TRUE. For example:

t1 appears as follows:

 A | B
---+---
 1 | 1
 2 | 2
 3 | 3

Create the following row access policy on t1:

=> CREATE ACCESS POLICY ON t1 for ROWS
WHERE enabled_role('manager')
OR
A<2
ENABLE;

With this policy enabled, the following behavior exists for users who want to perform DML operations:

  • A user with the manager role can perform DML on all rows in the table, because the WHERE clause in the policy evaluates to TRUE.

  • Users with non-manager roles can only perform a SELECT to return data in column A that has a value of less than two. If the access policy has to read the data in the table to confirm a condition, it does not allow DML operations.

Column access

On tables where a column access policy is enabled, you can perform DML operations if you can view the entire column in its originally defined type.

Suppose table t1 is created with the following data types and values:

=> CREATE TABLE t1 (A int, B int);
=> INSERT INTO t1 VALUES (1,2);
=> SELECT * FROM t1;
 A | B
---+---
 1 | 2
(1 row)

Suppose the following access policy is created, which coerces the data type of column A from INT to VARCHAR(20) at execution time.

=> CREATE ACCESS POLICY on t1 FOR column A A::VARCHAR(20) ENABLE;
Column "A" is of type int but expression in Access Policy is of type varchar(20). It will be coerced at execution time

In this case, u1 can view column A in its entirety, but because the active access policy doesn't specify column A's original data type, u1 cannot perform DML operations on column A.

=> \c - u1
You are now connected as user "u1".
=> SELECT A FROM t1;
 A
---
 1
(1 row)

=> INSERT INTO t1 VALUES (3);
ERROR 6538:  Unable to INSERT: "Access denied due to active access policy on table "t1" for column "A""

Overriding default behavior with GRANT TRUSTED

Specifying GRANT TRUSTED in an access policy overrides the default behavior ("users can only edit what they can see") and instructs the access policy to defer entirely to explicit GRANT statements when assessing whether a user can perform a DML operation.

GRANT TRUSTED is useful in cases where the form the data is stored in doesn't match its semantically "true" form.

For example, when integrating with Voltage SecureData, a common use case is storing encrypted data with VoltageSecureProtect, where decryption is left to a case expression in an access policy that calls VoltageSecureAccess. In this case, while the decrypted form is intuitively understood to be the data's "true" form, it's still stored in the table in its encrypted form; users who can view the decrypted data wouldn't see the data as it was stored and therefore wouldn't be able to perform DML operations. You can use GRANT TRUSTED to override this behavior and allow users to perform these operations if they have the grants.

In this example, the customer_info table contains columns for the customer first and last name and SSN. SSNs are sensitive and access to it should be controlled, so it is encrypted with VoltageSecureProtect as it is inserted into the table:


=> CREATE TABLE customer_info(first_name VARCHAR, last_name VARCHAR, ssn VARCHAR);
=> INSERT INTO customer_info SELECT 'Alice', 'Smith', VoltageSecureProtect('998-42-4910' USING PARAMETERS format='ssn');
=> INSERT INTO customer_info SELECT 'Robert', 'Eve', VoltageSecureProtect('899-28-1303' USING PARAMETERS format='ssn');
=> SELECT * FROM customer_info;
 first_name | last_name |     ssn
------------+-----------+-------------
 Alice      | Smith     | 967-63-8030
 Robert     | Eve       | 486-41-3371
(2 rows)

In this system, the role "trusted_ssn" identifies privileged users for which Vertica will decrypt the values of the "ssn" column with VoltageSecureAccess. To allow these privileged users to perform DML operations for which they have grants, you might use the following access policy:

=> CREATE ACCESS POLICY ON customer_info FOR COLUMN ssn
  CASE WHEN enabled_role('trusted_ssn') THEN VoltageSecureAccess(ssn USING PARAMETERS format='ssn')
  ELSE ssn END
  GRANT TRUSTED
  ENABLE;

Again, note that GRANT TRUSTED allows all users with GRANTs on the table to perform the specified operations, including users without the "trusted_ssn" role.

4.4.4 - Access policies and query optimization

Access policies affect the projection designs that the Vertica Database Designer produces, and the plans that the optimizer creates for query execution.

Access policies affect the projection designs that the Vertica Database Designer produces, and the plans that the optimizer creates for query execution.

Projection designs

When Database Designer creates projections for a given table, it takes into account access policies that apply to the current user. The set of projections that Database Designer produces for the table are optimized for that user's access privileges, and other users with similar access privileges. However, these projections might be less than optimal for users with different access privileges. These differences might have some effect on how efficiently Vertica processes queries for the second group of users. When you evaluate projection designs for a table, choose a design that optimizes access for all authorized users.

Query rewrite

The Vertica optimizer enforces access policies by rewriting user queries in its query plan, which can affect query performance. For example, the clients table has row and column access policies, both enabled. When a user queries this table, the query optimizer produces a plan that rewrites the query so it includes both policies:

=> SELECT * FROM clients;

The query optimizer produces a query plan that rewrites the query as follows:

SELECT * FROM (
SELECT custID, password, CASE WHEN enabled_role('manager') THEN SSN ELSE substr(SSN, 8, 4) END AS SSN FROM clients
WHERE enabled_role('broker') AND
  clients.clientID IN (SELECT brokers.clientID FROM brokers WHERE broker_name = CURRENT_USER())
) clients;

4.4.5 - Managing access policies

By default, you can only manage access policies on tables that you own.

By default, you can only manage access policies on tables that you own. You can optionally restrict access policy management to superusers with the AccessPolicyManagementSuperuserOnly parameter (false by default):

=> ALTER DATABASE DEFAULT SET PARAMETER AccessPolicyManagementSuperuserOnly = 1;
ALTER DATABASE

You can view and manage access policies for tables in several ways:

Viewing access policies

You can view access policies in two ways:

  • Query system table ACCESS_POLICY. For example, the following query returns all access policies on table public.customer_dimension:

    => \x
    => SELECT policy_type, is_policy_enabled, table_name, column_name, expression FROM access_policy WHERE table_name = 'public.customer_dimension';
    -[ RECORD 1 ]-----+----------------------------------------------------------------------------------------
    policy_type       | Column Policy
    is_policy_enabled | Enabled
    table_name        | public.customer_dimension
    column_name       | customer_address
    expression        | CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END
    
  • Export table DDL from the database catalog with EXPORT_TABLES, EXPORT_OBJECTS, or EXPORT_CATALOG. For example:

    => SELECT export_tables('','customer_dimension');
                                    export_tables
    -----------------------------------------------------------------------------
    CREATE TABLE public.customer_dimension
    (
        customer_key int NOT NULL,
        customer_type varchar(16),
        customer_name varchar(256),
        customer_gender varchar(8),
        ...
        CONSTRAINT C_PRIMARY PRIMARY KEY (customer_key) DISABLED
    );
    
    CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END ENABLE;
    

Modifying access policy expression

ALTER ACCESS POLICY can modify the expression of an existing access policy. For example, you can modify the access policy in the earlier example by extending access to the dbadmin role:

=> ALTER ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address
    CASE WHEN enabled_role('dbadmin') THEN customer_address
         WHEN enabled_role('administrator') THEN customer_address
         ELSE '**************' END ENABLE;
ALTER ACCESS POLICY

Querying system table ACCESS_POLICY confirms this change:

=> SELECT policy_type, is_policy_enabled, table_name, column_name, expression FROM access_policy
  WHERE table_name = 'public.customer_dimension' AND column_name='customer_address';
-[ RECORD 1 ]-----+-------------------------------------------------------------------------------------------------------------------------------------------
policy_type       | Column Policy
is_policy_enabled | Enabled
table_name        | public.customer_dimension
column_name       | customer_address
expression        | CASE WHEN enabled_role('dbadmin') THEN customer_address WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END

Enabling and disabling access policies

Owners of a table can enable and disable its row and column access policies.

Row access policies

You enable and disable row access policies on a table:

ALTER ACCESS POLICY ON [schema.]table FOR ROWS { ENABLE | DISABLE }

The following examples disable and then re-enable the row access policy on table customer_dimension:

=> ALTER ACCESS POLICY ON customer_dimension FOR ROWS DISABLE;
ALTER ACCESS POLICY
=> ALTER ACCESS POLICY ON customer_dimension FOR ROWS ENABLE;
ALTER ACCESS POLICY

Column access policies

You enable and disable access policies on a table column as follows:

ALTER ACCESS POLICY ON [schema.]table FOR COLUMN column { ENABLE | DISABLE }

The following examples disable and then re-enable the same column access policy on customer_dimension.customer_address:

=> ALTER ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address DISABLE;
ALTER ACCESS POLICY
=> ALTER ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address ENABLE;
ALTER ACCESS POLICY

Copying access polices

You copy access policies from one table to another as follows. Non-superusers must have ownership of both the source and destination tables:

ALTER ACCESS POLICY ON [schema.]table { FOR COLUMN column | FOR ROWS } COPY TO TABLE table

When you create a copy of a table or move its contents with the following functions (but not CREATE TABLE AS SELECT or CREATE TABLE LIKE), the access policies of the original table are copied to the new/destination table:

To copy access policies to another table, use ALTER ACCESS POLICY.

For example, you can copy a row access policy as follows:

=> ALTER ACCESS POLICY ON public.emp_dimension FOR ROWS COPY TO TABLE public.regional_managers_dimension;

The following statement copies the access policy on column employee_key from table public.emp_dimension to store.store_sales_fact:

=> ALTER ACCESS POLICY ON public.emp_dimension FOR COLUMN employee_key COPY TO TABLE store.store_sales_fact;

5 - Using the administration tools

The Vertica Administration tools allow you to easily perform administrative tasks.

The Vertica Administration tools allow you to easily perform administrative tasks. You can perform most Vertica database administration tasks with Administration Tools.

Run Administration Tools using the Database Superuser account on the Administration host, if possible. Make sure that no other Administration Tools processes are running.

If the Administration host is unresponsive, run Administration Tools on a different node in the cluster. That node permanently takes over the role of Administration host.

5.1 - Running the administration tools

Administration tools, or "admintools," supports various commands to manage your database.

Administration tools, or "admintools," supports various commands to manage your database.

To run admintools, you must have SSH and local connections enabled for the dbadmin user.

Syntax

/opt/vertica/bin/admintools [--debug ][
     { -h | --help }
   | { -a | --help_all}
   | { -t | --tool } name_of_tool [options]
]
--debug

If you include this option, Vertica logs debug information.

-h
--help
Outputs abbreviated help.
-a
--help_all
Outputs verbose help, which lists all command-line sub-commands and options.
{ -t | --tool } name_of_tool [options]

Specifies the tool to run, where name_of_tool is one of the tools described in the help output, and options are one or more comma-delimited tool arguments.

An unqualified admintools command displays the Main Menu dialog box.

Admin Tools screen

If you are unfamiliar with this type of interface, read Using the administration tools interface.

Privileges

dbadmin user

5.2 - First login as database administrator

The first time you log in as the and run the Administration Tools, the user interface displays.

The first time you log in as the Database Superuser and run the Administration Tools, the user interface displays.

  1. In the end-user license agreement (EULA ) window, type accept to proceed.

    A window displays, requesting the location of the license key file you downloaded from the Vertica website. The default path is /tmp/vlicense.dat.

  2. Type the absolute path to your license key (for example,

5.3 - Using the administration tools interface

The Vertica Administration Tools are implemented using Dialog, a graphical user interface that works in terminal (character-cell) windows.The interface responds to mouse clicks in some terminal windows, particularly local Linux windows, but you might find that it responds only to keystrokes.

The Vertica Administration Tools are implemented using Dialog, a graphical user interface that works in terminal (character-cell) windows.The interface responds to mouse clicks in some terminal windows, particularly local Linux windows, but you might find that it responds only to keystrokes. Thus, this section describes how to use the Administration Tools using only keystrokes.

Enter [return]

In all dialogs, when you are ready to run a command, select a file, or cancel the dialog, press the Enter key. The command descriptions in this section do not explicitly instruct you to press Enter.

OK - cancel - help

The OK, Cancel, and Help buttons are present on virtually all dialogs. Use the tab, space bar, or right and left arrow keys to select an option and then press Enter. The same keystrokes apply to dialogs that present a choice of Yes or No.
Some dialogs require that you choose one command from a menu. Type the alphanumeric character shown or use the up and down arrow keys to select a command and then press Enter.

List dialogs

In a list dialog, use the up and down arrow keys to highlight items, then use the space bar to select the items (which marks them with an X). Some list dialogs allow you to select multiple items. When you have finished selecting items, press Enter.

Form dialogs

In a form dialog (also referred to as a dialog box), use the tab key to cycle between OK, Cancel, Help, and the form field area. Once the cursor is in the form field area, use the up and down arrow keys to select an individual field (highlighted) and enter information. When you have finished entering information in all fields, press Enter.

Help buttons

Online help is provided in the form of text dialogs. If you have trouble viewing the help, see Notes for remote terminal users.

5.4 - Notes for remote terminal users

The appearance of the graphical interface depends on the color and font settings used by your terminal window.

The appearance of the graphical interface depends on the color and font settings used by your terminal window. The screen captures in this document were made using the default color and font settings in a PuTTy terminal application running on a Windows platform.

If you are using PuTTY, you can make the Administration Tools look like the screen captures in this document:

  1. In a PuTTY window, right click the title area and select Change Settings.

  2. Create or load a saved session.

  3. In the Category dialog, click Window > Appearance.

  4. In the Font settings, click the Change... button.

  5. Select Font: Courier New: Regular Size: 10

  6. Click Apply.

Repeat these steps for each existing session that you use to run the Administration Tools.

You can also change the translation to support UTF-8:

  1. In a PuTTY window, right click the title area and select Change Settings.

  2. Create or load a saved session.

  3. In the Category dialog, click Window > Translation.

  4. In the "Received data assumed to be in which character set" drop-down menu, select UTF-8.

  5. Click Apply.

5.5 - Using administration tools help

The Help on Using the Administration Tools command displays a help screen about using the Administration Tools.

The Help on Using the Administration Tools command displays a help screen about using the Administration Tools.

Most of the online help in the Administration Tools is context-sensitive. For example, if you use up/down arrows to select a command, press tab to move to the Help button, and press return, you get help on the selected command.

In a menu dialog

  1. Use the up and down arrow keys to choose the command for which you want help.

  2. Use the Tab key to move the cursor to the Help button.

  3. Press Enter (Return).

In a dialog box

  1. Use the up and down arrow keys to choose the field on which you want help.

  2. Use the Tab key to move the cursor to the Help button.

  3. Press Enter (Return).

Scrolling

Some help files are too long for a single screen. Use the up and down arrow keys to scroll through the text.

5.6 - Distributing changes made to the administration tools metadata

Administration Tools-specific metadata for a failed node will fall out of synchronization with other cluster nodes if you make the following changes:.

Administration Tools-specific metadata for a failed node will fall out of synchronization with other cluster nodes if you make the following changes:

  • Modify the restart policy

  • Add one or more nodes

  • Drop one or more nodes.

When you restore the node to the database cluster, you can use the Administration Tools to update the node with the latest Administration Tools metadata:

  1. Log on to a host that contains the metadata you want to transfer and start the Administration Tools. (See Using the administration tools.)

  2. On the Main Menu in the Administration Tools, select Configuration Menu and click OK.

  3. On the Configuration Menu, select Distribute Config Files and click OK.

  4. Select AdminTools Meta-Data.

    The Administration Tools metadata is distributed to every host in the cluster.

  5. Restart the database.

5.7 - Administration tools and Management Console

You can perform most database administration tasks using the Administration Tools, but you have the additional option of using the more visual and dynamic.

You can perform most database administration tasks using the Administration Tools, but you have the additional option of using the more visual and dynamic Management Console.

The following table compares the functionality available in both interfaces. Continue to use Administration Tools and the command line to perform actions not yet supported by Management Console.

Vertica Functionality Management Console Administration Tools
Use a Web interface for the administration of Vertica Yes No
Manage/monitor one or more databases and clusters through a UI Yes No
Manage multiple databases on different clusters Yes Yes
View database cluster state Yes Yes
View multiple cluster states Yes No
Connect to the database Yes Yes
Start/stop an existing database Yes Yes
Stop/restart Vertica on host Yes Yes
Kill a Vertica process on host No Yes
Create one or more databases Yes Yes
View databases Yes Yes
Remove a database from view Yes No
Drop a database Yes Yes
Create a physical schema design (Database Designer) Yes Yes
Modify a physical schema design (Database Designer) Yes Yes
Set the restart policy No Yes
Roll back database to the Last Good Epoch No Yes
Manage clusters (add, replace, remove hosts) Yes Yes
Rebalance data across nodes in the database Yes Yes
Configure database parameters dynamically Yes No
View database activity in relation to physical resource usage Yes No
View alerts and messages dynamically Yes No
View current database size usage statistics Yes No
View database size usage statistics over time Yes No
Upload/upgrade a license file Yes Yes
Warn users about license violation on login Yes Yes
Create, edit, manage, and delete users/user information Yes No
Use LDAP to authenticate users with company credentials Yes Yes
Manage user access to MC through roles Yes No
Map Management Console users to a Vertica database Yes No
Enable and disable user access to MC and/or the database Yes No
Audit user activity on database Yes No
Hide features unavailable to a user through roles Yes No
Generate new user (non-LDAP) passwords Yes No

Management Console Provides some, but Not All of the Functionality Provided By the Administration Tools. MC Also Provides Functionality Not Available in the Administration Tools.

See also

5.8 - Administration tools reference

Administration Tools, or "admintools," uses the open-source vertica-python client to perform operations on the database.

Administration Tools, or "admintools," uses the open-source vertica-python client to perform operations on the database.

The follow sections explain in detail all the steps you can perform with Vertica Administration Tools:

5.8.1 - Viewing database cluster state

This tool shows the current state of the nodes in the database.

This tool shows the current state of the nodes in the database.

  1. On the Main Menu, select View Database Cluster State, and click OK.
    The normal state of a running database is ALL UP. The normal state of a stopped database is ALL DOWN.

  2. If some hosts are UP and some DOWN, restart the specific host that is down using Restart Vertica on Host from the Administration Tools, or you can start the database as described in Starting and Stopping the Database (unless you have a known node failure and want to continue in that state.)

    Nodes shown as INITIALIZING or RECOVERING indicate that Failure recovery is in progress.

Nodes in other states (such as NEEDS_CATCHUP) are transitional and can be ignored unless they persist.

See also

5.8.2 - Connecting to the database

This tool connects to a running with.

This tool connects to a running database with vsql. You can use the Administration Tools to connect to a database from any node within the database while logged in to any user account with access privileges. You cannot use the Administration Tools to connect from a host that is not a database node. To connect from other hosts, run vsql as described in Connecting from the command line.

  1. On the Main Menu, click Connect to Database, and then click OK.

  2. Supply the database password if asked:

    Password:
    

    When you create a new user with the CREATE USER command, you can configure the password or leave it empty. You cannot bypass the password if the user was created with a password configured. You can change a user's password using the ALTER USER command.

    The Administration Tools connect to the database and transfer control to vsql.

    Welcome to vsql, the Vertica Analytic Database interactive terminal.
    Type:  \h or \? for help with vsql commands
           \g or terminate with semicolon to execute query
           \q to quit
    
    =>
    

See Using vsql for more information.

See also

5.8.3 - Restarting Vertica on host

This tool restarts the Vertica process on one or more hosts in a running database.

This tool restarts the Vertica process on one or more hosts in a running database. Use this tool if the Vertica process stopped or was killed on the host.

  1. To view the current state nodes in cluster, on the Main Menu, select View Database Cluster State.

  2. Click OK to return to the Main Menu.

  3. If one or more nodes are down, select Restart Vertica on Host, and click OK.

  4. Select the database that contains the host that you want to restart, and click OK.

  5. Select the one or more hosts to restart, and click OK.

  6. Enter the database password.

  7. Select View Database Cluster State again to verify all nodes are up.

5.8.4 - Configuration menu options

The Configuration Menu allows you to perform the following tasks:.

The Configuration Menu allows you to perform the following tasks:

5.8.4.1 - Creating a database

Use the procedures below to create either an Enterprise Mode or Eon Mode database with admintools.

Use the procedures below to create either an Enterprise Mode or Eon Mode database with admintools. To create a database with an in-browser wizard in Management Console, see Creating a database using MC. For details about creating a database with admintools through the command line, see Writing administration tools scripts.

Create an Enterprise Mode database

  1. On the Configuration Menu, click Create Database. Click OK.

  2. Select Enterprise Mode as your database mode.

  3. Enter the name of the database and an optional comment. Click OK.

  4. Enter a password. See Creating a database name and password for rules.

    If you do not enter a password, you are prompted to confirm: Yes to enter a superuser password, No to create a database without one.

  5. If you entered a password, enter the password again.

  6. Select the hosts to include in the database. The hosts in this list are the ones that were specified at installation time ( install_vertica -s).

    Database Hosts

  7. Specify the directories in which to store the catalog and data files.

    Database data directories

  8. Check the current database definition for correctness. Click Yes to proceed.

    CurrentDatabaseDefinition

  9. A message indicates that you have successfully created a database. Click OK.

You can also create an Enterprise Mode database using admintools through the command line, for example:

$ admintools -t create_db --data_path=/home/dbadmin --catalog_path=/home/dbadmin --database=verticadb --password=password --hosts=localhost

For more information, see Writing administration tools scripts.

Create an Eon Mode database

  1. On the Configuration Menu, click Create Database. Click OK.

  2. Select Eon Mode as your database mode.

  3. Enter the name of the database and an optional comment. Click OK.

  4. Enter a password. See Creating a database name and password for rules.

    AWS only: If you do not enter a password, you are prompted to confirm: Yes to enter a superuser password, No to create a database without one.

  5. If you entered a password, enter the password again.

  6. Select the hosts to include in the database. The hosts in this list are those specified at installation time ( install_vertica -s).

  7. Specify the directories in which to store the catalog and depot, depot size, communal storage location, and number of shards.

    • Depot Size: Use an integer followed by %, K, G, or T. Default is 60% of the disk total disk space of the filesystem storing the depot.

    • Communal Storage: Use an existing Amazon S3 bucket in the same region as your instances. Specify a new subfolder name, which Vertica will dynamically create within the existing S3 bucket. For example, s3://existingbucket/newstorage1. You can create a new subfolder within existing ones, but database creation will roll back if you do not specify any new subfolder name.

    • Number of Shards: Use a whole number. The default is equal to the number of nodes. For optimal performance, the number of shards should be no greater than 2x the number of nodes. When the number of nodes is greater than the number of shards (with ETS), the throughput of dashboard queries improves. When the number of shards exceeds the number of nodes, you can expand the cluster in the future to improve the performance of long analytic queries.

  8. Check the current database definition for correctness. Click Yes to proceed.

  9. A message indicates that you successfully created a database. Click OK.

In on-premises, AWS, and Azure environments, you can create an Eon Mode database using admintools through the command line. For instructions specific to your environment, see Create a database in Eon Mode.

5.8.4.2 - Dropping a database

This tool drops an existing.

This tool drops an existing database. Only the Database Superuser is allowed to drop a database.

  1. Stop the database.

  2. On the Configuration Menu, click Drop Database and then click OK.

  3. Select the database to drop and click OK.

  4. Click Yes to confirm that you want to drop the database.

  5. Type yes and click OK to reconfirm that you really want to drop the database.

  6. A message indicates that you have successfully dropped the database. Click OK.

When Vertica drops the database, it also automatically drops the node definitions that refer to the database . The following exceptions apply:

  • Another database uses a node definition. If another database refers to any of these node definitions, none of the node definitions are dropped.

  • A node definition is the only node defined for the host. (Vertica uses node definitions to locate hosts that are available for database creation, so removing the only node defined for a host would make the host unavailable for new databases.)

5.8.4.3 - Viewing a database

This tool displays the characteristics of an existing .

This tool displays the characteristics of an existing database.

  1. On the Configuration Menu, select View Database and click OK.

  2. Select the database to view.

  3. Vertica displays the following information about the database:

    • The name of the database.

    • The name and location of the log file for the database.

    • The hosts within the database cluster.

    • The value of the restart policy setting.

      Note: This setting determines whether nodes within a K-Safe database are restarted when they are rebooted. See Setting the restart policy.

    • The database port.

    • The name and location of the catalog directory.

5.8.4.4 - Setting the restart policy

The Restart Policy enables you to determine whether or not nodes in a K-Safe are automatically restarted when they are rebooted.

The Restart Policy enables you to determine whether or not nodes in a K-Safe database are automatically restarted when they are rebooted. Since this feature does not automatically restart nodes if the entire database is DOWN, it is not useful for databases that are not K-Safe.

To set the Restart Policy for a database:

  1. Open the Administration Tools.

  2. On the Main Menu, select Configuration Menu, and click OK.

  3. In the Configuration Menu, select Set Restart Policy, and click OK.

  4. Select the database for which you want to set the Restart Policy, and click OK.

  5. Select one of the following policies for the database:

    • Never — Nodes are never restarted automatically.

    • K-Safe — Nodes are automatically restarted if the database cluster is still UP. This is the default setting.

    • Always — Node on a single node database is restarted automatically.

  6. Click OK.

Best practice for restoring failed hardware

Following this procedure will prevent Vertica from misdiagnosing missing disk or bad mounts as data corruptions, which would result in a time-consuming, full-node recovery.

If a server fails due to hardware issues, for example a bad disk or a failed controller, upon repairing the hardware:

  1. Reboot the machine into runlevel 1, which is a root and console-only mode.

    Runlevel 1 prevents network connectivity and keeps Vertica from attempting to reconnect to the cluster.

  2. In runlevel 1, validate that the hardware has been repaired, the controllers are online, and any RAID recover is able to proceed.

  3. Once the hardware is confirmed consistent, only then reboot to runlevel 3 or higher.

At this point, the network activates, and Vertica rejoins the cluster and automatically recovers any missing data. Note that, on a single-node database, if any files that were associated with a projection have been deleted or corrupted, Vertica will delete all files associated with that projection, which could result in data loss.

5.8.4.5 - Installing external procedure executable files

  1. Run the Administration tools.

    $ /opt/vertica/bin/adminTools
    
  2. On the AdminTools Main Menu, click Configuration Menu, and then click OK.

  3. On the Configuration Menu, click Install External Procedure and then click OK.

  4. Select the database on which you want to install the external procedure.

  5. Either select the file to install or manually type the complete file path, and then click OK.

  6. If you are not the superuser, you are prompted to enter your password and click OK.

    The Administration Tools automatically create the database-name/procedures directory on each node in the database and installs the external procedure in these directories for you.

  7. Click OK in the dialog that indicates that the installation was successful.

5.8.5 - Advanced menu options

The Advanced Menu options allow you to perform the following tasks:.

The Advanced Menu options allow you to perform the following tasks:

5.8.5.1 - Rolling back the database to the last good epoch

Vertica provides the ability to roll the entire database back to a specific primarily to assist in the correction of human errors during data loads or other accidental corruptions.

Vertica provides the ability to roll the entire database back to a specific epoch primarily to assist in the correction of human errors during data loads or other accidental corruptions. For example, suppose that you have been performing a bulk load and the cluster went down during a particular COPY command. You might want to discard all epochs back to the point at which the previous COPY command committed and run the one that did not finish again. You can determine that point by examining the log files (see Monitoring the Log Files).

  1. On the Advanced Menu, select Roll Back Database to Last Good Epoch.

  2. Select the database to roll back. The database must be stopped.

  3. Accept the suggested restart epoch or specify a different one.

  4. Confirm that you want to discard the changes after the specified epoch.

The database restarts successfully.

5.8.5.2 - Stopping Vertica on host

This command attempts to gracefully shut down the Vertica process on a single node.

This command attempts to gracefully shut down the Vertica process on a single node.

  1. On the Advanced Menu, select Stop Vertica on Host and click OK.

  2. Select the hosts to stop.

  3. Confirm that you want to stop the hosts.

    If the command succeeds View Database Cluster State shows that the selected hosts are DOWN.

    If the command fails to stop any selected nodes, proceed to Killing Vertica Process on Host.

5.8.5.3 - Killing the Vertica process on host

This command sends a kill signal to the Vertica process on a node.

This command sends a kill signal to the Vertica process on a node.

  1. On the Advanced menu, select Kill Vertica Process on Host and click OK.

  2. Select the hosts on which to kills the Vertica process.

  3. Confirm that you want to stop the processes.

  4. If the command succeeds, View Database Cluster State shows that the selected hosts are DOWN.

5.8.5.4 - Upgrading a Vertica license key

The following steps are for licensed Vertica users.

The following steps are for licensed Vertica users. Completing the steps copies a license key file into the database. See Managing licenses for more information.

  1. On the Advanced menu select Upgrade License Key . Click OK.

  2. Select the database for which to upgrade the license key.

  3. Enter the absolute pathname of your downloaded license key file (for example, /tmp/vlicense.dat). Click OK.

  4. Click OK when you see a message indicating that the upgrade succeeded.

5.8.5.5 - Managing clusters

Cluster Management lets you add, replace, or remove hosts from a database cluster.

Cluster Management lets you add, replace, or remove hosts from a database cluster. These processes are usually part of a larger process of adding, removing, or replacing a database node.

Using cluster management

To use Cluster Management:

  1. From the Main Menu, select Advanced Menu, and then click OK.

  2. In the Advanced Menu, select Cluster Management, and then click OK.

  3. Select one of the following, and then click OK.

5.8.5.6 - Getting help on administration tools

The Help Using the Administration Tools command displays a help screen about using the Administration Tools.

The Help Using the Administration Tools command displays a help screen about using the Administration Tools.

Most of the online help in the Administration Tools is context-sensitive. For example, if you up the use up/down arrows to select a command, press tab to move to the Help button, and press return, you get help on the selected command.

5.8.5.7 - Administration tools metadata

The Administration Tools configuration data (metadata) contains information that databases need to start, such as the hostname/IP address of each participating host in the database cluster.

The Administration Tools configuration data (metadata) contains information that databases need to start, such as the hostname/IP address of each participating host in the database cluster.

To facilitate hostname resolution within the Administration Tools, at the command line, and inside the installation utility, Vertica enforces all hostnames you provide through the Administration Tools to use IP addresses:

  • During installation

    Vertica immediately converts any hostname you provide through command line options --hosts, -add-hosts or --remove-hosts to its IP address equivalent.

    • If you provide a hostname during installation that resolves to multiple IP addresses (such as in multi-homed systems), the installer prompts you to choose one IP address.

    • Vertica retains the name you give for messages and prompts only; internally it stores these hostnames as IP addresses.

  • Within the Administration Tools

    All hosts are in IP form to allow for direct comparisons (for example db = database = database.example.com).

  • At the command line

    Vertica converts any hostname value to an IP address that it uses to look up the host in the configuration metadata. If a host has multiple IP addresses that are resolved, Vertica tests each IP address to see if it resides in the metadata, choosing the first match. No match indicates that the host is not part of the database cluster.

Metadata is more portable because Vertica does not require the names of the hosts in the cluster to be exactly the same when you install or upgrade your database.

5.8.6 - Administration tools connection behavior and requirements

The behavior of admintools when it connects to and performs operations on a database may vary based on your configuration.

The behavior of admintools when it connects to and performs operations on a database may vary based on your configuration. In particular, admintools considers its connection to other nodes, the status of those nodes, and the authentication method used by dbadmin.

Connection requirements and authentication

  • admintools uses passwordless SSH connections between cluster hosts for most operations, which is configured or confirmed during installation with the install_vertica script

  • For most situations, when issuing commands to the database, admintools prefers to uses its SSH connection to a target host and uses a localhost client connection to the Vertica database

  • The incoming IP address determines the authentication method used. That is, a client connection may have different behavior from a local connection, which may be trusted by default

  • dbadmin should have a local trust or password-based authentication method

  • When deciding which host to use for multi-step operations, admintools prefers localhost, and then to reconnect to known-to-be-good nodes

K-safety support

The Administration Tools allow certain operations on a K-Safe database, even if some nodes are unresponsive.

The database must have been marked as K-Safe using the MARK_DESIGN_KSAFE function.

The following management functions within the Administration Tools are operational when some nodes are unresponsive.

  • View database cluster state

  • Connect to database

  • Start database (including manual recovery)

  • Stop database

  • Replace node (assuming node that is down is the one being replaced)

  • View database parameters

  • Upgrade license key

The following management functions within the Administration Tools require that all nodes be UP in order to be operational:

  • Create database

  • Run the Database Designer

  • Drop database

  • Set restart policy

  • Roll back database to Last Good Epoch

5.8.7 - Writing administration tools scripts

You can invoke most Administration Tools from the command line or a shell script.

You can invoke most Administration Tools from the command line or a shell script.

Syntax

/opt/vertica/bin/admintools {
     { -h | --help }
   | { -a | --help_all}
   | { [--debug ] { -t | --tool } toolname [ tool-args ] }
}

Parameters

-h
-help
Outputs abbreviated help.
-a
-help_all
Outputs verbose help, which lists all command-line sub-commands and options.
[debug] { -t | -tool }toolname [args] Specifies the tool to run, where toolname is one of the tools listed in the help output described below, and args is one or more comma-delimited toolname arguments. If you include the debug option, Vertica logs debug information during tool execution.

Tools

To return a list of all available tools, enter admintools -h at a command prompt.

To display help for a specific tool and its options or commands, qualify the specified tool name with --help or -h, as shown in the example below:

$ admintools -t connect_db --help
Usage: connect_db [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to connect
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes

To list all available tools and their commands and options in individual help text, enter admintools -a.

Usage:
    adminTools [-t | --tool] toolName [options]
Valid tools are:
                command_host
                connect_db
                create_db
                db_add_node
                db_add_subcluster
                db_remove_node
                db_remove_subcluster
                db_replace_node
                db_status
                distribute_config_files
                drop_db
                host_to_node
                install_package
                install_procedure
                kill_host
                kill_node
                license_audit
                list_allnodes
                list_db
                list_host
                list_node
                list_packages
                logrotate
                node_map
                re_ip
                rebalance_data
                restart_db
                restart_node
                restart_subcluster
                return_epoch
                revive_db
                set_restart_policy
                set_ssl_params
                show_active_db
                start_db
                stop_db
                stop_host
                stop_node
                stop_subcluster
                uninstall_package
                upgrade_license_key
                view_cluster
-------------------------------------------------------------------------
Usage: command_host [options]

Options:
  -h, --help            show this help message and exit
  -c CMD, --command=CMD
                        Command to run
  -F, --force           Provide the force cleanup flag. Only applies to start,
                        restart, condrestart. For other options it is ignored.
-------------------------------------------------------------------------
Usage: connect_db [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to connect
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
-------------------------------------------------------------------------
Usage: create_db [options]

Options:
  -h, --help            show this help message and exit
  -D DATA, --data_path=DATA
                        Path of data directory[optional] if not using compat21
  -c CATALOG, --catalog_path=CATALOG
                        Path of catalog directory[optional] if not using
                        compat21
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
  -d DB, --database=DB  Name of database to be created
  -l LICENSEFILE, --license=LICENSEFILE
                        Database license [optional]
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes [optional]
  -P POLICY, --policy=POLICY
                        Database restart policy [optional]
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts to participate in
                        database
  --shard-count=SHARD_COUNT
                        [Eon only] Number of shards in the database
  --communal-storage-location=COMMUNAL_STORAGE_LOCATION
                        [Eon only] Location of communal storage
  -x COMMUNAL_STORAGE_PARAMS, --communal-storage-params=COMMUNAL_STORAGE_PARAMS
                        [Eon only] Location of communal storage parameter file
  --depot-path=DEPOT_PATH
                        [Eon only] Path to depot directory
  --depot-size=DEPOT_SIZE
                        [Eon only] Size of depot
  --force-cleanup-on-failure
                        Force removal of existing directories on failure of
                        command
  --force-removal-at-creation
                        Force removal of existing directories before creating
                        the database
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: db_add_node [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of the database
  -s HOSTS, --hosts=HOSTS
                        Comma separated list of hosts to add to database
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -a AHOSTS, --add=AHOSTS
                        Comma separated list of hosts to add to database
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster for the new node
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
-------------------------------------------------------------------------
Usage: db_add_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be modified
  -s HOSTS, --hosts=HOSTS
                        Comma separated list of hosts to add to the subcluster
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -c SCNAME, --subcluster=SCNAME
                        Name of the new subcluster for the new node
  --is-primary          Create primary subcluster
  --is-secondary        Create secondary subcluster
  --control-set-size=CONTROLSETSIZE
                        Set the number of nodes that will run spread within
                        the subcluster
  --like=CLONESUBCLUSTER
                        Name of an existing subcluster from which to clone
                        properties for the new subcluster
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: db_remove_node [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be modified
  -s HOSTS, --hosts=HOSTS
                        Name of the host to remove from the db
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
  --skip-directory-cleanup
                        Caution: this option will force you to do a manual
                        cleanup. This option skips directory deletion during
                        remove node. This is best used in a cloud environment
                        where the hosts being removed will be subsequently
                        discarded.
-------------------------------------------------------------------------
Usage: db_remove_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be modified
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be removed
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
  --skip-directory-cleanup
                        Caution: this option will force you to do a manual
                        cleanup. This option skips directory deletion during
                        remove subcluster. This is best used in a cloud
                        environment where the hosts being removed will be
                        subsequently discarded.
-------------------------------------------------------------------------
Usage: db_replace_node [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of the database
  -o ORIGINAL, --original=ORIGINAL
                        Name of host you wish to replace
  -n NEWHOST, --new=NEWHOST
                        Name of the replacement host
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: db_status [options]

Options:
  -h, --help            show this help message and exit
  -s STATUS, --status=STATUS
                        Database status UP,DOWN or ALL(list running dbs -
                        UP,list down dbs - DOWN list all dbs - ALL
-------------------------------------------------------------------------
Usage: distribute_config_files
Sends admintools.conf from local host to all other hosts in the cluster

Options:
  -h, --help  show this help message and exit
-------------------------------------------------------------------------
Usage: drop_db [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Database to be dropped
-------------------------------------------------------------------------
Usage: host_to_node [options]

Options:
  -h, --help            show this help message and exit
  -s HOST, --host=HOST  comma separated list of hostnames which is to be
                        converted into its corresponding nodenames
  -d DB, --database=DB  show only node/host mapping for this database.
-------------------------------------------------------------------------
Usage: admintools -t install_package --package PACKAGE -d DB -p PASSWORD
Examples:
admintools -t install_package -d mydb -p 'mypasswd' --package default
    # (above) install all default packages that aren't currently installed

admintools -t install_package -d mydb -p 'mypasswd' --package default --force-reinstall
   # (above) upgrade (re-install) all default packages to the current version

admintools -t install_package -d mydb -p 'mypasswd' --package hcat
   # (above) install package hcat

See also: admintools -t list_packages

Options:
  -h, --help            show this help message and exit
  -d DBNAME, --dbname=DBNAME
                        database name
  -p PASSWORD, --password=PASSWORD
                        database admin password
  -P PACKAGE, --package=PACKAGE
                        specify package or 'all' or 'default'
  --force-reinstall     Force a package to be re-installed even if it is
                        already installed.
-------------------------------------------------------------------------
Usage: install_procedure [options]

Options:
  -h, --help            show this help message and exit
  -d DBNAME, --database=DBNAME
                        Name of database for installed procedure
  -f PROCPATH, --file=PROCPATH
                        Path of procedure file to install
  -p OWNERPASSWORD, --password=OWNERPASSWORD
                        Password of procedure file owner
-------------------------------------------------------------------------
Usage: kill_host [options]

Options:
  -h, --help            show this help message and exit
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts on which the vertica
                        process is to be killed using a SIGKILL signal
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
-------------------------------------------------------------------------
Usage: kill_node [options]

Options:
  -h, --help            show this help message and exit
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts on which the vertica
                        process is to be killed using a SIGKILL signal
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
-------------------------------------------------------------------------
Usage: license_audit --dbname DB_NAME [OPTIONS]
Runs audit and collects audit results.

Options:
  -h, --help            show this help message and exit
  -d DATABASE, --database=DATABASE
                        Name of the database to retrieve audit results
  -p PASSWORD, --password=PASSWORD
                        Password for database admin
  -q, --quiet           Do not print status messages.
  -f FILE, --file=FILE  Output results to FILE.
-------------------------------------------------------------------------
Usage: list_allnodes [options]

Options:
  -h, --help  show this help message and exit
-------------------------------------------------------------------------
Usage: list_db [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be listed
-------------------------------------------------------------------------
Usage: list_host [options]

Options:
  -h, --help  show this help message and exit
-------------------------------------------------------------------------
Usage: list_node [options]

Options:
  -h, --help            show this help message and exit
  -n NODENAME, --node=NODENAME
                        Name of the node to be listed
-------------------------------------------------------------------------
Usage: admintools -t list_packages [OPTIONS]
Examples:
admintools -t list_packages                               # lists all available packages
admintools -t list_packages --package all                 # lists all available packages
admintools -t list_packages --package default             # list all packages installed by default
admintools -t list_packages -d mydb --password 'mypasswd' # list the status of all packages in mydb

Options:
  -h, --help            show this help message and exit
  -d DBNAME, --dbname=DBNAME
                        database name
  -p PASSWORD, --password=PASSWORD
                        database admin password
  -P PACKAGE, --package=PACKAGE
                        specify package or 'all' or 'default'
-------------------------------------------------------------------------
Usage: logrotateconfig [options]

Options:
  -h, --help            show this help message and exit
  -d DBNAME, --dbname=DBNAME
                        database name
  -r ROTATION, --rotation=ROTATION
                        set how often the log is rotated.[
                        daily|weekly|monthly ]
  -s MAXLOGSZ, --maxsize=MAXLOGSZ
                        set maximum log size before rotation is forced.
  -k KEEP, --keep=KEEP  set # of old logs to keep
-------------------------------------------------------------------------
Usage: node_map [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  List only data for this database.
-------------------------------------------------------------------------
Usage: re_ip [options]

Replaces the IP addresses of hosts and databases in a cluster, or changes the
control messaging mode/addresses of a database.

Options:
  -h, --help            show this help message and exit
  -f MAPFILE, --file=MAPFILE
                        A text file with IP mapping information. If the -O
                        option is not used, the command replaces the IP
                        addresses of the hosts in the cluster and all
                        databases for those hosts. In this case, the format of
                        each line in MAPFILE is: [oldIPaddress newIPaddress]
                        or [oldIPaddress newIPaddress, newControlAddress,
                        newControlBroadcast]. If the former,
                        'newControlAddress' and 'newControlBroadcast' would
                        set to default values. Usage: $ admintools -t re_ip -f
                        <mapfile>
  -O, --db-only         Updates the control messaging addresses of a database.
                        Also used for error recovery (when re_ip encounters
                        some certain errors, a mapfile is auto-generated).
                        Format of each line in MAPFILE: [NodeName
                        AssociatedNodeIPaddress, newControlAddress,
                        newControlBrodcast]. 'NodeName' and
                        'AssociatedNodeIPaddress' must be consistent with
                        admintools.conf. Usage: $ admintools -t re_ip -f
                        <mapfile> -O -d <db_name>
  -i, --noprompts       System does not prompt for the validation of the new
                        settings before performing the re_ip operation. Prompting is on
                        by default.
  -T, --point-to-point  Sets the control messaging mode of a database to
                        point-to-point. Usage: $ admintools -t re_ip -d
                        <db_name> -T
  -U, --broadcast       Sets the control messaging mode of a database to
                        broadcast. Usage: $ admintools -t re_ip -d <db_name>
                        -U
  -d DB, --database=DB  Name of a database. Required with the following
                        options: -O, -T, -U.
-------------------------------------------------------------------------
Usage: rebalance_data [options]

Options:
  -h, --help            show this help message and exit
  -d DBNAME, --dbname=DBNAME
                        database name
  -k KSAFETY, --ksafety=KSAFETY
                        specify the new k value to use
  -p PASSWORD, --password=PASSWORD
  --script              Don't re-balance the data, just provide a script for
                        later use.
-------------------------------------------------------------------------
Usage: restart_db [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be restarted
  -e EPOCH, --epoch=EPOCH
                        Epoch at which the database is to be restarted. If
                        'last' is given as argument the db is restarted from
                        the last good epoch.
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -k, --allow-fallback-keygen
                        Generate spread encryption key from Vertica. Use under
                        support guidance only.
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: restart_node [options]

Options:
  -h, --help            show this help message and exit
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts to be restarted
  -d DB, --database=DB  Name of database whose node is to be restarted
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  --new-host-ips=NEWHOSTS
                        comma-separated list of new IPs for the hosts to be
                        restarted
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
  -F, --force           force the node to start and auto recover if necessary
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
  --waitfordown-timeout=WAITTIME
                        Seconds to wait until nodes to be restarted are down
-------------------------------------------------------------------------
Usage: restart_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database whose subcluster is to be restarted
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be restarted
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -s NEWHOSTS, --hosts=NEWHOSTS
                        Comma separated list of new hosts to rebind to the
                        nodes
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
  -F, --force           Force the nodes in the subcluster to start and auto
                        recover if necessary
-------------------------------------------------------------------------
Usage: return_epoch [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database
  -p PASSWORD, --password=PASSWORD
                        Database password in single quotes
-------------------------------------------------------------------------
Usage: revive_db [options]

Options:
  -h, --help            show this help message and exit
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts to participate in
                        database
  -n NODEHOST, --node-and-host=NODEHOST
                        pair of nodename-hostname values delimited by "|" eg:
                        "v_testdb_node0001|10.0.0.1"Note: Each node-host pair
                        has to be specified as a new argument
  --communal-storage-location=COMMUNAL_STORAGE_LOCATION
                        Location of communal storage
  -x COMMUNAL_STORAGE_PARAMS, --communal-storage-params=COMMUNAL_STORAGE_PARAMS
                        Location of communal storage parameter file
  -d DBNAME, --database=DBNAME
                        Name of database to be revived
  --force               Force cleanup of existing catalog directory
  --display-only        Describe the database on communal storage, and exit
  --strict-validation   Print warnings instead of raising errors while
                        validating cluster_config.json
-------------------------------------------------------------------------
Usage: sandbox_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be modified
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be sandboxed
  -b SBNAME, --sandbox=SBNAME
                        Name of the sandbox
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: set_restart_policy [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database for which to set policy
  -p POLICY, --policy=POLICY
                        Restart policy: ('never', 'ksafe', 'always')
-------------------------------------------------------------------------
Usage: set_ssl_params [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database whose parameters will be set
  -k KEYFILE, --ssl-key-file=KEYFILE
                        Path to SSL private key file
  -c CERTFILE, --ssl-cert-file=CERTFILE
                        Path to SSL certificate file
  -a CAFILE, --ssl-ca-file=CAFILE
                        Path to SSL CA file
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
-------------------------------------------------------------------------
Usage: show_active_db [options]

Options:
  -h, --help  show this help message and exit
-------------------------------------------------------------------------
Usage: start_db [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be started
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
  -F, --force           force the database to start at an epoch before data
                        consistency problems were detected.
  -U, --unsafe          Start database unsafely,  skipping recovery. Use under
                        support guidance only.
  -k, --allow-fallback-keygen
                        Generate spread encryption key from Vertica. Use under
                        support guidance only.
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts to be started
  --fast                Attempt fast startup on un-encrypted eon db. Fast
                        startup will use startup information from
                        cluster_config.json
-------------------------------------------------------------------------
Usage: stop_db [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be stopped
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -F, --force           Force the databases to shutdown, even if users are
                        connected.
  -z, --if-no-users     Only shutdown if no users are connected.
                        If any users are connected, exit with an error.
  -n DRAIN_SECONDS, --drain-seconds=DRAIN_SECONDS
                        Eon db only: seconds to wait for user connections to close.
                        Default value is 60 seconds.
                        When the time expires, connections will be forcibly closed
                        and the db will shut down.
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: stop_host [options]

Options:
  -h, --help            show this help message and exit
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts on which the vertica
                        process is to be killed using a SIGTERM signal
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
-------------------------------------------------------------------------
Usage: stop_node [options]

Options:
  -h, --help            show this help message and exit
  -s HOSTS, --hosts=HOSTS
                        comma-separated list of hosts on which the vertica
                        process is to be killed using a SIGTERM signal
  --compat21            (deprecated) Use Vertica 2.1 method using node names
                        instead of hostnames
-------------------------------------------------------------------------
Usage: stop_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database whose subcluster is to be stopped
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be stopped
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -n DRAIN_SECONDS, --drain-seconds=DRAIN_SECONDS
                        Seconds to wait for user connections to close.
                        Default value is 60 seconds.
                        When the time expires, connections will be forcibly closed
                        and the db will shut down.
  -F, --force           Force the subcluster to shutdown immediately,
                        even if users are connected.
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: uninstall_package [options]

Options:
  -h, --help            show this help message and exit
  -d DBNAME, --dbname=DBNAME
                        database name
  -p PASSWORD, --password=PASSWORD
                        database admin password
  -P PACKAGE, --package=PACKAGE
                        specify package or 'all' or 'default'
-------------------------------------------------------------------------
Usage: unsandbox_subcluster [options]

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database to be modified
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes
  -c SCNAME, --subcluster=SCNAME
                        Name of subcluster to be un-sandboxed
  --timeout=NONINTERACTIVE_TIMEOUT
                        set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i)
  -i, --noprompts       do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: upgrade_license_key --database mydb --license my_license.key
upgrade_license_key --install --license my_license.key

Updates the vertica license.

Without '--install', updates the license used by the database and
the admintools license cache.

With '--install', updates the license cache in admintools that
is used for creating new databases.

Options:
  -h, --help            show this help message and exit
  -d DB, --database=DB  Name of database. Cannot be used with --install.
  -l LICENSE, --license=LICENSE
                        Required - path to the license.
  -i, --install         When option is included, command will only update the
                        admintools license cache. Cannot be used with
                        --database.
  -p PASSWORD, --password=PASSWORD
                        Database password.
-------------------------------------------------------------------------
Usage: view_cluster [options]

Options:
  -h, --help            show this help message and exit
  -x, --xpand           show the full cluster state, node by node
  -d DB, --database=DB  filter the output for a single database

6 - Operating the database

This topic explains how to start and stop your Vertica database, and how to use the database index tool.

This topic explains how to start and stop your Vertica database, and how to use the database index tool.

6.1 - Starting the database

You can start a database through one of the following:.

You can start a database through one of the following:

Administration tools

You can start a database with the Vertica Administration Tools:

  1. Open the Administration Tools and select View Database Cluster State to make sure all nodes are down and no other database is running.

  2. Open the Administration Tools. See Using the administration tools for information about accessing the Administration Tools.

  3. On the Main Menu, select Start Database,and then select OK.

  4. Select the database to start, and then click OK.

  5. Enter the database password and click OK.

  6. When prompted that the database started successfully, click OK.

  7. Check the log files to make sure that no startup problems occurred.

Command line

You can start a database with the command line tool start_db:

$ /opt/vertica/bin/admintools -t start_db -d db-name
     [-p password]
     [-s host1[,...] | --hosts=host1[,...]]
     [--timeout seconds]
     [-i | --noprompts]
     [--fast]
     [-F | --force]
Option Description
-d --database Name of database to start.
-p --password

Required only during database creation, when you install a new license.

If the license is valid, the option -p (or --password) is not required to start the database and is silently ignored. This is by design, as the database can only be started by the user who (as part of the verticadba UNIX user group) initially created the database or who has root or su privileges.

If the license is invalid, Vertica uses the -p password argument to attempt to upgrade the license with the license file stored in /opt/vertica/config/share/license.key.

-s --hosts

(Eon Mode only) Comma delimited list of primary node host names or IP addresses. If you use this option, start_db attempts to start the database using just the nodes in the list. If omitted, start_db starts all database nodes.

For details, see Start Just the Primary Nodes in an Eon Mode Database below.

--timeout The number of seconds a timeout in seconds to await startup completion. If set to never, start_db never times out (implicitly sets -i)
-i
--noprompts
Startup does not pause to await user input. Setting -i implies a timeout of 1200 seconds.
--fast (Eon Mode only) Attempts fast startup on a database using startup information from cluster_config.json. This option can only be used with databases that do not use Spread encryption.
-F
--force
Forces the database to start at an epoch before data consistency problems were detected.

The following example uses start_db to start a single-node database:

$ /opt/vertica/bin/admintools -t start_db -d VMart
Info:
no password specified, using none
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (UP)
Database VMart started successfully

Eon Mode database node startup

On starting an Eon Mode database, you can start all primary nodes, or a subset of them. In both cases, pass start_db the list of the primary nodes to start with the -s option.

The following requirements apply:

  • Primary node hosts must already be up to start the database.
  • The start_db tool cannot start stopped hosts such as cloud-based VMs. You must either manually start the hosts or use the MC to start the cluster.

The following example starts the three primary nodes in a six-node Eon Mode database:

$ admintools -t start_db -d verticadb -p 'password' \
   -s 10.11.12.10,10.11.12.20,10.11.12.30
    Starting nodes:
        v_verticadb_node0001 (10.11.12.10)
        v_verticadb_node0002 (10.11.12.20)
        v_verticadb_node0003 (10.11.12.30)
    Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
    Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
    Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
    Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
    Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
    Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
    Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
    Node Status: v_verticadb_node0001: (UP) v_verticadb_node0002: (UP) v_verticadb_node0003: (UP)
Syncing catalog on verticadb with 2000 attempts.
Database verticadb: Startup Succeeded.  All Nodes are UP

After the database starts, the secondary subclusters are down. You can choose to start them as needed. See Starting a Subcluster.

Starting the database with a subset of primary nodes

As a best pratice, Vertica recommends that you always start an Eon Mode database with all primary nodes. Occasionally, you might be unable to start the hosts for all primary nodes. In that case, you might need to start the database with a subset of its primary nodes.

If start_db specifies a subset of database primary nodes, the following requirements apply:

  • The nodes must comprise a quorum: at least 50% + 1 of all primary nodes in the cluster.
  • Collectively, the nodes must provide coverage for all shards in communal storage. The primary nodes you use to start the database do not attempt to rebalance shard subscriptions while starting up.

If either or both of these conditions are not met, start_db returns an error. In the following example, start_db specifies three primary nodes in a database with nine primary nodes. The command returns an error that it cannot start the database with fewer than five primary nodes:

$ admintools -t start_db -d verticadb -p 'password' \
    -s 10.11.12.10,10.11.12.20,10.11.12.30
    Starting nodes:
        v_verticadb_node0001 (10.11.12.10)
        v_verticadb_node0002 (10.11.12.20)
        v_verticadb_node0003 (10.11.12.30)
Error: Quorum not satisfied for verticadb.
    3 < minimum 5  of 9 primary nodes.
Attempted to start the following nodes:
Primary
        v_verticadb_node0001 (10.11.12.10)
        v_verticadb_node0003 (10.11.12.30)
        v_verticadb_node0002 (10.11.12.20)
Secondary

 hint: you may want to start all primary nodes in the database
Database start up failed.  Cluster partitioned.

If you try to start the database with fewer than the full set of primary nodes and the cluster fails to start, Vertica processes might continue to run on some of the hosts. If so, subsequent attempts to start the database will return with an error like this:

Error: the vertica process for the database is running on the following hosts:
10.11.12.10
10.11.12.20
10.11.12.30
This may be because the process has not completed previous shutdown activities. Please wait and retry again.
Database start up failed.  Processes still running.
Database verticadb did not start successfully: Processes still running.. Hint: you may need to start all primary nodes.

Before you can start the database, you must stop the Vertica server process on the hosts listed in the error message, either with the admintools menus or the admintools command line's stop_host tool:

$ admintools -t stop_host -s 10.11.12.10,10.11.12.20,10.11.12.30

6.2 - Stopping the database

There are many occasions when you must stop a database, for example, before an upgrade or performing various maintenance tasks.

There are many occasions when you must stop a database, for example, before an upgrade or performing various maintenance tasks. You can stop a running database through one of the following:

You cannot stop a running database if any users are connected or Database Designer is building or deploying a database design.

Administration tools

To stop a running database with admintools:

  1. Verify that all cluster nodes are up. If any nodes are down, identify and restart them.

  2. Close all user sessions:

    • Identify all users with active sessions by querying the SESSIONS system table. Notify users of the impending shutdown and request them to shut down their sessions.

    • Prevent users from starting new sessions by temporarily resetting configuration parameter MaxClientSessions to 0:

      => ALTER DATABASE DEFAULT SET MaxClientSessions = 0;
      
    • Close all remaining user sessions with Vertica functions CLOSE_SESSION and CLOSE_ALL_SESSIONS.

  3. Open Vertica Administration Tools.

  4. From the Main Menu:

    • Select Stop Database

    • Click OK

  5. Select the database to stop and click OK.

  6. Enter the password (if asked) and click OK.

  7. When prompted that database shutdown is complete, click OK.

Vertica functions

You can stop a database with the SHUTDOWN function. By default, the shutdown fails if any users are connected. To force a shutdown regardless of active user connections, call SHUTDOWN with an argument of true:

=> SELECT SHUTDOWN('true');
      SHUTDOWN
-------------------------
Shutdown: sync complete
(1 row)

In Eon Mode databases, you can stop subclusters with the SHUTDOWN_SUBCLUSTER and SHUTDOWN_WITH_DRAIN functions. SHUTDOWN_SUBCLUSTER shuts down subclusters immediately, whereas SHUTDOWN_WITH_DRAIN performs a graceful shutdown that drains client connections from subclusters before shutting them down. For more information, see Starting and stopping subclusters.

The following example demonstrates how you can shut down all subclusters in an Eon Mode database using SHUTDOWN_WITH_DRAIN:

=> SELECT SHUTDOWN_WITH_DRAIN('', 0);
NOTICE 0:  Begin shutdown of subcluster (default_subcluster, analytics)
                SHUTDOWN_WITH_DRAIN
-----------------------------------------------------------------------
Shutdown message sent to subcluster (default_subcluster, analytics)

(1 row)

Command line

You can stop a database with the admintools command stop_db:

$ $ admintools -t stop_db --help
Usage: stop_db [options]

Options:
  -h, --help            Show this help message and exit.
  -d DB, --database=DB  Name of database to be stopped.
  -p DBPASSWORD, --password=DBPASSWORD
                        Database password in single quotes.
  -F, --force           Force the database to shutdown, even if users are
                        connected.
  -z, --if-no-users     Only shutdown if no users are connected. If any users
                        are connected, exit with an error.
  -n DRAIN_SECONDS, --drain-seconds=DRAIN_SECONDS
                        Eon db only: seconds to wait for user connections to
                        close. Default value is 60 seconds. When the time
                        expires, connections will be forcibly closed and the
                        db will shut down.
  --timeout=NONINTERACTIVE_TIMEOUT
                        Set a timeout (in seconds) to wait for actions to
                        complete ('never') will wait forever (implicitly sets
                        -i).
  -i, --noprompts       Do not stop and wait for user input(default false).
                        Setting this implies a timeout of 20 min.

stop_db behavior depends on whether it stops an Eon Mode or Enterprise Mode database.

Stopping an Eon Mode database

In Eon Mode databases, the default behavior of stop_db is to call SHUTDOWN_WITH_DRAIN to gracefully shut down all subclusters in the database. This graceful shutdown process drains client connections from subclusters before shutting them down.

The stop_db option -n (--drain-seconds) lets you specify the number of seconds to wait—by default, 60—before forcefully closing client connections and shutting down all subclusters. If you set a negative -n value, the subclusters are marked as draining but do not shut down until all active user sessions disconnect.

In the following example, the database initially has an active client session, but the session closes before the timeout limit is reached and the database shuts down:

$ admintools -t stop_db -d verticadb --password password --drain-seconds 200
Shutdown will use connection draining.
Shutdown will wait for all client sessions to complete, up to 200 seconds
Then it will force a shutdown.
Poller has been running for 0:00:00.000025 seconds since 2022-07-27 17:10:08.292919


------------------------------------------------------------
client_sessions     |node_count          |node_names
--------------------------------------------------------------
0                   |5                   |v_verticadb_node0005,v_verticadb_node0006,v_verticadb_node0003,v_verticadb_node0...
1                   |1                   |v_verticadb_node0001
STATUS: vertica.engine.api.db_client.module is still running on 1 host: nodeIP as of 2022-07-27 17:10:18. See /opt/vertica/log/adminTools.log for full details.
Poller has been running for 0:00:11.371296 seconds since 2022-07-27 17:10:08.292919

...

------------------------------------------------------------
client_sessions     |node_count          |node_names
--------------------------------------------------------------
0                   |5                   |v_verticadb_node0002,v_verticadb_node0004,v_verticadb_node0003,v_verticadb_node0...
1                   |1                   |v_verticadb_node0001
Stopping poller drain_status because it was canceled
Shutdown metafunction complete. Polling until database processes have stopped.
Database verticadb stopped successfully

If you use the -z (--if-no-users) option, the database shuts down immediately if there are no active user sessions. Otherwise, the stop_db command returns an error:

$ admintools -t stop_db -d verticadb --password password --if-no-users
Running shutdown metafunction. Not using connection draining

             Active session details
| Session id                        | Host Ip        | Connected User |
| ------- --                        | ---- --        | --------- ---- |
| v_verticadb_node0001-107720:0x257 | 192.168.111.31 | analyst        |
Database verticadb not stopped successfully for the following reason:
Shutdown did not complete. Message: Shutdown: aborting shutdown
Active sessions prevented shutdown.
Omit the option --if-no-users to close sessions. See stop_db --help.

You can use the -F (or --force) option to shut down all subclusters immediately, without checking for active user sessions or draining the subclusters.

Stopping an Enterprise Mode database

In Enterprise Mode databases, the default behavior of stop_db is to shut down the database only if there are no active sessions. If users are connected to the database, the command aborts with an error message and lists all active sessions. For example:

$ /opt/vertica/bin/admintools -t stop_db -d VMart
Info: no password specified, using none

        Active session details
| Session id                     | Host Ip       | Connected User |
| ------- --                     | ---- --       | --------- ---- |
| v_vmart_node0001-91901:0x162   | 10.20.100.247 | analyst        |
Database VMart not stopped successfully for the following reason:
Unexpected output from shutdown: Shutdown: aborting shutdown
NOTICE: Cannot shut down while users are connected

You can use the -F (or --force) option to override user connections and force a shutdown.

6.3 - CRC and sort order check

As a superuser, you can run the Index tool on a Vertica database to perform two tasks:.

As a superuser, you can run the Index tool on a Vertica database to perform two tasks:

  • Run a cyclic redundancy check (CRC) on each block of existing data storage to check the data integrity of ROS data blocks.

  • Check that the sort order in ROS containers is correct.

If the database is down, invoke the Index tool from the Linux command line. If the database is up, invoke from VSQL with Vertica meta-function RUN_INDEX_TOOL:

Operation Database down Database up
Run CRC /opt/vertica/bin/vertica -D catalog-path -v SELECT RUN_INDEX_TOOL ('checkcrc',... );
Check sort order /opt/vertica/bin/vertica -D catalog-path -I SELECT RUN_INDEX_TOOL ('checksort',... );

If invoked from the command line, the Index tool runs only on the current node. However, you can run the Index tool on multiple nodes simultaneously.

Result output

The Index tool writes summary information about its operation to standard output; detailed information on results is logged in one of two locations, depending on the environment where you invoke the tool:

Invoked from: Results written to:
Linux command line indextool.log in the database catalog directory
VSQL vertica.log on the current node

For information about evaluating output for possible errors, see:

Optimizing performance

You can optimize meta-function performance by narrowing the scope of the operation to one or more projections, and specifying the number of threads used to execute the function. For details, see RUN_INDEX_TOOL.

6.3.1 - Evaluating CRC errors

Vertica evaluates the CRC values in each ROS data block each time it fetches data disk to process a query.

Vertica evaluates the CRC values in each ROS data block each time it fetches data disk to process a query. If CRC errors occur while fetching data, the following information is written to the vertica.log file:

CRC Check Failure Details:File Name:
File Offset:
Compressed size in file:
Memory Address of Read Buffer:
Pointer to Compressed Data:
Memory Contents:

The Event Manager is also notified of CRC errors, so you can use an SNMP trap to capture CRC errors:

"CRC mismatch detected on file <file_path>. File may be corrupted. Please check hardware and drivers."

If you run a query from vsql, ODBC, or JDBC, the query returns a FileColumnReader ERROR. This message indicates that a specific block's CRC does not match a given record as follows:

hint: Data file may be corrupt.  Ensure that all hardware (disk and memory) is working properly.
Possible solutions are to delete the file <pathname> while the node is down, and then allow the node
to recover, or truncate the table data.code: ERRCODE_DATA_CORRUPTED

6.3.2 - Evaluating sort order errors

If ROS data is not sorted correctly in the projection's order, query results that rely on sorted data will be incorrect.

If ROS data is not sorted correctly in the projection's order, query results that rely on sorted data will be incorrect. You can use the Index tool to check the ROS sort order if you suspect or detect incorrect query results. The Index tool evaluates each ROS row to determine whether it is sorted correctly. If the check locates a row that is not in order, it writes an error message to the log file with the row number and contents of the unsorted row.

Reviewing errors

  1. Open the indextool.log file. For example:

    $ cd VMart/v_check_node0001_catalog
    
  2. Look for error messages that include an OID number and the string Sort Order Violation. For example:

    <INFO> ...on oid 45035996273723545: Sort Order Violation:
    
  3. Find detailed information about the sort order violation string by running grep on indextool.log. For example, the following command returns the line before each string (-B1), and the four lines that follow (-A4):

    [15:07:55][vertica-s1]: grep -B1 -A4 'Sort Order Violation:' /my_host/databases/check/v_check_node0001_catalog/indextool.log
    2012-06-14 14:07:13.686 unknown:0x7fe1da7a1950 [EE] <INFO> An error occurred when running index tool thread on oid 45035996273723537:
    Sort Order Violation:
    Row Position: 624
    Column Index: 0
    Last Row: 2576000
    This Row: 2575000
    --
    2012-06-14 14:07:13.687 unknown:0x7fe1dafa2950 [EE] <INFO> An error occurred when running index tool thread on oid 45035996273723545:
    Sort Order Violation:
    Row Position: 3
    Column Index: 0
    Last Row: 4
    This Row: 2
    --
    
  4. Find the projection where a sort order violation occurred by querying system table STORAGE_CONTAINERS. Use a storage_oid equal to the OID value listed in indextool.log. For example:

    => SELECT * FROM storage_containers WHERE storage_oid = 45035996273723545;
    

7 - Working with native tables

You can create two types of native tables in Vertica (ROS format), columnar and flexible.

You can create two types of native tables in Vertica (ROS format), columnar and flexible. You can create both types as persistent or temporary. You can also create views that query a specific set of table columns.

The tables described in this section store their data in and are managed by the Vertica database. Vertica also supports external tables, which are defined in the database and store their data externally. For more information about external tables, see Working with external data.

7.1 - Creating tables

Use the CREATE TABLE statement to create a native table in the Vertica.

Use the CREATE TABLE statement to create a native table in the Vertica logical schema. You can specify the columns directly, as in the following example, or you can derive a table definition from another table using a LIKE or AS clause. You can specify constraints, partitioning, segmentation, and other factors. For details and restrictions, see the reference page.

The following example shows a basic table definition:

=> CREATE TABLE orders(
    orderkey    INT,
    custkey     INT,
    prodkey     ARRAY[VARCHAR(10)],
    orderprices ARRAY[DECIMAL(12,2)],
    orderdate   DATE
);

Table data storage

Unlike traditional databases that store data in tables, Vertica physically stores table data in projections, which are collections of table columns. Projections store data in a format that optimizes query execution. Similar to materialized views, they store result sets on disk rather than compute them each time they are used in a query.

In order to query or perform any operation on a Vertica table, the table must have one or more projections associated with it. For more information, see Projections.

Deriving a table definition from the data

You can use the INFER_TABLE_DDL function to inspect Parquet, ORC, JSON, or Avro data and produce a starting point for a table definition. This function returns a CREATE TABLE statement, which might require further editing. For columns where the function could not infer the data type, the function labels the type as unknown and emits a warning. For VARCHAR and VARBINARY columns, you might need to adjust the length. Always review the statement the function returns, but especially for tables with many columns, using this function can save time and effort.

Parquet, ORC, and Avro files include schema information, but JSON files do not. For JSON, the function inspects the raw data to produce one or more candidate table definitions. See the function reference page for JSON examples.

In the following example, the function infers a complete table definition from Parquet input, but the VARCHAR columns use the default size and might need to be adjusted:

=> SELECT INFER_TABLE_DDL('/data/people/*.parquet'
        USING PARAMETERS format = 'parquet', table_name = 'employees');
WARNING 9311:  This generated statement contains one or more varchar/varbinary columns which default to length 80
                    INFER_TABLE_DDL
-------------------------------------------------------------------------
 create table "employees"(
  "employeeID" int,
  "personal" Row(
    "name" varchar,
    "address" Row(
      "street" varchar,
      "city" varchar,
      "zipcode" int
    ),
    "taxID" int
  ),
  "department" varchar
 );
(1 row)

For Parquet files, you can use the GET_METADATA function to inspect a file and report metadata including information about columns.

See also

7.2 - Creating temporary tables

CREATE TEMPORARY TABLE creates a table whose data persists only during the current session.

CREATE TEMPORARY TABLE creates a table whose data persists only during the current session. Temporary table data is never visible to other sessions.

By default, all temporary table data is transaction-scoped—that is, the data is discarded when a COMMIT statement ends the current transaction. If CREATE TEMPORARY TABLE includes the parameter ON COMMIT PRESERVE ROWS, table data is retained until the current session ends.

Temporary tables can be used to divide complex query processing into multiple steps. Typically, a reporting tool holds intermediate results while reports are generated—for example, the tool first gets a result set, then queries the result set, and so on.

When you create a temporary table, Vertica automatically generates a default projection for it. For more information, see Auto-projections.

Global versus local tables

CREATE TEMPORARY TABLE can create tables at two scopes, global and local, through the keywords GLOBAL and LOCAL, respectively:

Global temporary tables Vertica creates global temporary tables in the public schema. Definitions of these tables are visible to all sessions, and persist across sessions until they are explicitly dropped. Multiple users can access the table concurrently. Table data is session-scoped, so it is visible only to the session user, and is discarded when the session ends.
Local temporary tables Vertica creates local temporary tables in the V_TEMP_SCHEMA namespace and inserts them transparently into the user's search path. These tables are visible only to the session where they are created. When the session ends, Vertica automatically drops the table and its data.

Data retention

You can specify whether temporary table data is transaction- or session-scoped:

  • ON COMMIT DELETE ROWS (default): Vertica automatically removes all table data when each transaction ends.

  • ON COMMIT PRESERVE ROWS: Vertica preserves table data across transactions in the current session. Vertica automatically truncates the table when the session ends.

ON COMMIT DELETE ROWS

By default, Vertica removes all data from a temporary table, whether global or local, when the current transaction ends.

For example:

=> CREATE TEMPORARY TABLE tempDelete (a int, b int);
CREATE TABLE
=> INSERT INTO tempDelete VALUES(1,2);
 OUTPUT
--------
      1
(1 row)

=> SELECT * FROM tempDelete;
 a | b
---+---
 1 | 2
(1 row)

=> COMMIT;
COMMIT

=> SELECT * FROM tempDelete;
 a | b
---+---
(0 rows)

If desired, you can use DELETE within the same transaction multiple times, to refresh table data repeatedly.

ON COMMIT PRESERVE ROWS

You can specify that a temporary table retain data across transactions in the current session, by defining the table with the keywords ON COMMIT PRESERVE ROWS. Vertica automatically removes all data from the table only when the current session ends.

For example:

=> CREATE TEMPORARY TABLE tempPreserve (a int, b int) ON COMMIT PRESERVE ROWS;
CREATE TABLE
=> INSERT INTO tempPreserve VALUES (1,2);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM tempPreserve;
 a | b
---+---
 1 | 2
(1 row)

=> INSERT INTO tempPreserve VALUES (3,4);
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM tempPreserve;
 a | b
---+---
 1 | 2
 3 | 4
(2 rows)

Eon restrictions

The following Eon Mode restrictions apply to temporary tables:

  • K-safety of temporary tables is always set to 0, regardless of system K-safety. If a CREATE TEMPORARY TABLE statement sets k-num greater than 0, Vertica returns an warning.
  • If subscriptions to the current session change, temporary tables in that session becomes inaccessible. Causes for session subscription changes include:

    • A node left the list of participating nodes.

    • A new node appeared in the list of participating nodes.

    • An active node changed for one or more shards.

    • A mergeout operation in the same session that is triggered by a user explicitly invoking DO_TM_TASK('mergeout'), or changing a column data type with ALTER TABLE...ALTER COLUMN.

7.3 - Creating a table from other tables

You can create a table from other tables in two ways:.

You can create a table from other tables in two ways:

7.3.1 - Replicating a table

You can create a table from an existing one using CREATE TABLE with the LIKE clause:.

You can create a table from an existing one using CREATE TABLE with the LIKE clause:

CREATE TABLE [ IF NOT EXISTS ] [[ { namespace. | database. } ]schema.]table
LIKE [[ { namespace. | database. } ]schema.]existing-table
  [ { INCLUDING | EXCLUDING } PROJECTIONS ]
  [ { INCLUDE | EXCLUDE } [SCHEMA] PRIVILEGES ]

Creating a table with LIKE replicates the source table definition and any storage policy associated with it. Table data and expressions on columns are not copied to the new table.

The user performing the operation owns the new table.

The source table cannot have out-of-date projections and cannot be a temporary table.

Copying constraints

CREATE TABLE LIKE copies all table constraints except for:

  • Foreign key constraints.
  • Sequence column constraints.

For any column that obtains its values from a sequence, including IDENTITY columns, Vertica copies the column values into the new table, but removes the original constraint. For example, the following table definition sets an IDENTITY constraint on the ID column:

=> CREATE TABLE public.Premium_Customer
  (
    ID IDENTITY,
    lname varchar(25),
    fname varchar(25),
    store_membership_card int
  );

The following CREATE TABLE LIKE statement uses the source table Premium_Customer to create the replica All_Customers. Vertica removes the IDENTITY constraint, changing the column to an integer column with a NOT NULL constraint:

=> CREATE TABLE All_Customers LIKE Premium_Customer;
   CREATE TABLE

=> SELECT export_tables('','All_Customers');
                   export_tables
---------------------------------------------------
CREATE TABLE public.All_Customers
(
    ID int NOT NULL,
    lname varchar(25),
    fname varchar(25),
    store_membership_card int
);
(1 row)

Including projections

You can qualify the LIKE clause with INCLUDING PROJECTIONS or EXCLUDING PROJECTIONS, which specify whether to copy projections from the source table:

  • EXCLUDING PROJECTIONS (default): Do not copy projections from the source table.

  • INCLUDING PROJECTIONS: Copy current projections from the source table. Vertica names the new projections according to Vertica naming conventions, to avoid name conflicts with existing objects.

Including schema privileges

You can specify default inheritance of schema privileges for the new table:

  • EXCLUDE [SCHEMA] PRIVILEGES (default) disables inheritance of privileges from the schema

  • INCLUDE [SCHEMA] PRIVILEGES grants the table the same privileges granted to its schema

For more information see Setting privilege inheritance on tables and views.

Examples

  1. Create the table states:

    => CREATE TABLE states (
         state char(2) NOT NULL, bird varchar(20), tree varchar (20), tax float, stateDate char (20))
         PARTITION BY state;
    
  2. Populate the table with data:

    INSERT INTO states VALUES ('MA', 'chickadee', 'american_elm', 5.675, '07-04-1620');
    INSERT INTO states VALUES ('VT', 'Hermit_Thrasher', 'Sugar_Maple', 6.0, '07-04-1610');
    INSERT INTO states VALUES ('NH', 'Purple_Finch', 'White_Birch', 0, '07-04-1615');
    INSERT INTO states VALUES ('ME', 'Black_Cap_Chickadee', 'Pine_Tree', 5, '07-04-1615');
    INSERT INTO states VALUES ('CT', 'American_Robin', 'White_Oak', 6.35, '07-04-1618');
    INSERT INTO states VALUES ('RI', 'Rhode_Island_Red', 'Red_Maple', 5, '07-04-1619');
    
  3. View the table contents:

    => SELECT * FROM states;
     state |        bird         |     tree     |  tax  |      stateDate
    -------+---------------------+--------------+-------+----------------------
     VT    | Hermit_Thrasher     | Sugar_Maple  |     6 | 07-04-1610
     CT    | American_Robin      | White_Oak    |  6.35 | 07-04-1618
     RI    | Rhode_Island_Red    | Red_Maple    |     5 | 07-04-1619
     MA    | chickadee           | american_elm | 5.675 | 07-04-1620
     NH    | Purple_Finch        | White_Birch  |     0 | 07-04-1615
     ME    | Black_Cap_Chickadee | Pine_Tree    |     5 | 07-04-1615
    (6 rows
    
  4. Create a sample projection and refresh:

    => CREATE PROJECTION states_p AS SELECT state FROM states;
    
    => SELECT START_REFRESH();
    
  5. Create a table like the states table and include its projections:

    => CREATE TABLE newstates LIKE states INCLUDING PROJECTIONS;
    
  6. View projections for the two tables. Vertica has copied projections from states to newstates:

    => \dj
                                                          List of projections
                Schema             |                   Name                    |  Owner  |       Node       | Comment
    -------------------------------+-------------------------------------------+---------+------------------+---------
     public                        | newstates_b0                              | dbadmin |                  |
     public                        | newstates_b1                              | dbadmin |                  |
     public                        | newstates_p_b0                            | dbadmin |                  |
     public                        | newstates_p_b1                            | dbadmin |                  |
     public                        | states_b0                                 | dbadmin |                  |
     public                        | states_b1                                 | dbadmin |                  |
     public                        | states_p_b0                               | dbadmin |                  |
     public                        | states_p_b1                               | dbadmin |                  |
    
  7. Query the new table:

    => SELECT * FROM newstates;
     state | bird | tree | tax | stateDate
    -------+------+------+-----+-----------
    (0 rows)
    

When you use the CREATE TABLE LIKE statement, storage policy objects associated with the table are also copied. Data added to the new table use the same labeled storage location as the source table, unless you change the storage policy. For more information, see Working With Storage Locations.

See also

7.3.2 - Creating a table from a query

CREATE TABLE can specify an AS clause to create a table from a query, as follows:.

CREATE TABLE can specify an AS clause to create a table from query results, as in the following example:

=> CREATE TABLE cust_basic_profile AS SELECT
     customer_key, customer_gender, customer_age, marital_status, annual_income, occupation
   FROM customer_dimension WHERE customer_age>18 AND customer_gender !='';
CREATE TABLE

=> SELECT customer_age, annual_income, occupation 
   FROM cust_basic_profile
   WHERE customer_age > 23 ORDER BY customer_age;
 customer_age | annual_income |     occupation
--------------+---------------+--------------------
           24 |        469210 | Hairdresser
           24 |        140833 | Butler
           24 |        558867 | Lumberjack
           24 |        529117 | Mechanic
           24 |        322062 | Acrobat
           24 |        213734 | Writer
           ...

Labeling the AS clause

You can embed a LABEL hint in an AS clause in two places:

  • Immediately after the AS keyword:

    => CREATE TABLE myTable AS /*+LABEL myLabel*/ ...
    
  • In the SELECT statement:

    => CREATE TABLE myTable AS SELECT /*+LABEL myLabel*/ ...
    

If the AS clause contains labels in both places, the first label has precedence.

Labels are invalid for external tables.

Loading historical data

You can specify that the query return historical data by adding AT followed by one of:

  • EPOCH LATEST: Return data up to but not including the current epoch. The result set includes data from the latest committed DML transaction.

  • EPOCH integer: Return data up to and including the specified epoch.

  • TIME 'timestamp': Return data from the epoch at the specified timestamp.

These options are ignored if used to query temporary or external tables.

See Epochs for additional information about how Vertica uses epochs.

Zero-width column handling

If the query returns a column with zero width, Vertica automatically converts it to a VARCHAR(80) column. For example:

=> CREATE TABLE example AS SELECT '' AS X;
CREATE TABLE

=> SELECT EXPORT_TABLES ('', 'example');
                       EXPORT_TABLES
----------------------------------------------------------
CREATE TABLE public.example
(
    X varchar(80)
);

Requirements and restrictions

  • If you create a temporary table from a query, you must specify ON COMMIT PRESERVE ROWS in order to load the result set into the table. Otherwise, Vertica creates an empty table.

  • If the query output has expressions other than simple columns, such as constants or functions, you must specify an alias for each expression, or list all columns in the column name list.

  • You cannot use CREATE TABLE AS SELECT with a SELECT that returns values of complex types. You can, however, use CREATE TABLE LIKE.

See also

7.4 - Immutable tables

Many secure systems contain records that must be provably immune to change.

Many secure systems contain records that must be provably immune to change. Protective strategies such as row and block checksums incur high overhead. Moreover, these approaches are not foolproof against unauthorized changes, whether deliberate or inadvertent, by database administrators or other users with sufficient privileges.

Immutable tables are insert-only tables in which existing data cannot be modified, regardless of user privileges. Updating row values and deleting rows are prohibited. Certain changes to table metadata—for example, renaming table columns—are also prohibited, in order to prevent attempts to circumvent these restrictions. Flattened or external tables, which obtain their data from outside sources, cannot be set to be immutable.

You define an existing table as immutable with ALTER TABLE:

ALTER TABLE table SET IMMUTABLE ROWS;

Once set, table immutability cannot be reverted, and is immediately applied to all existing table data, and all data that is loaded thereafter. In order to modify the data of an immutable table, you must copy the data to a new table—for example, with COPY, CREATE TABLE...AS, or COPY_TABLE.

When you execute ALTER TABLE...SET IMMUTABLE ROWS on a table, Vertica sets two columns for that table in the system table TABLES. Together, these columns show when the table was made immutable:

  • immutable_rows_since_timestamp: Server system time when immutability was applied. This is valuable for long-term timestamp retrieval and efficient comparison.
  • immutable_rows_since_epoch: The epoch that was current when immutability was applied. This setting can help protect the table from attempts to pre-insert records with a future timestamp, so that row's epoch is less than the table's immutability epoch.

Enforcement

The following operations are prohibited on immutable tables:

The following partition management functions are disallowed when the target table is immutable:

Allowed operations

In general, you can execute any DML operation on an immutable table that does not affect existing row data—for example, add rows with COPY or INSERT. After you add data to an immutable table, it cannot be changed.

Other allowed operations fall generally into two categories:

7.5 - Disk quotas

By default, schemas and tables are limited only by available disk space and license capacity.

By default, schemas and tables are limited only by available disk space and license capacity. You can set disk quotas for schemas or individual tables, for example, to support multi-tenancy. Setting, modifying, or removing a disk quota requires superuser privileges.

Most user operations that increase storage size enforce disk quotas. A table can temporarily exceed its quota during some operations such as recovery. If you lower a quota below the current usage, no data is lost but you cannot add more. Treat quotas as advisory, not as hard limits

A schema quota, if set, must be larger than the largest table quota within it.

A disk quota is a string composed of an integer and a unit of measure (K, M, G, or T), such as '15G' or '1T'. Do not use a space between the number and the unit. No other units of measure are supported.

To set a quota at creation time, use the DISK_QUOTA option for CREATE SCHEMA or CREATE TABLE:

=> CREATE SCHEMA internal DISK_QUOTA '10T';
CREATE SCHEMA

=> CREATE TABLE internal.sales (...) DISK_QUOTA '5T';
CREATE TABLE

=> CREATE TABLE internal.leads (...) DISK_QUOTA '12T';
ROLLBACK 0:  Table can not have a greater disk quota than its Schema

To modify, add, or remove a quota on an existing schema or table, use ALTER SCHEMA or ALTER TABLE:

=> ALTER SCHEMA internal DISK_QUOTA '20T';
ALTER SCHEMA

=> ALTER TABLE internal.sales DISK_QUOTA SET NULL;
ALTER TABLE

You can set a quota that is lower than the current usage. The ALTER operation succeeds, the schema or table is temporarily over quota, and you cannot perform operations that increase data usage.

Data that is counted

In Eon Mode, disk usage is an aggregate of all space used by all shards for the schema or table. This value is computed for primary subscriptions only.

In Enterprise Mode, disk usage is the sum space used by all storage containers on all nodes for the schema or table. This sum excludes buddy projections but includes all other projections.

Disk usage is calculated based on compressed size.

When quotas are applied

Quotas, if present, affect most DML and ILM operations, including:

The following example shows a failure caused by exceeding a table's quota:

=> CREATE TABLE stats(score int) DISK_QUOTA '1k';
CREATE TABLE

=> COPY stats FROM STDIN;
1
2
3
4
5
\.
ERROR 0: Disk Quota Exceeded for the Table object public.stats
HINT: Delete data and PURGE or increase disk quota at the table level

DELETE does not free space, because deleted data is still preserved in the storage containers. The delete vector that is added by a delete operation does not count against a quota, so deleting is a quota-neutral operation. Disk space for deleted data is reclaimed when you purge it; see Removing table data.

Some uncommon operations, such as ADD COLUMN, RESTORE, and SWAP PARTITION, can create new storage containers during the transaction. These operations clean up the extra locations upon completion, but while the operation is in progress, a table or schema could exceed its quota. If you get disk-quota errors during these operations, you can temporarily increase the quota, perform the operation, and then reset it.

Quotas do not affect recovery, rebalancing, or Tuple Mover operations.

Monitoring

The DISK_QUOTA_USAGES system table shows current disk usage for tables and schemas that have quotas. This table does not report on objects that do not have quotas.

You can use this table to monitor usage and make decisions about adjusting quotas:

=> SELECT * FROM DISK_QUOTA_USAGES;
    object_oid     | object_name | is_schema | total_disk_usage_in_bytes | disk_quota_in_bytes
-------------------+-------------+-----------+---------------------+---------------------
 45035996273705100 | s           | t         |                 307 |               10240
 45035996273705104 | public.t    | f         |                 614 |                1024
 45035996273705108 | s.t         | f         |                 307 |                2048
(3 rows)

7.6 - Managing table columns

After you define a table, you can use ALTER TABLE to modify existing table columns.

After you define a table, you can use ALTER TABLE to modify existing table columns. You can perform the following operations on a column:

7.6.1 - Renaming columns

You rename a column with ALTER TABLE as follows:.

You rename a column with ALTER TABLE as follows:

ALTER TABLE [schema.]table-name  RENAME [ COLUMN ] column-name TO new-column-name

The following example renames a column in the Retail.Product_Dimension table from Product_description to Item_description:

=> ALTER TABLE Retail.Product_Dimension
    RENAME COLUMN Product_description TO Item_description;

If you rename a column that is referenced by a view, the column does not appear in the result set of the view even if the view uses the wild card (*) to represent all columns in the table. Recreate the view to incorporate the column's new name.

7.6.2 - Changing scalar column data type

In general, you can change a column's data type with ALTER TABLE if doing so does not require storage reorganization.

In general, you can change a column's data type with ALTER TABLE if doing so does not require storage reorganization. After you modify a column's data type, data that you load conforms to the new definition.

The sections that follow describe requirements and restrictions associated with changing a column with a scalar (primitive) data type. For information on modifying complex type columns, see Adding a new field to a complex type column.

Supported data type conversions

Vertica supports conversion for the following data types:

Data Types Supported Conversions
Binary Expansion and contraction.
Character All conversions between CHAR, VARCHAR, and LONG VARCHAR.
Exact numeric

All conversions between the following numeric data types: integer data types—INTEGER, INT, BIGINT, TINYINT, INT8, SMALLINT—and NUMERIC values of scale <=18 and precision 0.

You cannot modify the scale of NUMERIC data types; however, you can change precision in the ranges (0-18), (19-37), and so on.

Collection

The following conversions are supported:

  • Collection of one element type to collection of another element type, if the source element type can be coerced to the target element type.
  • Between arrays and sets.
  • Collection type to the same type (array to array or set to set), to change bounds or binary size.

For details, see Changing Collection Columns.

Unsupported data type conversions

Vertica does not allow data type conversion on types that require storage reorganization:

  • Boolean

  • DATE/TIME

  • Approximate numeric type

  • BINARY to VARBINARY and vice versa

You also cannot change a column's data type if the column is one of the following:

  • Primary key

  • Foreign key

  • Included in the SEGMENTED BY clause of any projection for that table.

You can work around some of these restrictions. For details, see Working with column data conversions.

7.6.2.1 - Changing column width

You can expand columns within the same class of data type.

You can expand columns within the same class of data type. Doing so is useful for storing larger items in a column. Vertica validates the data before it performs the conversion.

In general, you can also reduce column widths within the data type class. This is useful to reclaim storage if the original declaration was longer than you need, particularly with strings. You can reduce column width only if the following conditions are true:

  • Existing column data is no greater than the new width.

  • All nodes in the database cluster are up.

Otherwise, Vertica returns an error and the conversion fails. For example, if you try to convert a column from varchar(25) to varchar(10)Vertica allows the conversion as long as all column data is no more than 10 characters.

In the following example, columns y and z are initially defined as VARCHAR data types, and loaded with values 12345 and 654321, respectively. The attempt to reduce column z's width to 5 fails because it contains six-character data. The attempt to reduce column y's width to 5 succeeds because its content conforms with the new width:

=> CREATE TABLE t (x int, y VARCHAR, z VARCHAR);
CREATE TABLE
=> CREATE PROJECTION t_p1 AS SELECT * FROM t SEGMENTED BY hash(x) ALL NODES;
CREATE PROJECTION
=> INSERT INTO t values(1,'12345','654321');
 OUTPUT
--------
      1
(1 row)

=> SELECT * FROM t;
 x |   y   |   z
---+-------+--------
 1 | 12345 | 654321
(1 row)

=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ROLLBACK 2378:  Cannot convert column "z" to type "char(5)"
HINT:  Verify that the data in the column conforms to the new type
=> ALTER TABLE t ALTER COLUMN y SET DATA TYPE char(5);
ALTER TABLE

Changing collection columns

If a column is a collection data type, you can use ALTER TABLE to change either its bounds or its maximum binary size. These properties are set at table creation time and can then be altered.

You can make a collection bounded, setting its maximum number of elements, as in the following example.

=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int,10];
ALTER TABLE

=> \d test.t1
                                     List of Fields by Tables
 Schema | Table | Column |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
  test  |  t1   | arr    | array[int8, 10] |   80 |         | f        | f           |
(1 row)

Alternatively, you can set the binary size for the entire collection instead of setting bounds. Binary size is set either explicitly or from the DefaultArrayBinarySize configuration parameter. The following example creates an array column from the default, changes the default, and then uses ALTER TABLE to change it to the new default.

=> SELECT get_config_parameter('DefaultArrayBinarySize');
 get_config_parameter
----------------------
 100
(1 row)

=> CREATE TABLE test.t1 (arr array[int]);
CREATE TABLE

=> \d test.t1
                                     List of Fields by Tables
 Schema | Table | Column |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
  test  |  t1   | arr    | array[int8](96) |   96 |         | f        | f           |
(1 row)

=> ALTER DATABASE DEFAULT SET DefaultArrayBinarySize=200;
ALTER DATABASE

=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int];
ALTER TABLE

=> \d test.t1
                                     List of Fields by Tables
 Schema | Table | Column |      Type       | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
  test  |  t1   | arr    | array[int8](200)|  200 |         | f        | f           |
(1 row)

Alternatively, you can set the binary size explicitly instead of using the default value.

=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int](300);

Purging historical data

You cannot reduce a column's width if Vertica retains any historical data that exceeds the new width. To reduce the column width, first remove that data from the table:

  1. Advance the AHM to an epoch more recent than the historical data that needs to be removed from the table.

  2. Purge the table of all historical data that precedes the AHM with the function PURGE_TABLE.

For example, given the previous example, you can update the data in column t.z as follows:

=> UPDATE t SET z = '54321';
 OUTPUT
--------
      1
(1 row)

=> SELECT * FROM t;
 x |   y   |   z
---+-------+-------
 1 | 12345 | 54321
(1 row)

Although no data in column z now exceeds 5 characters, Vertica retains the history of its earlier data, so attempts to reduce the column width to 5 return an error:

=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ROLLBACK 2378:  Cannot convert column "z" to type "char(5)"
HINT:  Verify that the data in the column conforms to the new type

You can reduce the column width by purging the table's historical data as follows:

=> SELECT MAKE_AHM_NOW();
         MAKE_AHM_NOW
-------------------------------
 AHM set (New AHM Epoch: 6350)
(1 row)

=> SELECT PURGE_TABLE('t');
                                                     PURGE_TABLE
----------------------------------------------------------------------------------------------------------------------
 Task: purge operation
(Table: public.t) (Projection: public.t_p1_b0)
(Table: public.t) (Projection: public.t_p1_b1)

(1 row)

=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ALTER TABLE

7.6.2.2 - Working with column data conversions

Vertica conforms to the SQL standard by disallowing certain data conversions for table columns.

Vertica conforms to the SQL standard by disallowing certain data conversions for table columns. However, you sometimes need to work around this restriction when you convert data from a non-SQL database. The following examples describe one such workaround, using the following table:

=> CREATE TABLE sales(id INT, price VARCHAR) UNSEGMENTED ALL NODES;
CREATE TABLE
=> INSERT INTO sales VALUES (1, '$50.00');
 OUTPUT
--------
      1
(1 row)

=> INSERT INTO sales VALUES (2, '$100.00');
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> SELECT * FROM SALES;
 id |  price
----+---------
  1 | $50.00
  2 | $100.00
(2 rows)

To convert the price column's existing data type from VARCHAR to NUMERIC, complete these steps:

  1. Add a new column for temporary use. Assign the column a NUMERIC data type, and derive its default value from the existing price column.

  2. Drop the original price column.

  3. Rename the new column to the original column.

Add a new column for temporary use

  1. Add a column temp_price to table sales. You can use the new column temporarily, setting its data type to what you want (NUMERIC), and deriving its default value from the price column. Cast the default value for the new column to a NUMERIC data type and query the table:

    => ALTER TABLE sales ADD COLUMN temp_price NUMERIC(10,2) DEFAULT
    SUBSTR(sales.price, 2)::NUMERIC;
    ALTER TABLE
    
    => SELECT * FROM SALES;
     id |  price  | temp_price
    ----+---------+------------
      1 | $50.00  |      50.00
      2 | $100.00 |     100.00
    (2 rows)
    
  2. Use ALTER TABLE to drop the default expression from the new column temp_price. Vertica retains the values stored in this column:

    => ALTER TABLE sales ALTER COLUMN temp_price DROP DEFAULT;
    ALTER TABLE
    

Drop the original price column

Drop the extraneous price column. Before doing so, you must first advance the AHM to purge historical data that would otherwise prevent the drop operation:

  1. Advance the AHM:

    => SELECT MAKE_AHM_NOW();
             MAKE_AHM_NOW
    -------------------------------
     AHM set (New AHM Epoch: 6354)
    (1 row)
    
  2. Drop the original price column:

    => ALTER TABLE sales DROP COLUMN price CASCADE;
    ALTER COLUMN
    

Rename the new column to the original column

You can now rename the temp_price column to price:

  1. Use ALTER TABLE to rename the column:

    => ALTER TABLE sales RENAME COLUMN temp_price to price;
    
  2. Query the sales table again:

    => SELECT * FROM sales;
     id | price
    ----+--------
      1 |  50.00
      2 | 100.00
    (2 rows)
    

7.6.3 - Adding a new field to a complex type column

You can add new fields to columns of complex types (any combination or nesting of arrays and structs) in native tables.

You can add new fields to columns of complex types (any combination or nesting of arrays and structs) in native tables. To add a field to an existing table's column, use a single ALTER TABLE statement.

Requirements and restrictions

The following are requirements and restrictions associated with adding a new field to a complex type column:

  • New fields can only be added to rows/structs.
  • The new type definition must contain all of the existing fields in the complex type column. Dropping existing fields from the complex type is not allowed. All of the existing fields in the new type must exactly match their definitions in the old type.This requirement also means that existing fields cannot be renamed.
  • New fields can only be added to columns of native (non-external) tables.
  • New fields can be added at any level within a nested complex type. For example, if you have a column defined as ROW(id INT, name ROW(given_name VARCHAR(20), family_name VARCHAR(20)), you can add a middle_name field to the nested ROW.
  • New fields can be of any type, either complex or primitive.
  • Blank field names are not allowed when adding new fields. Note that blank field names in complex type columns are allowed when creating the table. Vertica automatically assigns a name to each unnamed field.
  • If you change the ordering of existing fields using ALTER TABLE, the change affects existing data in addition to new data. This means it is possible to reorder existing fields.
  • When you call ALTER COLUMN ... SET DATA TYPE to add a field to a complex type column, Vertica will place an O lock on the table preventing DELETE, UPDATE, INSERT, and COPY statements from accessing the table and blocking SELECT statements issued at SERIALIZABLE isolation level, until the operation completes.
  • Performance is slower when adding a field to an array element than when adding a field to an element not nested in an array.

Examples

Adding a field

Consider a company storing customer data:

=> CREATE TABLE customers(id INT, name VARCHAR, address ROW(street VARCHAR, city VARCHAR, zip INT));
CREATE TABLE

The company has just decided to expand internationally, so now needs to add a country field:

=> ALTER TABLE customers ALTER COLUMN address
SET DATA TYPE ROW(street VARCHAR, city VARCHAR, zip INT, country VARCHAR);
ALTER TABLE

You can view the table definition to confirm the change:


=> \d customers
List of Fields by Tables
 Schema |   Table   | Column  |                                 Type                                 | Size | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+----------------------------------------------------------------------+------+---------+----------+-------------+-------------
 public | customers | id      | int                                                                  |    8 |         | f        | f           |
 public | customers | name    | varchar(80)                                                          |   80 |         | f        | f           |
 public | customers | address | ROW(street varchar(80),city varchar(80),zip int,country varchar(80)) |   -1 |         | f        | f           |
 (3 rows)

You can also see that the country field remains null for existing customers:

=> SELECT * FROM customers;
 id | name |                                    address
----+------+--------------------------------------------------------------------------------
  1 | mina | {"street":"1 allegheny square east","city":"hamden","zip":6518,"country":null}
 (1 row)

Common error messages

While you can add one or more fields with a single ALTER TABLE statement, existing fields cannot be removed. The following example throws an error because the city field is missing:

=> ALTER TABLE customers ALTER COLUMN address SET DATA TYPE ROW(street VARCHAR, state VARCHAR, zip INT, country VARCHAR);
ROLLBACK 2377:  Cannot convert column "address" from "ROW(varchar(80),varchar(80),int,varchar(80))" to type "ROW(varchar(80),varchar(80),int,varchar(80))"

Similarly, you cannot alter the type of an existing field. The following example will throw an error because the zip field's type cannot be altered:

=> ALTER TABLE customers ALTER COLUMN address SET DATA TYPE ROW(street VARCHAR, city VARCHAR, zip VARCHAR, country VARCHAR);
ROLLBACK 2377:  Cannot convert column "address" from "ROW(varchar(80),varchar(80),int,varchar(80))" to type "ROW(varchar(80),varchar(80),varchar(80),varchar(80))"

Additional properties

A complex type column's field order follows the order specified in the ALTER command, allowing you to reorder a column's existing fields. The following example reorders the fields of the address column:

=> ALTER TABLE customers ALTER COLUMN address
SET DATA TYPE ROW(street VARCHAR, country VARCHAR, city VARCHAR, zip INT);
ALTER TABLE

The table definition shows the address column's fields have been reordered:


=> \d customers
List of Fields by Tables
 Schema |   Table   | Column  |                                 Type                                 | Size | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+----------------------------------------------------------------------+------+---------+----------+-------------+-------------
 public | customers | id      | int                                                                  |    8 |         | f        | f           |
 public | customers | name    | varchar(80)                                                          |   80 |         | f        | f           |
 public | customers | address | ROW(street varchar(80),country varchar(80),city varchar(80),zip int) |   -1 |         | f        | f           |
 (3 rows)

Note that you cannot add new fields with empty names. When creating a complex table, however, you can omit field names, and Vertica automatically assigns a name to each unnamed field:

=> CREATE TABLE products(name VARCHAR, description ROW(VARCHAR));
CREATE TABLE

Because the field created in the description column has not been named, Vertica assigns it a default name. This default name can be checked in the table definition:

=> \d products
List of Fields by Tables
 Schema |  Table   |   Column    |        Type         | Size | Default | Not Null | Primary Key | Foreign Key
--------+----------+-------------+---------------------+------+---------+----------+-------------+-------------
 public | products | name        | varchar(80)         |   80 |         | f        | f           |
 public | products | description | ROW(f0 varchar(80)) |   -1 |         | f        | f           |
(2 rows)

Above, we see that the VARCHAR field in the description column was automatically assigned the name f0. When adding new fields, you must specify the existing Vertica-assigned field name:

=> ALTER TABLE products ALTER COLUMN description
SET DATA TYPE ROW(f0 VARCHAR(80), expanded_description VARCHAR(200));
ALTER TABLE

7.6.4 - Defining column values

You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:.

You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:

  • DEFAULT

  • SET USING

  • DEFAULT USING

DEFAULT

The DEFAULT option sets column values to a specified value. It has the following syntax:

DEFAULT default-expression

Default values are set when you:

  • Load new rows into a table, for example, with INSERT or COPY. Vertica populates DEFAULT columns in new rows with their default values. Values in existing rows, including columns with DEFAULT expressions, remain unchanged.

  • Execute UPDATE on a table and set the value of a DEFAULT column to DEFAULT:

    => UPDATE table-name SET column-name=DEFAULT;
    
  • Add a column with a DEFAULT expression to an existing table. Vertica populates the new column with its default values when it is added to the table.

Restrictions

DEFAULT expressions cannot specify volatile functions with ALTER TABLE...ADD COLUMN. To specify volatile functions, use CREATE TABLE or ALTER TABLE...ALTER COLUMN statements.

SET USING

The SET USING option sets the column value to an expression when the function REFRESH_COLUMNS is invoked on that column. This option has the following syntax:

SET USING using-expression

This approach is useful for large denormalized (flattened) tables, where multiple columns get their values by querying other tables.

Restrictions

SET USING has the following restrictions:

  • Volatile functions are not allowed.

  • The expression cannot specify a sequence.

  • Vertica limits the use of several meta-functions that copy table data: COPY_TABLE, COPY_PARTITIONS_TO_TABLE, MOVE_PARTITIONS_TO_TABLE, and SWAP_PARTITIONS_BETWEEN_TABLES:

    • If the source and target tables both have SET USING columns, the operation is permitted only if each source SET USING column has a corresponding target SET USING column.

    • If only the source table has SET USING columns, SWAP_PARTITIONS_BETWEEN_TABLES is disallowed.

    • If only the target table has SET USING columns, the operation is disallowed.

DEFAULT USING

The DEFAULT USING option sets DEFAULT and SET USING constraints on a column, equivalent to using DEFAULT and SET USING separately with the same expression on the same column. It has the following syntax:

DEFAULT USING expression

For example, the following column definitions are effectively identical:

=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
     DEFAULT USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));
=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
     DEFAULT (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid))
     SET USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));

DEFAULT USING supports the same expressions as SET USING and is subject to the same restrictions.

Supported expressions

DEFAULT and SET USING generally support the same expressions. These include:

Expression restrictions

The following restrictions apply to DEFAULT and SET USING expressions:

  • The return value data type must match or be cast to the column data type.

  • The expression must return a value that conforms to the column bounds. For example, a column that is defined as a VARCHAR(1) cannot be set to a default string of abc.

  • In a temporary table, DEFAULT and SET USING do not support subqueries. If you try to create a temporary table where DEFAULT or SET USING use subquery expressions, Vertica returns an error.

  • A column's SET USING expression cannot specify another column in the same table that also sets its value with SET USING. Similarly, a column's DEFAULT expression cannot specify another column in the same table that also sets its value with DEFAULT, or whose value is automatically set to a sequence. However, a column's SET USING expression can specify another column that sets its value with DEFAULT.

  • DEFAULT and SET USING expressions only support one SELECT statement; attempts to include multiple SELECT statements in the expression return an error. For example, given table t1:

    => SELECT * FROM t1;
     a |    b
    ---+---------
     1 | hello
     2 | world
    (2 rows)
    

    Attempting to create table t2 with the following DEFAULT expression returns with an error:

    => CREATE TABLE t2 (aa int, bb varchar(30) DEFAULT (SELECT 'I said ')||(SELECT b FROM t1 where t1.a = t2.aa));
    ERROR 9745:  Expressions with multiple SELECT statements cannot be used in 'set using' query definitions
    

Disambiguating predicate columns

If a SET USING or DEFAULT query expression joins two columns with the same name, the column names must include their table names. Otherwise, Vertica assumes that both columns reference the dimension table, and the predicate always evaluates to true.

For example, tables orderFact and custDim both include column cid. Flattened table orderFact defines column cust_name with a SET USING query expression. Because the query predicate references columns cid from both tables, the column names are fully qualified:

=> CREATE TABLE public.orderFact
(
    ...
    cid int REFERENCES public.custDim(cid),
    cust_name varchar(20) SET USING (
        SELECT name FROM public.custDim WHERE (custDIM.cid = orderFact.cid)),
    ...
)

Examples

Derive a column's default value from another column

  1. Create table t with two columns, date and state, and insert a row of data:

    => CREATE TABLE t (date DATE, state VARCHAR(2));
    CREATE TABLE
    => INSERT INTO t VALUES (CURRENT_DATE, 'MA');
     OUTPUT
    --------
          1
    (1 row)
    
    => COMMIT;
    COMMMIT
    SELECT * FROM t;
        date    | state
    ------------+-------
     2017-12-28 | MA
    (1 row)
    
  2. Use ALTER TABLE to add a third column that extracts the integer month value from column date:

    => ALTER TABLE t ADD COLUMN month INTEGER DEFAULT date_part('month', date);
    ALTER TABLE
    
  3. When you query table t, Vertica returns the number of the month in column date:

    => SELECT * FROM t;
        date    | state | month
    ------------+-------+-------
     2017-12-28 | MA    |    12
    (1 row)
    

Update default column values

  1. Update table t by subtracting 30 days from date:

    => UPDATE t SET date = date-30;
     OUTPUT
    --------
          1
    (1 row)
    
    => COMMIT;
    COMMIT
    => SELECT * FROM t;
        date    | state | month
    ------------+-------+-------
     2017-11-28 | MA    |    12
    (1 row)
    

    The value in month remains unchanged.

  2. Refresh the default value in month from column date:

    => UPDATE t SET month=DEFAULT;
     OUTPUT
    --------
          1
    (1 row)
    
    => COMMIT;
    COMMIT
    => SELECT * FROM t;
        date    | state | month
    ------------+-------+-------
     2017-11-28 | MA    |    11
    (1 row)
    

Derive a default column value from user-defined scalar function

This example shows a user-defined scalar function that adds two integer values. The function is called add2ints and takes two arguments.

  1. Develop and deploy the function, as described in Scalar functions (UDSFs).

  2. Create a sample table, t1, with two integer columns:

    => CREATE TABLE t1 ( x int, y int );
    CREATE TABLE
    
  3. Insert some values into t1:

    => insert into t1 values (1,2);
    OUTPUT
    --------
          1
    (1 row)
    => insert into t1 values (3,4);
     OUTPUT
    --------
          1
    (1 row)
    
  4. Use ALTER TABLE to add a column to t1, with the default column value derived from the UDSF add2ints:

    alter table t1 add column z int default add2ints(x,y);
    ALTER TABLE
    
  5. List the new column:

    select z from t1;
     z
    ----
      3
      7
    (2 rows)
    

Table with a SET USING column that queries another table for its values

  1. Define tables t1 and t2. Column t2.b is defined to get its data from column t1.b, through the query in its SET USING clause:

    => CREATE TABLE t1 (a INT PRIMARY KEY ENABLED, b INT);
    CREATE TABLE
    
    => CREATE TABLE t2 (a INT, alpha VARCHAR(10),
          b INT SET USING (SELECT t1.b FROM t1 WHERE t1.a=t2.a))
          ORDER BY a SEGMENTED BY HASH(a) ALL NODES;
    CREATE TABLE
    
  2. Populate the tables with data:

    => INSERT INTO t1 VALUES(1,11),(2,22),(3,33),(4,44);
    => INSERT INTO t2 VALUES (1,'aa'),(2,'bb');
    => COMMIT;
    COMMIT
    
  3. View the data in table t2: Column in SET USING column b is empty, pending invocation of Vertica function REFRESH_COLUMNS:

    => SELECT * FROM t2;
     a | alpha | b
    ---+-------+---
     1 | aa    |
     2 | bb    |
    (2 rows)
    
  4. Refresh the column data in table t2 by calling function REFRESH_COLUMNS:

    => SELECT REFRESH_COLUMNS ('t2','b', 'REBUILD');
          REFRESH_COLUMNS
    ---------------------------
     refresh_columns completed
    (1 row)
    

    In this example, REFRESH_COLUMNS is called with the optional argument REBUILD. This argument specifies to replace all data in SET USING column b. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column. For details, see REFRESH_COLUMNS.

  5. View data in refreshed column b, whose data is obtained from table t1 as specified in the column's SET USING query:

    => SELECT * FROM t2 ORDER BY a;
      a | alpha | b
    ---+-------+----
     1 | aa    | 11
     2 | bb    | 22
    (2 rows)
    

Expressions with correlated subqueries

DEFAULT and SET USING expressions support subqueries that can obtain values from other tables, and use those with values in the current table to compute column values. The following example adds a column gmt_delivery_time to fact table customer_orders. The column specifies a DEFAULT expression to set values in the new column as follows:

  1. Calls meta-function NEW_TIME, which performs the following tasks:

    • Uses customer keys in customer_orders to query the customers dimension table for customer time zones.

    • Uses the queried time zone data to convert local delivery times to GMT.

  2. Populates the gmt_delivery_time column with the converted values.

=> CREATE TABLE public.customers(
    customer_key int,
    customer_name varchar(64),
    customer_address varchar(64),
    customer_tz varchar(5),
    ...);

=> CREATE TABLE public.customer_orders(
    customer_key int,
    order_number int,
    product_key int,
    product_version int,
    quantity_ordered int,
    store_key int,
    date_ordered date,
    date_shipped date,
    expected_delivery_date date,
    local_delivery_time timestamptz,
    ...);

=> ALTER TABLE customer_orders ADD COLUMN gmt_delivery_time timestamp
   DEFAULT NEW_TIME(customer_orders.local_delivery_time,
      (SELECT c.customer_tz FROM customers c WHERE (c.customer_key = customer_orders.customer_key)),
      'GMT');

7.7 - Altering table definitions

You can modify a table's definition with ALTER TABLE, in response to evolving database schema requirements.

You can modify a table's definition with ALTER TABLE, in response to evolving database schema requirements. Changing a table definition is often more efficient than staging data in a temporary table, consuming fewer resources and less storage.

For information on making column-level changes, see Managing table columns. For details about changing and reorganizing table partitions, see Partitioning existing table data.

7.7.1 - Adding table columns

You add a column to a persistent table with ALTER TABLE...ADD COLUMN:.

You add a column to a persistent table with ALTER TABLE...ADD COLUMN:

ALTER TABLE ...
ADD COLUMN [IF NOT EXISTS] column datatype
  [column-constraint]
  [ENCODING encoding-type]
  [PROJECTIONS (projections-list) | ALL PROJECTIONS ]

An ALTER TABLE statement can include more than one ADD COLUMN clause, separated by commas:

ALTER TABLE...
  ADD COLUMN pid INT NOT NULL,
  ADD COLUMN desc VARCHAR(200),
  ADD COLUMN region INT DEFAULT 1

Columns that use DEFAULT with static values, as shown in the previous example, can be added in a single ALTER TABLE statement. Columns that use non-static DEFAULT values must be added in separate ALTER TABLE statements.

Before you add columns to a table, verify that all its superprojections are up to date.

Table locking

When you use ADD COLUMN to alter a table, Vertica takes an O lock on the table until the operation completes. The lock prevents DELETE, UPDATE, INSERT, and COPY statements from accessing the table. The lock also blocks SELECT statements issued at SERIALIZABLE isolation level, until the operation completes.

Adding a column to a table does not affect K-safety of the physical schema design.

You can add columns when nodes are down.

Adding new columns to projections

When you add a column to a table, Vertica automatically adds the column to superprojections of that table. The ADD COLUMN clause can also specify to add the column to one or more non-superprojections, with one of these options:

  • PROJECTIONS (projections-list): Adds the new column to one or more projections of this table, specified as a comma-delimted list of projection base names. Vertica adds the column to all buddies of each projection. The projection list cannot include projections with pre-aggregated data such as live aggregate projections; otherwise, Vertica rolls back the ALTER TABLE statement.

  • ALL PROJECTIONS adds the column to all projections of this table, excluding projections with pre-aggregated data.

For example, the store_orders table has two projections, a superprojection (store_orders_super) and a user-created projection (store_orders_p). The following ALTER TABLE...ADD COLUMN statement adds a column to the store_orders table. Because the statement omits the PROJECTIONS option, Vertica adds the column only to the table's superprojection:

=> ALTER TABLE public.store_orders ADD COLUMN expected_ship_date date;
ALTER TABLE
=> SELECT projection_column_name, projection_name FROM projection_columns WHERE table_name ILIKE 'store_orders'
     ORDER BY projection_name , projection_column_name;
 projection_column_name |  projection_name
------------------------+--------------------
 order_date             | store_orders_p_b0
 order_no               | store_orders_p_b0
 ship_date              | store_orders_p_b0
 order_date             | store_orders_p_b1
 order_no               | store_orders_p_b1
 ship_date              | store_orders_p_b1
 expected_ship_date     | store_orders_super
 order_date             | store_orders_super
 order_no               | store_orders_super
 ship_date              | store_orders_super
 shipper                | store_orders_super
(11 rows)

The following ALTER TABLE...ADD COLUMN statement includes the PROJECTIONS option. This specifies to include projection store_orders_p in the add operation. Vertica adds the new column to this projection and the table's superprojection:

=> ALTER TABLE public.store_orders ADD COLUMN delivery_date date PROJECTIONS (store_orders_p);
=> SELECT projection_column_name, projection_name FROM projection_columns WHERE table_name ILIKE 'store_orders'
     ORDER BY projection_name, projection_column_name;
 projection_column_name |  projection_name
------------------------+--------------------
 delivery_date          | store_orders_p_b0
 order_date             | store_orders_p_b0
 order_no               | store_orders_p_b0
 ship_date              | store_orders_p_b0
 delivery_date          | store_orders_p_b1
 order_date             | store_orders_p_b1
 order_no               | store_orders_p_b1
 ship_date              | store_orders_p_b1
 delivery_date          | store_orders_super
 expected_ship_date     | store_orders_super
 order_date             | store_orders_super
 order_no               | store_orders_super
 ship_date              | store_orders_super
 shipper                | store_orders_super
(14 rows)

Updating associated table views

Adding new columns to a table that has an associated view does not update the view's result set, even if the view uses a wildcard (*) to represent all table columns. To incorporate new columns, you must recreate the view.

7.7.2 - Dropping table columns

ALTER TABLE...DROP COLUMN drops the specified table column and the ROS containers that correspond to the dropped column:.

ALTER TABLE...DROP COLUMN drops the specified table column and the ROS containers that correspond to the dropped column:

ALTER TABLE [schema.]table DROP [ COLUMN ] [IF EXISTS] column [CASCADE | RESTRICT]

After the drop operation completes, data backed up from the current epoch onward recovers without the column. Data recovered from a backup that precedes the current epoch re-add the table column. Because drop operations physically purge object storage and catalog definitions (table history) from the table, AT EPOCH (historical) queries return nothing for the dropped column.

The altered table retains its object ID.

Restrictions

  • You cannot drop or alter a primary key column or a column that participates in the table partitioning clause.

  • You cannot drop the first column of any projection sort order, or columns that participate in a projection segmentation expression.

  • In Enterprise Mode, all nodes must be up. This restriction does not apply to Eon mode.

  • You cannot drop a column associated with an access policy. Attempts to do so produce the following error:
    ERROR 6482: Failed to parse Access Policies for table "t1"

Using CASCADE to force a drop

If the table column to drop has dependencies, you must qualify the DROP COLUMN clause with the CASCADE option. For example, the target column might be specified in a projection sort order. In this and other cases, DROP COLUMN...CASCADE handles the dependency by reorganizing catalog definitions or dropping a projection. In all cases, CASCADE performs the minimal reorganization required to drop the column.

Use CASCADE to drop a column with the following dependencies:

Dropped column dependency CASCADE behavior
Any constraint Vertica drops the column when a FOREIGN KEY constraint depends on a UNIQUE or PRIMARY KEY constraint on the referenced columns.
Specified in projection sort order Vertica truncates projection sort order up to and including the projection that is dropped without impact on physical storage for other columns and then drops the specified column. For example if a projection's columns are in sort order (a,b,c), dropping column b causes the projection's sort order to be just (a), omitting column (c).
Specified in a projection segmentation expression The column to drop is integral to the projection definition. If possible, Vertica drops the projection as long as doing so does not compromise K-safety; otherwise, the transaction rolls back.
Referenced as default value of another column See Dropping a Column Referenced as Default, below.

Dropping a column referenced as default

You might want to drop a table column that is referenced by another column as its default value. For example, the following table is defined with two columns, a and b:, where b gets its default value from column a:

=> CREATE TABLE x (a int) UNSEGMENTED ALL NODES;
CREATE TABLE
=> ALTER TABLE x ADD COLUMN b int DEFAULT a;
ALTER TABLE

In this case, dropping column a requires the following procedure:

  1. Remove the default dependency through ALTER COLUMN..DROP DEFAULT:

    => ALTER TABLE x ALTER COLUMN b DROP DEFAULT;
    
  2. Create a replacement superprojection for the target table if one or both of the following conditions is true:

    • The target column is the table's first sort order column. If the table has no explicit sort order, the default table sort order specifies the first table column as the first sort order column. In this case, the new superprojection must specify a sort order that excludes the target column.

    • If the table is segmented, the target column is specified in the segmentation expression. In this case, the new superprojection must specify a segmentation expression that excludes the target column.

    Given the previous example, table x has a default sort order of (a,b). Because column a is the table's first sort order column, you must create a replacement superprojection that is sorted on column b:

    => CREATE PROJECTION x_p1 as select * FROM x ORDER BY b UNSEGMENTED ALL NODES;
    
  3. Run START_REFRESH:

    
    => SELECT START_REFRESH();
                  START_REFRESH
    ----------------------------------------
     Starting refresh background process.
    
    (1 row)
    
  4. Run MAKE_AHM_NOW:

    => SELECT MAKE_AHM_NOW();
             MAKE_AHM_NOW
    -------------------------------
     AHM set (New AHM Epoch: 1231)
    (1 row)
    
  5. Drop the column:

    => ALTER TABLE x DROP COLUMN a CASCADE;
    

Vertica implements the CASCADE directive as follows:

  • Drops the original superprojection for table x (x_super).

  • Updates the replacement superprojection x_p1 by dropping column a.

Examples

The following series of commands successfully drops a BYTEA data type column:

=> CREATE TABLE t (x BYTEA(65000), y BYTEA, z BYTEA(1));
CREATE TABLE
=> ALTER TABLE t DROP COLUMN y;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624:  Column "y" does not exist
=> ALTER TABLE t DROP COLUMN x RESTRICT;
ALTER TABLE
=> SELECT x FROM t;
ERROR 2624:  Column "x" does not exist
=> SELECT * FROM t;
 z
---
(0 rows)
=> DROP TABLE t CASCADE;
DROP TABLE

The following series of commands tries to drop a FLOAT(8) column and fails because there are not enough projections to maintain K-safety.

=> CREATE TABLE t (x FLOAT(8),y FLOAT(08));
CREATE TABLE
=> ALTER TABLE t DROP COLUMN y RESTRICT;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624:  Column "y" does not exist
=> ALTER TABLE t DROP x CASCADE;
ROLLBACK 2409:  Cannot drop any more columns in t
=> DROP TABLE t CASCADE;

7.7.3 - Altering constraint enforcement

ALTER TABLE...ALTER CONSTRAINT can enable or disable enforcement of primary key, unique, and check constraints.

ALTER TABLE...ALTER CONSTRAINT can enable or disable enforcement of primary key, unique, and check constraints. You must qualify this clause with the keyword ENABLED or DISABLED:

  • ENABLED enforces the specified constraint.

  • DISABLED disables enforcement of the specified constraint.

For example:

ALTER TABLE public.new_sales ALTER CONSTRAINT C_PRIMARY ENABLED;

For details, see Constraint enforcement.

7.7.4 - Renaming tables

ALTER TABLE...RENAME TO renames one or more tables.

ALTER TABLE...RENAME TO renames one or more tables. Renamed tables retain their original OIDs.

You rename multiple tables by supplying two comma-delimited lists. Vertica maps the names according to their order in the two lists. Only the first list can qualify table names with a schema. For example:

=> ALTER TABLE S1.T1, S1.T2 RENAME TO U1, U2;

The RENAME TO parameter is applied atomically: all tables are renamed, or none of them. For example, if the number of tables to rename does not match the number of new names, none of the tables is renamed.

Using rename to swap tables within a schema

You can use ALTER TABLE...RENAME TO to swap tables within the same schema, without actually moving data. You cannot swap tables across schemas.

The following example swaps the data in tables T1 and T2 through intermediary table temp:

  1. t1 to temp

  2. t2 to t1

  3. temp to t2

=> DROP TABLE IF EXISTS temp, t1, t2;
DROP TABLE
=> CREATE TABLE t1 (original_name varchar(24));
CREATE TABLE
=> CREATE TABLE t2 (original_name varchar(24));
CREATE TABLE
=> INSERT INTO t1 VALUES ('original name t1');
 OUTPUT
--------
      1
(1 row)

=> INSERT INTO t2 VALUES ('original name t2');
 OUTPUT
--------
      1
(1 row)

=> COMMIT;
COMMIT
=> ALTER TABLE t1, t2, temp RENAME TO temp, t1, t2;
ALTER TABLE
=> SELECT * FROM t1, t2;
  original_name   |  original_name
------------------+------------------
 original name t2 | original name t1
(1 row)

7.7.5 - Moving tables to another schema

ALTER TABLE...SET SCHEMA moves a table from one schema to another.

ALTER TABLE...SET SCHEMA moves a table from one schema to another. Vertica automatically moves all projections that are anchored to the source table to the destination schema. It also moves all IDENTITY columns to the destination schema.

Moving a table across schemas requires that you have USAGE privileges on the current schema and CREATE privileges on destination schema. You can move only one table between schemas at a time. You cannot move temporary tables across schemas.

Name conflicts

If a table of the same name or any of the projections that you want to move already exist in the new schema, the statement rolls back and does not move either the table or any projections. To work around name conflicts:

  1. Rename any conflicting table or projections that you want to move.

  2. Run ALTER TABLE...SET SCHEMA again.

Example

The following example moves table T1 from schema S1 to schema S2. All projections that are anchored on table T1 automatically move to schema S2:

=> ALTER TABLE S1.T1 SET SCHEMA S2;

7.7.6 - Changing table ownership

As a superuser or table owner, you can reassign table ownership with ALTER TABLE...OWNER TO, as follows:.

As a superuser or table owner, you can reassign table ownership with ALTER TABLE...OWNER TO, as follows:

ALTER TABLE [schema.]table-name OWNER TO owner-name

Changing table ownership is useful when moving a table from one schema to another. Ownership reassignment is also useful when a table owner leaves the company or changes job responsibilities. Because you can change the table owner, the tables won't have to be completely rewritten, you can avoid loss in productivity.

Changing table ownership automatically causes the following changes:

  • Grants on the table that were made by the original owner are dropped and all existing privileges on the table are revoked from the previous owner. Changes in table ownership has no effect on schema privileges.

  • Ownership of dependent IDENTITY sequences are transferred with the table. However, ownership does not change for named sequences created with CREATE SEQUENCE. To transfer ownership of these sequences, use ALTER SEQUENCE.

  • New table ownership is propagated to its projections.

Example

In this example, user Bob connects to the database, looks up the tables, and transfers ownership of table t33 from himself to user Alice.

=> \c - Bob
You are now connected as user "Bob".
=> \d
 Schema |  Name  | Kind  |  Owner  | Comment
--------+--------+-------+---------+---------
 public | applog | table | dbadmin |
 public | t33    | table | Bob     |
(2 rows)
=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE

When Bob looks up database tables again, he no longer sees table t33:

=> \d                List of tables
               List of tables
 Schema |  Name  | Kind  |  Owner  | Comment
--------+--------+-------+---------+---------
 public | applog | table | dbadmin |
(1 row)

When user Alice connects to the database and looks up tables, she sees she is the owner of table t33.

=> \c - Alice
You are now connected as user "Alice".
=> \d
             List of tables
 Schema | Name | Kind  | Owner | Comment
--------+------+-------+-------+---------
 public | t33  | table | Alice |
(2 rows)

Alice or a superuser can transfer table ownership back to Bob. In the following case a superuser performs the transfer.

=> \c - dbadmin
You are now connected as user "dbadmin".
=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> \d
                List of tables
 Schema |   Name   | Kind  |  Owner  | Comment
--------+----------+-------+---------+---------
 public | applog   | table | dbadmin |
 public | comments | table | dbadmin |
 public | t33      | table | Bob     |
 s1     | t1       | table | User1   |
(4 rows)

You can also query system table TABLES to view table and owner information. Note that a change in ownership does not change the table ID.

In the below series of commands, the superuser changes table ownership back to Alice and queries the TABLES system table.


=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE
=> SELECT table_schema_id, table_schema, table_id, table_name, owner_id, owner_name FROM tables;
  table_schema_id  | table_schema |     table_id      | table_name |     owner_id      | owner_name
-------------------+--------------+-------------------+------------+-------------------+------------
 45035996273704968 | public       | 45035996273713634 | applog     | 45035996273704962 | dbadmin
 45035996273704968 | public       | 45035996273724496 | comments   | 45035996273704962 | dbadmin
 45035996273730528 | s1           | 45035996273730548 | t1         | 45035996273730516 | User1
 45035996273704968 | public       | 45035996273795846 | t33        | 45035996273724576 | Alice
(5 rows)

Now the superuser changes table ownership back to Bob and queries the TABLES table again. Nothing changes but the owner_name row, from Alice to Bob.

=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> SELECT table_schema_id, table_schema, table_id, table_name, owner_id, owner_name FROM tables;
  table_schema_id  | table_schema |     table_id      | table_name |     owner_id      | owner_name
-------------------+--------------+-------------------+------------+-------------------+------------
 45035996273704968 | public       | 45035996273713634 | applog     | 45035996273704962 | dbadmin
 45035996273704968 | public       | 45035996273724496 | comments   | 45035996273704962 | dbadmin
 45035996273730528 | s1           | 45035996273730548 | t1         | 45035996273730516 | User1
 45035996273704968 | public       | 45035996273793876 | foo        | 45035996273724576 | Alice
 45035996273704968 | public       | 45035996273795846 | t33        | 45035996273714428 | Bob
(5 rows)

7.8 - Sequences

Sequences can be used to set the default values of columns to sequential integer values.

Sequences can be used to set the default values of columns to sequential integer values. Sequences guarantee uniqueness, and help avoid constraint enforcement problems and overhead. Sequences are especially useful for primary key columns.

While sequence object values are guaranteed to be unique, they are not guaranteed to be contiguous. For example, two nodes can increment a sequence at different rates. The node with a heavier processing load increments the sequence, but the values are not contiguous with those being incremented on a node with less processing. For details, see Distributing sequences.

Vertica supports the following sequence types:

  • Named sequences are database objects that generates unique numbers in sequential ascending or descending order. Named sequences are defined independently through CREATE SEQUENCE statements, and are managed independently of the tables that reference them. A table can set the default values of one or more columns to named sequences.
  • IDENTITY column sequences increment or decrement column's value as new rows are added. Unlike named sequences, IDENTITY sequence types are defined in a table's DDL, so they do not persist independently of that table. A table can contain only one IDENTITY column.

7.8.1 - Sequence types compared

The following table lists the differences between the two sequence types:.

The following table lists the differences between the two sequence types:

Supported Behavior Named Sequence IDENTITY
Default cache value 250K
Set initial cache
Define start value
Specify increment unit
Exists as an independent object
Exists only as part of table
Create as column constraint
Requires name
Use in expressions
Unique across tables
Change parameters
Move to different schema
Set to increment or decrement
Grant privileges to object
Specify minimum value
Specify maximum value

7.8.2 - Named sequences

Named sequences are sequences that are defined by CREATE SEQUENCE.

Named sequences are sequences that are defined by CREATE SEQUENCE. Unlike IDENTITY sequences, which are defined in a table's DDL, you create a named sequence as an independent object, and then set it as the default value of a table column.

Named sequences are used most often when an application requires a unique identifier in a table or an expression. After a named sequence returns a value, it never returns the same value again in the same session.

7.8.2.1 - Creating and using named sequences

You create a named sequence with CREATE SEQUENCE.

You create a named sequence with CREATE SEQUENCE. The statement requires only a sequence name; all other parameters are optional. To create a sequence, a user must have CREATE privileges on a schema that contains the sequence.

The following example creates an ascending named sequence, my_seq, starting at the value 100:

=> CREATE SEQUENCE my_seq START 100;
CREATE SEQUENCE

Incrementing and decrementing a sequence

When you create a named sequence object, you can also specify its increment or decrement value by setting its INCREMENT parameter. If you omit this parameter, as in the previous example, the default is set to 1.

You increment or decrement a sequence by calling the function NEXTVAL on it—either directly on the sequence itself, or indirectly by adding new rows to a table that references the sequence. When called for the first time on a new sequence, NEXTVAL initializes the sequence to its start value. Vertica also creates a cache for the sequence. Subsequent NEXTVAL calls on the sequence increment its value.

The following call to NEXTVAL initializes the new my_seq sequence to 100:

=> SELECT NEXTVAL('my_seq');
 nextval
---------
     100
(1 row)

Getting a sequence's current value

You can obtain the current value of a sequence by calling CURRVAL on it. For example:

=> SELECT CURRVAL('my_seq');
 CURRVAL
---------
     100
(1 row)

Referencing sequences in tables

A table can set the default values of any column to a named sequence. The table creator must have the following privileges: SELECT on the sequence, and USAGE on its schema.

In the following example, column id gets its default values from named sequence my_seq:

=> CREATE TABLE customer(id INTEGER DEFAULT my_seq.NEXTVAL,
  lname VARCHAR(25),
  fname VARCHAR(25),
  membership_card INTEGER
);

For each row that you insert into table customer, the sequence invokes the NEXTVAL function to set the value of the id column. For example:

=> INSERT INTO customer VALUES (default, 'Carr', 'Mary', 87432);
=> INSERT INTO customer VALUES (default, 'Diem', 'Nga', 87433);
=> COMMIT;

For each row, the insert operation invokes NEXTVAL on the sequence my_seq, which increments the sequence to 101 and 102, and sets the id column to those values:

=> SELECT * FROM customer;
 id  | lname | fname | membership_card
-----+-------+-------+-----------------
 101 | Carr  | Mary  |           87432
 102 | Diem  | Nga   |           87433
(1 row)

7.8.2.2 - Distributing sequences

When you create a sequence, its CACHE parameter determines the number of sequence values each node maintains during a session.

When you create a sequence, its CACHE parameter determines the number of sequence values each node maintains during a session. The default cache value is 250K, so each node reserves 250,000 values per session for each sequence. The default cache size provides an efficient means for large insert or copy operations.

If sequence caching is set to a low number, nodes are liable to request a new set of cache values more frequently. While it supplies a new cache, Vertica must lock the catalog. Until Vertica releases the lock, other database activities such as table inserts are blocked, which can adversely affect overall performance.

When a new session starts, node caches are initially empty. By default, the initiator node requests and reserves cache for all nodes in a cluster. You can change this default so each node requests its own cache, by setting configuration parameter ClusterSequenceCacheMode to 0.

For information on how Vertica requests and distributes cache among all nodes in a cluster, refer to Sequence caching.

Effects of distributed sessions

Vertica distributes a session across all nodes. The first time a cluster node calls the function NEXTVAL on a sequence to increment (or decrement) its value, the node requests its own cache of sequence values. The node then maintains that cache for the current session. As other nodes call NEXTVAL, they too create and maintain their own cache of sequence values.

During a session, nodes call NEXTVAL independently and at different frequencies. Each node uses its own cache to populate the sequence. All sequence values are guaranteed to be unique, but can be out of order with a NEXTVAL statement executed on another node. As a result, sequence values are often non-contiguous.

In all cases, increments a sequence only once per row. Thus, if the same sequence is referenced by multiple columns, NEXTVAL sets all columns in that row to the same value. This applies to rows of joined tables.

Calculating sequences

Vertica calculates the current value of a sequence as follows:

  • At the end of every statement, the state of all sequences used in the session is returned to the initiator node.

  • The initiator node calculates the maximum CURRVAL of each sequence across all states on all nodes.

  • This maximum value is used as CURRVAL in subsequent statements until another NEXTVAL is invoked.

Losing sequence values

Sequence values in cache can be lost in the following situations:

  • If a statement fails after NEXTVAL is called (thereby consuming a sequence value from the cache), the value is lost.

  • If a disconnect occurs (for example, dropped session), any remaining values in cache that have not been returned through NEXTVAL are lost.

  • When the initiator node distributes a new block of cache to each node where one or more nodes has not used up its current cache allotment. For information on this scenario, refer to Sequence caching.

You can recover lost sequence values by using ALTER SEQUENCE...RESTART, which resets the sequence to the specified value in the next session.

7.8.2.3 - Altering sequences

ALTER SEQUENCE can change sequences in two ways:.

ALTER SEQUENCE can change sequences in two ways:

  • Changes values that control sequence behavior—for example, its start value and range of minimum and maximum values. These changes take effect only when you start a new database session.
  • Changes sequence name, schema, or ownership. These changes take effect immediately.

Changing sequence behavior

ALTER SEQUENCE can change one or more sequence attributes through the following parameters:

These parameters... Control...
INCREMENT How much to increment or decrement the sequence on each call to NEXTVAL.
MINVALUE/MAXVALUE Range of valid integers.
RESTART Sequence value on its next call to NEXTVAL.
CACHE/NO CACHE How many sequence numbers are pre-allocated and stored in memory for faster access.
CYCLE/NO CYCLE Whether the sequence wraps when its minimum or maximum values are reached.

These changes take effect only when you start a new database session. For example, if you create a named sequence my_sequence that starts at 10 and increments by 1 (the default), each sequence call to NEXTVAL increments its value by 1:

=> CREATE SEQUENCE my_sequence START 10;
=> SELECT NEXTVAL('my_sequence');
 nextval
---------
      10
(1 row)
=> SELECT NEXTVAL('my_sequence');
 nextval
---------
      11
(1 row)

The following ALTER SEQUENCE statement specifies to restart the sequence at 50:

=>ALTER SEQUENCE my_sequence RESTART WITH 50;

However, this change has no effect in the current session. The next call to NEXTVAL increments the sequence to 12:

=> SELECT NEXTVAL('my_sequence');
 NEXTVAL
---------
      12
(1 row)

The sequence restarts at 50 only after you start a new database session:

=> \q
$ vsql
Welcome to vsql, the Vertica Analytic Database interactive terminal.

=> SELECT NEXTVAL('my_sequence');
 NEXTVAL
---------
      50
(1 row)

Changing sequence name, schema, and ownership

You can use ALTER SEQUENCE to make the following changes to a sequence:

  • Rename it (supported only for named sequences).

  • Move it to another schema (supported only for named sequences).

  • Reassign ownership.

Each of these changes requires separate ALTER SEQUENCE statements. These changes take effect immediately.

For example, the following statement renames a sequence from my_seq to serial:

=> ALTER SEQUENCE s1.my_seq RENAME TO s1.serial;

This statement moves sequence s1.serial to schema s2:

=> ALTER SEQUENCE s1.my_seq SET SCHEMA TO s2;

The following statement reassigns ownership of s2.serial to another user:

=> ALTER SEQUENCE s2.serial OWNER TO bertie;

7.8.2.4 - Dropping sequences

Use DROP SEQUENCE to remove a named sequence.

Use DROP SEQUENCE to remove a named sequence. For example:

=> DROP SEQUENCE my_sequence;

You cannot drop a sequence if one of the following conditions is true:

  • Other objects depend on the sequence. DROP SEQUENCE does not support cascade operations.

  • A column's DEFAULT expression references the sequence. Before dropping the sequence, you must remove all column references to it.

7.8.3 - IDENTITY sequences

IDENTITY (synonymous with AUTO_INCREMENT) columns are defined with a sequence that automatically increments column values as new rows are added.

IDENTITY (synonymous with AUTO_INCREMENT) columns are defined with a sequence that automatically increments column values as new rows are added. You define an IDENTITY column in a table as follows:

CREATE TABLE table-name...
  (column-name {IDENTITY | AUTO_INCREMENT}
      ( [ cache-size | start, increment [, cache-size ] ] )

Settings

start

First value to set for this column.

Default: 1

increment

Positive or negative integer that specifies how much to increment or decrement the sequence on each new row insertion from the previous row value, by default set to 1. To decrement sequence values, specify a negative value.

Default: 1

cache-size

How many unique numbers each node caches per session. A value of 0 or 1 disables sequence caching. For details, see Sequence caching.

Default: 250,000

Managing settings

Like named sequences, you can manage an IDENTITY column with ALTER SEQUENCE—for example, reset its start integer. Two exceptions apply: because the sequence is defined as part of a table column, you cannot change the sequence name or schema. You can query the SEQUENCES system table for the name of an IDENTITY column's sequence. This name is automatically created when you define the table, and conforms to the following convention:

table-name_col-name_seq

For example, you can change the maximum value of an IDENTITY column that is defined in the testAutoId table:

=> SELECT * FROM sequences WHERE identity_table_name  = 'testAutoId';
-[ RECORD 1 ]-------+-------------------------
sequence_schema     | public
sequence_name       | testAutoId_autoIdCol_seq
owner_name          | dbadmin
identity_table_name | testAutoId
session_cache_count | 250000
allow_cycle         | f
output_ordered      | f
increment_by        | 1
minimum             | 1
maximum             | 1000
current_value       | 1
sequence_schema_id  | 45035996273704980
sequence_id         | 45035996274278950
owner_id            | 45035996273704962
identity_table_id   | 45035996274278948

=> ALTER SEQUENCE testAutoId_autoIdCol_seq maxvalue 10000;
ALTER SEQUENCE

This change, like other changes to a sequence, take effect only when you start a new database session. One exception applies: changes to the sequence owner take effect immediately.

You can obtain the last value generated for an IDENTITY column by calling LAST_INSERT_ID.

Restrictions

The following restrictions apply to IDENTITY columns:

  • A table can contain only one IDENTITY column.
  • IDENTITY column values automatically increment before the current transaction is committed; rolling back the transaction does not revert the change.
  • You cannot change the value of an IDENTITY column.

Examples

The following example shows how to use the IDENTITY column-constraint to create a table with an ID column. The ID column has an initial value of 1. It is incremented by 1 every time a row is inserted.

  1. Create table Premium_Customer:

    => CREATE TABLE Premium_Customer(
         ID IDENTITY(1,1),
         lname VARCHAR(25),
         fname VARCHAR(25),
         store_membership_card INTEGER
    );
    => INSERT INTO Premium_Customer (lname, fname, store_membership_card )
         VALUES ('Gupta', 'Saleem', 475987);
    

    The IDENTITY column has a seed of 1, which specifies the value for the first row loaded into the table, and an increment of 1, which specifies the value that is added to the IDENTITY value of the previous row.

  2. Confirm the row you added and see the ID value:

    => SELECT * FROM Premium_Customer;
     ID | lname | fname  | store_membership_card
    ----+-------+--------+-----------------------
      1 | Gupta | Saleem |                475987
    (1 row)
    
  3. Add another row:

    => INSERT INTO Premium_Customer (lname, fname, store_membership_card)
       VALUES ('Lee', 'Chen', 598742);
    
  4. Call the Vertica function LAST_INSERT_ID. The function returns value 2 because you previously inserted a new customer (Chen Lee), and this value is incremented each time a row is inserted:

    => SELECT LAST_INSERT_ID();
     last_insert_id
    ----------------
                   2
    (1 row)
    
  5. View all the ID values in the Premium_Customer table:

    => SELECT * FROM Premium_Customer;
     ID | lname | fname  | store_membership_card
    ----+-------+--------+-----------------------
      1 | Gupta | Saleem |                475987
      2 | Lee   | Chen   |                598742
    (2 rows)
    

The next three examples illustrate the three valid ways to use IDENTITY arguments.

The first example uses a cache of 100, and the defaults for start value (1) and increment value (1):

=> CREATE TABLE t1(x IDENTITY(100), y INT);

The next example specifies the start and increment values as 1, and defaults to a cache value of 250,000:

=> CREATE TABLE t2(y IDENTITY(1,1), x INT);

The third example specifies start and increment values of 1, and a cache value of 100:

=> CREATE TABLE t3(z IDENTITY(1,1,100), zx INT);

7.8.4 - Sequence caching

Caching is similar for all sequence types: named sequences and IDENTITY column sequences.

Caching is similar for all sequence types: named sequences and IDENTITY column sequences. To allocate cache among the nodes in a cluster for a given sequence, Vertica uses the following process.

  1. By default, when a session begins, the cluster initiator node requests cache for itself and other nodes in the cluster.
  2. The initiator node distributes cache to other nodes when it distributes the execution plan.
  3. Because the initiator node requests caching for all nodes, only the initiator locks the global catalog for the cache request.

This approach is optimal for handling large INSERT-SELECT and COPY operations. The following figure shows how the initiator request and distributes cache for a named sequence in a three-node cluster, where caching for that sequence is set to 250 K:

Nodes run out of cache at different times. While executing the same query, nodes individually request additional cache as needed.

For new queries in the same session, the initiator might have an empty cache if it used all of its cache to execute the previous query execution. In this case, the initiator requests cache for all nodes.

Configuring sequence caching

You can change how nodes obtain sequence caches by setting the configuration parameter ClusterSequenceCacheMode to 0 (disabled). When this parameter is set to 0, all nodes in the cluster request their own cache and catalog lock. However, for initial large INSERT-SELECT and COPY operations, when the cache is empty for all nodes, each node requests cache at the same time. These multiple requests result in simultaneous locks on the global catalog, which can adversely affect performance. For this reason, ClusterSequenceCacheMode should remain set to its default value of 1 (enabled).

The following example compares how different settings of ClusterSequenceCacheMode affect how Vertica manages sequence caching. The example assumes a three-node cluster, 250 K caches for each node (the default), and sequence ID values that increment by 1.

Workflow step ClusterSequenceCacheMode = 1 ClusterSequenceCacheMode = 0
1

Cache is empty for all nodes.

Initiator node requests 250 K cache for each node.

Cache is empty for all nodes.

Each node, including initiator, requests its own 250 K cache.

2

Blocks of cache are distributed to each node as follows:

  • Node 1: 0–250 K

  • Node 2: 250 K + 1 to 500 K

  • Node 3: 500 K + 1 to 750 K

Each node begins to use its cache as it processes sequence updates.

3

Initiator node and node 3 run out of cache.

Node 2 only uses 250 K +1 to 400 K, 100 K of cache remains from 400 K +1 to 500 K.

4

Executing same statement:

  • As each node uses up its cache, it requests a new cache allocation.

  • If node 2 never uses its cache, the 100-K unused cache becomes a gap in sequence IDs.

Executing a new statement in same session, if initiator node cache is empty:

  • It requests and distributes new cache blocks for all nodes.

  • Nodes receive a new cache before the old cache is used, creating a gap in ID sequencing.

Executing same or new statement:

  • As each node uses up its cache, it requests a new cache allocation.

  • If node 2 never uses its cache, the 100 K unused cache becomes a gap in sequence IDs.

7.9 - Merging table data

MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set.

MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set. The join can match a source row with only one target row; otherwise, Vertica returns an error.

MERGE has the following syntax:

MERGE INTO target-table USING source-dataset ON  join-condition
matching-clause[ matching-clause ]

Merge operations have at least three components:

7.9.1 - Basic MERGE example

In this example, a merge operation involves two tables:.

In this example, a merge operation involves two tables:

  • visits_daily logs daily restaurant traffic, and is updated with each customer visit. Data in this table is refreshed every 24 hours.

  • visits_history stores the history of customer visits to various restaurants, accumulated over an indefinite time span.

Each night, you merge the daily visit count from visits_daily into visits_history. The merge operation modifies the target table in two ways:

  • Updates existing customer data.

  • Inserts new rows of data for first-time customers.

One MERGE statement executes both operations as a single (upsert) transaction.

Source and target tables

The source and target tables visits_daily and visits_history are defined as follows:

CREATE TABLE public.visits_daily
(
    customer_id int,
    location_name varchar(20),
    visit_time time(0) DEFAULT (now())::timetz(6)
);

CREATE TABLE public.visits_history
(
    customer_id int,
    location_name varchar(20),
    visit_count int
);

Table visits_history contains rows of three customers who between them visited two restaurants, Etoile and LaRosa:

=> SELECT * FROM visits_history ORDER BY customer_id, location_name;
 customer_id | location_name | visit_count
-------------+---------------+-------------
        1001 | Etoile        |           2
        1002 | La Rosa       |           4
        1004 | Etoile        |           1
(3 rows)

By close of business, table visits_daily contains three rows of restaurant visits:

=> SELECT * FROM visits_daily ORDER BY customer_id, location_name;
 customer_id | location_name | visit_time
-------------+---------------+------------
        1001 | Etoile        | 18:19:29
        1003 | Lux Cafe      | 08:07:00
        1004 | La Rosa       | 11:49:20
(3 rows)

Table data merge

The following MERGE statement merges visits_daily data into visits_history:

  • For matching customers, MERGE updates the occurrence count.

  • For non-matching customers, MERGE inserts new rows.

=> MERGE INTO visits_history h USING visits_daily d
    ON (h.customer_id=d.customer_id AND h.location_name=d.location_name)
    WHEN MATCHED THEN UPDATE SET visit_count = h.visit_count  + 1
    WHEN NOT MATCHED THEN INSERT (customer_id, location_name, visit_count)
    VALUES (d.customer_id, d.location_name, 1);
 OUTPUT
--------
      3
(1 row)

MERGE returns the number of rows updated and inserted. In this case, the returned value specifies three updates and inserts:

  • Customer 1001's third visit to Etoile

  • New customer 1003's first visit to new restaurant Lux Cafe

  • Customer 1004's first visit to La Rosa

If you now query table visits_history, the result set shows the merged (updated and inserted) data. Updated and new rows are highlighted:

7.9.2 - MERGE source options

A MERGE operation joins the target table to one of the following data sources:.

A MERGE operation joins the target table to one of the following data sources:

  • Another table

  • View

  • Subquery result set

Merging from table and view data

You merge data from one table into another as follows:

MERGE INTO target-table USING { source-table | source-view } join-condition
   matching-clause[ matching-clause ]

If you specify a view, Vertica expands the view name to the query that it encapsulates, and uses the result set as the merge source data.

For example, the VMart table public.product_dimension contains current and discontinued products. You can move all discontinued products into a separate table public.product_dimension_discontinued, as follows:

=> CREATE TABLE public.product_dimension_discontinued (
     product_key int,
     product_version int,
     sku_number char(32),
     category_description char(32),
     product_description varchar(128));

=> MERGE INTO product_dimension_discontinued tgt
     USING product_dimension src ON tgt.product_key = src.product_key
                                AND tgt.product_version = src.product_version
     WHEN NOT MATCHED AND src.discontinued_flag='1' THEN INSERT VALUES
       (src.product_key,
        src.product_version,
        src.sku_number,
        src.category_description,
        src.product_description);
 OUTPUT
--------
   1186
(1 row)

Source table product_dimension uses two columns, product_key and product_version, to identify unique products. The MERGE statement joins the source and target tables on these columns in order to return single instances of non-matching rows. The WHEN NOT MATCHED clause includes a filter (src.discontinued_flag='1'), which reduces the result set to include only discontinued products. The remaining rows are inserted into target table product_dimension_discontinued.

Merging from a subquery result set

You can merge into a table the result set that is returned by a subquery, as follows:

MERGE INTO target-table USING (subquery) sq-alias join-condition
   matching-clause[ matching-clause ]

For example, the VMart table public.product_dimension is defined as follows (DDL truncated):

CREATE TABLE public.product_dimension
(
    product_key int NOT NULL,
    product_version int NOT NULL,
    product_description varchar(128),
    sku_number char(32),
    ...
)
ALTER TABLE public.product_dimension
    ADD CONSTRAINT C_PRIMARY PRIMARY KEY (product_key, product_version) DISABLED;

Columns product_key and product_version comprise the table's primary key. You can modify this table so it contains a single column that concatenates the values of these two columns. This column can be used to uniquely identify each product, while also maintaining the original values from product_key and product_version.

You populate the new column with a MERGE statement that queries the other two columns:

=> ALTER TABLE public.product_dimension ADD COLUMN product_ID numeric(8,2);
ALTER TABLE

=> MERGE INTO product_dimension tgt
     USING (SELECT (product_key||'.0'||product_version)::numeric(8,2) AS pid, sku_number
     FROM product_dimension) src
     ON tgt.product_key||'.0'||product_version::numeric=src.pid
     WHEN MATCHED THEN UPDATE SET product_ID = src.pid;
 OUTPUT
--------
  60000
(1 row)

The following query verifies that the new column values correspond to the values in product_key and product_version:

=> SELECT product_ID, product_key, product_version, product_description
   FROM product_dimension
   WHERE category_description = 'Medical'
     AND product_description ILIKE '%diabetes%'
     AND discontinued_flag = 1 ORDER BY product_ID;
 product_ID | product_key | product_version |           product_description
------------+-------------+-----------------+-----------------------------------------
    5836.02 |        5836 |               2 | Brand #17487 diabetes blood testing kit
   14320.02 |       14320 |               2 | Brand #43046 diabetes blood testing kit
   18881.01 |       18881 |               1 | Brand #56743 diabetes blood testing kit
(3 rows)

7.9.3 - MERGE matching clauses

MERGE supports one instance of the following matching clauses:.

MERGE supports one instance of the following matching clauses:

Each matching clause can specify an additional filter, as described in Update and insert filters.

WHEN MATCHED THEN UPDATE SET

Updates all target table rows that are joined to the source table, typically with data from the source table:

WHEN MATCHED [ AND update-filter ] THEN UPDATE
   SET { target-column = expression }[,...]

Vertica can execute the join only on unique values in the source table's join column. If the source table's join column contains more than one matching value, the MERGE statement returns with a run-time error.

WHEN NOT MATCHED THEN INSERT

WHEN NOT MATCHED THEN INSERT inserts into the target table a new row for each source table row that is excluded from the join:

WHEN NOT MATCHED [ AND insert-filter ] THEN INSERT
   [ ( column-list ) ] VALUES ( values-list )

column-list is a comma-delimited list of one or more target columns in the target table, listed in any order. MERGE maps column-list columns to values-list values in the same order, and each column-value pair must be compatible. If you omit column-list, Vertica maps values-list values to columns according to column order in the table definition.

For example, given the following source and target table definitions:

CREATE TABLE t1 (a int, b int, c int);
CREATE TABLE t2 (x int, y int, z int);

The following WHEN NOT MATCHED clause implicitly sets the values of the target table columns a, b, and c in the newly inserted rows:

MERGE INTO t1 USING t2 ON t1.a=t2.x
   WHEN NOT MATCHED THEN INSERT VALUES (t2.x, t2.y, t2.z);

In contrast, the following WHEN NOT MATCHED clause excludes columns t1.b and t2.y from the merge operation. The WHEN NOT MATCHED clause explicitly pairs two sets of columns from the target and source tables: t1.a to t2.x, and t1.c to t2.z. Vertica sets excluded column t1.b. to null:

MERGE INTO t1 USING t2 ON t1.a=t2.x
   WHEN NOT MATCHED THEN INSERT (a, c) VALUES (t2.x, t2.z);

7.9.4 - Update and insert filters

Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:.

Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:

WHEN MATCHED AND update-filter THEN UPDATE ...
WHEN NOT MATCHED AND insert-filter THEN INSERT ...

Vertica also supports Oracle syntax for specifying update and insert filters:

WHEN MATCHED THEN UPDATE SET column-updates WHERE update-filter
WHEN NOT MATCHED THEN INSERT column-values WHERE insert-filter

Each filter can specify multiple conditions. Vertica handles the filters as follows:

  • An update filter is applied to the set of matching rows in the target table that are returned by the MERGE join. For each row where the update filter evaluates to true, Vertica updates the specified columns.

  • An insert filter is applied to the set of source table rows that are excluded from the MERGE join. For each row where the insert filter evaluates to true, Vertica adds a new row to the target table with the specified values.

For example, given the following data in tables t11 and t22:


=> SELECT * from t11 ORDER BY pk;
 pk | col1 | col2 | SKIP_ME_FLAG
----+------+------+--------------
  1 |    2 |    3 | t
  2 |    3 |    4 | t
  3 |    4 |    5 | f
  4 |      |    6 | f
  5 |    6 |    7 | t
  6 |      |    8 | f
  7 |    8 |      | t
(7 rows)

=> SELECT * FROM t22 ORDER BY pk;
 pk | col1 | col2
----+------+------
  1 |    2 |    4
  2 |    4 |    8
  3 |    6 |
  4 |    8 |   16
(4 rows)

You can merge data from table t11 into table t22 with the following MERGE statement, which includes update and insert filters:

=> MERGE INTO t22 USING t11 ON ( t11.pk=t22.pk )
   WHEN MATCHED
       AND t11.SKIP_ME_FLAG=FALSE AND (
         COALESCE (t22.col1<>t11.col1, (t22.col1 is null)<>(t11.col1 is null))
       )
   THEN UPDATE SET col1=t11.col1, col2=t11.col2
   WHEN NOT MATCHED
      AND t11.SKIP_ME_FLAG=FALSE
   THEN INSERT (pk, col1, col2) VALUES (t11.pk, t11.col1, t11.col2);
 OUTPUT
--------
      3
(1 row)

=> SELECT * FROM t22 ORDER BY pk;
 pk | col1 | col2
----+------+------
  1 |    2 |    4
  2 |    4 |    8
  3 |    4 |    5
  4 |      |    6
  6 |      |    8
(5 rows)

Vertica uses the update and insert filters as follows:

  • Evaluates all matching rows against the update filter conditions. Vertica updates each row where the following two conditions both evaluate to true:

    • Source column t11.SKIP_ME_FLAG is set to false.

    • The COALESCE function evaluates to true.

  • Evaluates all non-matching rows in the source table against the insert filter. For each row where column t11.SKIP_ME_FLAG is set to false, Vertica inserts a new row in the target table.

7.9.5 - MERGE optimization

You can improve MERGE performance in several ways:.

You can improve MERGE performance in several ways:

Projections for MERGE operations

The Vertica query optimizer automatically chooses the best projections to implement a merge operation. A good projection design strategy provides projections that help the query optimizer avoid extra sort and data transfer operations, and facilitate MERGE performance.

For example, the following MERGE statement fragment joins source and target tables tgt and src, respectively, on columns tgt.a and src.b:

=> MERGE INTO tgt USING src ON tgt.a = src.b ...

Vertica can use a local merge join if projections for tables tgt and src use one of the following projection designs, where inputs are presorted by projection ORDER BY clauses:

  • Replicated projections are sorted on:

    • Column a for table tgt

    • Column b for table src

  • Segmented projections are identically segmented on:

    • Column a for table tgt

    • Column b for table src

    • Corresponding segmented columns

Optimizing MERGE query plans

Vertica prepares an optimized query plan if the following conditions are all true:

  • The MERGE statement contains both matching clauses WHEN MATCHED THEN UPDATE SET and WHEN NOT MATCHED THEN INSERT. If the MERGE statement contains only one matching clause, it uses a non-optimized query plan.

  • The MERGE statement excludes update and insert filters.

  • The target table join column has a unique or primary key constraint. This requirement does not apply to the source table join column.

  • Both matching clauses specify all columns in the target table.

  • Both matching clauses specify identical source values.

For details on evaluating an EXPLAIN-generated query plan, see MERGE path.

The examples that follow use a simple schema to illustrate some of the conditions under which Vertica prepares or does not prepare an optimized query plan for MERGE:

CREATE TABLE target(a INT PRIMARY KEY, b INT, c INT) ORDER BY b,a;
CREATE TABLE source(a INT, b INT, c INT) ORDER BY b,a;
INSERT INTO target VALUES(1,2,3);
INSERT INTO target VALUES(2,4,7);
INSERT INTO source VALUES(3,4,5);
INSERT INTO source VALUES(4,6,9);
COMMIT;

Optimized MERGE statement

Vertica can prepare an optimized query plan for the following MERGE statement because:

  • The target table's join column t.a has a primary key constraint.

  • All columns in the target table (a,b,c) are included in the UPDATE and INSERT clauses.

  • The UPDATE and INSERT clauses specify identical source values: s.a, s.b, and s.c.

MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a, b=s.b, c=s.c
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);

 OUTPUT
--------
2
(1 row)

The output value of 2 indicates success and denotes the number of rows updated/inserted from the source into the target.

Non-optimized MERGE statement

In the next example, the MERGE statement runs without optimization because the source values in the UPDATE/INSERT clauses are not identical. Specifically, the UPDATE clause includes constants for columns s.a and s.c and the INSERT clause does not:


MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c - 1
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);

To make the previous MERGE statement eligible for optimization, rewrite the statement so that the source values in the UPDATE and INSERT clauses are identical:


MERGE INTO target t USING source s ON t.a = s.a
  WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c -1
  WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a + 1, s.b, s.c - 1);

7.9.6 - MERGE restrictions

The following restrictions apply to updating and inserting table data with MERGE.

The following restrictions apply to updating and inserting table data with MERGE.

Constraint enforcement

If primary key, unique key, or check constraints are enabled for automatic enforcement in the target table, Vertica enforces those constraints when you load new data. If a violation occurs, Vertica rolls back the operation and returns an error.

Columns prohibited from merge

The following columns cannot be specified in a merge operation; attempts to do so return with an error:

  • IDENTITY columns, or columns whose default value is set to a named sequence.

  • Vmap columns such as __raw__ in flex tables.

  • Columns of complex types ARRAY, SET, or ROW.

7.10 - Removing table data

Vertica provides several ways to remove data from a table:.

Vertica provides several ways to remove data from a table:

Delete operation Description
Drop a table Permanently remove a table and its definition, optionally remove associated views and projections.
Delete table rows Mark rows with delete vectors and store them so data can be rolled back to a previous epoch. The data must be purged to reclaim disk space.
Truncate table data Remove all storage and history associated with a table. The table structure is preserved for future use.
Purge data Permanently remove historical data from physical storage and free disk space for reuse.
Drop partitions Remove one more partitions from a table. Each partition contains a related subset of data in the table. Dropping partitioned data is efficient, and provides query performance benefits.

7.10.1 - Data removal operations compared

/need to include purge operations? or is that folded into DELETE operations?/.

The following table summarizes differences between various data removal operations.

Operations and options Performance Auto commits Saves history
DELETE FROM ``table Normal No Yes
DELETE FROM ``temp-table High No No
DELETE FROM table where-clause Normal No Yes
DELETE FROM temp-table where-clause Normal No Yes
DELETE FROM temp-table where-clause 
ON COMMIT PRESERVE ROWS
Normal No Yes
DELETE FROM temp-table where-clause 
ON COMMIT DELETE ROWS
High Yes No
DROP table High Yes No
TRUNCATE table High Yes No
TRUNCATE temp-table High Yes No
SELECT DROP_PARTITIONS (...) High Yes No

Choosing the best operation

The following table can help you decide which operation is best for removing table data:

If you want to... Use...
Delete both table data and definitions and start from scratch. DROP TABLE
Quickly drop data while preserving table definitions, and reload data. TRUNCATE TABLE
Regularly perform bulk delete operations on logical sets of data. DROP_PARTITIONS
Occasionally perform small deletes with the option to roll back or review history. DELETE

7.10.2 - Optimizing DELETE and UPDATE

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries.

Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries. A DELETE and UPDATE operation must update all projections, so the operation can only be as fast as the slowest projection.

To improve the performance of DELETE and UPDATE queries, consider the following issues:

  • Query performance after large DELETE operations: Vertica's implementation of DELETE differs from traditional databases: it does not delete data from disk storage; rather, it marks rows as deleted so they are available for historical queries. Deletion of 10% or more of the total rows in a table can adversely affect queries on that table. In that case, consider purging those rows to improve performance.
  • Recovery performance: Recovery is the action required for a cluster to restore K-safety after a crash. Large numbers of deleted records can degrade the performance of a recovery. To improve recovery performance, purge the deleted rows.
  • Concurrency: DELETE and UPDATE take exclusive locks on the table. Only one DELETE or UPDATE transaction on a table can be in progress at a time and only when no load operations are in progress. Delete and update operations on different tables can run concurrently.

Projection column requirements for optimized delete

A projection is optimized for delete and update operations if it contains all columns required by the query predicate. In general, DML operations are significantly faster when performed on optimized projections than on non-optimized projections.

For example, consider the following table and projections:

=> CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER);
=> CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a;
=> CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;

In the following query, both p1 and p2 are eligible for DELETE and UPDATE optimization because column a is available:

=> DELETE from t WHERE a = 1;

In the following example, only projection p1 is eligible for DELETE and UPDATE optimization because the b column is not available in p2:

=> DELETE from t WHERE b = 1;

Optimized DELETE in subqueries

To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE statement's WHERE clause must be in the projection definition.

For example, the following simple schema has two tables and three projections:

=> CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
=> CREATE TABLE tb2 (g INT, h INT, i INT, j INT);

The first projection references all columns in tb1 and sorts on column a:

=> CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;

The buddy projection references and sorts on column a in tb1:

=> CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;

This projection references all columns in tb2 and sorts on column i:

=> CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;

Consider the following DML statement, which references tb1.a in its WHERE clause. Since both projections on tb1 contain column a, both are eligible for the optimized DELETE:

=> DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);

Restrictions

Optimized DELETE operations are not supported under the following conditions:

  • With replicated projections if subqueries reference the target table. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e = tb1.e);
    
  • With subqueries that do not return multiple rows. For example, the following syntax is not supported:

    => DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
    

Projection sort order for optimizing DELETE

Design your projections so that frequently-used DELETE or UPDATE predicate columns appear in the sort order of all projections for large DELETE and UPDATE operations.

For example, suppose most of the DELETE queries you perform on a projection look like the following:

=> DELETE from t where time_key < '1-1-2007'

To optimize the delete operations, make time_key appear in the ORDER BY clause of all projections. This schema design results in better performance of the delete operation.

In addition, add sort columns to the sort order such that each combination of the sort key values uniquely identifies a row or a small set of rows. For more information, see Choosing sort order: best practices. To analyze projections for sort order issues, use the EVALUATE_DELETE_PERFORMANCE function.

7.10.3 - Purging deleted data

In Vertica, delete operations do not remove rows from physical storage.

In Vertica, delete operations do not remove rows from physical storage. DELETE marks rows as deleted, as does UPDATE, which combines delete and insert operations. In both cases, Vertica retains discarded rows as historical data, which remains accessible to historical queries until it is purged.

The cost of retaining historical data is twofold:

  • Disk space is allocated to deleted rows and delete markers.

  • Typical (non-historical) queries must read and skip over deleted data, which can impact performance.

A purge operation permanently removes historical data from physical storage and frees disk space for reuse. Only historical data that precedes the Ancient History Mark (AHM) is eligible to be purged.

You can purge data in two ways:

In both cases, Vertica purges all historical data up to and including the AHM epoch and resets the AHM. See Epochs for additional information about how Vertica uses epochs.

7.10.3.1 - Setting a purge policy

The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged.

The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged. Eligible data is automatically purged when the Tuple Mover performs mergeout operations.

Vertica provides two methods for determining when deleted data is eligible to be purged:

  • Specifying the time for which delete data is saved

  • Specifying the number of epochs that are saved

Specifying the time for which delete data is saved

Specifying the time for which delete data is saved is the preferred method for determining which deleted data can be purged. By default, Vertica saves historical data only when nodes are down.

To change the specified time for saving deleted data, use the HistoryRetentionTime configuration parameter:

=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = {seconds | -1};

In the above syntax:

  • seconds is the amount of time (in seconds) for which to save deleted data.

  • -1 indicates that you do not want to use the HistoryRetentionTime configuration parameter to determine which deleted data is eligible to be purged. Use this setting if you prefer to use the other method (HistoryRetentionEpochs) for determining which deleted data can be purged.

The following example sets the history epoch retention level to 240 seconds:

=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = 240;

Specifying the number of epochs that are saved

Unless you have a reason to limit the number of epochs, Vertica recommends that you specify the time over which delete data is saved.

To specify the number of historical epoch to save through the HistoryRetentionEpochs configuration parameter:

  1. Turn off the HistoryRetentionTime configuration parameter:

    => ALTER DATABASE DEFAULT SET HistoryRetentionTime = -1;
    
  2. Set the history epoch retention level through the HistoryRetentionEpochs configuration parameter:

    => ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = {num_epochs | -1};
    
    • num_epochs is the number of historical epochs to save.

    • -1 indicates that you do not want to use the HistoryRetentionEpochs configuration parameter to trim historical epochs from the epoch map. By default, HistoryRetentionEpochs is set to -1.

The following example sets the number of historical epochs to save to 40:

=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = 40;

Modifications are immediately implemented across all nodes within the database cluster. You do not need to restart the database.

See Epoch management parameters for additional details. See Epochs for information about how Vertica uses epochs.

Disabling purge

If you want to preserve all historical data, set the value of both historical epoch retention parameters to -1, as follows:

=> ALTER DABABASE mydb SET HistoryRetentionTime = -1;
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = -1;

7.10.3.2 - Manually purging data

You manually purge deleted data as follows:.

You manually purge deleted data as follows:

  1. Set the cut-off date for purging deleted data. First, call one of the following functions to verify the current ancient history mark (AHM):

    • GET_AHM_TIME returns a TIMESTAMP value of the AHM.

    • GET_AHM_EPOCH returns the number of the epoch in which the AHM is located.

  2. Set the AHM to the desired cut-off date with one of the following functions:

    If you call SET_AHM_TIME, keep in mind that the timestamp you specify is mapped to an epoch, which by default has a three-minute granularity. Thus, if you specify an AHM time of 2008-01-01 00:00:00.00, Vertica might purge data from the first three minutes of 2008, or retain data from last three minutes of 2007.

  3. Purge deleted data from the desired projections with one of the following functions:

    The tuple mover performs a mergeout operation to purge the data. Vertica periodically invokes the tuple mover to perform mergeout operations, as configured by tuple mover parameters. You can manually invoke the tuple mover by calling the function DO_TM_TASK.

See Epochs for additional information about how Vertica uses epochs.

7.10.4 - Truncating tables

TRUNCATE TABLE removes all storage associated with the target table and its projections.

TRUNCATE TABLE removes all storage associated with the target table and its projections. Vertica preserves the table and the projection definitions. If the truncated table has out-of-date projections, those projections are cleared and marked up-to-date when TRUNCATE TABLE returns.

TRUNCATE TABLE commits the entire transaction after statement execution, even if truncating the table fails. You cannot roll back a TRUNCATE TABLE statement.

Use TRUNCATE TABLE for testing purposes. You can use it to remove all data from a table and load it with fresh data, without recreating the table and its projections.

Table locking

TRUNCATE TABLE takes an O (owner) lock on the table until the truncation process completes. The savepoint is then released.

If the operation cannot obtain an O lock on the target table, Vertica tries to close any internal Tuple Mover sessions that are running on that table. If successful, the operation can proceed. Explicit Tuple Mover operations that are running in user sessions do not close. If an explicit Tuple Mover operation is running on the table, the operation proceeds only when the operation is complete.

Restrictions

You cannot truncate an external table.

Examples

=> INSERT INTO sample_table (a) VALUES (3);
=> SELECT * FROM sample_table;
a
---
3
(1 row)
=> TRUNCATE TABLE sample_table;
TRUNCATE TABLE
=> SELECT * FROM sample_table;
a
---
(0 rows)

7.11 - Rebuilding tables

You can reclaim disk space on a large scale by rebuilding tables, as follows:.

You can reclaim disk space on a large scale by rebuilding tables, as follows:

  1. Create a table with the same (or similar) definition as the table to rebuild.

  2. Create projections for the new table.

  3. Copy data from the target table into the new one with INSERT...SELECT.

  4. Drop the old table and its projections.

  5. Rename the new table with ALTER TABLE...RENAME, using the name of the old table.

Projection considerations

  • You must have enough disk space to contain the old and new projections at the same time. If necessary, you can drop some of the old projections before loading the new table. You must, however, retain at least one superprojection of the old table (or two buddy superprojections to maintain K-safety) until the new table is loaded. (See Prepare disk storage locations for disk space requirements.)

  • You can specify different names for the new projections or use ALTER TABLE...RENAME to change the names of the old projections.

  • The relationship between tables and projections does not depend on object names. Instead, it depends on object identifiers that are not affected by rename operations. Thus, if you rename a table, its projections continue to work normally.

7.12 - Dropping tables

DROP TABLE drops a table from the database catalog.

DROP TABLE drops a table from the database catalog. If any projections are associated with the table, DROP TABLE returns an error message unless it also includes the CASCADE option. One exception applies: the table only has an auto-generated superprojection (auto-projection) associated with it.

Using CASCADE

In the following example, DROP TABLE tries to remove a table that has several projections associated with it. Because it omits the CASCADE option, Vertica returns an error:

=> DROP TABLE d1;
NOTICE: Constraint - depends on Table d1
NOTICE: Projection d1p1 depends on Table d1
NOTICE: Projection d1p2 depends on Table d1
NOTICE: Projection d1p3 depends on Table d1
NOTICE: Projection f1d1p1 depends on Table d1
NOTICE: Projection f1d1p2 depends on Table d1
NOTICE: Projection f1d1p3 depends on Table d1
ERROR: DROP failed due to dependencies: Cannot drop Table d1 because other objects depend on it
HINT: Use DROP ... CASCADE to drop the dependent objects too.
=> DROP TABLE d1 CASCADE;
DROP TABLE
=> CREATE TABLE mytable (a INT, b VARCHAR(256));
CREATE TABLE
=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Doesn't exist
NOTICE:  Nothing was dropped
DROP TABLE

The next attempt includes the CASCADE option and succeeds:

=> DROP TABLE d1 CASCADE;
DROP TABLE
=> CREATE TABLE mytable (a INT, b VARCHAR(256));
CREATE TABLE
=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Doesn't exist
NOTICE:  Nothing was dropped
DROP TABLE

Using IF EXISTS

In the following example, DROP TABLE includes the option IF EXISTS. This option specifies not to report an error if one or more of the tables to drop does not exist. This clause is useful in SQL scripts—for example, to ensure that a table is dropped before you try to recreate it:

=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Table doesn't exist
NOTICE:  Nothing was dropped
DROP TABLE

Dropping and restoring view tables

Views that reference a table that is dropped and then replaced by another table with the same name continue to function and use the contents of the new table. The new table must have the same column definitions.

8 - Managing client connections

Vertica provides several settings to control client connections:.

Vertica provides several settings to control client connections:

  • Limit the number of client connections a user can have open at the same time.

  • Limit the time a client connection can be idle before being automatically disconnected.

  • Use connection load balancing to spread the overhead of servicing client connections among nodes.

  • Detect unresponsive clients with TCP keepalive.

  • Drain a subcluster to reject any new client connections to that subcluster. For details, see Drain client connections.

  • Route client connections to subclusters based on their workloads. For details, see Workload routing.

Total client connections to a given node cannot exceed the limits set in MaxClientSessions.

Changes to a client's MAXCONNECTIONS property have no effect on current sessions; these changes apply only to new sessions. For example, if you change user's connection mode from DATABASE to NODE, current node connections are unaffected. This change applies only to new sessions, which are reserved on the invoking node.

When Vertica closes a client connection, the client's ongoing operations, if any, are canceled.

8.1 - Limiting the number and length of client connections

You can manage how many active sessions a user can open to the server, and the duration of those sessions.

You can manage how many active sessions a user can open to the server, and the duration of those sessions. Doing so helps prevent overuse of available resources, and can improve overall throughput.

You can define connection limits at two levels:

  • Set the MAXCONNECTIONS property on individual users. This property specifies how many sessions a user can open concurrently on individual nodes, or across the database cluster. For example, the following ALTER USER statement allows user Joe up to 10 concurrent sessions:

    => ALTER USER Joe MAXCONNECTIONS 10 ON DATABASE;
    
  • Set the configuration parameter MaxClientSessions on the database or individual nodes. This parameter specifies the maximum number of client sessions that can run on nodes in the database cluster, by default set to 50. An extra five sessions are always reserved to dbadmin users. This enables them to log in when the total number of client sessions equals MaxClientSessions.

Total client connections to a given node cannot exceed the limits set in MaxClientSessions.

Changes to a client's MAXCONNECTIONS property have no effect on current sessions; these changes apply only to new sessions. For example, if you change user's connection mode from DATABASE to NODE, current node connections are unaffected. This change applies only to new sessions, which are reserved on the invoking node.

Managing TCP keepalive settings

Vertica uses kernel TCP keepalive parameters to detect unresponsive clients and determine when the connection should be closed.Vertica also supports a set of equivalent KeepAlive parameters that can override TCP keepalive parameter settings. By default, all Vertica KeepAlive parameters are set to 0, which signifies to use TCP keepalive settings. To override TCP keepalive settings, set the equivalent parameters at the database level with ALTER DATABASE, or for the current session with ALTER SESSION.

TCP keepalive Parameter Vertica Parameter Description
tcp_keepalive_time KeepAliveIdleTime Length (in seconds) of the idle period before the first TCP keepalive probe is sent to ensure that the client is still connected.
tcp_keepalive_probes KeepAliveProbeCount Number of consecutive keepalive probes that must go unacknowledged by the client before the client connection is considered lost and closed
tcp_keepalive_intvl KeepAliveProbeInterval Time interval (in seconds) between keepalive probes.

Examples

The following examples show how to use Vertica KeepAlive parameters to override TCP keepalive parameters as follows:

  • After 600 seconds (ten minutes), the first keepalive probe is sent to the client.

  • Consecutive keepalive probes are sent every 30 seconds.

  • If the client fails to respond to 10 keepalive probes, the connection is considered lost and is closed.

To make this the default policy for client connections, use ALTER DATABASE:

=> ALTER DATABASE DEFAULT SET KeepAliveIdleTime = 600;
=> ALTER DATABASE DEFAULT SET KeepAliveProbeInterval = 30;
=> ALTER DATABASE DEFAULT SET KeepAliveProbeCount = 10;

To override database-level policies for the current session, use ALTER SESSION:

=> ALTER SESSION SET KeepAliveIdleTime = 400;
=> ALTER SESSION SET KeepAliveProbeInterval = 72;
=> ALTER SESSION SET KeepAliveProbeCount = 60;

Query system table CONFIGURATION_PARAMETERS to verify database and session settings of the three Vertica KeepAlive parameters:

=> SELECT parameter_name, database_value, current_value FROM configuration_parameters WHERE parameter_name ILIKE 'KeepAlive%';
     parameter_name     | database_value | current_value
------------------------+----------------+---------------
 KeepAliveProbeCount    | 10             | 60
 KeepAliveIdleTime      | 600            | 400
 KeepAliveProbeInterval | 30             | 72
(3 rows)

Limiting idle session length

If a client continues to respond to TCP keepalive probes, but is not running any queries, the client's session is considered idle. Idle sessions eventually time out. The maximum time that sessions are allowed to idle can be set at three levels, in descending order of precedence:

  • As dbadmin, set the IDLESESSIONTIMEOUT property for individual users. This property overrides all other session timeout settings.
  • Users can limit the idle time of the current session with SET SESSION IDLESESSIONTIMEOUT. Non-superusers can only set their session idle time to a value equal to or lower than their own IDLESESSIONTIMEOUT setting. If no session idle time is explicitly set for a user, the session idle time for that user is inherited from the node or database settings.
  • As dbadmin, set configuration parameter DEFAULTIDLESESSIONTIMEOUT. on the database or on individual nodes. This You can limit the default database cluster or individual nodes, with configuration parameter DEFAULTIDLESESSIONTIMEOUT. This parameter sets the default timeout value for all non-superusers.

All settings apply to sessions that are continuously idle—that is, sessions where no queries are running. If a client is slow or unresponsive during query execution, that time does not apply to timeouts. For example, the time that is required for a streaming batch insert is not counted towards timeout. The server identifies a session as idle starting from the moment it starts to wait for any type of message from that session.

Viewing session settings

The following sections demonstrate how you can query the database for details about the session and connection limits.

Session length limits

Use SHOW DATABASE to view the session length limit for the database:

=> SHOW DATABASE DEFAULT DEFAULTIDLESESSIONTIMEOUT;
           name            | setting
---------------------------+---------
 DefaultIdleSessionTimeout | 2 day
(1 row)

Use SHOW to view the length limit for the current session:

=> SHOW IDLESESSIONTIMEOUT;
        name        | setting
--------------------+---------
 idlesessiontimeout | 1
(1 row)

Connection limits

Use SHOW DATABASE to view the connection limits for the database:

=> SHOW DATABASE DEFAULT MaxClientSessions;
       name        | setting
-------------------+---------
 MaxClientSessions | 50
(1 row)

Query USERS to view the connection limits for users:

=> SELECT user_name, max_connections, connection_limit_mode FROM users
     WHERE user_name != 'dbadmin';
 user_name | max_connections | connection_limit_mode
-----------+-----------------+-----------------------
 SuzyX     | 3               | database
 Joe       | 10              | database
(2 rows)

Closing user sessions

To manually close a user session, use CLOSE_USER_SESSIONS:

=> SELECT CLOSE_USER_SESSIONS ('Joe');
                             close_user_sessions
------------------------------------------------------------------------------
 Close all sessions for user Joe sent. Check v_monitor.sessions for progress.
(1 row)

Example

A user executes a query, and for some reason the query takes an unusually long time to finish (for example, because of server traffic or query complexity). In this case, the user might think the query failed, and opens another session to run the same query. Now, two sessions run the same query, using an extra connection.

To prevent this situation, you can limit how many sessions individual users can run, by modifying their MAXCONNECTIONS user property. This can help minimize the chances of running redundant queries. It also helps prevent users from consuming all available connections, as set by the database For example, the following setting on user SuzyQ limits her to two database sessions at any time:

=> CREATE USER SuzyQ MAXCONNECTIONS 2 ON DATABASE;

Limiting Another issue setting client connections prevents is when a user connects to the server many times. Too many user connections exhausts the number of allowable connections set by database configuration parameter MaxClientSessions.

Cluster changes and connections

Behavior changes can occur with client connection limits when the following changes occur to a cluster:

  • You add or remove a node.

  • A node goes down or comes back up.

Changes in node availability between connection requests have little impact on connection limits.

In terms of honoring connection limits, no significant impact exists when nodes go down or come up in between connection requests. No special actions are needed to handle this. However, if a node goes down, its active session exits and other nodes in the cluster also drop their sessions. This frees up connections. The query may hang in which case the blocked sessions are reasonable and as expected.

8.2 - Drain client connections

Draining client connections in a subclusters prepares the subcluster for shutdown by marking all nodes in the subcluster as draining.

Eon Mode only

Draining client connections in a subclusters prepares the subcluster for shutdown by marking all nodes in the subcluster as draining. Work from existing user sessions continues on draining nodes, but the nodes refuse new client connections and are excluded from load-balancing operations. If clients attempt to connect to a draining node, they receive an error that informs them of the draining status. Load balancing operations exclude draining nodes, so clients that opt-in to connection load balancing should receive a connection error only if all nodes in the load balancing policy are draining. You do not need to change any connection load balancing configurations to use this feature. dbadmin can still connect to draining nodes.

To drain client connections before shutting down a subcluster, you can use the SHUTDOWN_WITH_DRAIN function. This function performs a Graceful Shutdown that marks a subcluster as draining until either the existing connections complete their work and close or a user-specified timeout is reached. When one of these conditions is met, the function proceeds to shutdown the subcluster. Vertica provides several meta-functions that allow you to independently perform each step of the SHUTDOWN_WITH_DRAIN process. You can use the START_DRAIN_SUBCLUSTER function to mark a subcluster as draining and then the SHUTDOWN_SUBCLUSTER function to shut down a subcluster once its connections have closed.

You can use the CANCEL_DRAIN_SUBCLUSTER function to mark all nodes in a subcluster as not draining. As soon as a node is both UP and not draining, the node accepts new client connections. If all nodes in a draining subcluster are down, the draining status of its nodes is automatically reset to not draining.

You can query the DRAINING_STATUS system table to monitor the draining status of each node as well as client connection information, such as the number of active user sessions on each node.

The following example drains a subcluster named analytics, then cancels the draining of the subcluster.

To mark the analytics subcluster as draining, call SHUTDOWN_WITH_DRAIN with a negative timeout value:

=> SELECT SHUTDOWN_WITH_DRAIN('analytics', -1);
NOTICE 0:  Draining has started on subcluster (analytics)

You can confirm that the subcluster is draining by querying the DRAINING_STATUS system table:

=> SELECT node_name, subcluster_name, is_draining FROM draining_status ORDER BY 1;
node_name          | subcluster_name    | is_draining
-------------------+--------------------+--------------
verticadb_node0001 | default_subcluster | f
verticadb_node0002 | default_subcluster | f
verticadb_node0003 | default_subcluster | f
verticadb_node0004 | analytics          | t
verticadb_node0005 | analytics          | t
verticadb_node0006 | analytics          | t

If a client attempts to connect directly to a node in the draining subcluster, they receive the following error message:

$ /opt/vertica/bin/vsql -h noeIP --password password verticadb analyst
vsql: FATAL 10611:  New session rejected because subcluster to which this node belongs is draining connections

To cancel the graceful shutdown of the analytics subcluster, you can type Ctrl+C:

=> SELECT SHUTDOWN_WITH_DRAIN('analytics', -1);
NOTICE 0:  Draining has started on subcluster (analytics)
^CCancel request sent
ERROR 0:  Cancel received after draining started and before shutdown issued. Nodes will not be shut down. The subclusters are still in the draining state.
HINT:  Run cancel_drain_subcluster('') to restore all nodes to the 'not_draining' state

As mentioned in the above hint, you can run CANCEL_DRAIN_SUBCLUSTER to reset the status of the draining nodes in the subcluster to not draining:

=> SELECT CANCEL_DRAIN_SUBCLUSTER('analytics');
              CANCEL_DRAIN_SUBCLUSTER
--------------------------------------------------------
Targeted subcluster: 'analytics'
Action: CANCEL DRAIN

(1 row)

To confirm that the subcluster is no longer draining, you can again query the DRAINING_STATUS system table:

=> SELECT node_name, subcluster_name, is_draining FROM draining_status ORDER BY 1;
node_name          | subcluster_name    | is_draining
-------------------+--------------------+-------
verticadb_node0001 | default_subcluster | f
verticadb_node0002 | default_subcluster | f
verticadb_node0003 | default_subcluster | f
verticadb_node0004 | analytics          | f
verticadb_node0005 | analytics          | f
verticadb_node0006 | analytics          | f
(6 rows)

8.3 - Connection load balancing

Each client connection to a host in the Vertica cluster requires a small overhead in memory and processor time.

Each client connection to a host in the Vertica cluster requires a small overhead in memory and processor time. If many clients connect to a single host, this overhead can begin to affect the performance of the database. You can spread the overhead of client connections by dictating that certain clients connect to specific hosts in the cluster. However, this manual balancing becomes difficult as new clients and hosts are added to your environment.

Connection load balancing helps automatically spread the overhead of client connections across the cluster by having hosts redirect client connections to other hosts. By redirecting connections, the overhead from client connections is spread across the cluster without having to manually assign particular hosts to individual clients. Clients can connect to a small handful of hosts, and they are naturally redirected to other hosts in the cluster. Load balancing does not redirect connections to draining hosts. For more information see, Drain client connections.

Native connection load balancing

Native connection load balancing is a feature built into the Vertica Analytic Database server and client libraries as well as vsql. Both the server and the client need to enable load balancing for it to function. If connection load balancing is enabled, a host in the database cluster can redirect a client's attempt to connect to it to another currently-active host in the cluster. This redirection is based on a load balancing policy. This redirection only takes place once, so a client is not bounced from one host to another.

Because native connection load balancing is incorporated into the Vertica client libraries, any client application that connects to Vertica transparently takes advantage of it simply by setting a connection parameter.

How you choose to implement connection load balancing depends on your network environment. Since native load connection balancing is easier to implement, you should use it unless your network configuration requires that clients be separated from the hosts in the Vertica database by a firewall.

For more about native connection load balancing, see About Native Connection Load Balancing.

Workload routing

Workload routing lets you create rules for routing client connections to particular subclusters based on their workloads.

The primary advantages of this type of load balancing is as follows:

  • Database administrators can associate certain subclusters with certain workloads (as opposed to client IP addresses).
  • Clients do not need to know anything about the subcluster they will be routed to, only the type of workload they have.
  • Database administrators can change workload routing policies at any time, and these changes are transparent to all clients.

For details, see Workload routing.

8.3.1 - About native connection load balancing

Native connection load balancing is a feature built into the Vertica server and client libraries that helps spread the CPU and memory overhead caused by client connections across the hosts in the database.

Native connection load balancing is a feature built into the Vertica server and client libraries that helps spread the CPU and memory overhead caused by client connections across the hosts in the database. It can prevent unequal distribution of client connections among hosts in the cluster.

There are two types of native connection load balancing:

  • Cluster-wide balancing—This method the legacy method of connection load balancing. It was the only type of load balancing prior to Vertica version 9.2. Using this method, you apply a single load balancing policy across the entire cluster. All connection to the cluster are handled the same way.

  • Load balancing policies—This method lets you set different load balancing policies depending on the source of client connection. For example, you can have a policy that redirects connections from outside of your local network to one set of nodes in your cluster, and connections from within your local network to another set of nodes.

8.3.2 - Classic connection load balancing

The classic connection load balancing feature applies a single policy for all client connections to your database.

The classic connection load balancing feature applies a single policy for all client connections to your database. Both your database and the client must enable the load balancing option in order for connections to be load balanced. When both client and server enable load balancing, the following process takes place when the client attempts to open a connection to Vertica:

  1. The client connects to a host in the database cluster, with a connection parameter indicating that it is requesting a load-balanced connection.

  2. The host chooses a host from the list of currently up hosts in the cluster, according to the current load balancing scheme. Under all schemes, it is possible for a host to select itself.

  3. The host tells the client which host it selected to handle the client's connection.

  4. If the host chose another host in the database to handle the client connection, the client disconnects from the initial host. Otherwise, the client jumps to step 6.

  5. The client establishes a connection to the host that will handle its connection. The client sets this second connection request so that the second host does not interpret the connection as a request for load balancing.

  6. The client connection proceeds as usual, (negotiating encryption if the connection has SSL enabled, and proceeding to authenticating the user ).

This process is transparent to the client application. The client driver automatically disconnects from the initial host and reconnects to the host selected for load balancing.

Requirements

  • In mixed IPv4 and IPv6 environments, balancing only works for the address family for which you have configured native load balancing. For example, if you have configured load balancing using an IPv4 address, then IPv6 clients cannot use load balancing, however the IPv6 clients can still connect, but load balancing does not occur.

  • The native load balancer returns an IP address for the client to use. This address must be one that the client can reach. If your nodes are on a private network, native load-balancing requires you to publish a public address in one of two ways:

    • Set the public address on each node. Vertica saves that address in the export_address field in the NODES system table.

    • Set the subnet on the database. Vertica saves that address in the export_subnet field in the DATABASES system table.

Load balancing schemes

The load balancing scheme controls how a host selects which host to handle a client connection. There are three available schemes:

  • NONE (default): Disables native connection load balancing.

  • ROUNDROBIN: Chooses the next host from a circular list of hosts in the cluster that are up—for example, in a three-node cluster, iterates over node1, node2, and node3, then wraps back to node1. Each host in the cluster maintains its own pointer to the next host in the circular list, rather than there being a single cluster-wide state.

  • RANDOM: Randomly chooses a host from among all hosts in the cluster that are up.

You set the native connection load balancing scheme using the SET_LOAD_BALANCE_POLICY function. See Enabling and Disabling Native Connection Load Balancing for instructions.

Driver notes

  • Native connection load balancing works with the ADO.NET driver's connection pooling. The connection the client makes to the initial host, and the final connection to the load-balanced host, use pooled connections if they are available.

  • If a client application uses the JDBC and ODBC driver with third-party connection pooling solutions, the initial connection is not pooled because it is not a full client connection. The final connection is pooled because it is a standard client connection.

Connection failover

The client libraries include a failover feature that allow them to connect to backup hosts if the host specified in the connection properties is unreachable. When using native connection load balancing, this failover feature is only used for the initial connection to the database. If the host to which the client was redirected does not respond to the client's connection request, the client does not attempt to connect to a backup host and instead returns a connection error to the user.

Clients are redirected only to hosts that are known to be up. Thus, this sort of connection failure should only occur if the targeted host goes down at the same moment the client is redirected to it. For more information, see ADO.NET connection failover, JDBC connection failover, and Connection failover.

8.3.2.1 - Enabling and disabling classic connection load balancing

Only a database can enable or disable classic cluster-wide connection load balancing.

Only a database superuser can enable or disable classic cluster-wide connection load balancing. To enable or disable load balancing, use the SET_LOAD_BALANCE_POLICY function to set the load balance policy. Setting the load balance policy to anything other than 'NONE' enables load balancing on the server. The following example enables native connection load balancing by setting the load balancing policy to ROUNDROBIN.

=> SELECT SET_LOAD_BALANCE_POLICY('ROUNDROBIN');
                  SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------------
Successfully changed the client initiator load balancing policy to: roundrobin
(1 row)

To disable native connection load balancing, use SET_LOAD_BALANCE_POLICY to set the policy to 'NONE':

=> SELECT SET_LOAD_BALANCE_POLICY('NONE');
SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------
Successfully changed the client initiator load balancing policy to: none
(1 row)

By default, client connections are not load balanced, even when connection load balancing is enabled on the server. Clients must set a connection parameter to indicates they are willing to have their connection request load balanced. See Load balancing in ADO.NET, Load balancing in JDBC, and Load balancing, for information on enabling load balancing on the client. For vsql, use the -C command-line option to enable load balancing.

Resetting the load balancing state

When the load balancing policy is ROUNDROBIN, each host in the Vertica cluster maintains its own state of which host it will select to handle the next client connection. You can reset this state to its initial value (usually, the host with the lowest-node id) using the RESET_LOAD_BALANCE_POLICY function:

=> SELECT RESET_LOAD_BALANCE_POLICY();
RESET_LOAD_BALANCE_POLICY
-------------------------------------------------------------------------
Successfully reset stateful client load balance policies: "roundrobin".
(1 row)

See also

8.3.2.2 - Monitoring legacy connection load balancing

Query the LOAD_BALANCE_POLICY column of the V_CATALOG.DATABASES to determine the state of native connection load balancing on your server:.

Query the LOAD_BALANCE_POLICY column of the V_CATALOG.DATABASES to determine the state of native connection load balancing on your server:

=> SELECT LOAD_BALANCE_POLICY FROM V_CATALOG.DATABASES;
LOAD_BALANCE_POLICY
---------------------
roundrobin
(1 row)

Determining to which node a client has connected

A client can determine the node to which is has connected by querying the NODE_NAME column of the V_MONITOR.CURRENT_SESSION table:

=> SELECT NODE_NAME FROM V_MONITOR.CURRENT_SESSION;
NODE_NAME
------------------
v_vmart_node0002
(1 row)

8.3.3 - Connection load balancing policies

Connection load balancing policies help spread the load of servicing client connections by redirecting connections based on the connection's origin.

Connection load balancing policies help spread the load of servicing client connections by redirecting connections based on the connection's origin. These policies can also help prevent nodes reaching their client connection limits and rejecting new connections by spreading connections among nodes. See Limiting the number and length of client connections for more information about client connection limits.

A load balancing policy consists of:

  • Network addresses that identify particular IP address and port number combinations on a node.

  • One or more connection load balancing groups that consists of network addresses that you want to handle client connections. You define load balancing groups using fault groups, subclusters, or a list of network addresses.

  • One or more routing rules that map a range of client IP addresses to a connection load balancing group.

When a client connects to a node in the database with load balancing enabled, the node evaluates all of the routing rules based on the client's IP address to determine if any match. If more than one rule matches the IP address, the node applies the most specific rule (the one that affects the fewest IP addresses).

If the node finds a matching rule, it uses the rule to determine the pool of potential nodes to handle the client connection. When evaluating potential target nodes, it always ensures that the nodes are currently up. The initially-contacted node then chooses one of the nodes in the group based on the group's distribution scheme. This scheme can be either choosing a node at random, or choosing a node in a rotating "round-robin" order. For example, in a three-node cluster, the round robin order would be node 1, then node 2, then node 3, and then back to node 1 again.

After it processes the rules, if the node determines that another node should handle the client's connection, it tells the client which node it has chosen. The client disconnects from the initial node and connects to the chosen node to continue with the connection process (either negotiating encryption if the connection has TLS/SSL enabled, or authentication).

If the initial node chooses itself based on the routing rules, it tells the client to proceed to the next step of the connection process.

If no routing rule matches the incoming IP address, the node checks to see if classic connection load balancing is enabled by both Vertica and the client. If so, it handles the connection according to the classic load balancing policy. See Classic connection load balancing for more information.

Finally, if the database is running in Eon Mode, the node tries to apply a default interior load balancing rule. See Default Subcluster Interior Load Balancing Policy below.

If no routing rule matches the incoming IP address and classic load balancing and the default subcluster interior load balancing rule did not apply, the node handles the connection itself. It also handles the connection itself if it cannot follow the load balancing rule. For example, if all nodes in the load balancing group targeted by the rule are down, then the initially-contacted node handles the client connection itself. In this case, the node does not attempt to apply any other less-restrictive load balancing rules that would apply to the incoming connection. It only attempts to apply a single load balancing rule.

Use cases

Using load balancing policies you can:

  • Ensure connections originating from inside or outside of your internal network are directed to a valid IP address for the client. For example, suppose your Vertica nodes have two IP addresses: one for the external network and another for the internal network. These networks are mutually exclusive. You cannot reach the private network from the public, and you cannot reach the public network from the private. Your load balancing rules need to provide the client with an IP address they can actually reach.


  • Enable access to multiple nodes of a Vertica cluster that are behind a NAT router. A NAT router is accessible from the outside network via a single IP address. Systems within the NAT router's private network can be accessed on this single IP address using different port numbers. You can create a load balancing policy that redirects a client connection to the NAT's IP address but with a different port number.


  • Designate sets of nodes to service client connections from an IP address range. For example, if your ETL systems have a set range of IP addresses, you could limit their client connections to an arbitrary set of Vertica nodes, a subcluster, or a fault group. This technique lets you isolate the overhead of servicing client connections to a few nodes. It is useful when you are using subclusters in an Eon Mode database to isolate workloads (see Subclusters for more information).

Using connection load balancing policies with IPv4 and IPv6

Connection load balancing policies work with both IPv4 and IPv6. As far as the load balancing policies are concerned, the two address families represent separate networks. If you want your load balancing policy to handle both IPv4 and IPv6 addresses, you must create separate sets of network addresses, load balancing groups, and rules for each protocol. When a client opens a connection to a node in the cluster, the addressing protocol it uses determines which set of rules Vertica consults when deciding whether and how to balance the connection.

Default subcluster interior load balancing policy

Databases running in Eon Mode have a default connection load balancing policy that helps spread the load of handling client connections among the nodes in a subcluster. When a client connects to a node while opting into connection load balancing, the node checks for load balancing policies that apply to the client's IP address. If it does not find any applicable load balancing rule, and classic load balancing is not enabled, the node falls back to the default interior load balancing rule. This rule distributes connections among the nodes in the same subcluster as the initially-contacted node.

As with other connection load balancing policies, the nodes in the subcluster must have a network address defined for them to be eligible to handle the client connection. If no nodes in the subcluster have a network address, the node does not apply the default subcluster interior load balancing rule, and the connection is not load balanced.

This default rule is convenient when you are primarily interested in load balancing connections within each subcluster. You just create network addresses for the nodes in your subcluster. You do not need to create load balancing groups or rules. Clients that opt-in to load balancing are then automatically balanced among the nodes in the subcluster.

Interior load balancing policy with multiple network addresses

If your nodes have multiple network addresses, the default subcluster interior load balancing rule chooses the address that was created first as the target of load balancing rule. For example, suppose you create a network address on a node for the private IP address 192.168.1.10. Then you create another network address for the node for the public IP address 233.252.0.1. The default subcluster interior connection load balancing rule always selects 192.168.1.10 as the target of the rule.

If you want the default interior load balancing rule to choose a different network address as its target, drop the other network addresses on the node and then recreate them. Deleting and recreating other addresses makes the address you want the rule to select the oldest address. For example, suppose you want the rule to use a public address (233.252.0.1) that was created after a private address (192.168.1.10). In this case, you can drop the address for 192.168.1.10 and then recreate it. The rule then defaults to the older public 233.252.0.1 address.

If you intend to create multiple network addresses for the nodes in your subcluster, create the network addresses you want to use with the default subcluster interior load balancing first. For example, suppose you want to use the default interior load balancing subcluster rule to load balance most client connections. However, you also want to create a connection load balancing policy to manage connections coming in from a group of ETL systems. In this case, create the network addresses you want to use for the default interior load balancing rule first, then create the network addresses for the ETL systems.

Load balancing policies vs. classic load balancing

There are several differences between the classic load balancing feature and the load balancing policy feature:

  • In classic connection load balancing, you just enable the load balancing option on both client and server, and load balancing is enabled. There are more steps to implement load balancing policies: you have to create addresses, groups, and rules and then enable load balancing on the client.

  • Classic connection load balancing only supports a single, cluster-wide policy for redirecting connections. With connection load balancing policies, you get to choose which nodes handle client connections based on the connection's origin. This gives you more flexibility to handle complex situations. Examples include routing connections through a NAT-based router or having nodes that are accessible via multiple IP addresses on different networks.

  • In classic connection load balancing, each node in the cluster can only be reached via a single IP address. This address is set in the EXPORT_ADDRESS column of the NODES system table. With connection load balancing policies, you can create a network address for each IP address associated with a node. Then you create rules that redirect to those addresses.

Steps to create a load balancing policy

There are three steps you must follow to create a load balancing policy:

  1. Create one or more network addresses for each node that you want to participate in the connection load balancing policies.

  2. Create one or more load balancing groups to be the target of the routing rules. Load balancing groups can target a collection of specific network addresses. Alternatively, you can create a group from a fault group or subcluster. You can limit the members of the load balance group to a subset of the fault group or subcluster using an IP address filter.

  3. Create one or more routing rules.

While not absolutely necessary, it is always a good idea to idea to test your load balancing policy to ensure it works the way you expect it to.

After following these steps, Vertica will apply the load balancing policies to client connections that opt into connection load balancing. See Load balancing in ADO.NET, Load balancing in JDBC, and Load balancing, for information on enabling load balancing on the client. For vsql, use the -C command-line option to enable load balancing.

These steps are explained in the other topics in this section.

See also

8.3.3.1 - Creating network addresses

Network addresses assign a name to an IP address and port number on a node.

Network addresses assign a name to an IP address and port number on a node. You use these addresses when you define load balancing groups. A node can have multiple network addresses associated with it. For example, suppose a node has one IP address that is only accessible from outside of the local network, and another that is accessible only from inside the network. In this case, you can define one network address using the external IP address, and another using the internal address. You can then create two different load balancing policies, one for external clients, and another for internal clients.

You create a network address using the CREATE NETWORK ADDRESS statement. This statement takes:

  • The name to assign to the network address

  • The name of the node

  • The IP address of the node to associate with the network address

  • The port number the node uses to accept client connections (optional)

The following example demonstrates creating three network addresses, one for each node in a three-node database.

=> SELECT node_name,node_address,node_address_family FROM v_catalog.nodes;
    node_name     | node_address | node_address_family
------------------+--------------+----------------------
 v_vmart_node0001 | 10.20.110.21 | ipv4
 v_vmart_node0002 | 10.20.110.22 | ipv4
 v_vmart_node0003 | 10.20.110.23 | ipv4
(4 rows)


=> CREATE NETWORK ADDRESS node01 ON v_vmart_node0001 WITH '10.20.110.21';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_vmart_node0002 WITH '10.20.110.22';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 on v_vmart_node0003 WITH '10.20.110.23';
CREATE NETWORK ADDRESS

Creating network addresses for IPv6 addresses works the same way:

=> CREATE NETWORK ADDRESS node1_ipv6 ON v_vmart_node0001 WITH '2001:0DB8:7D5F:7433::';
CREATE NETWORK ADDRESS

Vertica does not perform any tests on the IP address you supply in the CREATE NETWORK ADDRESS statement. You must test the IP addresses you supply to this statement to confirm they correspond to the right node.

Vertica does not restrict the address you supply because it is often not aware of all the network addresses through which the node is accessible. For example, your node may be accessible from an external network via an IP address that Vertica is not configured to use. Or, your node can have both an IPv4 and an IPv6 address, only one of which Vertica is aware of.

For example, suppose v_vmart_node0003 from the previous example is not accessible via the IP address 192.168.1.5. You can still create a network address for it using that address:

=> CREATE NETWORK ADDRESS node04 ON v_vmart_node0003 WITH '192.168.1.5';
CREATE NETWORK ADDRESS

If you create a network group and routing rule that targets this address, client connections would either connect to the wrong node, or fail due to being connected to a host that's not part of a Vertica cluster.

Specifying a port number in a network address

By default, the CREATE NETWORK ADDRESS statement assumes the port number for the node's client connection is the default 5433. Sometimes, you may have a node listening for client connections on a different port. You can supply an alternate port number for the network address using the PORT keyword.

For example, suppose your nodes are behind a NAT router. In this case, you can have your nodes listen on different port numbers so the NAT router can route connections to them. When creating network addresses for these nodes, you supply the IP address of the NAT router and the port number the node is listening on. For example:

=> CREATE NETWORK ADDRESS node1_nat ON v_vmart_node0001 WITH '192.168.10.10' PORT 5433;
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node2_nat ON v_vmart_node0002 with '192.168.10.10' PORT 5434;
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node3_nat ON v_vmart_node0003 with '192.168.10.10' PORT 5435;
CREATE NETWORK ADDRESS

8.3.3.2 - Creating connection load balance groups

After you have created network addresses for nodes, you create collections of them so you can target them with routing rules.

After you have created network addresses for nodes, you create collections of them so you can target them with routing rules. These collections of network addresses are called load balancing groups. You have two ways to select the addresses to include in a load balancing group:

  • A list of network addresses

  • The name of one or more fault groups or subclusters, plus an IP address range in CIDR format. The address range selects which network addresses in the fault groups or subclusters Vertica adds to the load balancing group. Only the network addresses that are within the IP address range you supply are added to the load balance group. This filter lets you base your load balance group on a portion of the nodes that make up the fault group or subcluster.

You create a load balancing group using the CREATE LOAD BALANCE GROUP statement. When basing your group on a list of addresses, this statement takes the name for the group and the list of addresses. The following example demonstrates creating addresses for four nodes, and then creating two groups based on those nodes.

=> CREATE NETWORK ADDRESS addr01 ON v_vmart_node0001 WITH '10.20.110.21';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS addr02 ON v_vmart_node0002 WITH '10.20.110.22';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS addr03 on v_vmart_node0003 WITH '10.20.110.23';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS addr04 on v_vmart_node0004 WITH '10.20.110.24';
CREATE NETWORK ADDRESS
=> CREATE LOAD BALANCE GROUP group_1 WITH ADDRESS addr01, addr02;
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP group_2 WITH ADDRESS addr03, addr04;
CREATE LOAD BALANCE GROUP

=> SELECT * FROM LOAD_BALANCE_GROUPS;
    name    |   policy   |     filter      |         type          | object_name
------------+------------+-----------------+-----------------------+-------------
 group_1    | ROUNDROBIN |                 | Network Address Group | addr01
 group_1    | ROUNDROBIN |                 | Network Address Group | addr02
 group_2    | ROUNDROBIN |                 | Network Address Group | addr03
 group_2    | ROUNDROBIN |                 | Network Address Group | addr04
(4 rows)

A network address can be a part of as many load balancing groups as you like. However, each group can only have a single network address per node. You cannot add two network addresses belonging to the same node to the same load balancing group.

Creating load balancing groups from fault groups

To create a load balancing group from one or more fault groups, you supply:

  • The name for the load balancing group

  • The name of one or more fault groups

  • An IP address filter in CIDR format that filters the fault groups to be added to the load balancing group basd on their IP addresses. Vertica excludes any network addresses in the fault group that do not fall within this range. If you want all of the nodes in the fault groups to be added to the load balance group, specify the filter 0.0.0.0/0.

This example creates two load balancing groups from a fault group. The first includes all network addresses in the group by using the CIDR notation for all IP addresses. The second limits the fault group to three of the four nodes in the fault group by using the IP address filter.

=> CREATE FAULT GROUP fault_1;
CREATE FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE  v_vmart_node0001;
ALTER FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE  v_vmart_node0002;
ALTER FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE  v_vmart_node0003;
ALTER FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE  v_vmart_node0004;
ALTER FAULT GROUP
=> SELECT node_name,node_address,node_address_family,export_address
   FROM v_catalog.nodes;
    node_name     | node_address | node_address_family | export_address
------------------+--------------+---------------------+----------------
 v_vmart_node0001 | 10.20.110.21 | ipv4                | 10.20.110.21
 v_vmart_node0002 | 10.20.110.22 | ipv4                | 10.20.110.22
 v_vmart_node0003 | 10.20.110.23 | ipv4                | 10.20.110.23
 v_vmart_node0004 | 10.20.110.24 | ipv4                | 10.20.110.24
(4 rows)

=> CREATE LOAD BALANCE GROUP group_all WITH FAULT GROUP fault_1 FILTER
   '0.0.0.0/0';
CREATE LOAD BALANCE GROUP

=> CREATE LOAD BALANCE GROUP group_some WITH FAULT GROUP fault_1 FILTER
   '10.20.110.21/30';
CREATE LOAD BALANCE GROUP

=> SELECT * FROM LOAD_BALANCE_GROUPS;
      name      |   policy   |     filter      |         type          | object_name
----------------+------------+-----------------+-----------------------+-------------
 group_all      | ROUNDROBIN | 0.0.0.0/0       | Fault Group           | fault_1
 group_some     | ROUNDROBIN | 10.20.110.21/30 | Fault Group           | fault_1
(2 rows)

You can also supply multiple fault groups to the CREATE LOAD BALANCE GROUP statement:

=> CREATE LOAD BALANCE GROUP group_2_faults WITH FAULT GROUP
   fault_2, fault_3 FILTER '0.0.0.0/0';
CREATE LOAD BALANCE GROUP

Creating load balance groups from subclusters

Creating a load balance group from a subcluster is similar to creating a load balance group from a fault group. You just use WITH SUBCLUSTER instead of WITH FAULT GROUP in the CREATE LOAD BALANCE GROUP statement.

=> SELECT node_name,node_address,node_address_family,subcluster_name
   FROM v_catalog.nodes;
      node_name       | node_address | node_address_family |  subcluster_name
----------------------+--------------+---------------------+--------------------
 v_verticadb_node0001 | 10.11.12.10  | ipv4                | load_subcluster
 v_verticadb_node0002 | 10.11.12.20  | ipv4                | load_subcluster
 v_verticadb_node0003 | 10.11.12.30  | ipv4                | load_subcluster
 v_verticadb_node0004 | 10.11.12.40  | ipv4                | analytics_subcluster
 v_verticadb_node0005 | 10.11.12.50  | ipv4                | analytics_subcluster
 v_verticadb_node0006 | 10.11.12.60  | ipv4                | analytics_subcluster
(6 rows)

=> CREATE NETWORK ADDRESS node01 ON v_verticadb_node0001 WITH '10.11.12.10';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_verticadb_node0002 WITH '10.11.12.20';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 ON v_verticadb_node0003 WITH '10.11.12.30';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node04 ON v_verticadb_node0004 WITH '10.11.12.40';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node05 ON v_verticadb_node0005 WITH '10.11.12.50';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node06 ON v_verticadb_node0006 WITH '10.11.12.60';
CREATE NETWORK ADDRESS

=> CREATE LOAD BALANCE GROUP load_subcluster WITH SUBCLUSTER load_subcluster
   FILTER '0.0.0.0/0';
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP analytics_subcluster WITH SUBCLUSTER
   analytics_subcluster FILTER '0.0.0.0/0';
CREATE LOAD BALANCE GROUP

Setting the group's distribution policy

A load balancing group has a policy setting that determines how the initially-contacted node chooses a target from the group. CREATE LOAD BALANCE GROUP supports three policies:

  • ROUNDROBIN (default) rotates among the available members of the load balancing group. The initially-contacted node keeps track of which node it chose last time, and chooses the next one in the cluster.

  • RANDOM chooses an available node from the group randomly.

  • NONE disables load balancing.

The following example demonstrates creating a load balancing group with a RANDOM distribution policy.

=> CREATE LOAD BALANCE GROUP group_random WITH ADDRESS node01, node02,
   node03, node04 POLICY 'RANDOM';
CREATE LOAD BALANCE GROUP

The next step

After creating the load balancing group, you must add a load balancing routing rule that tells Vertica how incoming connections should be redirected to the groups. See Creating load balancing routing rules.

8.3.3.3 - Creating load balancing routing rules

Once you have created one or more connection load balancing groups, you are ready to create load balancing routing rules.

Once you have created one or more connection load balancing groups, you are ready to create load balancing routing rules. These rules tell Vertica how to redirect client connections based on their IP addresses.

You create routing rules using the CREATE ROUTING RULE statement. You pass this statement:

  • The name for the rule

  • The source IP address range (either IPv4 or IPv6) in CIDR format the rule applies to

  • The name of the load balancing group to handle the connection

The following example creates two rules. The first redirects connections coming from the IP address range 192.168.1.0 through 192.168.1.255 to a load balancing group named group_1. The second routes connections from the IP range 10.20.1.0 through 10.20.1.255 to the load balancing group named group_2.

=> CREATE ROUTING RULE internal_clients ROUTE '192.168.1.0/24' TO group_1;
CREATE ROUTING RULE

=> CREATE ROUTING RULE external_clients ROUTE '10.20.1.0/24' TO group_2;
CREATE ROUTING RULE

Creating a catch-all routing rule

Vertica applies routing rules in most specific to least specific order. This behavior lets you create a "catch-all" rule that handles all incoming connections. Then you can create rules to handle smaller IP address ranges for specific purposes. For example, suppose you wanted to create a catch-all rule that worked with the rules created in the previous example. Then you can create a new rule that routes 0.0.0.0/0 (the CIDR notation for all IP addresses) to a group that should handle connections that aren't handled by either of the previously-created rules. For example:

=> CREATE LOAD BALANCE GROUP group_all WITH ADDRESS node01, node02, node03, node04;
CREATE LOAD BALANCE GROUP

=> CREATE ROUTING RULE catch_all ROUTE '0.0.0.0/0' TO group_all;
CREATE ROUTING RULE

After running the above statements, any connection that does not originate from the IP address ranges 192.168.1.* or 10.20.1.* are routed to the group_all group.

8.3.3.4 - Testing connection load balancing policies

After creating your routing rules, you should test them to verify that they perform the way you expect.

After creating your routing rules, you should test them to verify that they perform the way you expect. The best way to test your rules is to call the DESCRIBE_LOAD_BALANCE_DECISION function with an IP address. This function evaluates the routing rules and reports back how Vertica would route a client connection from the IP address. It uses the same logic that Vertica uses when handling client connections, so the results reflect the actual connection load balancing result you will see from client connections. It also reflects the current state of the your Vertica cluster, so it will not redirect connections to down nodes.

The following example demonstrates testing a set of rules. One rule handles all connections from the range 192.168.1.0 to 192.168.1.255, while the other handles all connections originating from the 192 subnet. The third call demonstrates what happens when no rules apply to the IP address you supply.

=> SELECT describe_load_balance_decision('192.168.1.25');
                        describe_load_balance_decision
--------------------------------------------------------------------------------
 Describing load balance decision for address [192.168.1.25]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address matches this rule
Matched to load balance group [group_1] the group has policy [ROUNDROBIN]
number of addresses [2]
(0) LB Address: [10.20.100.247]:5433
(1) LB Address: [10.20.100.248]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [10.20.100.248] port [5433]

(1 row)

=> SELECT describe_load_balance_decision('192.168.2.25');
                        describe_load_balance_decision
--------------------------------------------------------------------------------
 Describing load balance decision for address [192.168.2.25]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address does not match source ip filter for this rule.
Considered rule [subnet_192] source ip filter [192.0.0.0/8]... input address
matches this rule
Matched to load balance group [group_all] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [10.20.100.247]:5433
(1) LB Address: [10.20.100.248]:5433
(2) LB Address: [10.20.100.249]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [10.20.100.248] port [5433]

(1 row)

=> SELECT describe_load_balance_decision('1.2.3.4');
                         describe_load_balance_decision
--------------------------------------------------------------------------------
 Describing load balance decision for address [1.2.3.4]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address does not match source ip filter for this rule.
Considered rule [subnet_192] source ip filter [192.0.0.0/8]... input address
does not match source ip filter for this rule.
Routing table decision: No matching routing rules: input address does not match
any routing rule source filters. Details: [Tried some rules but no matching]
No rules matched. Falling back to classic load balancing.
Classic load balance decision: Classic load balancing considered, but either
the policy was NONE or no target was available. Details: [NONE or invalid]

(1 row)

The DESCRIBE_LOAD_BALANCE_DECISION function also takes into account the classic cluster-wide load balancing settings:

=>  SELECT SET_LOAD_BALANCE_POLICY('ROUNDROBIN');
                            SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------------
 Successfully changed the client initiator load balancing policy to: roundrobin
(1 row)

=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('1.2.3.4');
                            describe_load_balance_decision
--------------------------------------------------------------------------------
 Describing load balance decision for address [1.2.3.4]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address does not match source ip filter for this rule.
Considered rule [subnet_192] source ip filter [192.0.0.0/8]... input address
does not match source ip filter for this rule.
Routing table decision: No matching routing rules: input address does not
match any routing rule source filters. Details: [Tried some rules but no matching]
No rules matched. Falling back to classic load balancing.
Classic load balance decision: Success. Load balance redirect to: [10.20.100.247]
port [5433]

(1 row)

The function can also help you debug connection issues you notice after going live with your load balancing policy. For example, if you notice that one node is handling a large number of client connections, you can test the client IP addresses against your policies to see why the connections are not being balanced.

8.3.3.5 - Load balancing policy examples

The following examples demonstrate some common use cases for connection load balancing policies.

The following examples demonstrate some common use cases for connection load balancing policies.

Enabling client connections from multiple networks

Suppose you have a Vertica cluster that is accessible from two (or more) different networks. Some examples of this situation are:

  • You have an internal and an external network. In this configuration, your database nodes usually have two or more IP addresses, which each address only accessible from one of the networks. This configuration is common when running Vertica in a cloud environment. In many cases, you can create a catch-all rule that applies to all IP addresses, and then add additional routing rules for the internal subnets.

  • You want clients to be load balanced whether they use IPv4 or IPv6 protocols. From the database's perspective, IPv4 and IPv6 connections are separate networks because each node has a separate IPv4 and IPv6 IP address.

When creating a load balancing policy for a database that is accessible from multiple networks, client connections must be directed to IP addresses on the network they can access. The best solution is to create load balancing groups for each set of IP addresses assigned to a node. Then create routing rules that redirect client connections to the IP addresses that are accessible from their network.

The following example:

  1. Creates two sets of network addresses: one for the internal network and another for the external network.

  2. Creates two load balance groups: one for the internal network and one for the external.

  3. Creates three routing rules: one for the internal network, and two for the external. The internal routing rule covers a subset of the network covered by one of the external rules.

  4. Tests the routing rules using internal and external IP addresses.

=> CREATE NETWORK ADDRESS node01_int ON v_vmart_node0001 WITH '192.168.0.1';
CREATE NETWORK ADDRESS

=> CREATE NETWORK ADDRESS node01_ext ON v_vmart_node0001 WITH '203.0.113.1';
CREATE NETWORK ADDRESS

=> CREATE NETWORK ADDRESS node02_int ON v_vmart_node0002 WITH '192.168.0.2';
CREATE NETWORK ADDRESS

=> CREATE NETWORK ADDRESS node02_ext ON v_vmart_node0002 WITH '203.0.113.2';
CREATE NETWORK ADDRESS

=> CREATE NETWORK ADDRESS node03_int ON v_vmart_node0003 WITH '192.168.0.3';
CREATE NETWORK ADDRESS

=> CREATE NETWORK ADDRESS node03_ext ON v_vmart_node0003 WITH '203.0.113.3';
CREATE NETWORK ADDRESS

=> CREATE LOAD BALANCE GROUP internal_group WITH ADDRESS node01_int, node02_int, node03_int;
CREATE LOAD BALANCE GROUP

=> CREATE LOAD BALANCE GROUP external_group WITH ADDRESS node01_ext, node02_ext, node03_ext;
CREATE LOAD BALANCE GROUP

=> CREATE ROUTING RULE internal_rule ROUTE '192.168.0.0/24' TO internal_group;
CREATE ROUTING RULE

=> CREATE ROUTING RULE external_rule ROUTE '0.0.0.0/0' TO external_group;
CREATE ROUTING RULE

=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('198.51.100.10');
                         DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
 Describing load balance decision for address [198.51.100.10]
Load balance cache internal version id (node-local): [3]
Considered rule [internal_rule] source ip filter [192.168.0.0/24]... input
address does not match source ip filter for this rule.
Considered rule [external_rule] source ip filter [0.0.0.0/0]... input
address matches this rule
Matched to load balance group [external_group] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [203.0.113.1]:5433
(1) LB Address: [203.0.113.2]:5433
(2) LB Address: [203.0.113.3]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [203.0.113.3] port [5433]

(1 row)

=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('198.51.100.10');

                         DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
 Describing load balance decision for address [192.168.0.79]
Load balance cache internal version id (node-local): [3]
Considered rule [internal_rule] source ip filter [192.168.0.0/24]... input
address matches this rule
Matched to load balance group [internal_group] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [192.168.0.1]:5433
(1) LB Address: [192.168.0.3]:5433
(2) LB Address: [192.168.0.2]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [192.168.0.2] port
[5433]

(1 row)

Isolating workloads

You may want to control which nodes in your cluster are used by specific types of clients. For example, you may want to limit clients that perform data-loading tasks to one set of nodes, and reserve the rest of the nodes for running queries. This separation of workloads is especially common for Eon Mode databases. See Controlling Where a Query Runs for an example of using load balancing policies in an Eon Mode database to control which subcluster a client connects to.

You can create client load balancing policies that support workload isolation if clients performing certain types of tasks always originate from a limited IP address range. For example, if the clients that load data into your system always fall into a specific subnet, you can create a policy that limits which nodes those clients can access.

In the following example:

  • There are two fault groups (group_a and group_b) that separate workloads in an Eon Mode database. These groups are used as the basis of the load balancing groups.

  • The ETL client connections all originate from the 203.0.113.0/24 subnet.

  • User connections originate in the range of 192.0.0.0 to 199.255.255.255.

=> CREATE NETWORK ADDRESS node01 ON v_vmart_node0001 WITH '192.0.2.1';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_vmart_node0002 WITH '192.0.2.2';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 ON v_vmart_node0003 WITH '192.0.2.3';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node04 ON v_vmart_node0004 WITH '192.0.2.4';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node05 ON v_vmart_node0005 WITH '192.0.2.5';
CREATE NETWORK ADDRESS
                                                     ^
=> CREATE LOAD BALANCE GROUP lb_users WITH FAULT GROUP group_a FILTER '192.0.2.0/24';
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP lb_etl WITH FAULT GROUP group_b FILTER '192.0.2.0/24';
CREATE LOAD BALANCE GROUP
=> CREATE ROUTING RULE users_rule ROUTE '192.0.0.0/5' TO lb_users;
CREATE ROUTING RULE
=> CREATE ROUTING RULE etl_rule ROUTE '203.0.113.0/24' TO lb_etl;
CREATE ROUTING RULE

=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('198.51.200.129');
                          DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
 Describing load balance decision for address [198.51.200.129]
Load balance cache internal version id (node-local): [6]
Considered rule [etl_rule] source ip filter [203.0.113.0/24]... input address
does not match source ip filter for this rule.
Considered rule [users_rule] source ip filter [192.0.0.0/5]... input address
matches this rule
Matched to load balance group [lb_users] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [192.0.2.1]:5433
(1) LB Address: [192.0.2.2]:5433
(2) LB Address: [192.0.2.3]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [192.0.2.2] port
[5433]

(1 row)

=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('203.0.113.24');
                             DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
 Describing load balance decision for address [203.0.113.24]
Load balance cache internal version id (node-local): [6]
Considered rule [etl_rule] source ip filter [203.0.113.0/24]... input address
matches this rule
Matched to load balance group [lb_etl] the group has policy [ROUNDROBIN] number
of addresses [2]
(0) LB Address: [192.0.2.4]:5433
(1) LB Address: [192.0.2.5]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [192.0.2.5] port
[5433]

(1 row)

=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('10.20.100.25');
                           DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
 Describing load balance decision for address [10.20.100.25]
Load balance cache internal version id (node-local): [6]
Considered rule [etl_rule] source ip filter [203.0.113.0/24]... input address
does not match source ip filter for this rule.
Considered rule [users_rule] source ip filter [192.0.0.0/5]... input address
does not match source ip filter for this rule.
Routing table decision: No matching routing rules: input address does not match
any routing rule source filters. Details: [Tried some rules but no matching]
No rules matched. Falling back to classic load balancing.
Classic load balance decision: Classic load balancing considered, but either the
policy was NONE or no target was available. Details: [NONE or invalid]

(1 row)

Enabling the default subcluster interior load balancing policy

Vertica attempts to apply the default subcluster interior load balancing policy if no other load balancing policy applies to an incoming connection and classic load balancing is not enabled. See Default Subcluster Interior Load Balancing Policy for a description of this rule.

To enable default subcluster interior load balancing, you must create network addresses for the nodes in a subcluster. Once you create the addresses, Vertica attempts to apply this rule to load balance connections within a subcluster when no other rules apply.

The following example confirms the database has no load balancing groups or rules. Then it adds publicly-accessible network addresses to the nodes in the primary subcluster. When these addresses are added, Vertica applies the default subcluster interior load balancing policy.

=> SELECT * FROM LOAD_BALANCE_GROUPS;
 name | policy | filter | type | object_name
------+--------+--------+------+-------------
(0 rows)

=> SELECT * FROM ROUTING_RULES;
 name | source_address | destination_name
------+----------------+------------------
(0 rows)

=> CREATE NETWORK ADDRESS node01_ext ON v_verticadb_node0001 WITH '203.0.113.1';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02_ext ON v_verticadb_node0002 WITH '203.0.113.2';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03_ext ON v_verticadb_node0003 WITH '203.0.113.3';
CREATE NETWORK ADDRESS

=> SELECT describe_load_balance_decision('11.0.0.100');
                                describe_load_balance_decision
-----------------------------------------------------------------------------------------------
Describing load balance decision for address [11.0.0.100] on subcluster [default_subcluster]
Load balance cache internal version id (node-local): [2]
Considered rule [auto_rr_default_subcluster] subcluster interior filter  [default_subcluster]...
current subcluster matches this rule
Matched to load balance group [auto_lbg_sc_default_subcluster] the group has policy
[ROUNDROBIN] number of addresses [3]
(0) LB Address: [203.0.113.1]:5433
(1) LB Address: [203.0.113.2]:5433
(2) LB Address: [203.0.113.3]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [203.0.113.2] port [5433]

(1 row)

Load balance both IPv4 and IPv6 connections

Connection load balancing policies regard IPv4 and IPv6 connections as separate networks. To load balance both types of incoming client connections, create two sets of network addresses, at least two load balancing groups, and two load balancing , once for each network address family.

This example creates two load balancing policies for the default subcluster: one for the IPv4 network addresses (192.168.111.31 to 192.168.111.33) and one for the IPv6 network addresses (fd9b:1fcc:1dc4:78d3::31 to fd9b:1fcc:1dc4:78d3::33).

=> SELECT node_name,node_address,subcluster_name FROM NODES;
      node_name       |  node_address  |  subcluster_name
----------------------+----------------+--------------------
 v_verticadb_node0001 | 192.168.111.31 | default_subcluster
 v_verticadb_node0002 | 192.168.111.32 | default_subcluster
 v_verticadb_node0003 | 192.168.111.33 | default_subcluster

=> CREATE NETWORK ADDRESS node01 ON v_verticadb_node0001 WITH
   '192.168.111.31';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node01_ipv6 ON v_verticadb_node0001 WITH
   'fd9b:1fcc:1dc4:78d3::31';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_verticadb_node0002 WITH
   '192.168.111.32';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02_ipv6 ON v_verticadb_node0002 WITH
   'fd9b:1fcc:1dc4:78d3::32';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 ON v_verticadb_node0003 WITH
   '192.168.111.33';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03_ipv6 ON v_verticadb_node0003 WITH
   'fd9b:1fcc:1dc4:78d3::33';
CREATE NETWORK ADDRESS

=> CREATE LOAD BALANCE GROUP group_ipv4 WITH SUBCLUSTER default_subcluster
   FILTER '192.168.111.0/24';
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP group_ipv6 WITH SUBCLUSTER default_subcluster
   FILTER 'fd9b:1fcc:1dc4:78d3::0/64';
CREATE LOAD BALANCE GROUP

=> CREATE ROUTING RULE all_ipv4 route '0.0.0.0/0' TO group_ipv4;
CREATE ROUTING RULE
=> CREATE ROUTING RULE all_ipv6 route '0::0/0' TO group_ipv6;
CREATE ROUTING RULE

=> SELECT describe_load_balance_decision('203.0.113.50');
                                                                                                                                                                                                                                                                                   describe_load_balance_decision
-----------------------------------------------------------------------------------------------
Describing load balance decision for address [203.0.113.50] on subcluster [default_subcluster]
Load balance cache internal version id (node-local): [3]
Considered rule [all_ipv4] source ip filter [0.0.0.0/0]... input address matches this rule
Matched to load balance group [ group_ipv4] the group has policy [ROUNDROBIN] number of addresses [3]
(0) LB Address: [192.168.111.31]:5433
(1) LB Address: [192.168.111.32]:5433
(2) LB Address: [192.168.111.33]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [192.168.111.33] port [5433]

(1 row)

=> SELECT describe_load_balance_decision('2001:0DB8:EA04:8F2C::1');
                                                                                                                                                                                                                                                                                                                                                                    describe_load_balance_decision
---------------------------------------------------------------------------------------------------------
Describing load balance decision for address [2001:0DB8:EA04:8F2C::1] on subcluster [default_subcluster]
Load balance cache internal version id (node-local): [3]
Considered rule [all_ipv4] source ip filter [0.0.0.0/0]... input address does not match source ip filter for this rule.
Considered rule [all_ipv6] source ip filter [0::0/0]... input address matches this rule
Matched to load balance group [ group_ipv6] the group has policy [ROUNDROBIN] number of addresses [3]
(0) LB Address: [fd9b:1fcc:1dc4:78d3::31]:5433
(1) LB Address: [fd9b:1fcc:1dc4:78d3::32]:5433
(2) LB Address: [fd9b:1fcc:1dc4:78d3::33]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [fd9b:1fcc:1dc4:78d3::33] port [5433]

(1 row)

Other examples

For other examples of using connection load balancing, see the following topics:

8.3.3.6 - Viewing load balancing policy configurations

Query the following system tables in the V_CATALOG Schema to see the load balance policies defined in your database:.

Query the following system tables in the V_CATALOG schema to see the load balance policies defined in your database:

  • NETWORK_ADDRESSES lists all of the network addresses defined in your database.

  • LOAD_BALANCE_GROUPS lists the contents of your load balance groups.

  • ROUTING_RULES lists all of the routing rules defined in your database.

This example demonstrates querying each of the load balancing policy system tables.

=> \x
Expanded display is on.
=> SELECT * FROM V_CATALOG.NETWORK_ADDRESSES;
-[ RECORD 1 ]----+-----------------
name             | node01
node             | v_vmart_node0001
address          | 10.20.100.247
port             | 5433
address_family   | ipv4
is_enabled       | t
is_auto_detected | f
-[ RECORD 2 ]----+-----------------
name             | node02
node             | v_vmart_node0002
address          | 10.20.100.248
port             | 5433
address_family   | ipv4
is_enabled       | t
is_auto_detected | f
-[ RECORD 3 ]----+-----------------
name             | node03
node             | v_vmart_node0003
address          | 10.20.100.249
port             | 5433
address_family   | ipv4
is_enabled       | t
is_auto_detected | f
-[ RECORD 4 ]----+-----------------
name             | alt_node1
node             | v_vmart_node0001
address          | 192.168.1.200
port             | 8080
address_family   | ipv4
is_enabled       | t
is_auto_detected | f
-[ RECORD 5 ]----+-----------------
name             | test_addr
node             | v_vmart_node0001
address          | 192.168.1.100
port             | 4000
address_family   | ipv4
is_enabled       | t
is_auto_detected | f

=> SELECT * FROM LOAD_BALANCE_GROUPS;
-[ RECORD 1 ]----------------------
name        | group_all
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node01
-[ RECORD 2 ]----------------------
name        | group_all
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node02
-[ RECORD 3 ]----------------------
name        | group_all
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node03
-[ RECORD 4 ]----------------------
name        | group_1
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node01
-[ RECORD 5 ]----------------------
name        | group_1
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node02
-[ RECORD 6 ]----------------------
name        | group_2
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node01
-[ RECORD 7 ]----------------------
name        | group_2
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node02
-[ RECORD 8 ]----------------------
name        | group_2
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node03
-[ RECORD 9 ]----------------------
name        | etl_group
policy      | ROUNDROBIN
filter      |
type        | Network Address Group
object_name | node01

=> SELECT * FROM ROUTING_RULES;
-[ RECORD 1 ]----+-----------------
name             | internal_clients
source_address   | 192.168.1.0/24
destination_name | group_1
-[ RECORD 2 ]----+-----------------
name             | etl_rule
source_address   | 10.20.100.0/24
destination_name | etl_group
-[ RECORD 3 ]----+-----------------
name             | subnet_192
source_address   | 192.0.0.0/8
destination_name | group_all

8.3.3.7 - Maintaining load balancing policies

Once you have created load balancing policies, you maintain them using the following statements:.

Once you have created load balancing policies, you maintain them using the following statements:

  • ALTER NETWORK ADDRESS letsyou: rename, change the IP address, and enable or disable a network address.

  • ALTER LOAD BALANCE GROUP letsyou rename, add or remove network addresses or fault groups, change the fault group IP address filter, or change the policy of a load balance group.

  • ALTER ROUTING RULE letsyou rename, change the source IP address, and the target load balance group of a rule.

See the refence pages for these statements for examples.

Deleting load balancing policy objects

You can also delete existing load balance policy objects using the following statements:

8.3.4 - Workload routing

Workload routing routes client connections to subclusters based on their workloads.

Workload routing routes client connections to subclusters. This lets you reserve subclusters for certain types of tasks.

When a client connects to Vertica, they connect to a Connection node, which then routes the client to the correct subcluster based on the client's specified workload, user, or role and the database's routing rules. If multiple subclusters are associated with the same workload, the client is randomly routed to one of those subclusters.

In this context, "routing" refers to the connection node acting as a proxy for the client and the Execution node in the target subcluster. All queries and query results are first sent to the connection node and then passed on to the execution node and client, respectively.

The primary advantages of this type of load balancing are as follows:

  • Database administrators can associate certain subclusters with certain workloads and roles (as opposed to client IP addresses).
  • Clients do not need to know anything about the subcluster they will be routed to, only the type of workload they have or the role they should use.

Workload routing depends on actions from both the database administrator and the client:

  • The database administrator must create rules for handling various workloads.
  • The client must either specify the type of workload they have or have an enabled role (either from a manual SET ROLE or default role) associated with a routing rule.

View the current workload

To view the workload associated with the current session, use SHOW WORKLOAD:

=> SHOW WORKLOAD;
   name   |  setting
----------+------------
 workload | my_worload
(1 row)

Create workload routing rules

Routing rules apply to a client's specified workload and either user or role. If you specify more than one subcluster in a routing rule, the client is randomly routed to one of those subclusters.

If multiple routing rules could apply to a client's session, the rule with the highest priority is used. For details, see Priorities.

To view existing routing rules, see WORKLOAD_ROUTING_RULES.

To view workloads available to you and your enabled roles, use SHOW AVAILABLE WORKLOADS:

=> SHOW AVAILABLE WORKLOADS;
                name | setting
---------------------+------------------------
 available workloads | reporting, analytics
(1 row) 

Workload-based routing

Workload-based routing rules apply to clients that specify a particular workload, routing them to one of the subclusters listed in the rule. In this example, when a client connects to the database and specifies the analytics workload, their connection is randomly routed to either sc_analytics or sc_analytics_2:

=> CREATE ROUTING RULE ROUTE WORKLOAD analytics TO SUBCLUSTER sc_analytics, sc_analytics_2;

To alter a routing rule, use ALTER ROUTING RULE. For example, to route analytics workloads to sc_analytics:

=> ALTER ROUTING RULE FOR WORKLOAD analytics SET SUBCLUSTER TO sc_analytics;

To add or remove a subcluster, use ALTER ROUTING RULE. For example:

  1. To add a subcluster sc_01:
=> ALTER ROUTING RULE FOR WORKLOAD analytics ADD SUBCLUSTER sc_01;
  1. To remove a subcluster sc_01:
=> ALTER ROUTING RULE FOR WORKLOAD analytics REMOVE SUBCLUSTER sc_01;

To drop a routing rule, use DROP ROUTING RULE and specify the workload. For example, to drop the routing rule for the analytics workload:

=> DROP ROUTING RULE FOR WORKLOAD analytics;

User- and role-based routing

You can grant USAGE privileges to a user or role to let them route their queries to one of the subclusters listed in the routing rule. In this example, when a client connects to the database and enables the analytics_role role, their connection is randomly routed to either sc_analytics or sc_analtyics_2:

=> CREATE ROUTING RULE ROUTE WORKLOAD analytics TO SUBCLUSTER sc_analytics, sc_analytics_2;
=> GRANT USAGE ON ROUTING RULE analytics TO analytics_role;

Users can then enable the role and set their workload to analytics for the session:

=> SET ROLE analytics_role;
=> SET SESSION WORKLOAD analytics;

Users can also enable the role automatically by setting it as a default role and then specify the workload when they connect. For details on default roles, see Enabling roles automatically.

Similarly, in this example, when a client connects to the database as user analytics_user, they are randomly routed to either sc_analytics or sc_analtyics_2:

=> CREATE ROUTING RULE ROUTE WORKLOAD analytics TO SUBCLUSTER sc_analytics, sc_analytics_2;
=> GRANT USAGE ON ROUTING RULE