This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Administrator's guide
Welcome to the Vertica Administator's Guide.
Welcome to the Vertica Administator's Guide. This document describes how to set up and maintain a Vertica Analytics Platform database.
Prerequisites
This document makes the following assumptions:
1 - Administration overview
This document describes the functions performed by a Vertica database administrator (DBA).
This document describes the functions performed by a Vertica database administrator (DBA). Perform these tasks using only the dedicated database administrator account that was created when you installed Vertica. The examples in this documentation set assume that the administrative account name is dbadmin.
-
To perform certain cluster configuration and administration tasks, the DBA (users of the administrative account) must be able to supply the root password for those hosts. If this requirement conflicts with your organization's security policies, these functions must be performed by your IT staff.
-
If you perform administrative functions using a different account from the account provided during installation, Vertica encounters file ownership problems.
-
If you share the administrative account password, make sure that only one user runs the Administration tools at any time. Otherwise, automatic configuration propagation does not work correctly.
-
The Administration Tools require that the calling user's shell be /bin/bash
. Other shells give unexpected results and are not supported.
2 - Managing licenses
You must license Vertica in order to use it.
You must license Vertica in order to use it. Vertica supplies your license in the form of one or more license files, which encode the terms of your license.
To prevent introducing special characters that invalidate the license, do not open the license files in an editor. Opening the file in this way can introduce special characters, such as line endings and file terminators, that may not be visible within the editor. Whether visible or not, these characters invalidate the license.
Applying license files
Be careful not to change the license key file in any way when copying the file between Windows and Linux, or to any other location. To help prevent applications from trying to alter the file, enclose the license file in an archive file (such as a .zip or .tar file). You should keep a back up of your license key file. OpenText recommends that you keep the backup in /opt/vertica.
After copying the license file from one location to another, check that the copied file size is identical to that of the one you received from Vertica.
2.1 - Obtaining a license key file
Follow these steps to obtain a license key file:.
Follow these steps to obtain a license key file:
-
Log in to the Software Entitlement Key site using your passport login information. If you do not have a passport login, create one.
-
On the Request Access page, enter your order number and select a role.
-
Enter your request access reasoning.
-
Click Submit.
-
After your request is approved, you will receive a confirmation email. On the site, click the Entitlements tab to see your Vertica software.
-
Under the Action tab, click Activate. You may select more than one product.
-
The License Activation page opens. Enter your Target Name.
-
Select you Vertica version and the quantity you want to activate.
-
Click Next.
-
Confirm your activation details and click Submit.
-
The Activation Results page displays. Follow the instructions in New Vertica license installations or Vertica license changes to complete your installation or upgrade.
Your Vertica Community Edition download package includes the Community Edition license, which allows three nodes and 1TB of data. The Vertica Community Edition license does not expire.
2.2 - Understanding Vertica licenses
Vertica has flexible licensing terms.
Vertica has flexible licensing terms. It can be licensed on the following bases:
-
Term-based (valid until a specific date).
-
Size-based (valid to store up to a specified amount of raw data).
-
Both term- and size-based.
-
Unlimited duration and data storage.
-
Node-based with an unlimited number of CPUs and users (one node is a server acting as a single computer system, whether physical or virtual).
-
A pay-as-you-go model where you pay for only the number of hours you use. This license is available on your cloud provider's marketplace.
Your license key has your licensing bases encoded into it. If you are unsure of your current license, you can view your license information from within Vertica.
Note
Vertica does not support license downgrades.
Vertica Community Edition (CE) is free and allows customers to cerate databases with the following limits:
-
up to 3 of nodes
-
up to 1 terabyte of data
Community Edition licenses cannot be installed co-located in a Hadoop infrastructure and used to query data stored in Hadoop formats.
As part of the CE license, you agree to the collection of some anonymous, non-identifying usage data. This data lets Vertica understand how customers use the product, and helps guide the development of new features. None of your personal data is collected. For details on what is collected, see the Community Edition End User License Agreement.
Vertica for SQL on Apache Hadoop license
Vertica for SQL on Apache Hadoop is a separate product with its own license. This documentation covers both products. Consult your license agreement for details about available features and limitations.
2.3 - Installing or upgrading a license key
The steps you follow to apply your Vertica license key vary, depending on the type of license you are applying and whether you are upgrading your license.
The steps you follow to apply your Vertica license key vary, depending on the type of license you are applying and whether you are upgrading your license.
2.3.1 - New Vertica license installations
Follow these steps to install a new Vertica license:.
Follow these steps to install a new Vertica license:
-
Copy the license key file you generated from the Software Entitlement Key site to your Administration host.
-
Ensure the license key's file permissions are set to 400 (read permissions).
-
Install Vertica as described in the Installing Vertica if you have not already done so. The interface prompts you for the license key file.
-
To install Community Edition, leave the default path blank and click OK. To apply your evaluation or Premium Edition license, enter the absolute path of the license key file you downloaded to your Administration Host and press OK. The first time you log in as the Database Superuser and run the Administration tools, the interface prompts you to accept the End-User License Agreement (EULA).
Note
If you installed
Management Console, the MC administrator can point to the location of the license key during Management Console configuration.
-
Choose View EULA.
-
Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing the license, or choose Reject EULA to reject the EULA and return to the Advanced Menu.
2.3.2 - Vertica license changes
If your license is expiring or you want your database to grow beyond your licensed data size, you must renew or upgrade your license.
If your license is expiring or you want your database to grow beyond your licensed data size, you must renew or upgrade your license. After you obtain your renewal or upgraded license key file, you can install it using Administration Tools or Management Console.
Upgrading does not require a new license unless you are increasing the capacity of your database. You can add-on capacity to your database using the Software Entitlement Key. You do not need uninstall and reinstall the license to add-on capacity.
-
Copy the license key file you generated from the Software Entitlement Key site to your Administration host.
-
Ensure the license key's file permissions are set to 400 (read permissions).
-
Start your database, if it is not already running.
-
In the Administration Tools, select Advanced > Upgrade License Key and click OK.
-
Enter the absolute path to your new license key file and click OK. The interface prompts you to accept the End-User License Agreement (EULA).
-
Choose View EULA.
-
Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing the license, or choose Reject EULA to reject the EULA and return to the Advanced Menu.
Uploading or upgrading a license key using Management Console
-
From your database's Overview page in Management Console, click the License tab. The License page displays. You can view your installed licenses on this page.
-
Click Install New License at the top of the License page.
-
Browse to the location of the license key from your local computer and upload the file.
-
Click Apply at the top of the page. Management Console prompts you to accept the End-User License Agreement (EULA).
-
Select the check box to officially accept the EULA and continue installing the license, or click Cancel to exit.
Note
As soon as you renew or upgrade your license key from either your
Administration host or Management Console, Vertica applies the license update. No further warnings appear.
Adding capacity
If you are adding capacity to your database, you do not need to uninstall and reinstall the license. Instead, you can install multiple licenses to increase the size of your database. This additive capacity only works for licenses with the same format, such as adding a Premium license capacity to an existing Premium license type. When you add capacity, the size of license will be the total of both licenses; the previous license is not overwritten. You cannot add capacity using two different license formats, such as adding Hadoop license capacity to an existing Premium license.
You can run the AUDIT()
function to verify the license capacity was added on. The reflection of add-on capacity to your license will run during the automatic run of the audit function. If you want to see the immediate result of the add-on capacity, run the AUDIT()
function to refresh.
Note
If you have an expired license, you must drop the expired license before you can continue to use Vertica. For more information, see
DROP_LICENSE.
2.4 - Viewing your license status
You can use several functions to display your license terms and current status.
You can use several functions to display your license terms and current status.
Examining your license key
Use the DISPLAY_LICENSE SQL function to display the license information. This function displays the dates for which your license is valid (or Perpetual
if your license does not expire) and any raw data allowance. For example:
=> SELECT DISPLAY_LICENSE();
DISPLAY_LICENSE
---------------------------------------------------
Vertica Systems, Inc.
2007-08-03
Perpetual
500GB
(1 row)
You can also query the LICENSES system table to view information about your installed licenses. This table displays your license types, the dates for which your licenses are valid, and the size and node limits your licenses impose.
Alternatively, use the LICENSES table in Management Console. On your database Overview page, click the License tab to view information about your installed licenses.
Viewing your license compliance
If your license includes a raw data size allowance, Vertica periodically audits your database's size to ensure it remains compliant with the license agreement. If your license has a term limit, Vertica also periodically checks to see if the license has expired. You can see the result of the latest audits using the GET_COMPLIANCE_STATUS function.
=> select GET_COMPLIANCE_STATUS();
GET_COMPLIANCE_STATUS
---------------------------------------------------------------------------------
Raw Data Size: 2.00GB +/- 0.003GB
License Size : 4.000GB
Utilization : 50%
Audit Time : 2011-03-09 09:54:09.538704+00
Compliance Status : The database is in compliance with respect to raw data size.
License End Date: 04/06/2011
Days Remaining: 28.59
(1 row)
To see how your ORC/Parquet data is affecting your license compliance, see Viewing license compliance for Hadoop file formats.
Viewing your license status through MC
Information about license usage is on the Settings page. See Monitoring database size for license compliance.
2.5 - Viewing license compliance for Hadoop file formats
You can use the EXTERNAL_TABLE_DETAILS system table to gather information about all of your tables based on Hadoop file formats.
You can use the EXTERNAL_TABLE_DETAILS system table to gather information about all of your tables based on Hadoop file formats. This information can help you understand how much of your license's data allowance is used by ORC and Parquet-based data.
Vertica computes the values in this table at query time, so to avoid performance problems, restrict your queries to filter by table_schema, table_name, or source_format. These three columns are the only columns you can use in a predicate, but you may use all of the usual predicate operators.
=> SELECT * FROM EXTERNAL_TABLE_DETAILS
WHERE source_format = 'PARQUET' OR source_format = 'ORC';
-[ RECORD 1 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schema_oid | 45035996273704978
table_schema | public
table_oid | 45035996273760390
table_name | ORC_demo
source_format | ORC
total_file_count | 5
total_file_size_bytes | 789
source_statement | COPY FROM 'ORC_demo/*' ORC
file_access_error |
-[ RECORD 2 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schema_oid | 45035196277204374
table_schema | public
table_oid | 45035996274460352
table_name | Parquet_demo
source_format | PARQUET
total_file_count | 3
total_file_size_bytes | 498
source_statement | COPY FROM 'Parquet_demo/*' PARQUET
file_access_error |
When computing the size of an external table, Vertica counts all data found in the location specified by the COPY FROM clause. If you have a directory that contains ORC and delimited files, for example, and you define your external table with "COPY FROM *" instead of "COPY FROM *.orc", this table includes the size of the delimited files. (You would probably also encounter errors when querying that external table.) When you query this table Vertica does not validate your table definition; it just uses the path to find files to report.
You can also use the AUDIT function to find the size of a specific table or schema. When using the AUDIT function on ORC or PARQUET external tables, the error tolerance and confidence level parameters are ignored. Instead, the AUDIT always returns the size of the ORC or Parquet files on disk.
=> select AUDIT('customers_orc');
AUDIT
-----------
619080883
(1 row)
2.6 - Moving a cloud installation from by the hour (BTH) to bring your own license (BYOL)
Vertica offers two licensing options for some of the entries in the Amazon Web Services Marketplace and Google Cloud Marketplace:.
Vertica offers two licensing options for some of the entries in the Amazon Web Services Marketplace and Google Cloud Marketplace:
- Bring Your Own License (BYOL): a long-term license that you obtain through an online licensing portal. These deployments also work with a free Community Edition license. Vertica uses a community license automatically if you do not install a license that you purchased. (For more about Vertica licenses, see Managing licenses and Understanding Vertica licenses.)
- Vertica by the Hour (BTH): a pay-as-you-go environment where you are charged an hourly fee for both the use of Vertica and the cost of the instances it runs on. The Vertica by the hour deployment offers an alternative to purchasing a term license. If you want to crunch large volumes of data within a short period of time, this option might work better for you. The BTH license is automatically applied to all clusters you create using a BTH MC instance.
If you start out with an hourly license, you can later decide to use a long-term license for your database. The support for an hourly versus a long-term license is built into the instances running your database. To move your database from an hourly license to a long-term license, you must create a new database cluster with a new set of instances.
To move from an hourly to a long-term license, follow these steps:
-
Purchase a BYOL license. Follow the process described in Obtaining a license key file.
-
Apply the new license to your database.
-
Shut down your database.
-
Create a new database cluster using a BYOL marketplace entry.
-
Revive your database onto the new cluster.
The exact steps you must take depend on your database mode and your preferred tool for managing your database:
Moving an Eon Mode database from BTH to BYOL using the command line
Follow these steps to move an Eon Mode database from an hourly to a long-term license.
Obtain a long-term BYOL license from the online licensing portal, described in Obtaining a license key file.Upload the license file to a node in your database. Note the absolute path in the node's filesystem, as you will need this later when installing the license.Connect to the node you uploaded the license file to in the previous step.
Connect to your database using vsql and view the licenses table:
=> SELECT * FROM licenses;
Note the name of the hourly license listed in the NAME column, so you can check if it is still present later.
Install the license in the database using the INSTALL_LICENSE function with the absolute path to the license file you uploaded in step 2:
=> SELECT install_license('absolute path to BYOL license');
View the licenses table again:
=> SELECT * FROM licenses;
If only the new BYOL license appears in the table, skip to step 8. If the hourly license whose name you noted in step 4 is still in the table, copy the name and proceed to step 7.
Call the DROP_LICENSE function to drop the hourly license:
=> SELECT drop_license('hourly license name');
-
You will need the path for your cluster's communal storage in a later step. If you do not already know the path, you can find this information by executing this query:
=> SELECT location_path FROM V_CATALOG.STORAGE_LOCATIONS
WHERE sharing_type = 'COMMUNAL';
-
Synchronize your database's metadata. See Synchronizing metadata.
-
Shut down the database by calling the SHUTDOWN function:
=> SELECT SHUTDOWN();
-
You now need to create a new BYOL cluster onto which you will revive your database. Deploy a new cluster including a new MC instance using a BYOL entry in the marketplace of your chosen cloud platform. See:
Important
Your new BYOL cluster must have the same number of primary nodes as your existing hourly license cluster.
-
Revive your database onto the new cluster. For instructions, see Reviving an Eon Mode database cluster. Because you created the new cluster using a BYOL entry in the marketplace, the database uses the BYOL you applied earlier.
-
After reviving the database on your new BYOL cluster, terminate the instances for your hourly license cluster and MC. For instructions, see your cloud provider's documentation.
Moving an Eon Mode database from BTH to BYOL using the MC
Follow this procedure to move to BYOL and revive your database using MC:
-
Purchase a long-term BYOL license from the online licensing portal, following the steps detailed in Obtaining a license key file. Save the file to a location on your computer.
-
You now need to install the new license on your database. Log into MC and click your database in the Recent Databases list.
-
At the bottom of your database's Overview page, click the License tab.
-
Under the Installed Licenses list, note the name of the BTH license in the License Name column. You will need this later to check whether it is still present after installing the new long-term license.
-
In the ribbon at the top of the License History page, click the Install New License button. The Settings: License page opens.
-
Click the Browse button next to the Upload a new license box.
-
Locate the license file you obtained in step 1, and click Open.
-
Click the Apply button on the top right of the page.
-
Select the checkbox to agree to the EULA terms and click OK.
-
After Vertica installs the license, click the Close button.
-
Click the License tab at the bottom of the page.
-
If only the new long-term license appears in the Installed Licenses list, skip to Step 16. If the by-the-hour license also appears in the list, copy down its name from the License Name column.
-
You must drop the by-the-hour license before you can proceed. At the bottom of the page, click the Query Execution tab.
-
In the query editor, enter the following statement:
SELECT DROP_LICENSE('hourly license name');
-
Click Execute Query. The query should complete indicating that the license has been dropped.
-
You will need the path for your cluster's communal storage in a later step. If you do not already know the path, you can find this information by executing this query in the Query Execution tab:
SELECT location_path FROM V_CATALOG.STORAGE_LOCATIONS
WHERE sharing_type = 'COMMUNAL';
-
Synchronize your database's metadata. See Synchronizing metadata.
-
You must now stop your by-the-hour database cluster. At the bottom of the page, click the Manage tab.
-
In the banner at the top of the page, click Stop Database and then click OK to confirm.
-
From the Amazon Web Services Marketplace or the Google Cloud Marketplace, deploy a new Vertica Management Console using a BYOL entry. Do not deploy a full cluster. You just need an MC deployment.
-
Log into your new MC instance and revive the database. See Reviving an Eon Mode database on AWS in MC for detailed instructions.
-
After reviving the database on your new environment, terminate the instances for your hourly license environment. To do so, on the AWS CloudFormation Stacks page, select the hourly environment's stack (its collection of AWS resources) and click Actions > Delete Stack.
Moving an Enterprise Mode database from hourly to BYOL using backup and restore
Note
Currently, AWS is the only platform supported for Enterprise Mode databases using hourly licenses.
In an Enterprise Mode database, follow this procedure to move to BYOL, and then back up and restore your database:
Obtain a long-term BYOL license from the online licensing portal, described in Obtaining a license key file.Upload the license file to a node in your database. Note the absolute path in the node's filesystem, as you will need this later when installing the license.Connect to the node you uploaded the license file to in the previous step.
Connect to your database using vsql and view the licenses table:
=> SELECT * FROM licenses;
Note the name of the hourly license listed in the NAME column, so you can check if it is still present later.
Install the license in the database using the INSTALL_LICENSE function with the absolute path to the license file you uploaded in step 2:
=> SELECT install_license('absolute path to BYOL license');
View the licenses table again:
=> SELECT * FROM licenses;
If only the new BYOL license appears in the table, skip to step 8. If the hourly license whose name you noted in step 4 is still in the table, copy the name and proceed to step 7.
Call the DROP_LICENSE function to drop the hourly license:
=> SELECT drop_license('hourly license name');
-
Back up the database. See Backing up and restoring the database.
-
Deploy a new cluster for your database using one of the BYOL entries in the Amazon Web Services Marketplace.
-
Restore the database from the backup you created earlier. See Backing up and restoring the database. When you restore the database, it will use the BYOL you loaded earlier.
-
After restoring the database on your new environment, terminate the instances for your hourly license environment. To do so, on the AWS CloudFormation Stacks page, select the hourly environment's stack (its collection of AWS resources) and click Actions > Delete Stack.
After completing one of these procedures, see Viewing your license status to confirm the license drop and install were successful.
2.7 - Auditing database size
You can use your Vertica software until columnar data reaches the maximum raw data size that your license agreement allows.
You can use your Vertica software until columnar data reaches the maximum raw data size that your license agreement allows. Vertica periodically runs an audit of the columnar data size to verify that your database complies with this agreement. You can also run your own audits of database size with two functions:
-
AUDIT: Estimates the raw data size of a database, schema, or table.
-
AUDIT_FLEX: Estimates the size of one or more flexible tables in a database, schema, or projection.
The following two examples audit the database and one schema:
=> SELECT AUDIT('', 'database');
AUDIT
----------
76376696
(1 row)
=> SELECT AUDIT('online_sales', 'schema');
AUDIT
----------
35716504
(1 row)
Raw data size
AUDIT and AUDIT_FLEX use statistical sampling to estimate the raw data size of data stored in tables—that is, the uncompressed data that the database stores. For most data types, Vertica evaluates the raw data size as if the data were exported from the database in text format, rather than as compressed data. For details, see Evaluating Data Type Footprint.
By using statistical sampling, the audit minimizes its impact on database performance. The tradeoff between accuracy and performance impact is a small margin of error. Reports on your database size include the margin of error, so you can assess the accuracy of the estimate.
Data in ORC and Parquet-based external tables are also audited whether they are stored locally in the Vertica cluster's file system or remotely in S3 or on a Hadoop cluster. AUDIT always uses the file size of the underlying data files as the amount of data in the table. For example, suppose you have an external table based on 1GB of ORC files stored in HDFS. Then an audit of the table reports it as being 1GB in size.
Note
The Vertica audit does not verify that these files contain actual ORC or Parquet data. It just checks the size of the files that correspond to the external table definition.
Unaudited data
Table data that appears in multiple projections is counted only once. An audit also excludes the following data:
-
Temporary table data.
-
Data in SET USING columns.
-
Non-columnar data accessible through external table definitions. Data in columnar formats such as ORC and Parquet count against your totals.
-
Data that was deleted but not yet purged.
-
Data stored in system and work tables such as monitoring tables, Data collector tables, and Database Designer tables.
-
Delimiter characters.
Vertica evaluates the footprint of different data types as follows:
-
Strings and binary types—CHAR, VARCHAR, BINARY, VARBINARY—are counted as their actual size in bytes using UTF-8 encoding.
-
Numeric data types are evaluated as if they were printed. Each digit counts as a byte, as does any decimal point, sign, or scientific notation. For example, -123.456
counts as eight bytes—six digits plus the decimal point and minus sign.
-
Date/time data types are evaluated as if they were converted to text, including hyphens, spaces, and colons. For example, vsql prints a timestamp value of 2011-07-04 12:00:00
as 19 characters, or 19 bytes.
-
Complex types are evaluated as the sum of the sizes of their component parts. An array is counted as the total size of all elements, and a ROW is counted as the total size of all fields.
Controlling audit accuracy
AUDIT can specify the level of an audit's error tolerance and confidence, by default set to 5 and 99 percent, respectively. For example, you can obtain a high level of audit accuracy by setting error tolerance and confidence level to 0 and 100 percent, respectively. Unlike estimating raw data size with statistical sampling, Vertica dumps all audited data to a raw format to calculate its size.
Caution
Vertica discourages database-wide audits at this level. Doing so can have a significant adverse impact on database performance.
The following example audits the database with 25% error tolerance:
=> SELECT AUDIT('', 25);
AUDIT
----------
75797126
(1 row)
The following example audits the database with 25% level of tolerance and 90% confidence level:
=> SELECT AUDIT('',25,90);
AUDIT
----------
76402672
(1 row)
Note
These accuracy settings have no effect on audits of external tables based on ORC or Parquet files. Audits of external tables based on these formats always use the file size of ORC or Parquet files.
2.8 - Monitoring database size for license compliance
Your Vertica license can include a data storage allowance.
Your Vertica license can include a data storage allowance. The allowance can consist of data in columnar tables, flex tables, or both types of data. The AUDIT()
function estimates the columnar table data size and any flex table materialized columns. The AUDIT_FLEX()
function estimates the amount of __raw__
column data in flex or columnar tables. In regards to license data limits, data in __raw__
columns is calculated at 1/10th the size of structured data. Monitoring data sizes for columnar and flex tables lets you plan either to schedule deleting old data to keep your database in compliance with your license, or to consider a license upgrade for additional data storage.
Note
An audit of columnar data includes flex table real and materialized columns, but not __raw__
column data.
Viewing your license compliance status
Vertica periodically runs an audit of the columnar data size to verify that your database is compliant with your license terms. You can view the results of the most recent audit by calling the GET_COMPLIANCE_STATUS function.
=> select GET_COMPLIANCE_STATUS();
GET_COMPLIANCE_STATUS
---------------------------------------------------------------------------------
Raw Data Size: 2.00GB +/- 0.003GB
License Size : 4.000GB
Utilization : 50%
Audit Time : 2011-03-09 09:54:09.538704+00
Compliance Status : The database is in compliance with respect to raw data size.
License End Date: 04/06/2011
Days Remaining: 28.59
(1 row)
Periodically running GET_COMPLIANCE_STATUS to monitor your database's license status is usually enough to ensure that your database remains compliant with your license. If your database begins to near its columnar data allowance, you can use the other auditing functions described below to determine where your database is growing and how recent deletes affect the database size.
Manually auditing columnar data usage
You can manually check license compliance for all columnar data in your database using the AUDIT_LICENSE_SIZE function. This function performs the same audit that Vertica periodically performs automatically. The AUDIT_LICENSE_SIZE check runs in the background, so the function returns immediately. You can then query the results using GET_COMPLIANCE_STATUS.
Note
When you audit columnar data, the results include any flex table real and materialized columns, but not data in the __raw__
column. Materialized columns are virtual columns that you have promoted to real columns. Columns that you define when creating a flex table, or which you add with ALTER TABLE...ADD COLUMN
statements are real columns. All __raw__
columns are real columns. However, since they consist of unstructured or semi-structured data, they are audited separately.
An alternative to AUDIT_LICENSE_SIZE is to use the AUDIT function to audit the size of the columnar tables in your entire database by passing an empty string to the function. This function operates synchronously, returning when it has estimated the size of the database.
=> SELECT AUDIT('');
AUDIT
----------
76376696
(1 row)
The size of the database is reported in bytes. The AUDIT function also allows you to control the accuracy of the estimated database size using additional parameters. See the entry for the AUDIT function for full details. Vertica does not count the AUDIT function results as an official audit. It takes no license compliance actions based on the results.
Note
The results of the AUDIT function do not include flex table data in
__raw__
columns. Use the
AUDIT_FLEX function to monitor data usage flex tables.
Manually auditing __raw__ column data
You can use the AUDIT_FLEX function to manually audit data usage for flex or columnar tables with a __raw__
column. The function calculates the encoded, compressed data stored in ROS containers for any __raw__
columns. Materialized columns in flex tables are calculated by the AUDIT
function. The AUDIT_FLEX results do not include data in the __raw__
columns of temporary flex tables.
Targeted auditing
If audits determine that the columnar table estimates are unexpectedly large, consider schemas, tables, or partitions that are using the most storage. You can use the AUDIT function to perform targeted audits of schemas, tables, or partitions by supplying the name of the entity whose size you want to find. For example, to find the size of the online_sales schema in the VMart example database, run the following command:
=> SELECT AUDIT('online_sales');
AUDIT
----------
35716504
(1 row)
You can also change the granularity of an audit to report the size of each object in a larger entity (for example, each table in a schema) by using the granularity argument of the AUDIT function. See the AUDIT function.
Using Management Console to monitor license compliance
You can also get information about data storage of columnar data (for columnar tables and for materialized columns in flex tables) through the Management Console. This information is available in the database Overview page, which displays a grid view of the database's overall health.
-
The needle in the license meter adjusts to reflect the amount used in megabytes.
-
The grace period represents the term portion of the license.
-
The Audit button returns the same information as the AUDIT() function in a graphical representation.
-
The Details link within the License grid (next to the Audit button) provides historical information about license usage. This page also shows a progress meter of percent used toward your license limit.
2.9 - Managing license warnings and limits
The term portion of a Vertica license is easy to manage—you are licensed to use Vertica until a specific date.
Term license warnings and expiration
The term portion of a Vertica license is easy to manage—you are licensed to use Vertica until a specific date. If the term of your license expires, Vertica alerts you with messages appearing in the Administration tools and vsql. For example:
=> CREATE TABLE T (A INT);
NOTICE 8723: Vertica license 432d8e57-5a13-4266-a60d-759275416eb2 is in its grace period; grace period expires in 28 days
HINT: Renew at https://softwaresupport.softwaregrp.com/
CREATE TABLE
Contact Vertica at https://softwaresupport.softwaregrp.com/ as soon as possible to renew your license, and then install the new license. After the grace period expires, Vertica stops processing DML queries and allows DDL queries with a warning message. If a license expires and one or more valid alternative licenses are installed, Vertica uses the alternative licenses.
Data size license warnings and remedies
If your Vertica columnar license includes a raw data size allowance, Vertica periodically audits the size of your database to ensure it remains compliant with the license agreement. For details of this audit, see Auditing database size. You should also monitor your database size to know when it will approach licensed usage. Monitoring the database size helps you plan to either upgrade your license to allow for continued database growth or delete data from the database so you remain compliant with your license. See Monitoring database size for license compliance for details.
If your database's size approaches your licensed usage allowance (above 75% of license limits), you will see warnings in the Administration tools , vsql, and Management Console. You have two options to eliminate these warnings:
-
Upgrade your license to a larger data size allowance.
-
Delete data from your database to remain under your licensed raw data size allowance. The warnings disappear after Vertica's next audit of the database size shows that it is no longer close to or over the licensed amount. You can also manually run a database audit (see Monitoring database size for license compliance for details).
If your database continues to grow after you receive warnings that its size is approaching your licensed size allowance, Vertica displays additional warnings in more parts of the system after a grace period passes. Use the GET_COMPLIANCE_STATUS function to check the status of your license.
If your Vertica premium edition database size exceeds your licensed limits
If your Premium Edition database size exceeds your licensed data allowance, all successful queries from ODBC and JDBC clients return with a status of SUCCESS_WITH_INFO instead of the usual SUCCESS. The message sent with the results contains a warning about the database size. Your ODBC and JDBC clients should be prepared to handle these messages instead of assuming that successful requests always return SUCCESS.
Note
These warnings for Premium Edition are in addition to any warnings you see in Administration Tools, vsql, and Management Console.
If your Community Edition database size exceeds the limit of 1 terabyte, Vertica stops processing DML queries and allows DDL queries with a warning message.
To bring your database under compliance, you can choose to:
2.10 - Exporting license audit results to CSV
You can use admintools to audit a database for license compliance and export the results in CSV format, as follows:.
You can use admintools
to audit a database for license compliance and export the results in CSV format, as follows:
admintools -t license_audit [--password=password] --database=database] [--file=csv-file] [--quiet]
where:
-
database
must be a running database. If the database is password protected, you must also supply the password.
-
--file
csv-file
directs output to the specified file. If csv-file
already exists, the tool returns an error message. If this option is unspecified, output is directed to stdout
.
-
--quiet
specifies that the tool should run in quiet mode; if unspecified, status messages are sent to stdout
.
Running the license_audit
tool is equivalent to invoking the following SQL statements:
select audit('');
select audit_flex('');
select * from dc_features_used;
select * from v_catalog.license_audits;
select * from v_catalog.user_audits;
Audit results include the following information:
-
Log of used Vertica features
-
Estimated database size
-
Raw data size allowed by your Vertica license
-
Percentage of licensed allowance that the database currently uses
-
Audit timestamps
The following truncated example shows the raw CSV output that license_audit
generates:
FEATURES_USED
features_used,feature,date,sum
features_used,metafunction::get_compliance_status,2014-08-04,1
features_used,metafunction::bootstrap_license,2014-08-04,1
...
LICENSE_AUDITS
license_audits,database_size_bytes,license_size_bytes,usage_percent,audit_start_timestamp,audit_end_timestamp,confidence_level_percent,error_tolerance_percent,used_sampling,confidence_interval_lower_bound_bytes,confidence_interval_upper_bound_bytes,sample_count,cell_count,license_name
license_audits,808117909,536870912000,0.00150523690320551,2014-08-04 23:59:00.024874-04,2014-08-04 23:59:00.578419-04,99,5,t,785472097,830763721,10000,174754646,vertica
...
USER_AUDITS
user_audits,size_bytes,user_id,user_name,object_id,object_type,object_schema,object_name,audit_start_timestamp,audit_end_timestamp,confidence_level_percent,error_tolerance_percent,used_sampling,confidence_interval_lower_bound_bytes,confidence_interval_upper_bound_bytes,sample_count,cell_count
user_audits,812489249,45035996273704962,dbadmin,45035996273704974,DATABASE,,VMart,2014-10-14 11:50:13.230669-04,2014-10-14 11:50:14.069057-04,99,5,t,789022736,835955762,10000,174755178
AUDIT_SIZE_BYTES
audit_size_bytes,now,audit
audit_size_bytes,2014-10-14 11:52:14.015231-04,810584417
FLEX_SIZE_BYTES
flex_size_bytes,now,audit_flex
flex_size_bytes,2014-10-14 11:52:15.117036-04,11850
3 - Configuring the database
Before reading the topics in this section, you should be familiar with the material in [%=Vertica.GETTING_STARTED_GUIDE%] and are familiar with creating and configuring a fully-functioning example database.
Before reading the topics in this section, you should be familiar with the material in Getting started and are familiar with creating and configuring a fully-functioning example database.
See also
3.1 - Configuration procedure
This section describes the tasks required to set up a Vertica database.
This section describes the tasks required to set up a Vertica database. It assumes that you have a valid license key file, installed the Vertica rpm package, and ran the installation script as described.
You complete the configuration procedure using:
Note
You can also perform certain tasks using
Management Console. Those tasks point to the appropriate topic.
Continuing configuring
Follow the configuration procedure sequentially as this section describes.
Vertica strongly recommends that you first experiment with creating and configuring a database.
You can use this generic configuration procedure several times during the development process, modifying it to fit your changing goals. You can omit steps such as preparing actual data files and sample queries, and run the Database Designer without optimizing for queries. For example, you can create, load, and query a database several times for development and testing purposes, then one final time to create and load the production database.
3.1.1 - Prepare disk storage locations
You must create and specify directories in which to store your catalog and data files ().
You must create and specify directories in which to store your catalog and data files (physical schema). You can specify these locations when you install or configure the database, or later during database operations. Both the catalog and data directories must be owned by the database superuser.
The directory you specify for database catalog files (the catalog path) is used across all nodes in the cluster. For example, if you specify /home/catalog as the catalog directory, Vertica uses that catalog path on all nodes. The catalog directory should always be separate from any data file directories.
Note
Do not use a shared directory for more than one node. Data and catalog directories must be distinct for each node. Multiple nodes must not be allowed to write to the same data or catalog directory.
The data path you designate is also used across all nodes in the cluster. Specifying that data should be stored in /home/data, Vertica uses this path on all database nodes.
Do not use a single directory to contain both catalog and data files. You can store the catalog and data directories on different drives, which can be either on drives local to the host (recommended for the catalog directory) or on a shared storage location, such as an external disk enclosure or a SAN.
Before you specify a catalog or data path, be sure the parent directory exists on all nodes of your database. Creating a database in admintools also creates the catalog and data directories, but the parent directory must exist on each node.
You do not need to specify a disk storage location during installation. However, you can do so by using the --data-dir
parameter to the install_vertica
script. See Specifying disk storage location during installation.
3.1.1.1 - Specifying disk storage location during database creation
When you invoke the Create Database command in the , a dialog box allows you to specify the catalog and data locations.
When you invoke the Create Database command in the Administration tools, a dialog box allows you to specify the catalog and data locations. These locations must exist on each host in the cluster and must be owned by the database administrator.
When you click OK, Vertica automatically creates the following subdirectories:
catalog-pathname/database-name/node-name_catalog/data-pathname/database-name/node-name_data/
For example, if you use the default value (the database administrator's home directory) of
/home/dbadmin
for the Stock Exchange example database, the catalog and data directories are created on each node in the cluster as follows:
/home/dbadmin/Stock_Schema/stock_schema_node1_host01_catalog/home/dbadmin/Stock_Schema/stock_schema_node1_host01_data
Notes
-
Catalog and data path names must contain only alphanumeric characters and cannot have leading space characters. Failure to comply with these restrictions will result in database creation failure.
-
Vertica refuses to overwrite a directory if it appears to be in use by another database. Therefore, if you created a database for evaluation purposes, dropped the database, and want to reuse the database name, make sure that the disk storage location previously used has been completely cleaned up. See Managing storage locations for details.
3.1.1.2 - Specifying disk storage location on MC
You can use the MC interface to specify where you want to store database metadata on the cluster in the following ways:.
You can use the MC interface to specify where you want to store database metadata on the cluster in the following ways:
See also
Configuring Management Console.
3.1.1.3 - Configuring disk usage to optimize performance
Once you have created your initial storage location, you can add additional storage locations to the database later.
Once you have created your initial storage location, you can add additional storage locations to the database later. Not only does this provide additional space, it lets you control disk usage and increase I/O performance by isolating files that have different I/O or access patterns. For example, consider:
-
Isolating execution engine temporary files from data files by creating a separate storage location for temp space.
-
Creating labeled storage locations and storage policies, in which selected database objects are stored on different storage locations based on measured performance statistics or predicted access patterns.
See also
Managing storage locations
3.1.1.4 - Using shared storage with Vertica
If using shared SAN storage, ensure there is no contention among the nodes for disk space or bandwidth.
If using shared SAN storage, ensure there is no contention among the nodes for disk space or bandwidth.
-
Each host must have its own catalog and data locations. Hosts cannot share catalog or data locations.
-
Configure the storage so that there is enough I/O bandwidth for each node to access the storage independently.
3.1.1.5 - Viewing database storage information
You can view node-specific information on your Vertica cluster through the.
You can view node-specific information on your Vertica cluster through the Management Console. See Monitoring Vertica Using Management Console for details.
3.1.1.6 - Anti-virus scanning exclusions
You should exclude the Vertica catalog and data directories from anti-virus scanning.
You should exclude the Vertica catalog and data directories from anti-virus scanning. Certain anti-virus products have been identified as targeting Vertica directories, and sometimes lock or delete files in them. This can adversely affect Vertica performance and data integrity.
Identified anti-virus products include the following:
-
ClamAV
-
SentinelOne
-
Sophos
-
Symantec
-
Twistlock
Important
This list is not comprehensive.
3.1.2 - Disk space requirements for Vertica
In addition to actual data stored in the database, Vertica requires disk space for several data reorganization operations, such as and managing nodes in the cluster.
In addition to actual data stored in the database, Vertica requires disk space for several data reorganization operations, such as mergeout and managing nodes in the cluster. For best results, Vertica recommends that disk utilization per node be no more than sixty percent (60%) for a K-Safe=1 database to allow such operations to proceed.
In addition, disk space is temporarily required by certain query execution operators, such as hash joins and sorts, in the case when they cannot be completed in memory (RAM). Such operators might be encountered during queries, recovery, refreshing projections, and so on. The amount of disk space needed (known as temp space) depends on the nature of the queries, amount of data on the node and number of concurrent users on the system. By default, any unused disk space on the data disk can be used as temp space. However, Vertica recommends provisioning temp space separate from data disk space.
See also
Configuring disk usage to optimize performance.
3.1.3 - Disk space requirements for Management Console
You can install Management Console on any node in the cluster, so it has no special disk requirements other than disk space you allocate for your database cluster.
You can install Management Console on any node in the cluster, so it has no special disk requirements other than disk space you allocate for your database cluster.
3.1.4 - Prepare the logical schema script
Designing a logical schema for a Vertica database is no different from designing one for any other SQL database.
Designing a logical schema for a Vertica database is no different from designing one for any other SQL database. Details are described more fully in Designing a logical schema.
To create your logical schema, prepare a SQL script (plain text file, typically with an extension of .sql
) that:
-
Creates additional schemas (as necessary). See Using multiple schemas.
-
Creates the tables and column constraints in your database using the CREATE TABLE command.
-
Defines the necessary table constraints using the ALTER TABLE command.
-
Defines any views on the table using the CREATE VIEW command.
You can generate a script file using:
-
A schema designer application.
-
A schema extracted from an existing database.
-
A text editor.
-
One of the example database example-name_define_schema.sql
scripts as a template. (See the example database directories in
/opt/vertica/examples
.)
In your script file, make sure that:
-
Each statement ends with a semicolon.
-
You use data types supported by Vertica, as described in the SQL Reference Manual.
Once you have created a database, you can test your schema script by executing it as described in Create the logical schema. If you encounter errors, drop all tables, correct the errors, and run the script again.
3.1.5 - Prepare data files
Prepare two sets of data files:.
Prepare two sets of data files:
-
Test data files. Use test files to test the database after the partial data load. If possible, use part of the actual data files to prepare the test data files.
-
Actual data files. Once the database has been tested and optimized, use your data files for your initial Data load.
How to name data files
Name each data file to match the corresponding table in the logical schema. Case does not matter.
Use the extension .tbl
or whatever you prefer. For example, if a table is named Stock_Dimension
, name the corresponding data file stock_dimension.tbl
. When using multiple data files, append _nnn
(where nnn is a positive integer in the range 001 to 999) to the file name. For example, stock_dimension.tbl_001
, stock_dimension.tbl_002
, and so on.
3.1.6 - Prepare load scripts
You can postpone this step if your goal is to test a logical schema design for validity.
Note
You can postpone this step if your goal is to test a logical schema design for validity.
Prepare SQL scripts to load data directly into physical storage using COPY on vsql, or through ODBC.
You need scripts that load:
-
Large tables
-
Small tables
Vertica recommends that you load large tables using multiple files. To test the load process, use files of 10GB to 50GB in size. This size provides several advantages:
-
You can use one of the data files as a sample data file for the Database Designer.
-
You can load just enough data to Perform a partial data load before you load the remainder.
-
If a single load fails and rolls back, you do not lose an excessive amount of time.
-
Once the load process is tested, for multi-terabyte tables, break up the full load in file sizes of 250–500GB.
See also
Tip
You can use the load scripts included in the example databases as templates.
3.1.7 - Create an optional sample query script
The purpose of a sample query script is to test your schema and load scripts for errors.
The purpose of a sample query script is to test your schema and load scripts for errors.
Include a sample of queries your users are likely to run against the database. If you don't have any real queries, just write simple SQL that collects counts on each of your tables. Alternatively, you can skip this step.
3.1.8 - Create an empty database
Two options are available for creating an empty database:.
Two options are available for creating an empty database:
Although you can create more than one database (for example, one for production and one for testing), there can be only one active database for each installation of Vertica Analytic Database.
3.1.8.1 - Creating a database name and password
Database names must conform to the following rules:.
Database names
Database names must conform to the following rules:
-
Be between 1-30 characters
-
Begin with a letter
-
Follow with any combination of letters (upper and lowercase), numbers, and/or underscores.
Database names are case sensitive; however, Vertica strongly recommends that you do not create databases with names that differ only in case. For example, do not create a database called mydatabase
and another called MyDataBase
.
Database passwords
Database passwords can contain letters, digits, and special characters listed in the next table. Passwords cannot include non-ASCII Unicode characters.
The allowed password length is between 0-100 characters. The database superuser can change a Vertica user's maximum password length using ALTER PROFILE.
You use Profiles to specify and control password definitions. For instance, a profile can define the maximum length, reuse time, and the minimum number or required digits for a password, as well as other details.
The following special (ASCII) characters are valid in passwords. Special characters can appear anywhere in a password string. For example, mypas$word
or $mypassword
are both valid, while ±mypassword
is not. Using special characters other than the ones listed below can cause database instability.
-
#
-
?
-
=
-
_
-
'
-
)
-
(
-
@
-
\
-
/
-
!
-
,
-
~
-
:
-
%
-
;
-
`
-
^
-
+
-
.
-
-
-
space
-
&
-
<
-
>
-
[
-
]
-
{
-
}
-
|
-
*
-
$
-
"
See also
3.1.8.2 - Create a database using administration tools
Run the from your as follows:.
-
Run the Administration tools from your Administration host as follows:
$ /opt/vertica/bin/admintools
If you are using a remote terminal application, such as PuTTY or a Cygwin bash shell, see Notes for remote terminal users.
-
Accept the license agreement and specify the location of your license file. For more information see Managing licenses for more information.
This step is necessary only if it is the first time you have run the Administration Tools
-
On the Main Menu, click Configuration Menu, and click OK.
-
On the Configuration Menu, click Create Database, and click OK.
-
Enter the name of the database and an optional comment, and click OK. See Creating a database name and password for naming guidelines and restrictions.
-
Establish the superuser password for your database.
-
To provide a password enter the password and click OK. Confirm the password by entering it again, and then click OK.
-
If you don't want to provide the password, leave it blank and click OK. If you don't set a password, Vertica prompts you to verify that you truly do not want to establish a superuser password for this database. Click Yes to create the database without a password or No to establish the password.
Caution
If you do not enter a password at this point, the superuser password is set to empty. Unless the database is for evaluation or academic purposes, Vertica strongly recommends that you enter a superuser password. See
Creating a database name and password for guidelines.
-
Select the hosts to include in the database from the list of hosts specified when Vertica was installed (
install_vertica -s
), and click OK.
-
Specify the directories in which to store the data and catalog files, and click OK.
Note
Do not use a shared directory for more than one node. Data and catalog directories must be distinct for each node. Multiple nodes must not be allowed to write to the same data or catalog directory.
-
Catalog and data path names must contain only alphanumeric characters and cannot have leading spaces. Failure to comply with these restrictions results in database creation failure.
For example:
Catalog pathname: /home/dbadmin
Data Pathname: /home/dbadmin
-
Review the Current Database Definition screen to verify that it represents the database you want to create, and then click Yes to proceed or No to modify the database definition.
-
If you click Yes, Vertica creates the database you defined and then displays a message to indicate that the database was successfully created.
Note
For databases created with 3 or more nodes, Vertica automatically sets
K-safety to 1 to ensure that the database is fault tolerant in case a node fails. For more information, see
Failure recovery in the Administrator's Guide and
MARK_DESIGN_KSAFE.
-
Click OK to acknowledge the message.
3.1.9 - Create the logical schema
Connect to the database.
-
Connect to the database.
In the Administration Tools Main Menu, click Connect to Database and click OK.
See Connecting to the Database for details.
The vsql welcome script appears:
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
=>
-
Run the logical schema script
Using the \i meta-command in vsql to run the SQL logical schema script that you prepared earlier.
-
Disconnect from the database
Use the \q
meta-command in vsql to return to the Administration Tools.
3.1.10 - Perform a partial data load
Vertica recommends that for large tables, you perform a partial data load and then test your database before completing a full data load.
Vertica recommends that for large tables, you perform a partial data load and then test your database before completing a full data load. This load should load a representative amount of data.
-
Load the small tables.
Load the small table data files using the SQL load scripts and data files you prepared earlier.
-
Partially load the large tables.
Load 10GB to 50GB of table data for each table using the SQL load scripts and data files that you prepared earlier.
For more information about projections, see Projections.
3.1.11 - Test the database
Test the database to verify that it is running as expected.
Test the database to verify that it is running as expected.
Check queries for syntax errors and execution times.
-
Use the vsql \timing meta-command to enable the display of query execution time in milliseconds.
-
Execute the SQL sample query script that you prepared earlier.
-
Execute several ad hoc queries.
3.1.12 - Optimize query performance
Optimizing the database consists of optimizing for compression and tuning for queries.
Optimizing the database consists of optimizing for compression and tuning for queries. (See Creating a database design.)
To optimize the database, use the Database Designer to create and deploy a design for optimizing the database. See Using Database Designer to create a comprehensive design.
After you run the Database Designer, use the techniques described in Query optimization to improve the performance of certain types of queries.
Note
The database response time depends on factors such as type and size of the application query, database design, data size and data types stored, available computational power, and network bandwidth. Adding nodes to a database cluster does not necessarily improve the system response time for every query, especially if the response time is already short, e.g., less then 10 seconds, or the response time is not hardware bound.
3.1.13 - Complete the data load
To complete the load:.
To complete the load:
-
Monitor system resource usage.
Continue to run the top
, free
, and df
utilities and watch them while your load scripts are running (as described in Monitoring Linux resource usage). You can do this on any or all nodes in the cluster. Make sure that the system is not swapping excessively (watch kswapd
in top
) or running out of swap space (watch for a large amount of used swap space in free).
Note
Vertica requires a dedicated server. If your loader or other processes take up significant amounts of RAM, it can result in swapping.
-
Complete the large table loads.
Run the remainder of the large table load scripts.
3.1.14 - Test the optimized database
Check query execution times to test your optimized design:.
Check query execution times to test your optimized design:
-
Use the vsql \timing
meta-command to enable the display of query execution time in milliseconds.
Execute a SQL sample query script to test your schema and load scripts for errors.
Note
Include a sample of queries your users are likely to run against the database. If you don't have any real queries, just write simple SQL that collects counts on each of your tables. Alternatively, you can skip this step.
-
Execute several ad hoc queries
-
Run Administration tools and select Connect to Database.
-
Use the \i meta-command to execute the query script; for example:
vmartdb=> \i vmart_query_03.sql customer_name | annual_income
------------------+---------------
James M. McNulty | 999979
Emily G. Vogel | 999998
(2 rows)
Time: First fetch (2 rows): 58.411 ms. All rows formatted: 58.448 ms
vmartdb=> \i vmart_query_06.sql
store_key | order_number | date_ordered
-----------+--------------+--------------
45 | 202416 | 2004-01-04
113 | 66017 | 2004-01-04
121 | 251417 | 2004-01-04
24 | 250295 | 2004-01-04
9 | 188567 | 2004-01-04
166 | 36008 | 2004-01-04
27 | 150241 | 2004-01-04
148 | 182207 | 2004-01-04
198 | 75716 | 2004-01-04
(9 rows)
Time: First fetch (9 rows): 25.342 ms. All rows formatted: 25.383 ms
Once the database is optimized, it should run queries efficiently. If you discover queries that you want to optimize, you can modify and update the design incrementally.
3.1.15 - Implement locales for international data sets
Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax.
Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate GROUP BY
and ORDER BY
clauses, joins, and the analytic ORDER BY
clause.
The default locale for a Vertica database is en_US@collation=binary
(English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default en_US@collation=binary
collation, regardless of the session collation. Any locale-specific collation is applied at query time.
If you set the locale to null, Vertica sets the locale to en_US_POSIX
. You can set the locale back to the default locale and collation by issuing the vsql meta-command \locale
. For example:
Note
=> set locale to '';
INFO 2567: Canonical locale: 'en_US_POSIX'
Standard collation: 'LEN'
English (United States, Computer)
SET
=> \locale en_US@collation=binary;
INFO 2567: Canonical locale: 'en_US'
Standard collation: 'LEN_KBINARY'
English (United States)
=> \locale
en_US@collation-binary;
You can set locale through ODBC, JDBC, and ADO.net.
ICU locale support
Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax. The locale used by the database session is not derived from the operating system (through the LANG
variable), so Vertica recommends that you set the LANG
for each node running vsql, as described in the next section.
While ICU library services can specify collation, currency, and calendar preferences, Vertica supports only the collation component. Any keywords not relating to collation are rejected. Projections are always collated using the en_US@collation=binary
collation regardless of the session collation. Any locale-specific collation is applied at query time.
The SET DATESTYLE TO
...
command provides some aspects of the calendar, but Vertica supports only dollars as currency.
Changing DB locale for a session
This examples sets the session locale to Thai.
-
At the operating-system level for each node running vsql, set the LANG
variable to the locale language as follows:
export LANG=th_TH.UTF-8
Note
If setting the LANG=
as shown does not work, the operating system support for locales may not be installed.
-
For each Vertica session (from ODBC/JDBC or vsql) set the language locale.
From vsql:
\locale th_TH
-
From ODBC/JDBC:
"SET LOCALE TO th_TH;"
-
In PUTTY (or ssh terminal), change the settings as follows:
settings > window > translation > UTF-8
-
Click Apply and then click Save.
All data loaded must be in UTF-8 format, not an ISO format, as described in Delimited data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8, are not supported, so functions like SUBSTRING do not work correctly for multibyte characters. Thus, settings for locale should not work correctly. If the translation setting ISO-8859-11:2001 (Latin/Thai) works, the data is loaded incorrectly. To convert data correctly, use a utility program such as Linux
iconv
.
Note
The maximum length parameter for VARCHAR and CHAR data type refers to the number of octets (bytes) that can be stored in that field, not the number of characters. When using multi-byte UTF-8 characters, make sure to size fields to accommodate from 1 to 4 bytes per character, depending on the data.
See also
3.1.15.1 - Specify the default locale for the database
After you start the database, the default locale configuration parameter, DefaultSessionLocale, sets the initial locale.
After you start the database, the default locale configuration parameter, DefaultSessionLocale
, sets the initial locale. You can override this value for individual sessions.
To set the locale for the database, use the configuration parameter as follows:
=> ALTER DATABASE DEFAULT SET DefaultSessionLocale = 'ICU-locale-identifier';
For example:
=> ALTER DATABASE DEFAULT SET DefaultSessionLocale = 'en_GB';
3.1.15.2 - Override the default locale for a session
You can override the default locale for the current session in two ways:.
You can override the default locale for the current session in two ways:
-
VSQL command
\locale
. For example:
=> \locale en_GBINFO:
INFO 2567: Canonical locale: 'en_GB'
Standard collation: 'LEN'
English (United Kingdom)
-
SQL statement
SET LOCALE
. For example:
=> SET LOCALE TO en_GB;
INFO 2567: Canonical locale: 'en_GB'
Standard collation: 'LEN'
English (United Kingdom)
Both methods accept locale short and long forms. For example:
=> SET LOCALE TO LEN;
INFO 2567: Canonical locale: 'en'
Standard collation: 'LEN'
English
=> \locale LEN
INFO 2567: Canonical locale: 'en'
Standard collation: 'LEN'
English
See also
3.1.15.3 - Server versus client locale settings
Vertica differentiates database server locale settings from client application locale settings:.
Vertica differentiates database server locale settings from client application locale settings:
The following sections describe best practices to ensure predictable results.
Server locale
The server session locale should be set as described in Specify the default locale for the database. If locales vary across different sessions, set the server locale at the start of each session from your client.
vsql client
-
If the database does not have a default session locale, set the server locale for the session to the desired locale.
-
The locale setting in the terminal emulator where the vsql client runs should be set to be equivalent to session locale setting on the server side (ICU locale). By doing so, the data is collated correctly on the server and displayed correctly on the client.
-
All input data for vsql should be in UTF-8, and all output data is encoded in UTF-8
-
Vertica does not support non UTF-8 encodings and associated locale values; .
-
For instructions on setting locale and encoding, refer to your terminal emulator documentation.
ODBC clients
-
ODBC applications can be either in ANSI or Unicode mode. If the user application is Unicode, the encoding used by ODBC is UCS-2. If the user application is ANSI, the data must be in single-byte ASCII, which is compatible with UTF-8 used on the database server. The ODBC driver converts UCS-2 to UTF-8 when passing to the Vertica server and converts data sent by the Vertica server from UTF-8 to UCS-2.
-
If the user application is not already in UCS-2, the application must convert the input data to UCS-2, or unexpected results could occur. For example:
-
For non-UCS-2 data passed to ODBC APIs, when it is interpreted as UCS-2, it could result in an invalid UCS-2 symbol being passed to the APIs, resulting in errors.
-
The symbol provided in the alternate encoding could be a valid UCS-2 symbol. If this occurs, incorrect data is inserted into the database.
-
If the database does not have a default session locale, ODBC applications should set the desired server session locale using SQLSetConnectAttr
(if different from database wide setting). By doing so, you get the expected collation and string functions behavior on the server.
JDBC and ADO.NET clients
-
JDBC and ADO.NET applications use a UTF-16 character set encoding and are responsible for converting any non-UTF-16 encoded data to UTF-16. The same cautions apply as for ODBC if this encoding is violated.
-
The JDBC and ADO.NET drivers convert UTF-16 data to UTF-8 when passing to the Vertica server and convert data sent by Vertica server from UTF-8 to UTF-16.
-
If there is no default session locale at the database level, JDBC and ADO.NET applications should set the correct server session locale by executing the SET LOCALE TO command in order to get the expected collation and string functions behavior on the server. For more information, see SET LOCALE.
3.1.16 - Using time zones with Vertica
Vertica uses the public-domain tz database (time zone database), which contains code and data that represent the history of local time for locations around the globe.
Vertica uses the public-domain tz database (time zone database), which contains code and data that represent the history of local time for locations around the globe. This database organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree on timestamps that are later than the POSIX Epoch (1970-01-01 00:00:00 UTC). Each timezone has a unique identifier. Identifiers typically follow the convention area
/
location
, where area
is a continent or ocean, and location
is a specific location within the area—for example, Africa/Cairo, America/New_York, and Pacific/Honolulu.
Important
IANA acknowledge that 1970 is an arbitrary cutoff. They note the problems that face moving the cutoff earlier "due to the wide variety of local practices before computer timekeeping became prevalent." IANA's own description of the tz database suggests that users should regard historical dates and times, especially those that predate the POSIX epoch date, with a healthy measure of skepticism. For details, see
Theory and pragmatics of the tz code and data.
Vertica uses the TZ
environment variable (if set) on each node for the default current time zone. Otherwise, Vertica uses the operating system time zone.
The TZ
variable can be set by the operating system during login (see /etc/profile
, /etc/profile.d
, or /etc/bashrc
) or by the user in .profile
, .bashrc
or .bash-profile
. TZ
must be set to the same value on each node when you start Vertica.
The following command returns the current time zone for your database:
=> SHOW TIMEZONE;
name | setting
----------+------------------
timezone | America/New_York
(1 row)
You can also set the time zone for a single session with SET TIME ZONE.
Conversion and storage of date/time data
There is no database default time zone. TIMESTAMPTZ (TIMESTAMP WITH TIMEZONE) data is converted from the current local time and stored as GMT/UTC (Greenwich Mean Time/Coordinated Universal Time).
When TIMESTAMPTZ data is used, data is converted back to the current local time zone, which might be different from the local time zone where the data was stored. This conversion takes into account daylight saving time (summer time), depending on the year and date to determine when daylight saving time begins and ends.
TIMESTAMP WITHOUT TIMEZONE data stores the timestamp as given, and retrieves it exactly as given. The current time zone is ignored. The same is true for TIME WITHOUT TIMEZONE. For TIME WITH TIMEZONE (TIMETZ), however, the current time zone setting is stored along with the given time, and that time zone is used on retrieval.
Note
Vertica recommends that you use TIMESTAMPTZ, not TIMETZ.
Querying data/time data
TIMESTAMPTZ uses the current time zone on both input and output, as in the following example:
=> CREATE TEMP TABLE s (tstz TIMESTAMPTZ);=> SET TIMEZONE TO 'America/New_York';
=> INSERT INTO s VALUES ('2009-02-01 00:00:00');
=> INSERT INTO s VALUES ('2009-05-12 12:00:00');
=> SELECT tstz AS 'Local timezone', tstz AT TIMEZONE 'America/New_York' AS 'America/New_York',
tstz AT TIMEZONE 'GMT' AS 'GMT' FROM s;
Local timezone | America/New_York | GMT
------------------------+---------------------+---------------------
2009-02-01 00:00:00-05 | 2009-02-01 00:00:00 | 2009-02-01 05:00:00
2009-05-12 12:00:00-04 | 2009-05-12 12:00:00 | 2009-05-12 16:00:00
(2 rows)
The -05
in the Local time zone column shows that the data is displayed in EST, while -04
indicates EDT. The other two columns show the TIMESTAMP WITHOUT TIMEZONE at the specified time zone.
The next example shows what happens if the current time zone is changed to GMT:
=> SET TIMEZONE TO 'GMT';=> SELECT tstz AS 'Local timezone', tstz AT TIMEZONE 'America/New_York' AS
'America/New_York', tstz AT TIMEZONE 'GMT' as 'GMT' FROM s;
Local timezone | America/New_York | GMT
------------------------+---------------------+---------------------
2009-02-01 05:00:00+00 | 2009-02-01 00:00:00 | 2009-02-01 05:00:00
2009-05-12 16:00:00+00 | 2009-05-12 12:00:00 | 2009-05-12 16:00:00
(2 rows)
The +00 in the Local time zone column indicates that TIMESTAMPTZ is displayed in GMT.
The approach of using TIMESTAMPTZ fields to record events captures the GMT of the event, as expressed in terms of the local time zone. Later, it allows for easy conversion to any other time zone, either by setting the local time zone or by specifying an explicit AT TIMEZONE clause.
The following example shows how TIMESTAMP WITHOUT TIMEZONE fields work in Vertica.
=> CREATE TEMP TABLE tnoz (ts TIMESTAMP);=> INSERT INTO tnoz VALUES('2009-02-01 00:00:00');
=> INSERT INTO tnoz VALUES('2009-05-12 12:00:00');
=> SET TIMEZONE TO 'GMT';
=> SELECT ts AS 'No timezone', ts AT TIMEZONE 'America/New_York' AS
'America/New_York', ts AT TIMEZONE 'GMT' AS 'GMT' FROM tnoz;
No timezone | America/New_York | GMT
---------------------+------------------------+------------------------
2009-02-01 00:00:00 | 2009-02-01 05:00:00+00 | 2009-02-01 00:00:00+00
2009-05-12 12:00:00 | 2009-05-12 16:00:00+00 | 2009-05-12 12:00:00+00
(2 rows)
The +00
at the end of a timestamp indicates that the setting is TIMESTAMP WITH TIMEZONE in GMT (the current time zone). The America/New_York column shows what the GMT setting was when you recorded the time, assuming you read a normal clock in the America/New_York time zone. What this shows is that if it is midnight in the America/New_York time zone, then it is 5 am GMT.
Note
00:00:00 Sunday February 1, 2009 in America/New_York converts to 05:00:00 Sunday February 1, 2009 in GMT.
The GMT column displays the GMT time, assuming the input data was captured in GMT.
If you don't set the time zone to GMT, and you use another time zone, for example America/New_York, then the results display in America/New_York with a -05
and -04
, showing the difference between that time zone and GMT.
=> SET TIMEZONE TO 'America/New_York';
=> SHOW TIMEZONE;
name | setting
----------+------------------
timezone | America/New_York
(1 row)
=> SELECT ts AS 'No timezone', ts AT TIMEZONE 'America/New_York' AS
'America/New_York', ts AT TIMEZONE 'GMT' AS 'GMT' FROM tnoz;
No timezone | America/New_York | GMT
---------------------+------------------------+------------------------
2009-02-01 00:00:00 | 2009-02-01 00:00:00-05 | 2009-01-31 19:00:00-05
2009-05-12 12:00:00 | 2009-05-12 12:00:00-04 | 2009-05-12 08:00:00-04
(2 rows)
In this case, the last column is interesting in that it returns the time in New York, given that the data was captured in GMT.
See also
3.1.17 - Change transaction isolation levels
By default, Vertica uses the READ COMMITTED isolation level for all sessions.
By default, Vertica uses the READ COMMITTED
isolation level for all sessions. You can change the default isolation level for the database or for a given session.
A transaction retains its isolation level until it completes, even if the session's isolation level changes during the transaction. Vertica internal processes (such as the Tuple Mover and refresh operations) and DDL operations always run at the SERIALIZABLE isolation level to ensure consistency.
Database isolation level
The configuration parameter
TransactionIsolationLevel
specifies the database isolation level, and is used as the default for all sessions. Use
ALTER DATABASE
to change the default isolation level.For example:
=> ALTER DATABASE DEFAULT SET TransactionIsolationLevel = 'SERIALIZABLE';
ALTER DATABASE
=> ALTER DATABASE DEFAULT SET TransactionIsolationLevel = 'READ COMMITTED';
ALTER DATABASE
Changes to the database isolation level only apply to future sessions. Existing sessions and their transactions continue to use their original isolation level.
Use
SHOW CURRENT
to view the database isolation level:
=> SHOW CURRENT TransactionIsolationLevel;
level | name | setting
----------+---------------------------+----------------
DATABASE | TransactionIsolationLevel | READ COMMITTED
(1 row)
Session isolation level
SET SESSION CHARACTERISTICS AS TRANSACTION
changes the isolation level for a specific session. For example:
=> SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET
Use
SHOW
to view the current session's isolation level:
=> SHOW TRANSACTION_ISOLATION;
See also
Transactions
3.2 - Configuration parameter management
For details about individual configuration parameters grouped by category, see Configuration Parameters.
Vertica supports a wide variety of configuration parameters that affect many facets of database behavior. These parameters can be set with the appropriate ALTER statements at one or more levels, listed here in descending order of precedence:
-
User (ALTER USER)
-
Session (ALTER SESSION)
-
Node (ALTER NODE)
-
Database (ALTER DATABASE)
Not all parameters can be set at all levels. Consult the documentation of individual parameters for restrictions.
You can query the CONFIGURATION_PARAMETERS system table to obtain the current settings for all user-accessible parameters. For example, the following query returns settings for partitioning parameters: their current and default values, which levels they can be set at, and whether changes require a database restart to take effect:
=> SELECT parameter_name, current_value, default_value, allowed_levels, change_requires_restart
FROM configuration_parameters WHERE parameter_name ILIKE '%partitioncount%';
parameter_name | current_value | default_value | allowed_levels | change_requires_restart
----------------------+---------------+---------------+----------------+-------------------------
MaxPartitionCount | 1024 | 1024 | NODE, DATABASE | f
ActivePartitionCount | 1 | 1 | NODE, DATABASE | f
(2 rows)
For details about individual configuration parameters grouped by category, see Configuration parameters.
Setting and clearing configuration parameters
You change specific configuration parameters with the appropriate ALTER statements; the same statements also let you reset configuration parameters to their default values. For example, the following ALTER statements change ActivePartitionCount at the database level from 1 to 2 , and DisablePartitionCount at the session level from 0 to 1:
=> ALTER DATABASE DEFAULT SET ActivePartitionCount = 2;
ALTER DATABASE
=> ALTER SESSION SET DisableAutopartition = 1;
ALTER SESSION
=> SELECT parameter_name, current_value, default_value FROM configuration_parameters
WHERE parameter_name IN ('ActivePartitionCount', 'DisableAutopartition');
parameter_name | current_value | default_value
----------------------+---------------+---------------
ActivePartitionCount | 2 | 1
DisableAutopartition | 1 | 0
(2 rows)
You can later reset the same configuration parameters to their default values:
=> ALTER DATABASE DEFAULT CLEAR ActivePartitionCount;
ALTER DATABASE
=> ALTER SESSION CLEAR DisableAutopartition;
ALTER DATABASE
=> SELECT parameter_name, current_value, default_value FROM configuration_parameters
WHERE parameter_name IN ('ActivePartitionCount', 'DisableAutopartition');
parameter_name | current_value | default_value
----------------------+---------------+---------------
DisableAutopartition | 0 | 0
ActivePartitionCount | 1 | 1
(2 rows)
Caution
Vertica is designed to operate with minimal configuration changes. Be careful to change configuration parameters according to documented guidelines.
3.2.1 - Viewing configuration parameter values
You can view active configuration parameter values in two ways:.
You can view active configuration parameter values in two ways:
SHOW statements
Use the following SHOW
statements to view active configuration parameters:
-
SHOW CURRENT: Returns settings of active configuration parameter values. Vertica checks settings at all levels, in the following ascending order of precedence:
If no values are set at any scope, SHOW CURRENT returns the parameter's default value.
-
SHOW DATABASE: Displays configuration parameter values set for the database.
-
SHOW USER: Displays configuration parameters set for the specified user, and for all users.
-
SHOW SESSION: Displays configuration parameter values set for the current session.
-
SHOW NODE: Displays configuration parameter values set for a node.
If a configuration parameter requires a restart to take effect, the values in a SHOW CURRENT
statement might differ from values in other SHOW
statements. To see which parameters require restart, query the CONFIGURATION_PARAMETERS system table.
System tables
You can query several system tables for configuration parameters:
3.3 - Designing a logical schema
Designing a logical schema for a Vertica database is the same as designing for any other SQL database.
Designing a logical schema for a Vertica database is the same as designing for any other SQL database. A logical schema consists of objects such as schemas, tables, views and referential Integrity constraints that are visible to SQL users. Vertica supports any relational schema design that you choose.
3.3.1 - Using multiple schemas
Using a single schema is effective if there is only one database user or if a few users cooperate in sharing the database.
Using a single schema is effective if there is only one database user or if a few users cooperate in sharing the database. In many cases, however, it makes sense to use additional schemas to allow users and their applications to create and access tables in separate namespaces. For example, using additional schemas allows:
-
Many users to access the database without interfering with one another.
Individual schemas can be configured to grant specific users access to the schema and its tables while restricting others.
-
Third-party applications to create tables that have the same name in different schemas, preventing table collisions.
Unlike other RDBMS, a schema in a Vertica database is not a collection of objects bound to one user.
3.3.1.1 - Multiple schema examples
This section provides examples of when and how you might want to use multiple schemas to separate database users.
This section provides examples of when and how you might want to use multiple schemas to separate database users. These examples fall into two categories: using multiple private schemas and using a combination of private schemas (i.e. schemas limited to a single user) and shared schemas (i.e. schemas shared across multiple users).
Using multiple private schemas
Using multiple private schemas is an effective way of separating database users from one another when sensitive information is involved. Typically a user is granted access to only one schema and its contents, thus providing database security at the schema level. Database users can be running different applications, multiple copies of the same application, or even multiple instances of the same application. This enables you to consolidate applications on one database to reduce management overhead and use resources more effectively. The following examples highlight using multiple private schemas.
Using multiple schemas to separate users and their unique applications
In this example, both database users work for the same company. One user (HRUser) uses a Human Resource (HR) application with access to sensitive personal data, such as salaries, while another user (MedUser) accesses information regarding company healthcare costs through a healthcare management application. HRUser should not be able to access company healthcare cost information and MedUser should not be able to view personal employee data.
To grant these users access to data they need while restricting them from data they should not see, two schemas are created with appropriate user access, as follows:
Using multiple schemas to support multitenancy
This example is similar to the last example in that access to sensitive data is limited by separating users into different schemas. In this case, however, each user is using a virtual instance of the same application.
An example of this is a retail marketing analytics company that provides data and software as a service (SaaS) to large retailers to help them determine which promotional methods they use are most effective at driving customer sales.
In this example, each database user equates to a retailer, and each user only has access to its own schema. The retail marketing analytics company provides a virtual instance of the same application to each retail customer, and each instance points to the user’s specific schema in which to create and update tables. The tables in these schemas use the same names because they are created by instances of the same application, but they do not conflict because they are in separate schemas.
Example of schemas in this database could be:
-
MartSchema—A schema owned by MartUser, a large department store chain.
-
PharmSchema—A schema owned by PharmUser, a large drug store chain.
Using multiple schemas to migrate to a newer version of an application
Using multiple schemas is an effective way of migrating to a new version of a software application. In this case, a new schema is created to support the new version of the software, and the old schema is kept as long as necessary to support the original version of the software. This is called a “rolling application upgrade.”
For example, a company might use a HR application to store employee data. The following schemas could be used for the original and updated versions of the software:
-
HRSchema—A schema owned by HRUser, the schema user for the original HR application.
-
V2HRSchema—A schema owned by V2HRUser, the schema user for the new version of the HR application.
Combining private and shared schemas
The previous examples illustrate cases in which all schemas in the database are private and no information is shared between users. However, users might want to share common data. In the retail case, for example, MartUser and PharmUser might want to compare their per store sales of a particular product against the industry per store sales average. Since this information is an industry average and is not specific to any retail chain, it can be placed in a schema on which both users are granted USAGE privileges.
Example of schemas in this database might be:
-
MartSchema—A schema owned by MartUser, a large department store chain.
-
PharmSchema—A schema owned by PharmUser, a large drug store chain.
-
IndustrySchema—A schema owned by DBUser (from the retail marketing analytics company) on which both MartUser and PharmUser have USAGE privileges. It is unlikely that retailers would be given any privileges beyond USAGE on the schema and SELECT on one or more of its tables.
3.3.1.2 - Creating schemas
You can create as many schemas as necessary for your database.
You can create as many schemas as necessary for your database. For example, you could create a schema for each database user. However, schemas and users are not synonymous as they are in Oracle.
By default, only a superuser can create a schema or give a user the right to create a schema. (See GRANT (database) in the SQL Reference Manual.)
To create a schema use the CREATE SCHEMA statement, as described in the SQL Reference Manual.
3.3.1.3 - Specifying objects in multiple schemas
Once you create two or more schemas, each SQL statement or function must identify the schema associated with the object you are referencing.
Once you create two or more schemas, each SQL statement or function must identify the schema associated with the object you are referencing. You can specify an object within multiple schemas by:
-
Qualifying the object name by using the schema name and object name separated by a dot. For example, to specify MyTable
, located in Schema1
, qualify the name as Schema1.MyTable
.
-
Using a search path that includes the desired schemas when a referenced object is unqualified. By Setting search paths, Vertica will automatically search the specified schemas to find the object.
3.3.1.4 - Setting search paths
Each user session has a search path of schemas.
Each user session has a search path of schemas. Vertica uses this search path to find tables and user-defined functions (UDFs) that are unqualified by their schema name. A session search path is initially set from the user's profile. You can change the session's search path at any time by calling
SET SEARCH_PATH
. This search path remains in effect until the next SET SEARCH_PATH
statement, or the session ends.
Viewing the current search path
SHOW SEARCH_PATH
returns the session's current search path. For example:
=> SHOW SEARCH_PATH;
name | setting
-------------+---------------------------------------------------
search_path | "$user", public, v_catalog, v_monitor, v_internal
Schemas are listed in descending order of precedence. The first schema has the highest precedence in the search order. If this schema exists, it is also defined as the current schema, which is used for tables that are created with unqualified names. You can identify the current schema by calling the function
CURRENT_SCHEMA
:
=> SELECT CURRENT_SCHEMA;
current_schema
----------------
public
(1 row)
Setting the user search path
A session search path is initially set from the user's profile. If the search path in a user profile is not set by
CREATE USER
or
ALTER USER
, it is set to the database default:
=> CREATE USER agent007;
CREATE USER
=> \c - agent007
You are now connected as user "agent007".
=> SHOW SEARCH_PATH;
name | setting
-------------+---------------------------------------------------
search_path | "$user", public, v_catalog, v_monitor, v_internal
$user
resolves to the session user name—in this case, agent007
—and has the highest precedence. If a schema agent007
, exists, Vertica begins searches for unqualified tables in that schema. Also, calls to
CURRENT_SCHEMA
return this schema. Otherwise, Vertica uses public
as the current schema and begins searches in it.
Use
ALTER USER
to modify an existing user's search path. These changes overwrite all non-system schemas in the search path, including $USER
. System schemas are untouched. Changes to a user's search path take effect only when the user starts a new session; current sessions are unaffected.
For example, the following statements modify agent007
's search path, and grant access privileges to schemas and tables that are on the new search path:
=> ALTER USER agent007 SEARCH_PATH store, public;
ALTER USER
=> GRANT ALL ON SCHEMA store, public TO agent007;
GRANT PRIVILEGE
=> GRANT SELECT ON ALL TABLES IN SCHEMA store, public TO agent007;
GRANT PRIVILEGE
=> \c - agent007
You are now connected as user "agent007".
=> SHOW SEARCH_PATH;
name | setting
-------------+-------------------------------------------------
search_path | store, public, v_catalog, v_monitor, v_internal
(1 row)
To verify a user's search path, query the system table
USERS
:
=> SELECT search_path FROM USERS WHERE user_name='agent007';
search_path
-------------------------------------------------
store, public, v_catalog, v_monitor, v_internal
(1 row)
To revert a user's search path to the database default settings, call ALTER USER
and set the search path to DEFAULT
. For example:
=> ALTER USER agent007 SEARCH_PATH DEFAULT;
ALTER USER
=> SELECT search_path FROM USERS WHERE user_name='agent007';
search_path
---------------------------------------------------
"$user", public, v_catalog, v_monitor, v_internal
(1 row)
Ignored search path schemas
Vertica only searches among existing schemas to which the current user has access privileges. If a schema in the search path does not exist or the user lacks access privileges to it, Vertica silently excludes it from the search. For example, if agent007
lacks SELECT privileges to schema public
, Vertica silently skips this schema. Vertica returns with an error only if it cannot find the table anywhere on the search path.
Setting session search path
Vertica initially sets a session's search path from the user's profile. You can change the current session's search path with
SET SEARCH_PATH
. You can use SET SEARCH_PATH
in two ways:
-
Explicitly set the session search path to one or more schemas. For example:
=> \c - agent007
You are now connected as user "agent007".
dbadmin=> SHOW SEARCH_PATH;
name | setting
-------------+---------------------------------------------------
search_path | "$user", public, v_catalog, v_monitor, v_internal
(1 row)
=> SET SEARCH_PATH TO store, public;
SET
=> SHOW SEARCH_PATH;
name | setting
-------------+-------------------------------------------------
search_path | store, public, v_catalog, v_monitor, v_internal
(1 row)
-
Set the session search path to the database default:
=> SET SEARCH_PATH TO DEFAULT;
SET
=> SHOW SEARCH_PATH;
name | setting
-------------+---------------------------------------------------
search_path | "$user", public, v_catalog, v_monitor, v_internal
(1 row)
SET SEARCH_PATH
overwrites all non-system schemas in the search path, including $USER
. System schemas are untouched.
3.3.1.5 - Creating objects that span multiple schemas
Vertica supports that reference tables across multiple schemas.
Vertica supports views that reference tables across multiple schemas. For example, a user might need to compare employee salaries to industry averages. In this case, the application queries two schemas:
Best Practice: When creating objects that span schemas, use qualified table names. This naming convention avoids confusion if the query path or table structure within the schemas changes at a later date.
3.3.2 - Tables in schemas
In Vertica you can create persistent and temporary tables, through CREATE TABLE and CREATE TEMPORARY TABLE, respectively.
In Vertica you can create persistent and temporary tables, through
CREATE TABLE
and
CREATE TEMPORARY TABLE
, respectively.
For detailed information on both types, see Creating Tables and Creating temporary tables.
Persistent tables
CREATE TABLE
creates a table in the Vertica logical schema. For example:
CREATE TABLE vendor_dimension (
vendor_key INTEGER NOT NULL PRIMARY KEY,
vendor_name VARCHAR(64),
vendor_address VARCHAR(64),
vendor_city VARCHAR(64),
vendor_state CHAR(2),
vendor_region VARCHAR(32),
deal_size INTEGER,
last_deal_update DATE
);
For detailed information, see Creating Tables.
Temporary tables
CREATE TEMPORARY TABLE
creates a table whose data persists only during the current session. Temporary table data is never visible to other sessions.
Temporary tables can be used to divide complex query processing into multiple steps. Typically, a reporting tool holds intermediate results while reports are generated—for example, the tool first gets a result set, then queries the result set, and so on.
CREATE TEMPORARY TABLE
can create tables at two scopes, global and local, through the keywords GLOBAL
and LOCAL
, respectively:
-
GLOBAL
(default): The table definition is visible to all sessions. However, table data is session-scoped.
-
LOCAL
: The table definition is visible only to the session in which it is created. When the session ends, Vertica automatically drops the table.
For detailed information, see Creating temporary tables.
3.4 - Creating a database design
A is a physical storage plan that optimizes query performance.
A design is a physical storage plan that optimizes query performance. Data in Vertica is physically stored in projections. When you initially load data into a table using INSERT, COPY (or COPY LOCAL), Vertica creates a default superprojection for the table. This superprojection ensures that all of the data is available for queries. However, these superprojections might not optimize database performance, resulting in slow query performance and low data compression.
To improve performance, create a design for your Vertica database that optimizes query performance and data compression. You can create a design in several ways:
Database Designer can help you minimize how much time you spend on manual database tuning. You can also use Database Designer to redesign the database incrementally as requirements such as workloads change over time.
Database Designer runs as a background process. This is useful if you have a large design that you want to run overnight. An active SSH session is not required, so design and deploy operations continue to run uninterrupted if the session ends.
Tip
Vertica recommends that you first globally optimize your database using the Comprehensive setting in Database Designer. If the performance of the comprehensive design is not adequate, you can design custom projections using an incremental design and manually, as described in
Creating custom designs.
3.4.1 - About Database Designer
Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.
Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.
During the design process, Database Designer analyzes the logical schema definition, sample data, and sample queries, and creates a physical schema (projections) in the form of a SQL script that you deploy automatically or manually. This script creates a minimal set of superprojections to ensure K-safety.
In most cases, the projections that Database Designer creates provide excellent query performance within physical constraints while using disk space efficiently.
General design options
When you run Database Designer, several general options are available:
-
Create a comprehensive or incremental design.
-
Optimize for query execution, load, or a balance of both.
-
Require K-safety.
-
Recommend unsegmented projections when feasible.
-
Analyze statistics before creating the design.
Database Designer bases its design on the following information that you provide:
Output
Database Designer yields the following output:
-
Design script that creates the projections for the design in a way that meets the optimization objectives and distributes data uniformly across the cluster.
-
Deployment script that creates and refreshes the projections for your design. For comprehensive designs, the deployment script contains commands that remove non-optimized projections. The deployment script includes the full design script.
-
Backup script that contains SQL statements to deploy the design that existed on the system before deployment. This file is useful in case you need to revert to the pre-deployment design.
Design restrictions
Database Designer-generated designs:
-
Exclude live aggregate or Top-K projections. You must create these manually. See CREATE PROJECTION.
-
Do not sort, segment, or partition projections on LONG VARBINARY and LONG VARCHAR columns.
-
Do not support operations on complex types.
Post-design options
While running Database Designer, you can choose to deploy your design automatically after the deployment script is created, or to deploy it manually, after you have reviewed and tested the design. Vertica recommends that you test the design on a non-production server before deploying the design to your production server.
3.4.2 - How Database Designer creates a design
Database Designer-generated designs can include the following recommendations:.
Design recommendations
Database Designer-generated designs can include the following recommendations:
-
Sort buddy projections in the same order, which can significantly improve load, recovery, and site node performance. All buddy projections have the same base name so that they can be identified as a group.
Note
If you manually create projections, Database Designer recommends a buddy with the same sort order, if one does not already exist. By default, Database Designer recommends both super and non-super segmented projections with a buddy of the same sort order and segmentation.
-
Accepts unlimited queries for a comprehensive design.
-
Identifies similar design queries and assigns them a signature.
For queries with the same signature, Database Designer weights the queries, depending on how many queries have that signature. It then considers the weighted query when creating a design.
-
Recommends and creates projections in a way that minimizes data skew by distributing data uniformly across the cluster.
-
Produces higher quality designs by considering UPDATE, DELETE, and SELECT statements.
3.4.3 - Database Designer access requirements
By default, only users with the DBADMIN role can run Database Designer.
By default, only users with the DBADMIN role can run Database Designer. Non-DBADMIN users can run Database Designer only if they are granted the necessary privileges and DBDUSER role, as described below. You can also enable users to run Database Designer on the Management Console (see Enabling Users to run Database Designer on Management Console).
-
Add a temporary folder to all cluster nodes with CREATE LOCATION:
=> CREATE LOCATION '/tmp/dbd' ALL NODES;
-
Grant the desired user CREATE privileges to create schemas on the current (DEFAULT) database, with GRANT DATABASE:
=> GRANT CREATE ON DATABASE DEFAULT TO dbd-user;
-
Grant the DBDUSER role to dbd-user
with GRANT ROLE:
=> GRANT DBDUSER TO dbd-user;
-
On all nodes in the cluster, grant dbd-user
access to the temporary folder with GRANT LOCATION:
=> GRANT ALL ON LOCATION '/tmp/dbd' TO dbd-user;
-
Grant dbd-user
privileges on one or more database schemas and their tables, with GRANT SCHEMA and GRANT TABLE, respectively:
=> GRANT ALL ON SCHEMA this-schema[,...] TO dbd-user;
=> GRANT ALL ON ALL TABLES IN SCHEMA this-schema[,...] TO dbd-user;
-
Enable the DBDUSER role on dbd-user
in one of the following ways:
-
As dbd-user
, enable the DBDUSER role with SET ROLE:
=> SET ROLE DBDUSER;
-
As DBADMIN, automatically enable the DBDUSER role for dbd-user
on each login, with ALTER USER:
=> ALTER USER dbd-user DEFAULT ROLE DBDUSER;
Important
When you grant the DBDUSER role, be sure to associate a resource pool with that user to manage resources while Database Designer runs.
Multiple users can run Database Designer concurrently without interfering with each other or exhausting cluster resources. When a user runs Database Designer, either with Management Console or programmatically, execution is generally contained by the user's resource pool, but might spill over into system resource pools for less-intensive tasks.
Enabling users to run Database Designer on Management Console
Users who are already granted the DBDUSER role and required privileges, as described above, can also be enabled to run Database Designer on Management Console:
-
Log in as a superuser to Management Console.
-
Click MC Settings.
-
Click User Management.
-
Specify an MC user:
-
To create an MC user, click Add.
-
To use an existing MC user, select the user and click Edit.
-
Next to the DB access level window, click Add.
-
In the Add Permissions window:
-
From the Choose a database drop-down list, select the database on which to create a design.
-
In the Database username field, enter the dbd-user
user name that you created earlier.
-
In the Database password field, enter the database password.
-
In the Restrict access drop-down list, select the level of MC user for this user.
-
Click OK to save your changes.
-
Log out of the MC Super user account.
The MC user is now mapped to dbd-user
. Log in as the MC user and use Database Designer to create an optimized design for your database.
DBDUSER capabilities and limitations
As a DBDUSER, the following constraints apply:
-
Designs must set K-safety to be equal to system K-safety. If a design violates K-safety by lacking enough buddy projections for tables, the design does not complete.
-
You cannot explicitly advance the ancient history mark (AHM)—for example, call MAKE_AHM_NOW—until after deploying the design.
When you create a design, you automatically have privileges to manipulate that design. Other tasks might require additional privileges:
Task |
Required privileges |
Submit design tables |
|
Submit a single design query |
- EXECUTE on the design query
|
Submit a file of design queries |
|
Submit design queries from results of a user query |
|
Create design and deployment scripts |
|
3.4.4 - Logging projection data for Database Designer
When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify.
When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify. When you deploy the design, Database Designer creates the design based on these projections. However, space or budget constraints may prevent Database Designer from creating all the proposed projections. In addition, Database Designer may not be able to implement the projections using ideal criteria.
To get information about the projections, first enable the Database Designer logging capability. When enabled, Database Designer stores information about the proposed projections in two Data Collector tables. After Database Designer deploys the design, these logs contain information about which proposed projections were actually created. After deployment, the logs contain information about:
-
Projections that the Optimizer proposed
-
Projections that Database Designer actually created when the design was deployed
-
Projections that Database Designer created, but not with the ideal criteria that the Optimizer identified.
-
The DDL used to create all the projections
-
Column optimizations
If you do not deploy the design immediately, review the log to determine if you want to make any changes. If the design has been deployed, you can still manually create some of the projections that Database Designer did not create.
To enable the Database Designer logging capability, see Enabling logging for Database Designer.
To view the logged information, see Viewing Database Designer logs.
3.4.4.1 - Enabling logging for Database Designer
By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.
By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.
To enable Database Designer logging, enter the following command:
=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 1;
To disable Database Designer logging, enter the following command:
=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 0;
See also
3.4.4.2 - Viewing Database Designer logs
You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:.
You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:
DC_DESIGN_PROJECTION_CANDIDATES
The DC_DESIGN_PROJECTION_CANDIDATES table contains information about all the projections that the Optimizer proposed. This table also includes the DDL that creates them. The is_a_winner
field indicates if that projection was part of the actual deployed design. To view the DC_DESIGN_PROJECTION_CANDIDATES table, enter:
=> SELECT * FROM DC_DESIGN_PROJECTION_CANDIDATES;
DC_DESIGN_QUERY_PROJECTION_CANDIDATES
The DC_DESIGN_QUERY_PROJECTION_CANDIDATES table lists plan features for all design queries.
Possible features are:
For all design queries, the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table includes the following plan feature information:
-
Optimizer path cost.
-
Database Designer benefits.
-
Ideal plan feature and its description, which identifies how the referenced projection should be optimized.
-
If the design was deployed, the actual plan feature and its description is included in the table. This information identifies how the referenced projection was actually optimized.
Because most projections have multiple optimizations, each projection usually has multiple rows.To view the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table, enter:
=> SELECT * FROM DC_DESIGN_QUERY_PROJECTION_CANDIDATES;
To see example data from these tables, see Database Designer logs: example data.
3.4.4.3 - Database Designer logs: example data
In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database.
In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database. The output shows two records from the DC_DESIGN_PROJECTION_CANDIDATES table.
The first record contains information about the customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$ projection. The record includes the CREATE PROJECTION statement that Database Designer used to create the projection. The is_a_winner
column is t
, indicating that Database Designer created this projection when it deployed the design.
The second record contains information about the product_dimension_dbd_2_sort_$product_version$__$product_key$ projection. For this projection, the is_a_winner
column is f
. The Optimizer recommended that Database Designer create this projection as part of the design. However, Database Designer did not create the projection when it deployed the design. The log includes the DDL for the CREATE PROJECTION statement. If you want to add the projection manually, you can use that DDL. For more information, see Creating a design manually.
=> SELECT * FROM dc_design_projection_candidates;
-[ RECORD 1 ]--------+---------------------------------------------------------------
time | 2014-04-11 06:30:17.918764-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_table_id | 45035996273720620
projection_id | 45035996273726626
iteration_number | 1
projection_name | customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."customer_dimension_dbd_1_
sort_$customer_gender$__$annual_income$"
(
customer_key ENCODING AUTO,
customer_type ENCODING AUTO,
customer_name ENCODING AUTO,
customer_gender ENCODING RLE,
title ENCODING AUTO,
household_id ENCODING AUTO,
customer_address ENCODING AUTO,
customer_city ENCODING AUTO,
customer_state ENCODING AUTO,
customer_region ENCODING AUTO,
marital_status ENCODING AUTO,
customer_age ENCODING AUTO,
number_of_children ENCODING AUTO,
annual_income ENCODING AUTO,
occupation ENCODING AUTO,
largest_bill_amount ENCODING AUTO,
store_membership_card ENCODING AUTO,
customer_since ENCODING AUTO,
deal_stage ENCODING AUTO,
deal_size ENCODING AUTO,
last_deal_update ENCODING AUTO
)
AS
SELECT customer_key,
customer_type,
customer_name,
customer_gender,
title,
household_id,
customer_address,
customer_city,
customer_state,
customer_region,
marital_status,
customer_age,
number_of_children,
annual_income,
occupation,
largest_bill_amount,
store_membership_card,
customer_since,
deal_stage,
deal_size,
last_deal_update
FROM public.customer_dimension
ORDER BY customer_gender,
annual_income
UNSEGMENTED ALL NODES;
is_a_winner | t
-[ RECORD 2 ]--------+-------------------------------------------------------------
time | 2014-04-11 06:30:17.961324-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_table_id | 45035996273720624
projection_id | 45035996273726714
iteration_number | 1
projection_name | product_dimension_dbd_2_sort_$product_version$__$product_key$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."product_dimension_dbd_2_
sort_$product_version$__$product_key$"
(
product_key ENCODING AUTO,
product_version ENCODING RLE,
product_description ENCODING AUTO,
sku_number ENCODING AUTO,
category_description ENCODING AUTO,
department_description ENCODING AUTO,
package_type_description ENCODING AUTO,
package_size ENCODING AUTO,
fat_content ENCODING AUTO,
diet_type ENCODING AUTO,
weight ENCODING AUTO,
weight_units_of_measure ENCODING AUTO,
shelf_width ENCODING AUTO,
shelf_height ENCODING AUTO,
shelf_depth ENCODING AUTO,
product_price ENCODING AUTO,
product_cost ENCODING AUTO,
lowest_competitor_price ENCODING AUTO,
highest_competitor_price ENCODING AUTO,
average_competitor_price ENCODING AUTO,
discontinued_flag ENCODING AUTO
)
AS
SELECT product_key,
product_version,
product_description,
sku_number,
category_description,
department_description,
package_type_description,
package_size,
fat_content,
diet_type,
weight,
weight_units_of_measure,
shelf_width,
shelf_height,
shelf_depth,
product_price,
product_cost,
lowest_competitor_price,
highest_competitor_price,
average_competitor_price,
discontinued_flag
FROM public.product_dimension
ORDER BY product_version,
product_key
UNSEGMENTED ALL NODES;
is_a_winner | f
.
.
.
The next example shows the contents of two records in the DC_DESIGN_QUERY_PROJECTION_CANDIDATES. Both of these rows apply to projection id 45035996273726626.
In the first record, the Optimizer recommends that Database Designer optimize the customer_gender
column for the GROUPBY PIPE algorithm.
In the second record, the Optimizer recommends that Database Designer optimize the public.customer_dimension table for late materialization. Late materialization can improve the performance of joins that might spill to disk.
=> SELECT * FROM dc_design_query_projection_candidates;
-[ RECORD 1 ]-----------------+------------------------------------------------------------
time | 2014-04-11 06:30:17.482377-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 3
iteration_number | 1
design_table_id | 45035996273720620
projection_id | 45035996273726626
ideal_plan_feature | GROUP BY PIPE
ideal_plan_feature_description | Group-by pipelined on column(s) customer_gender
dbd_benefits | 5
opt_path_cost | 211
-[ RECORD 2 ]-----------------+------------------------------------------------------------
time | 2014-04-11 06:30:17.48276-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 3
iteration_number | 1
design_table_id | 45035996273720620
projection_id | 45035996273726626
ideal_plan_feature | LATE MATERIALIZATION
ideal_plan_feature_description | Late materialization on table public.customer_dimension
dbd_benefits | 4
opt_path_cost | 669
.
.
.
You can view the actual plan features that Database Designer implemented for the projections it created. To do so, query the V_INTERNAL.DC_DESIGN_QUERY_PROJECTIONS table:
=> select * from v_internal.dc_design_query_projections;
-[ RECORD 1 ]-------------------+-------------------------------------------------------------
time | 2014-04-11 06:31:41.19199-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 1
projection_id | 2
design_table_id | 45035996273720624
actual_plan_feature | RLE PREDICATE
actual_plan_feature_description | RLE on predicate column(s) department_description
dbd_benefits | 2
opt_path_cost | 141
-[ RECORD 2 ]-------------------+-------------------------------------------------------------
time | 2014-04-11 06:31:41.192292-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 1
projection_id | 2
design_table_id | 45035996273720624
actual_plan_feature | GROUP BY PIPE
actual_plan_feature_description | Group-by pipelined on column(s) fat_content
dbd_benefits | 5
opt_path_cost | 155
3.4.5 - General design settings
Before you run Database Designer, you must provide specific information on the design to create.
Before you run Database Designer, you must provide specific information on the design to create.
Design name
All designs that you create with Database Designer must have unique names that conform to the conventions described in Identifiers, and are no more than 32 characters long (16 characters if you use Database Designer in Administration Tools or Management Console).
The design name is incorporated into the names of files that Database Designer generates, such as its deployment script. This can help you differentiate files that are associated with different designs.
Design type
Database Designer can create two distinct design types: comprehensive or incremental.
Comprehensive design
A comprehensive design creates an initial or replacement design for all the tables in the specified schemas. Create a comprehensive design when you are creating a new database.
To help Database Designer create an efficient design, load representative data into the tables before you begin the design process. When you load data into a table, Vertica creates an unoptimized superprojection so that Database Designer has projections to optimize. If a table has no data, Database Designer cannot optimize it.
Optionally, supply Database Designer with representative queries that you plan to use so Database Designer can optimize the design for them. If you do not supply any queries, Database Designer creates a generic optimization of the superprojections that minimizes storage, with no query-specific projections.
During a comprehensive design, Database Designer creates deployment scripts that:
Incremental design
After you create and deploy a comprehensive database design, your database is likely to change over time in various ways. Consider using Database Designer periodically to create incremental designs that address these changes. Changes that warrant an incremental design can include:
-
Significant data additions or updates
-
New or modified queries that you run regularly
-
Performance issues with one or more queries
-
Schema changes
Optimization objective
Database Designer can optimize the design for one of three objectives:
- Load: Designs that are optimized for loads minimize database size, potentially at the expense of query performance.
- Query: Designs that are optimized for query performance. These designs typically favor fast query execution over load optimization, and thus result in a larger storage footprint.
- Balanced: Designs that are balanced between database size and query performance.
A fully optimized query has an optimization ratio of 0.99. Optimization ratio is the ratio of a query's benefits achieved in the design produced by the Database Designer to that achieved in the ideal plan. The optimization ratio is set in the OptRatio parameter in designer.log
.
Design tables
Database Designer needs one or more tables with a moderate amount of sample data—approximately 10 GB—to create optimal designs. Design tables with large amounts of data adversely affect Database Designer performance. Design tables with too little data prevent Database Designer from creating an optimized design. If a design table has no data, Database Designer ignores it.
Note
If you drop a table after adding it to the design, Database Designer cannot build or deploy the design.
Design queries
A database design that is optimized for query performance requires a set of representative queries, or design queries. Design queries are required for incremental designs, and optional for comprehensive designs. You list design queries in a SQL file that you supply as input to Database Designer. Database Designer checks the validity of the queries when you add them to your design, and again when it builds the design. If a query is invalid, Database Designer ignores it.
If you use Management Console to create a database design, you can submit queries either from an input file or from the system table QUERY_REQUESTS. For details, see Creating a design manually.
The maximum number of design queries depends on the design type: ≤200 queries for a comprehensive design, ≤100 queries for an incremental design. Optionally, you can assign weights to the design queries that signify their relative importance. Database Designer uses those weights to prioritize the queries in its design.
Segmented and unsegmented projections
When creating a comprehensive design, Database Designer creates projections based on data statistics and queries. It also reviews the submitted design tables to decide whether projections should be segmented (distributed across the cluster nodes) or unsegmented (replicated on all cluster nodes).
By default, Database Designer recommends only segmented projections. You can enable Database Designer to recommend unsegmented projections . In this case, Database Designer recommends segmented superprojections for large tables when deploying to multi-node clusters, and unsegmented superprojections for smaller tables.
Database Designer uses the following algorithm to determine whether to recommend unsegmented projections. Assuming that largest-row-count
equals the number of rows in the design table with the largest number of rows, Database Designer recommends unsegmented projections if any of the following conditions is true:
-
largest-row-count
< 1,000,000 AND
number-table-rows
≤
10%-largest-row-count
-
largest-row-count
≥ 10,000,000 AND
number-table-rows
≤
1%-largest-row-count
-
1,000,000 ≤
largest-row-count
< 10,000,000 AND
number-table-rows
≤ 100,000
Database Designer does not segment projections on:
For more information, see High availability with projections.
Statistics analysis
By default, Database Designer analyzes statistics for design tables when they are added to the design. Accurate statistics help Database Designer optimize compression and query performance.
Analyzing statistics takes time and resources. If you are certain that design table statistics are up to date, you can specify to skip this step and avoid the overhead otherwise incurred.
For more information, see Collecting Statistics.
3.4.6 - Building a design
After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.
After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.
Note
You cannot stop a running database if Database Designer is building a database design.
When you build a database design, Vertica generates two scripts:
-
Deployment script: design-name
_deploy.sql—Contains the SQL statements that create projections for the design you are deploying, deploy the design, and drop unused projections. When the deployment script runs, it creates the optimized design. For details about how to run this script and deploy the design, see Deploying a Design.
-
Design script: design-name
_design.sql—Contains the CREATE PROJECTION
statements that Database Designeruses to create the design. Review this script to make sure you are happy with the design.
The design script is a subset of the deployment script. It serves as a backup of the DDL for the projections that the deployment script creates.
When you create a design using Management Console:
-
If you submit a large number of queries to your design and build it right immediately, a timing issue could cause the queries not to load before deployment starts. If this occurs, you might see one of the following errors:
To accommodate this timing issue, you may need to reset the design, check the Queries tab to make sure the queries have been loaded, and then rebuild the design. Detailed instructions are in:
-
The scripts are deleted when deployment completes. To save a copy of the deployment script after the design is built but before the deployment completes, go to the Output window and copy and paste the SQL statements to a file.
3.4.7 - Resetting a design
You must reset a design when:.
You must reset a design when:
-
You build a design and the output scripts described in Building a Design are not created.
-
You build a design but Database Designer cannot complete the design because the queries it expects are not loaded.
Resetting a design discards all the run-specific information of the previous Database Designer build, but retains its configuration (design type, optimization objectives, K-safety, etc.) and tables and queries.
After you reset a design, review the design to see what changes you need to make. For example, you can fix errors, change parameters, or check for and add additional tables or queries. Then you can rebuild the design.
You can only reset a design in Management Console or by using the DESIGNER_RESET_DESIGN function.
3.4.8 - Deploying a design
After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.
After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.
Both the design and deployment processes run in the background. This is useful if you have a large design that you want to run overnight. Because an active SSH session is not required, the design/deploy operations continue to run uninterrupted, even if the session is terminated.
Note
You cannot stop a running database if Database Designer is building or deploying a database design.
Database Designer runs as a background process. Multiple users can run Database Designer concurrently without interfering with each other or using up all the cluster resources. However, if multiple users are deploying a design on the same tables at the same time, Database Designer may not be able to complete the deployment. To avoid problems, consider the following:
-
Schedule potentially conflicting Database Designer processes to run sequentially overnight so that there are no concurrency problems.
-
Avoid scheduling Database Designer runs on the same set of tables at the same time.
There are two ways to deploy your design:
3.4.8.1 - Deploying designs using Database Designer
OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.
OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.
If you choose to allow Database Designer to automatically deploy your script during a comprehensive design and are running Administrative Tools, Database Designer creates a backup script of your database's current design. This script helps you re-create the design of projections that may have been dropped by the new design. The backup script is located in the output directory you specified during the design process.
If you choose not to have Database Designer automatically run the deployment script (for example, if you want to maintain projections from a pre-existing deployment), you can manually run the deployment script later. See Deploying designs manually.
To deploy a design while running Database Designer, do one of the following:
-
In Management Console, select the design and click Deploy Design.
-
In the Administration Tools, select Deploy design in the Design Options window.
If you are running Database Designer programmatically, use DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY and set the deploy
parameter to 'true'.
Once you have deployed your design, query the DEPLOY_STATUS system table to see the steps that the deployment took:
vmartdb=> SELECT * FROM V_MONITOR.DEPLOY_STATUS;
3.4.8.2 - Deploying designs manually
If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:.
If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:
-
Make sure that the target database contains the same tables and projections as the database where you ran Database Designer. The database should also contain sample data.
-
To deploy the projections to a test or production environment, execute the deployment script in vsql with the meta-command
\i
as follows, where design-name
is the name of the database design:
=> \i design-name_deploy.sql
-
For a K-safe database, call Vertica meta-function
GET_PROJECTIONS
on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.
-
If you create projections for tables that already contains data, call
REFRESH
or
START_REFRESH
to update new projections. Otherwise, these projections are not available for query processing.
-
Call
MAKE_AHM_NOW
to set the Ancient History Mark (AHM) to the most recent epoch.
-
Call
DROP PROJECTION
on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.
-
Call
ANALYZE_STATISTICS
on all database projections:
=> SELECT ANALYZE_STATISTICS ('');
This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.
3.4.9 - How to create a design
There are three ways to create a design using Database Designer:.
There are three ways to create a design using Database Designer:
The following table shows what Database Designer capabilities are available in each tool:
Database Designer Capability |
Management Console |
Running Database Designer Programmatically |
Administrative Tools |
Create design |
Yes |
Yes |
Yes |
Design name length (# of characters) |
16 |
32 |
16 |
Build design (create design and deployment scripts) |
Yes |
Yes |
Yes |
Create backup script |
|
|
Yes |
Set design type (comprehensive or incremental) |
Yes |
Yes |
Yes |
Set optimization objective |
Yes |
Yes |
Yes |
Add design tables |
Yes |
Yes |
Yes |
Add design queries file |
Yes |
Yes |
Yes |
Add single design query |
|
Yes |
|
Use query repository |
Yes |
Yes |
|
Set K-safety |
Yes |
Yes |
Yes |
Analyze statistics |
Yes |
Yes |
Yes |
Require all unsegmented projections |
Yes |
Yes |
|
View event history |
Yes |
Yes |
|
Set correlation analysis mode (Default = 0) |
|
Yes |
|
3.4.9.1 - Using administration tools to create a design
To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user.
To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user. Follow these steps:
-
Log in as the dbadmin user and start Administration Tools.
-
From the main menu, start the database for which you want to create a design. The database must be running before you can create a design for it.
-
On the main menu, select Configuration Menu and click OK.
-
On the Configuration Menu, select Run Database Designer and click OK.
-
On the Select a database to design window, enter the name of the database for which you are creating a design and click OK.
-
On the Enter the directory for Database Designer output window, enter the full path to the directory to contain the design script, deployment script, backup script, and log files, and click OK.
For information about the scripts, see Building a design.
-
On the Database Designer window, enter a name for the design and click OK.
-
On the Design Type window, choose which type of design to create and click OK.
For details, see Design Types.
-
The Select schema(s) to add to query search path window lists all the schemas in the database that you selected. Select the schemas that contain representative data that you want Database Designer to consider when creating the design and click OK.
For details about choosing schema and tables to submit to Database Designer, see Design Tables with Sample Data.
-
On the Optimization Objectives window, select the objective you want for the database optimization:
-
The final window summarizes the choices you have made and offers you two choices:
-
Proceed with building the design, and deploying it if you specified to deploy it immediately. If you did not specify to deploy, you can review the design and deployment scripts and deploy them manually, as described in Deploying designs manually.
-
Cancel the design and go back to change some of the parameters as needed.
-
Creating a design can take a long time.To cancel a running design from the Administration Tools window, enter Ctrl+C.
To create a design for the VMart example database, see Using Database Designer to create a comprehensive design in Getting Started.
3.4.10 - Running Database Designer programmatically
Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality.
Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality. Run Database Designer programmatically to perform the following tasks:
-
Optimize performance on tables that you own.
-
Create or update a design without requiring superuser or DBADMIN intervention.
-
Add individual queries and tables, or add data to your design, and then rerun Database Designer to update the design based on this new information.
-
Customize the design.
-
Use recently executed queries to set up your database to run Database Designer automatically on a regular basis.
-
Assign each design query a query weight that indicates the importance of that query in creating the design. Assign a higher weight to queries that you run frequently so that Database Designer prioritizes those queries in creating the design.
For more details about Database Designer functions, see Database Designer function categories.
3.4.10.1 - Database Designer function categories
Database Designer functions perform the following operations, generally performed in the following order:
-
Create a design.
-
Set design properties.
-
Populate a design.
-
Create design and deployment scripts.
-
Get design data.
-
Clean up.
Important
You can also use meta-function
DESIGNER_SINGLE_RUN, which encapsulates all of these steps with a single call. The meta-function iterates over all queries within a specified timespan, and returns with a design ready for deployment.
For detailed information, see Workflow for running Database Designer programmatically. For information on required privileges, see Privileges for running Database Designer functions
Caution
Before running Database Designer functions on an existing schema, back up the current design by calling
EXPORT_CATALOG.
Create a design
DESIGNER_CREATE_DESIGN directs Database Designer to create a design.
Set design properties
The following functions let you specify design properties:
Populate a design
The following functions let you add tables and queries to your Database Designer design:
Create design and deployment scripts
The following functions populate the Database Designer workspace and create design and deployment scripts. You can also analyze statistics, deploy the design automatically, and drop the workspace after the deployment:
Reset a design
DESIGNER_RESET_DESIGN discards all the run-specific information of the previous Database Designer build or deployment of the specified design but retains its configuration.
Get design data
The following functions display information about projections and scripts that the Database Designer created:
Cleanup
The following functions cancel any running Database Designer operation or drop a Database Designer design and all its contents:
3.4.10.2 - Workflow for running Database Designer programmatically
The following example shows the steps you take to create a design by running Database Designer programmatically.
The following example shows the steps you take to create a design by running Database Designer programmatically.
Important
Before running Database Designer functions on an existing schema, back up the current design by calling function
EXPORT_CATALOG.
Before you run this example, you should have the DBDUSER role, and you should have enabled that role using the SET ROLE DBDUSER command:
-
Create a table in the public schema:
=> CREATE TABLE T(
x INT,
y INT,
z INT,
u INT,
v INT,
w INT PRIMARY KEY
);
-
Add data to the table:
\! perl -e 'for ($i=0; $i<100000; ++$i) {printf("%d, %d, %d, %d, %d, %d\n", $i/10000, $i/100, $i/10, $i/2, $i, $i);}'
| vsql -c "COPY T FROM STDIN DELIMITER ',' DIRECT;"
-
Create a second table in the public schema:
=> CREATE TABLE T2(
x INT,
y INT,
z INT,
u INT,
v INT,
w INT PRIMARY KEY
);
-
Copy the data from table T1
to table T2
and commit the changes:
=> INSERT /*+DIRECT*/ INTO T2 SELECT * FROM T;
=> COMMIT;
-
Create a new design:
=> SELECT DESIGNER_CREATE_DESIGN('my_design');
This command adds information to the DESIGNS system table in the V_MONITOR schema.
-
Add tables from the public schema to the design :
=> SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t');
=> SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t2');
These commands add information to the DESIGN_TABLES system table.
-
Create a file named queries.txt
in /tmp/examples
, or another directory where you have READ and WRITE privileges. Add the following two queries in that file and save it. Database Designer uses these queries to create the design:
SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
SELECT DISTINCT w FROM T;
-
Add the queries file to the design and display the results—the numbers of accepted queries, non-design queries, and unoptimizable queries:
=> SELECT DESIGNER_ADD_DESIGN_QUERIES
('my_design',
'/tmp/examples/queries.txt',
'true'
);
The results show that both queries were accepted:
Number of accepted queries =2
Number of queries referencing non-design tables =0
Number of unsupported queries =0
Number of illegal queries =0
The DESIGNER_ADD_DESIGN_QUERIES function populates the DESIGN_QUERIES system table.
-
Set the design type to comprehensive. (This is the default.) A comprehensive design creates an initial or replacement design for all the design tables:
=> SELECT DESIGNER_SET_DESIGN_TYPE('my_design', 'comprehensive');
-
Set the optimization objective to query. This setting creates a design that focuses on faster query performance, which might recommend additional projections. These projections could result in a larger database storage footprint:
=> SELECT DESIGNER_SET_OPTIMIZATION_OBJECTIVE('my_design', 'query');
-
Create the design and save the design and deployment scripts in /tmp/examples
, or another directory where you have READ and WRITE privileges. The following command:
-
Analyzes statistics
-
Doesn't deploy the design.
-
Doesn't drop the design after deployment.
-
Stops if it encounters an error.
=> SELECT DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
('my_design',
'/tmp/examples/my_design_projections.sql',
'/tmp/examples/my_design_deploy.sql',
'True',
'False',
'False',
'False'
);
This command adds information to the following system tables:
-
Examine the status of the Database Designer run to see what projections Database Designer recommends. In the deployment_projection_name
column:
-
rep
indicates a replicated projection
-
super
indicates a superprojection
The deployment_status
column is pending
because the design has not yet been deployed.
For this example, Database Designer recommends four projections:
=> \x
Expanded display is on.
=> SELECT * FROM OUTPUT_DEPLOYMENT_STATUS;
-[ RECORD 1 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 1
deployment_projection_name | T_DBD_1_rep_my_design
deployment_status | pending
error_message | N/A
-[ RECORD 2 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 2
deployment_projection_name | T2_DBD_2_rep_my_design
deployment_status | pending
error_message | N/A
-[ RECORD 3 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 3
deployment_projection_name | T_super
deployment_status | pending
error_message | N/A
-[ RECORD 4 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 4
deployment_projection_name | T2_super
deployment_status | pending
error_message | N/A
-
View the script /tmp/examples/my_design_deploy.sql
to see how these projections are created when you run the deployment script. In this example, the script also assigns the encoding schemes RLE and COMMONDELTA_COMP to columns where appropriate.
-
Deploy the design from the directory where you saved it:
=> \i /tmp/examples/my_design_deploy.sql
-
Now that the design is deployed, delete the design:
=> SELECT DESIGNER_DROP_DESIGN('my_design');
3.4.10.3 - Privileges for running Database Designer functions
Non-DBADMIN users with the DBDUSER role can run Database Designer functions.
Non-DBADMIN users with the DBDUSER role can run Database Designer functions. Two steps are required to enable users to run these functions:
-
A DBADMIN or superuser grants the user the DBDUSER role:
=> GRANT DBDUSER TO username;
This role persists until the DBADMIN revokes it.
-
Before the DBDUSER can run Database Designer functions, one of the following must occur:
-
The user enables the DBDUSER role:
=> SET ROLE DBDUSER;
-
The superuser sets the user's default role to DBDUSER:
=> ALTER USER username DEFAULT ROLE DBDUSER;
General DBDUSER limitations
As a DBDUSER, the following restrictions apply:
-
You can set a design's K-safety to a value less than or equal to system K-safety. You cannot change system K-safety.
-
You cannot explicitly change the ancient history mark (AHM), even during design deployment.
Design dependencies and privileges
Individual design tasks are likely to have dependencies that require specific privileges:
Task |
Required privileges |
Add tables to a design |
|
Add a single design query to the design |
- Privilege to execute the design query
|
Add a query file to the design |
|
Add queries from the result of a user query to the design |
|
Create design and deployment scripts |
|
3.4.10.4 - Resource pool for Database Designer users
When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs.
When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs. This allows multiple users to run Database Designer concurrently without interfering with each other or using up all cluster resources.
Note
When a user runs Database Designer, execution is mostly contained in the user's resource pool. However, Vertica might also use other system resource pools to perform less-intensive tasks.
3.4.11 - Creating custom designs
Vertica strongly recommends that you use the physical schema design produced by , which provides , excellent query performance, and efficient use of storage space.
Vertica strongly recommends that you use the physical schema design produced by Database Designer, which provides K-safety, excellent query performance, and efficient use of storage space. If any queries run less as efficiently than you expect, consider using the Database Designer incremental design process to optimize the database design for the query.
If the projections created by Database Designer still do not meet your needs, you can write custom projections, from scratch or based on projection designs created by Database Designer.
If you are unfamiliar with writing custom projections, start by modifying an existing design generated by Database Designer.
3.4.11.1 - Custom design process
To create a custom design or customize an existing one:.
To create a custom design or customize an existing one:
-
Plan the new design or modifications to an existing one. See Planning your design.
-
Create or modify projections. See Design fundamentals and CREATE PROJECTION for more detail.
-
Deploy projections to a test environment. See Writing and deploying custom projections.
-
Test and modify projections as needed.
-
After you finalize the design, deploy projections to the production environment.
3.4.11.2 - Planning your design
The syntax for creating a design is easy for anyone who is familiar with SQL.
The syntax for creating a design is easy for anyone who is familiar with SQL. As with any successful project, however, a successful design requires some initial planning. Before you create your first design:
-
Become familiar with standard design requirements and plan your design to include them. See Design requirements.
-
Determine how many projections you need to include in the design. See Determining the number of projections to use.
-
Determine the type of compression and encoding to use for columns. See Architecture.
-
Determine whether or not you want the database to be K-safe. Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values are 0, 1, and 2. See Designing for K-safety.
3.4.11.2.1 - Design requirements
A physical schema design is a script that contains CREATE PROJECTION statements.
A physical schema design is a script that contains CREATE PROJECTION statements. These statements determine which columns are included in projections and how they are optimized.
If you use Database Designer as a starting point, it automatically creates designs that meet all fundamental design requirements. If you intend to create or modify designs manually, be aware that all designs must meet the following requirements:
-
Every design must create at least one superprojection for every table in the database that is used by the client application. These projections provide complete coverage that enables users to perform ad-hoc queries as needed. They can contain joins and they are usually configured to maximize performance through sort order, compression, and encoding.
-
Query-specific projections are optional. If you are satisfied with the performance provided through superprojections, you do not need to create additional projections. However, you can maximize performance by tuning for specific query work loads.
-
Vertica recommends that all production databases have a minimum K-safety of one (K=1) to support high availability and recovery. (K-safety can be set to 0, 1, or 2.) See High availability with projections and Designing for K-safety.
-
Vertica recommends that if you have more than 20 nodes, but small tables, do not create replicated projections. If you create replicated projections, the catalog becomes very large and performance may degrade. Instead, consider segmenting those projections.
3.4.11.2.2 - Determining the number of projections to use
In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding.
In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding. This is especially true if the sort orders for the projections have been used to maximize performance for one or more query predicates (WHERE clauses).
However, you might want to add additional query-specific projections to increase the performance of queries that run slowly, are used frequently, or are run as part of business-critical reporting. The number of additional projections (and their buddies) that you create should be determined by:
-
Your organization's needs
-
The amount of disk space you have available on each node in the cluster
-
The amount of time available for loading data into the database
As the number of projections that are tuned for specific queries increases, the performance of these queries improves. However, the amount of disk space used and the amount of time required to load data increases as well. Therefore, you should create and test designs to determine the optimum number of projections for your database configuration. On average, organizations that choose to implement query-specific projections achieve optimal performance through the addition of a few query-specific projections.
3.4.11.2.3 - Designing for K-safety
Vertica recommends that all production databases have a minimum K-safety of one (K=1).
Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values for production databases are 1 and 2. Non-production databases do not have to be K-safe and can be set to 0.
A K-safe database must have at least three nodes, as shown in the following table:
K-safety level |
Number of required nodes |
1 |
3+ |
2 |
5+ |
Note
Vertica only supports K-safety levels 1 and 2.
You can set K-safety to 1 or 2 only when the physical schema design meets certain redundancy requirements. See Requirements for a K-safe physical schema design.
Using Database Designer
To create designs that are K-safe, Vertica recommends that you use the Database Designer. When creating projections with Database Designer, projection definitions that meet K-safe design requirements are recommended and marked with a K-safety level. Database Designer creates a script that uses the
MARK_DESIGN_KSAFE
function to set the K-safety of the physical schema to 1. For example:
=> \i VMart_Schema_design_opt_1.sql
CREATE PROJECTION
CREATE PROJECTION
mark_design_ksafe
----------------------
Marked design 1-safe
(1 row)
By default, Vertica creates K-safe superprojections when database K-safety is greater than 0.
Monitoring K-safety
Monitoring tables can be accessed programmatically to enable external actions, such as alerts. You monitor the K-safety level by querying the
SYSTEM
table for settings in columns DESIGNED_FAULT_TOLERANCE
and CURRENT_FAULT_TOLERANCE
.
Loss of K-safety
When K nodes in your cluster fail, your database continues to run, although performance is affected. Further node failures could potentially cause the database to shut down if the failed node's data is not available from another functioning node in the cluster.
See also
K-safety in an Enterprise Mode database
3.4.11.2.3.1 - Requirements for a K-safe physical schema design
Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes.
Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes. (If your cluster has one or two nodes, it generates designs with a K-safety of 0. You can modify a design created for a three-node (or greater) cluster, and the K-safe requirements are already set.
If you create custom projections, your physical schema design must meet the following requirements to be able to successfully recover the database in the event of a failure:
You can use the
MARK_DESIGN_KSAFE
function to find out whether your schema design meets requirements for K-safety.
3.4.11.2.3.2 - Requirements for a physical schema design with no K-safety
If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).
If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).
If you want to start from scratch, do the following to establish minimal projection requirements for a functioning database with no K-safety (K=0):
-
Define at least one superprojection for each table in the logical schema.
-
Replicate (define an exact copy of) each dimension table superprojection on each node.
3.4.11.2.3.3 - Designing segmented projections for K-safety
Projections must comply with database K-safety requirements.
Projections must comply with database K-safety requirements. In general, you must create buddy projections for each segmented projection, where the number of buddy projections is K+1. Thus, if system K-safety is set to 1, each projection segment must be duplicated by one buddy; if K-safety is set to 2, each segment must be duplicated by two buddies.
Automatic creation of buddy projections
You can use
CREATE PROJECTION
so it automatically creates the number of buddy projections required to satisfy K-safety, by including SEGMENTED BY ... ALL NODES
. If CREATE PROJECTION
specifies K-safety (KSAFE=
n
)
, Vertica uses that setting; if the statement omits KSAFE
, Vertica uses system K-safety.
In the following example, CREATE PROJECTION
creates segmented projection ttt_p1
for table ttt
. Because system K-safety is set to 1, Vertica requires a buddy projection for each segmented projection. The CREATE PROJECTION
statement omits KSAFE
, so Vertica uses system K-safety and creates two buddy projections: ttt_p1_b0
and ttt_p1_b1
:
=> SELECT mark_design_ksafe(1);
mark_design_ksafe
----------------------
Marked design 1-safe
(1 row)
=> CREATE TABLE ttt (a int, b int);
WARNING 6978: Table "ttt" will include privileges from schema "public"
CREATE TABLE
=> CREATE PROJECTION ttt_p1 as SELECT * FROM ttt SEGMENTED BY HASH(a) ALL NODES;
CREATE PROJECTION
=> SELECT projection_name from projections WHERE anchor_table_name='ttt';
projection_name
-----------------
ttt_p1_b0
ttt_p1_b1
(2 rows)
Vertica automatically names buddy projections by appending the suffix _b
n
to the projection base name—for example ttt_p1_b0
.
Manual creation of buddy projections
If you create a projection on a single node, and system K-safety is greater than 0, you must manually create the number of buddies required for K-safety. For example, you can create projection xxx_p1
for table xxx
on a single node, as follows:
=> CREATE TABLE xxx (a int, b int);
WARNING 6978: Table "xxx" will include privileges from schema "public"
CREATE TABLE
=> CREATE PROJECTION xxx_p1 AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0001;
CREATE PROJECTION
Because K-safety is set to 1, a single instance of this projection is not K-safe. Attempts to insert data into its anchor table xxx
return with an error like this:
=> INSERT INTO xxx VALUES (1, 2);
ERROR 3586: Insufficient projections to answer query
DETAIL: No projections that satisfy K-safety found for table xxx
HINT: Define buddy projections for table xxx
In order to comply with K-safety, you must create a buddy projection for projection xxx_p1
. For example:
=> CREATE PROJECTION xxx_p1_buddy AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0002;
CREATE PROJECTION
Table xxx
now complies with K-safety and accepts DML statements such as INSERT
:
VMart=> INSERT INTO xxx VALUES (1, 2);
OUTPUT
--------
1
(1 row)
See also
For general information about segmented projections and buddies, see Segmented projections. For information about designing for K-safety, see Designing for K-safety and Designing for segmentation.
3.4.11.2.3.4 - Designing unsegmented projections for K-Safety
In many cases, dimension tables are relatively small, so you do not need to segment them.
In many cases, dimension tables are relatively small, so you do not need to segment them. Accordingly, you should design a K-safe database so projections for its dimension tables are replicated without segmentation on all cluster nodes. You create these projections with a
CREATE PROJECTION
statement that includes the keywords UNSEGMENTED ALL NODES
. These keywords specify to create identical instances of the projection on all cluster nodes.
The following example shows how to create an unsegmented projection for the table store.store_dimension
:
=> CREATE PROJECTION store.store_dimension_proj (storekey, name, city, state)
AS SELECT store_key, store_name, store_city, store_state
FROM store.store_dimension
UNSEGMENTED ALL NODES;
CREATE PROJECTION
Vertica uses the same name to identify all instances of the unsegmented projection—in this example, store.store_dimension_proj
. The keyword ALL NODES
specifies to replicate the projection on all nodes:
=> \dj store.store_dimension_proj
List of projections
Schema | Name | Owner | Node | Comment
--------+----------------------+---------+------------------+---------
store | store_dimension_proj | dbadmin | v_vmart_node0001 |
store | store_dimension_proj | dbadmin | v_vmart_node0002 |
store | store_dimension_proj | dbadmin | v_vmart_node0003 |
(3 rows)
For more information about projection name conventions, see Projection naming.
3.4.11.2.4 - Designing for segmentation
You segment projections using hash segmentation.
You segment projections using hash segmentation. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns that meet the criteria could be an excellent choice for hash segmentation.
Note
For detailed information about using hash segmentation in a projection, see
CREATE PROJECTION in the SQL Reference Manual.
When segmenting projections, determine which columns to use to segment the projection. Choose one or more columns that have a large number of unique data values and acceptable skew in their data distribution. Primary key columns are an excellent choice for hash segmentation. The columns must be unique across all the tables being used in a query.
3.4.11.3 - Design fundamentals
Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point.
Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point. This ensures that you have projections that meet basic requirements.
3.4.11.3.1 - Writing and deploying custom projections
Before you write custom projections, review the topics in Planning Your Design carefully.
Before you write custom projections, review the topics in Planning your design carefully. Failure to follow these considerations can result in non-functional projections.
To manually modify or create a projection:
-
Write a script with
CREATE PROJECTION
statements to create the desired projections.
-
Run the script in vsql with the meta-command
\i
.
Note
You must have a database loaded with a logical schema.
-
For a K-safe database, call Vertica meta-function
GET_PROJECTIONS
on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.
-
If you create projections for tables that already contains data, call
REFRESH
or
START_REFRESH
to update new projections. Otherwise, these projections are not available for query processing.
-
Call
MAKE_AHM_NOW
to set the Ancient History Mark (AHM) to the most recent epoch.
-
Call
DROP PROJECTION
on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.
-
Call
ANALYZE_STATISTICS
on all database projections:
=> SELECT ANALYZE_STATISTICS ('');
This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.
3.4.11.3.2 - Designing superprojections
Superprojections have the following requirements:.
Superprojections have the following requirements:
-
They must contain every column within the table.
-
For a K-safe design, superprojections must either be replicated on all nodes within the database cluster (for dimension tables) or paired with buddies and segmented across all nodes (for very large tables and medium large tables). See Projections and High availability with projections for an overview of projections and how they are stored. See Designing for K-safety for design specifics.
To provide maximum usability, superprojections need to minimize storage requirements while maximizing query performance. To achieve this, the sort order for columns in superprojections is based on storage requirements and commonly used queries.
3.4.11.3.3 - Sort order benefits
Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.
Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.
Minimize storage requirements
Minimizing storage saves on physical resources and increases performance by reducing disk I/O. You can minimize projection storage by prioritizing low-cardinality columns in its sort order. This reduces the number of rows Vertica stores and accesses to retrieve query results.
After identifying projection sort columns, analyze their data and choose the most effective encoding method. The Vertica optimizer gives preference to columns with run-length encoding (RLE), so be sure to use it whenever appropriate. Run-length encoding replaces sequences (runs) of identical values with a single pair that contains the value and number of occurrences. Therefore, it is especially appropriate to use it for low-cardinality columns whose run length is large.
You can facilitate query performance through column sort order as follows:
-
Where possible, sort order should prioritize columns with the lowest cardinality.
-
Do not sort projections on columns of type LONG VARBINARY and LONG VARCHAR.
See also
Choosing sort order: best practices
3.4.11.3.4 - Choosing sort order: best practices
When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.
When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.
Combine RLE and sort order
When dealing with predicates on low-cardinality columns, use a combination of RLE and sorting to minimize storage requirements and maximize query performance.
Suppose you have a students
table contain the following values and encoding types:
Column |
# of Distinct Values |
Encoded With |
gender |
2 (M or F) |
RLE |
pass_fail |
2 (P or F) |
RLE |
class |
4 (freshman, sophomore, junior, or senior) |
RLE |
name |
10000 (too many to list) |
Auto |
You might have queries similar to this one:
SELECT name FROM studentsWHERE gender = 'M' AND pass_fail = 'P' AND class = 'senior';
The fastest way to access the data is to work through the low-cardinality columns with the smallest number of distinct values before the high-cardinality columns. The following sort order minimizes storage and maximizes query performance for queries that have equality restrictions on gender
, class
, pass_fail
, and name
. Specify the ORDER BY clause of the projection as follows:
ORDER BY students.gender, students.pass_fail, students.class, students.name
In this example, the gender
column is represented by two RLE entries, the pass_fail
column is represented by four entries, and the class
column is represented by 16 entries, regardless of the cardinality of the students
table. Vertica efficiently finds the set of rows that satisfy all the predicates, resulting in a huge reduction of search effort for RLE encoded columns that occur early in the sort order. Consequently, if you use low-cardinality columns in local predicates, as in the previous example, put those columns early in the projection sort order, in increasing order of distinct cardinality (that is, in increasing order of the number of distinct values in each column).
If you sort this table with student.class
first, you improve the performance of queries that restrict only on the student.class
column, and you improve the compression of the student.class
column (which contains the largest number of distinct values), but the other columns do not compress as well. Determining which projection is better depends on the specific queries in your workload, and their relative importance.
Storage savings with compression decrease as the cardinality of the column increases; however, storage savings with compression increase as the number of bytes required to store values in that column increases.
Maximize the advantages of RLE
To maximize the advantages of RLE encoding, use it only when the average run length of a column is greater than 10 when sorted. For example, suppose you have a table with the following columns, sorted in order of cardinality from low to high:
address.country, address.region, address.state, address.city, address.zipcode
The zipcode
column might not have 10 sorted entries in a row with the same zip code, so there is probably no advantage to run-length encoding that column, and it could make compression worse. But there are likely to be more than 10 countries in a sorted run length, so applying RLE to the country column can improve performance.
Put lower cardinality column first for functional dependencies
In general, put columns that you use for local predicates (as in the previous example) earlier in the join order to make predicate evaluation more efficient. In addition, if a lower cardinality column is uniquely determined by a higher cardinality column (like city_id uniquely determining a state_id), it is always better to put the lower cardinality, functionally determined column earlier in the sort order than the higher cardinality column.
For example, in the following sort order, the Area_Code column is sorted before the Number column in the customer_info table:
ORDER BY = customer_info.Area_Code, customer_info.Number, customer_info.Address
In the query, put the Area_Code
column first, so that only the values in the Number
column that start with 978 are scanned.
=> SELECT AddressFROM customer_info WHERE Area_Code='978' AND Number='9780123457';
Sort for merge joins
When processing a join, the Vertica optimizer chooses from two algorithms:
-
Merge join—If both inputs are pre-sorted on the join column, the optimizer chooses a merge join, which is faster and uses less memory.
-
Hash join—Using the hash join algorithm, Vertica uses the smaller (inner) joined table to build an in-memory hash table on the join column. A hash join has no sort requirement, but it consumes more memory because Vertica builds a hash table with the values in the inner table. The optimizer chooses a hash join when projections are not sorted on the join columns.
If both inputs are pre-sorted, merge joins do not have to do any pre-processing, making the join perform faster. Vertica uses the term sort-merge join to refer to the case when at least one of the inputs must be sorted prior to the merge join. Vertica sorts the inner input side but only if the outer input side is already sorted on the join columns.
To give the Vertica query optimizer the option to use an efficient merge join for a particular join, create projections on both sides of the join that put the join column first in their respective projections. This is primarily important to do if both tables are so large that neither table fits into memory. If all tables that a table will be joined to can be expected to fit into memory simultaneously, the benefits of merge join over hash join are sufficiently small that it probably isn't worth creating a projection for any one join column.
Sort on columns in important queries
If you have an important query, one that you run on a regular basis, you can save time by putting the columns specified in the WHERE clause or the GROUP BY clause of that query early in the sort order.
If that query uses a high-cardinality column such as Social Security number, you may sacrifice storage by placing this column early in the sort order of a projection, but your most important query will be optimized.
Sort columns of equal cardinality by size
If you have two columns of equal cardinality, put the column that is larger first in the sort order. For example, a CHAR(20) column takes up 20 bytes, but an INTEGER column takes up 8 bytes. By putting the CHAR(20) column ahead of the INTEGER column, your projection compresses better.
Sort foreign key columns first, from low to high distinct cardinality
Suppose you have a fact table where the first four columns in the sort order make up a foreign key to another table. For best compression, choose a sort order for the fact table such that the foreign keys appear first, and in increasing order of distinct cardinality. Other factors also apply to the design of projections for fact tables, such as partitioning by a time dimension, if any.
In the following example, the table inventory
stores inventory data, and product_key
and warehouse_key
are foreign keys to the product_dimension
and warehouse_dimension
tables:
=> CREATE TABLE inventory (
date_key INTEGER NOT NULL,
product_key INTEGER NOT NULL,
warehouse_key INTEGER NOT NULL,
...
);
=> ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
REFERENCES warehouse_dimension(warehouse_key);
ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_product FOREIGN KEY(product_key)
REFERENCES product_dimension(product_key);
The inventory table should be sorted by warehouse_key and then product, since the cardinality of the warehouse_key
column is probably lower that the cardinality of the product_key
.
3.4.11.3.5 - Prioritizing column access speed
If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank.
If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank. For more information, see Setting storage performance.
How columns are ranked
Vertica stores columns included in the projection sort order on the fastest available storage locations. Columns not included in the projection sort order are stored on slower disks. Columns for each projection are ranked as follows:
-
Columns in the sort order are given the highest priority (numbers > 1000).
-
The last column in the sort order is given the rank number 1001.
-
The next-to-last column in the sort order is given the rank number 1002, and so on until the first column in the sort order is given 1000 + # of sort columns.
-
The remaining columns are given numbers from 1000–1, starting with 1000 and decrementing by one per column.
Vertica then stores columns on disk from the highest ranking to the lowest ranking. It places highest-ranking columns on the fastest disks and the lowest-ranking columns on the slowest disks.
Overriding default column ranking
You can modify which columns are stored on fast disks by manually overriding the default ranks for these columns. To accomplish this, set the ACCESSRANK
keyword in the column list. Make sure to use an integer that is not already being used for another column. For example, if you want to give a column the fastest access rank, use a number that is significantly higher than 1000 + the number of sort columns. This allows you to enter more columns over time without bumping into the access rank you set.
The following example sets column store_key
's access rank to 1500:
CREATE PROJECTION retail_sales_fact_p (
store_key ENCODING RLE ACCESSRANK 1500,
pos_transaction_number ENCODING RLE,
sales_dollar_amount,
cost_dollar_amount )
AS SELECT
store_key,
pos_transaction_number,
sales_dollar_amount,
cost_dollar_amount
FROM store.store_sales_fact
ORDER BY store_key
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;
4 - Database users and privileges
Database users should only have access to the database resources that they need to perform their tasks.
Database users should only have access to the database resources that they need to perform their tasks. For example, most users should be able to read data but not modify or insert new data. A smaller number of users typically need permission to perform a wider range of database tasks—for example, create and modify schemas, tables, and views. A very small number of users can perform administrative tasks, such as rebalance nodes on a cluster, or start or stop a database. You can also let certain users extend their own privileges to other users.
Client authentication controls what database objects users can access and change in the database. You specify access for specific users or roles with GRANT statements.
4.1 - Database users
Every Vertica database has one or more users.
Every Vertica database has one or more users. When users connect to a database, they must log on with valid credentials (username and password) that a superuser defined in the database.
Database users own the objects they create in a database, such as tables, procedures, and storage locations.
4.1.1 - Types of database users
In a Vertica database, there are three types of users:.
In a Vertica database, there are three types of users:
Note
External to a Vertica database, an MC administrator can create users through the Management Console and grant them database access. See
User administration in MC for details.
4.1.1.1 - Database administration user
On installation, a new Vertica database automatically contains a user with superuser privileges.
On installation, a new Vertica database automatically contains a user with superuser privileges. Unless explicitly named during installation, this user is identified as dbadmin
. This user cannot be dropped and has the following irrevocable roles:
With these roles, the dbadmin
user can perform all database operations. This user can also create other users with administrative privileges.
Important
Do not confuse the dbadmin
user with the DBADMIN role. The DBADMIN role is a set of privileges that can be assigned to one or more users.
The Vertica documentation often references the dbadmin
user as a superuser. This reference is unrelated to Linux superusers.
Creating additional database administrators
As the dbadmin
user, you can create other users with the same privileges:
-
Create a user:
=> CREATE USER DataBaseAdmin2;
CREATE USER
-
Grant the appropriate roles to new user DataBaseAdmin2
:
=> GRANT dbduser, dbadmin, pseudosuperuser to DataBaseAdmin2;
GRANT ROLE
User DataBaseAdmin2
now has the same privileges granted to the original dbadmin user.
-
As DataBaseAdmin2
, enable your assigned roles with SET ROLE:
=> \c - DataBaseAdmin2;
You are now connected to database "VMart" as user "DataBaseAdmin2".
=> SET ROLE dbadmin, dbduser, pseudosuperuser;
SET ROLE
-
Confirm the roles are enabled:
=> SHOW ENABLED ROLES;
name | setting
-------------------------------------------------
enabled roles | dbduser, dbadmin, pseudosuperuser
4.1.1.2 - Object owner
An object owner is the user who creates a particular database object and can perform any operation on that object.
An object owner is the user who creates a particular database object and can perform any operation on that object. By default, only an owner (or a superuser) can act on a database object. In order to allow other users to use an object, the owner or superuser must grant privileges to those users using one of the GRANT statements.
Note
Object owners are
PUBLIC users for objects that other users own.
See Database privileges for more information.
4.1.1.3 - PUBLIC user
All non-DBA (superuser) or object owners are PUBLIC users.
All non-DBA (superuser) or object owners are PUBLIC users.
Note
Object owners are PUBLIC users for objects that other users own.
Newly-created users do not have access to schema PUBLIC by default. Make sure to GRANT USAGE ON SCHEMA PUBLIC to all users you create.
See also
4.1.2 - Creating a database user
To create a database user:.
To create a database user:
-
From vsql, connect to the database as a superuser.
-
Issue the
CREATE USER
statement with optional parameters.
-
Run a series of GRANT statements to grant the new user privileges.
To create a user on MC, see User administration in MC in Management Console
New user privileges
By default, new database users have the right to create temporary tables in the database.
New users do not have access to schema PUBLIC
by default. Be sure to call GRANT USAGE ON SCHEMA PUBLIC
to all users you create.
Modifying users
You can change information about a user, such as his or her password, by using the
ALTER USER
statement. If you want to configure a user to not have any password authentication, you can set the empty password ‘’ in CREATE USER
or ALTER USER
statements, or omit the IDENTIFIED BY
parameter in CREATE USER
.
Example
The following series of commands add user Fred to a database with password 'password. The second command grants USAGE privileges to Fred on the public schema:
=> CREATE USER Fred IDENTIFIED BY 'password';
=> GRANT USAGE ON SCHEMA PUBLIC to Fred;
User names created with double-quotes are case sensitive. For example:
=> CREATE USER "FrEd1";
In the above example, the logon name must be an exact match. If the user name was created without double-quotes (for example, FRED1
), then the user can log on as FRED1
, FrEd1
, fred1
, and so on.
See also
4.1.3 - User-level configuration parameters
ALTER USER lets you set user-level configuration parameters on individual users.
ALTER USER lets you set user-level configuration parameters on individual users. These settings override database- or session-level settings on the same parameters. For example, the following ALTER USER statement sets DepotOperationsForQuery for users Yvonne and Ahmed to FETCHES, thus overriding the default setting of ALL:
=> SELECT user_name, parameter_name, current_value, default_value FROM user_configuration_parameters
WHERE user_name IN('Ahmed', 'Yvonne') AND parameter_name = 'DepotOperationsForQuery';
user_name | parameter_name | current_value | default_value
-----------+-------------------------+---------------+---------------
Ahmed | DepotOperationsForQuery | ALL | ALL
Yvonne | DepotOperationsForQuery | ALL | ALL
(2 rows)
=> ALTER USER Ahmed SET DepotOperationsForQuery='FETCHES';
ALTER USER
=> ALTER USER Yvonne SET DepotOperationsForQuery='FETCHES';
ALTER USER
Identifying user-level parameters
To identify user-level configuration parameters, query the allowed_levels
column of system table CONFIGURATION_PARAMETERS. For example, the following query identifies user-level parameters that affect depot usage:
n=> SELECT parameter_name, allowed_levels, default_value, current_level, current_value
FROM configuration_parameters WHERE allowed_levels ilike '%USER%' AND parameter_name ilike '%depot%';
parameter_name | allowed_levels | default_value | current_level | current_value
-------------------------+-------------------------+---------------+---------------+---------------
UseDepotForReads | SESSION, USER, DATABASE | 1 | DEFAULT | 1
DepotOperationsForQuery | SESSION, USER, DATABASE | ALL | DEFAULT | ALL
UseDepotForWrites | SESSION, USER, DATABASE | 1 | DEFAULT | 1
(3 rows)
Viewing user parameter settings
You can obtain user settings in two ways:
-
Query system table USER_CONFIGURATION_PARAMETERS:
=> SELECT * FROM user_configuration_parameters;
user_name | parameter_name | current_value | default_value
-----------+---------------------------+---------------+---------------
Ahmed | DepotOperationsForQuery | FETCHES | ALL
Yvonne | DepotOperationsForQuery | FETCHES | ALL
Yvonne | LoadSourceStatisticsLimit | 512 | 256
(3 rows)
-
Use SHOW USER:
=> SHOW USER Yvonne PARAMETER ALL;
user | parameter | setting
--------+---------------------------+---------
Yvonne | DepotOperationsForQuery | FETCHES
Yvonne | LoadSourceStatisticsLimit | 512
(2 rows)
=> SHOW USER ALL PARAMETER ALL;
user | parameter | setting
--------+---------------------------+---------
Yvonne | DepotOperationsForQuery | FETCHES
Yvonne | LoadSourceStatisticsLimit | 512
Ahmed | DepotOperationsForQuery | FETCHES
(3 rows)
4.1.4 - Locking user accounts
As a superuser, you can manually lock and unlock a database user account with ALTER USER...ACCOUNT LOCK and ALTER USER...ACCOUNT UNLOCK, respectively.
As a superuser, you can manually lock and unlock a database user account with ALTER USER...ACCOUNT LOCK and ALTER USER...ACCOUNT UNLOCK, respectively. For example, the following command prevents user Fred from logging in to the database:
=> ALTER USER Fred ACCOUNT LOCK;
=> \c - Fred
FATAL 4974: The user account "Fred" is locked
HINT: Please contact the database administrator
The following example unlocks access to Fred's user account:
=> ALTER USER Fred ACCOUNT UNLOCK;|
=> \c - Fred
You are now connected as user "Fred".
Locking new accounts
CREATE USER can specify to lock a new account. Like any locked account, it can be unlocked with ALTER USER...ACCOUNT UNLOCK.
=> CREATE USER Bob ACCOUNT LOCK;
CREATE USER
Locking accounts for failed login attempts
A user's profile can specify to lock an account after a certain number of failed login attempts.
4.1.5 - Setting and changing user passwords
As a superuser, you can set any user's password when you create that user with CREATE USER, or later with ALTER USER.
As a superuser, you can set any user's password when you create that user with CREATE USER, or later with ALTER USER. Non-superusers can also change their own passwords with ALTER USER. One exception applies: users who are added to the Vertica database with the LDAPLink service cannot change their passwords with ALTER USER.
You can also give a user a pre-hashed password if you provide its associated salt. The salt must be a hex string. This method bypasses password complexity requirements.
To view password hashes and salts of existing users, see the PASSWORDS system table.
Changing a user's password has no effect on their current session.
Setting user passwords in VSQL
In this example, the user 'Bob' is created with the password 'mypassword.'
=> CREATE USER Bob IDENTIFIED BY 'mypassword';
CREATE USER
The password is then changed to 'Orca.'
=> ALTER USER Bob IDENTIFIED BY 'Orca' REPLACE 'mypassword';
ALTER USER
In this example, the user 'Alice' is created with a pre-hashed password and salt.
=> CREATE USER Alice IDENTIFIED BY
'sha512e0299de83ecfaa0b6c9cbb1feabfbe0b3c82a1495875cd9ec1c4b09016f09b42c1'
SALT '465a4aec38a85d6ecea5a0ac8f2d36d8';
Setting user passwords in Management Console
On Management Console, users with ADMIN or IT privileges can reset a user's non-LDAP password:
-
Sign in to Management Console and navigate to MC Settings > User management.
-
Click to select the user to modify and click Edit.
-
Click Edit password and enter the new password twice.
-
Click OK and then Save.
4.2 - Database roles
A role is a collection of privileges that can be granted to one or more users or other roles.
A role is a collection of privileges that can be granted to one or more users or other roles. Roles help you grant and manage sets of privileges for various categories of users, rather than grant those privileges to each user individually.
For example, several users might require administrative privileges. You can grant these privileges to them as follows:
-
Create an administrator role with CREATE ROLE:
CREATE ROLE administrator;
-
Grant the role to the appropriate users.
-
Grant the appropriate privileges to this role with one or more GRANT statements. You can later add and remove privileges as needed. Changes in role privileges are automatically propagated to the users who have that role.
After users are assigned roles, they can either enable those roles themselves, or you can automatically enable their roles for them.
4.2.1 - Predefined database roles
Vertica has the following predefined roles:.
Vertica has the following predefined roles:
Automatic role grants
On installation, Vertica automatically grants and enables predefined roles as follows:
-
The DBADMIN, PSEUDOSUPERUSER, and DBDUSER roles are irrevocably granted to the dbadmin user. These roles are always enabled for dbadmin
, and can never be dropped.
-
PUBLIC is granted to dbadmin
, and to all other users as they are created. This role is always enabled and cannot be dropped or revoked.
Granting predefined roles
After installation, the dbadmin
user and users with the PSEUDOSUPERUSER role can grant one or more predefined roles to any user or non-predefined role. For example, the following set of statements creates the userdba
role and grants it the predefined role DBADMIN:
=> CREATE ROLE userdba;
CREATE ROLE
=> GRANT DBADMIN TO userdba WITH ADMIN OPTION;
GRANT ROLE
Users and roles that are granted a predefined role can extend that role to other users, if the original GRANT (Role) statement includes WITH ADMIN OPTION. One exception applies: if you grant a user the PSEUDOSUPERUSER role and omit WITH ADMIN OPTION, the grantee can grant any role, including all predefined roles, to other users.
For example, the userdba
role was previously granted the DBADMIN role. Because the GRANT statement includes WITH ADMIN OPTION, users who are assigned the userdba
role can grant the DBADMIN role to other users:
=> GRANT userdba TO fred;
GRANT ROLE
=> \c - fred
You are now connected as user "fred".
=> SET ROLE userdba;
SET
=> GRANT dbadmin TO alice;
GRANT ROLE
Modifying predefined Roles
Excluding SYSMONITOR, you can grant predefined roles privileges on individual database objects, such as tables or schemas. For example:
=> CREATE SCHEMA s1;
CREATE SCHEMA
=> GRANT ALL ON SCHEMA s1 to PUBLIC;
GRANT PRIVILEGE
You can grant PUBLIC any role, including predefined roles. For example:
=> CREATE ROLE r1;
CREATE ROLE
=> GRANT r1 TO PUBLIC;
GRANT ROLE
You cannot modify any other predefined role by granting another role to it. Attempts to do so return a rollback error:
=> CREATE ROLE r2;
CREATE ROLE
=> GRANT r2 TO PSEUDOSUPERUSER;
ROLLBACK 2347: Cannot alter predefined role "pseudosuperuser"
4.2.1.1 - DBADMIN
The DBADMIN role is a predefined role that is assigned to the dbadmin user on database installation.
The DBADMIN
role is a predefined role that is assigned to the dbadmin
user on database installation. Thereafter, the dbadmin
user and users with the
PSEUDOSUPERUSER
role can grant any role to any user or non-predefined role.
For example, superuser dbadmin
creates role fred
and grants fred
the DBADMIN
role:
=> CREATE USER fred;
CREATE USER
=> GRANT DBADMIN TO fred WITH ADMIN OPTION;
GRANT ROLE
After user fred
enables its DBADMIN role
, he can exercise his DBADMIN
privileges by creating user alice
. Because the GRANT
statement includes WITH ADMIN OPTION
, fred
can also grant the DBADMIN
role to user alice
:
=> \c - fred
You are now connected as user "fred".
=> SET ROLE dbadmin;
SET
CREATE USER alice;
CREATE USER
=> GRANT DBADMIN TO alice;
GRANT ROLE
DBADMIN privileges
The following table lists privileges that are supported for the DBADMIN role:
-
Create users and roles, and grant them roles and privileges
-
Create and drop schemas
-
View all system tables
-
View and terminate user sessions
-
Access all data created by any user
4.2.1.2 - PSEUDOSUPERUSER
The PSEUDOSUPERUSER role is a predefined role that is automatically assigned to the dbadmin user on database installation.
The PSEUDOSUPERUSER
role is a predefined role that is automatically assigned to the dbadmin
user on database installation. The dbadmin
can grant this role to any user or non-predefined role. Thereafter, PSEUDOSUPERUSER
users can grant any role, including predefined roles, to other users.
PSEUDOSUPERUSER privileges
Users with the PSEUDOSUPERUSER
role are entitled to complete administrative privileges, which cannot be revoked. Role privileges include:
-
Bypass all GRANT/REVOKE authorization
-
Create schemas and tables
-
Create users and roles, and grant privileges to them
-
Modify user accounts—for example, set user account's passwords, and lock/unlock accounts.
-
Create or drop a UDF library and function, or any external procedure
4.2.1.3 - DBDUSER
The DBDUSER role is a predefined role that is assigned to the dbadmin user on database installation.
The DBDUSER
role is a predefined role that is assigned to the dbadmin
user on database installation. The dbadmin
and any PSEUDOSUPERUSER
can grant this role to any user or non-predefined role. Users who have this role and enable it can call Database Designer functions from the command line.
Note
Non-DBADMIN users with the DBDUSER role cannot run Database Designer through Administration Tools. Only
DBADMIN users can run Administration Tools.
Associating DBDUSER with resource pools
Be sure to associate a resource pool with the DBDUSER
role, to facilitate resource management when you run Database Designer. Multiple users can run Database Designer concurrently without interfering with each other or exhausting all the cluster resources. Whether you run Database Designer programmatically or with Administration Tools, design execution is generally contained by the user's resource pool, but might spill over into system resource pools for less-intensive tasks.
4.2.1.4 - SYSMONITOR
An organization's database administrator may have many responsibilities outside of maintaining Vertica as a DBADMIN user.
An organization's database administrator may have many responsibilities outside of maintaining Vertica as a DBADMIN user. In this case, as the DBADMIN you may want to delegate some Vertica administrative tasks to another Vertica user.
The DBADMIN can assign a delegate the SYSMONITOR role to grant access to system tables without granting full DBADMIN access.
The SYSMONITOR role provides the following privileges.
-
View all system tables that are marked as monitorable. You can see a list of all the monitorable tables by issuing the statement:
=> SELECT * FROM system_tables WHERE is_monitorable='t';
-
If WITH ADMIN OPTION
was included when granting SYSMONITOR to the user or role, that user or role can then grant SYSMONITOR privileges to other users and roles.
Grant a SYSMONITOR role
To grant a user or role the SYSMONITOR role, you must be one of the following:
Use the GRANT (Role) SQL statement to assign a user the SYSMONITOR role. This example shows how to grant the SYSMONITOR role to user1 and includes administration privileges by using the WITH ADMIN OPTION parameter. The ADMIN OPTION grants the SYSMONITOR role administrative privileges:
=> GRANT SYSMONITOR TO user1 WITH ADMIN OPTION;
This example shows how to revoke the ADMIN OPTION from the SYSMONITOR role for user1:
=> REVOKE ADMIN OPTION for SYSMONITOR FROM user1;
Use CASCADE to revoke ADMIN OPTION privileges for all users assigned the SYSMONITOR role:
=> REVOKE ADMIN OPTION for SYSMONITOR FROM PUBLIC CASCADE;
Example
This example shows how to:
=> CREATE USER user1;
=> CREATE ROLE monitor;
=> GRANT SYSMONITOR TO monitor;
=> GRANT monitor TO user1;
Assign SYSMONITOR privileges
This example uses the user and role created in the Grant SYSMONITOR Role example and shows how to:
-
Create a table called personal_data
-
Log in as user1
-
Grant user1 the monitor role. (You already granted the monitor SYSMONITOR privileges in the Grant a SYSMONITOR Role example.)
-
Run a SELECT statement as user1
The results of the operations are based on the privilege already granted to user1.
=> CREATE TABLE personal_data (SSN varchar (256));
=> \c -user1;
=> SET ROLE monitor;
=> SELECT COUNT(*) FROM TABLES;
COUNT
-------
1
(1 row)
Because you assigned the SYSMONITOR role, user1 can see the number of rows in the Tables system table. In this simple example, there is only one table (personal_data) in the database so the SELECT COUNT returns one row. In actual conditions, the SYSMONITOR role would see all the tables in the database.
Check if a table is accessible by SYSMONITOR
To check if a system table can be accessed by a user assigned the SYSMONITOR role:
=> SELECT table_name, is_monitorable FROM system_tables WHERE table_name='table_name'
For example, the following statement shows that the CURRENT_SESSION system table is accessible by the SYSMONITOR:
=> SELECT table_name, is_monitorable FROM system_tables WHERE table_name='current_session';
table_name | is_monitorable
-----------------+----------------
current_session | t
(1 row)
4.2.1.5 - UDXDEVELOPER
The UDXDEVELOPER role is a predefined role that enables users to create and replace user-defined libraries.
The UDXDEVELOPER role is a predefined role that enables users to create and replace user-defined libraries. The dbadmin
can grant this role to any user or non-predefined role.
UDXDEVELOPER privileges
Users with the UDXDEVELOPER role can perform the following actions:
To use the privileges of this role, you must explicitly enable it using SET ROLE.
Security considerations
A user with the UDXDEVELOPER role can create libraries and, therefore, can install any UDx function in the database. UDx functions run as the Linux user that owns the database, and therefore have access to resources that Vertica has access to.
A poorly-written function can degrade database performance. Give this role only to users you trust to use UDxs responsibly. You can limit the memory that a UDx can consume by running UDxs in fenced mode and by setting the FencedUDxMemoryLimitMB configuration parameter.
4.2.1.6 - MLSUPERVISOR
The MLSUPERVISOR role is a predefined role to which all the ML-model management privileges of DBADMIN are delegated.
The MLSUPERVISOR
role is a predefined role to which all the ML-model management privileges of DBADMIN are delegated. An MLSUPERVISOR
can manage all models in the V_CATALOG.MODELS
table on behalf of dbadmin
.
In the following example, user alice
uses her MLSUPERVISOR
privileges to reassign ownership of the model my_model
from user bob
to user nina
:
=> \c - alice
You are now connected as user "alice".
=> SELECT model_name, schema_name, owner_name FROM models;
model_name | schema_name | owner_name
-------------+-------------+------------
my_model | public | bob
mylinearreg | myschema2 | alice
(2 rows)
=> SET ROLE MLSUPERVISOR;
=> ALTER MODEL my_model OWNER to nina;
=> SELECT model_name, schema_name, owner_name FROM models;
model_name | schema_name | owner_name
-------------+-------------+------------
my_model | public | nina
mylinearreg | myschema2 | alice
(2 rows)
=> DROP MODEL my_model;
MLSUPERVISOR privileges
The following privileges are supported for the MLSUPERVISOR role:
-
ML-model management privileges of DBADMIN
-
Management (USAGE, ALTER, DROP) of all models in V_CATALOG.MODELS
To use the privileges of this role, you must explicitly enable it using SET ROLE.
See also
4.2.1.7 - PUBLIC
The PUBLIC role is a predefined role that is automatically assigned to all new users.
The PUBLIC
role is a predefined role that is automatically assigned to all new users. It is always enabled and cannot be dropped or revoked. Use this role to grant all database users the same minimum set of privileges.
Like any role, the PUBLIC
role can be granted privileges to individual objects and other roles. The following example grants the PUBLIC
role INSERT and SELECT privileges on table publicdata
. This enables all users to read data in that table and insert new data:
=> CREATE TABLE publicdata (a INT, b VARCHAR);
CREATE TABLE
=> GRANT INSERT, SELECT ON publicdata TO PUBLIC;
GRANT PRIVILEGE
=> CREATE PROJECTION publicdataproj AS (SELECT * FROM publicdata);
CREATE PROJECTION
=> \c - bob
You are now connected as user "bob".
=> INSERT INTO publicdata VALUES (10, 'Hello World');
OUTPUT
--------
1
(1 row)
The following example grants PUBLIC
the employee
role, so all database users have employee
privileges:
=> GRANT employee TO public;
GRANT ROLE
Important
The clause WITH ADMIN OPTION
is invalid for any GRANT
statement that specifies PUBLIC
as grantee.
4.2.2 - Role hierarchy
By granting roles to other roles, you can build a hierarchy of roles, where roles lower in the hierarchy have a narrow range of privileges, while roles higher in the hierarchy are granted combinations of roles and their privileges.
By granting roles to other roles, you can build a hierarchy of roles, where roles lower in the hierarchy have a narrow range of privileges, while roles higher in the hierarchy are granted combinations of roles and their privileges. When you organize roles hierarchically, any privileges that you add to lower-level roles are automatically propagated to the roles above them.
Creating hierarchical roles
The following example creates two roles, assigns them privileges, then assigns both roles to another role.
-
Create table applog
:
=> CREATE TABLE applog (id int, sourceID VARCHAR(32), data TIMESTAMP, event VARCHAR(256));
-
Create the logreader
role and grant it read-only privileges on table applog
:
=> CREATE ROLE logreader;
CREATE ROLE
=> GRANT SELECT ON applog TO logreader;
GRANT PRIVILEGE
-
Create the logwriter
role and grant it write privileges on table applog
:
=> CREATE ROLE logwriter;
CREATE ROLE
=> GRANT INSERT, UPDATE ON applog to logwriter;
GRANT PRIVILEGE
-
Create the logadmin
role and grant it DELETE privilege on table applog
:
=> CREATE ROLE logadmin;
CREATE ROLE
=> GRANT DELETE ON applog to logadmin;
GRANT PRIVILEGE
-
Grant the logreader
and logwriter
roles to role logadmin
:
=> GRANT logreader, logwriter TO logadmin;
-
Create user bob
and grant him the logadmin
role:
=> CREATE USER bob;
CREATE USER
=> GRANT logadmin TO bob;
GRANT PRIVILEGE
-
Modify user bob
's account so his logadmin
role is automatically enabled on login:
=> ALTER USER bob DEFAULT ROLE logadmin;
ALTER USER
=> \c - bob
You are now connected as user "bob".
=> SHOW ENABLED_ROLES;
name | setting
---------------+----------
enabled roles | logadmin
(1 row)
Enabling hierarchical roles
Only roles that are explicitly granted to a user can be enabled for that user. In the previous example, roles logreader
or logwriter
cannot be enabled for bob
. They can only be enabled indirectly, by enabling logadmin
.
Hierarchical role grants and WITH ADMIN OPTION
If one or more roles are granted to another role using WITH ADMIN OPTION
, then users who are granted the 'higher' role inherit administrative access to the subordinate roles.
For example, you might modify the earlier grants of roles logreader
and logwriter
to logadmin
as follows:
=> GRANT logreader, logwriter TO logadmin WITH ADMIN OPTION;
NOTICE 4617: Role "logreader" was already granted to role "logadmin"
NOTICE 4617: Role "logwriter" was already granted to role "logadmin"
GRANT ROLE
User bob
, through his logadmin
role, is now authorized to grant its two subordinate roles to other users—in this case, role logreader
to user Alice
:
=> \c - bob;
You are now connected as user "bob".
=> GRANT logreader TO Alice;
GRANT ROLE
=> \c - alice;
You are now connected as user "alice".
=> show available_roles;
name | setting
-----------------+-----------
available roles | logreader
(1 row)
Note
Because the grant of the logadmin
role to bob
did not include WITH ADMIN OPTION
, he cannot grant that role to alice
:
=> \c - bob;
You are now connected as user "bob".
=> GRANT logadmin TO alice;
ROLLBACK 4925: The role "logadmin" cannot be granted to "alice"
4.2.3 - Creating and dropping roles
As a superuser with the DBADMIN or PSEUDOSUPERUSER role, you can create and drop roles with CREATE ROLE and DROP ROLE, respectively.
As a superuser with the
DBADMIN
or
PSEUDOSUPERUSER
role, you can create and drop roles with
CREATE ROLE
and
DROP ROLE
, respectively.
=> CREATE ROLE administrator;
CREATE ROLE
A new role has no privileges or roles granted to it. Only superusers can grant privileges and access to the role.
Dropping database roles with dependencies
If you try to drop a role that is granted to users or other roles Vertica returns a rollback message:
=> DROP ROLE administrator;
NOTICE: User Bob depends on Role administrator
ROLLBACK: DROP ROLE failed due to dependencies
DETAIL: Cannot drop Role administrator because other objects depend on it
HINT: Use DROP ROLE ... CASCADE to remove granted roles from the dependent users/roles
To force the drop operation, qualify the DROP ROL
E statement with CASCADE
:
=> DROP ROLE administrator CASCADE;
DROP ROLE
4.2.4 - Granting privileges to roles
You can use GRANT statements to assign privileges to a role, just as you assign privileges to users.
You can use GRANT statements to assign privileges to a role, just as you assign privileges to users. See Database privileges for information about which privileges can be granted.
Granting a privilege to a role immediately affects active user sessions. When you grant a privilege to a role, it becomes immediately available to all users with that role enabled.
The following example creates two roles and assigns them different privileges on the same table.
-
Create table applog
:
=> CREATE TABLE applog (id int, sourceID VARCHAR(32), data TIMESTAMP, event VARCHAR(256));
-
Create roles logreader
and logwriter
:
=> CREATE ROLE logreader;
CREATE ROLE
=> CREATE ROLE logwriter;
CREATE ROLE
-
Grant read-only privileges on applog
to logreader
, and write privileges to logwriter
:
=> GRANT SELECT ON applog TO logreader;
GRANT PRIVILEGE
=> GRANT INSERT ON applog TO logwriter;
GRANT PRIVILEGE
Revoking privileges from roles
Use REVOKE statements to revoke a privilege from a role. Revoking a privilege from a role immediately affects active user sessions. When you revoke a privilege from a role, it is no longer available to users who have the privilege through that role.
For example:
=> REVOKE INSERT ON applog FROM logwriter;
REVOKE PRIVILEGE
4.2.5 - Granting database roles
You can assign one or more roles to a user or another role with GRANT (Role):.
You can assign one or more roles to a user or another role with GRANT (Role):
GRANT role[,...] TO grantee[,...] [ WITH ADMIN OPTION ]
For example, you might create three roles—appdata
, applogs
, and appadmin
—and grant appadmin
to user bob
:
=> CREATE ROLE appdata;
CREATE ROLE
=> CREATE ROLE applogs;
CREATE ROLE
=> CREATE ROLE appadmin;
CREATE ROLE
=> GRANT appadmin TO bob;
GRANT ROLE
Granting roles to another role
GRANT
can assign one or more roles to another role. For example, the following GRANT
statement grants roles appdata
and applogs
to role appadmin
:
=> GRANT appdata, applogs TO appadmin;
-- grant to other roles
GRANT ROLE
Because user bob was previously assigned the role appadmin
, he now has all privileges that are granted to roles appdata
and applogs
.
When you grant one role to another role, Vertica checks for circular references. In the previous example, role appdata
is assigned to the appadmin
role. Thus, subsequent attempts to assign appadmin
to appdata
fail, returning with the following warning:
=> GRANT appadmin TO appdata;
WARNING: Circular assignation of roles is not allowed
HINT: Cannot grant appadmin to appdata
GRANT ROLE
Enabling roles
After granting a role to a user, the role must be enabled. You can enable a role for the current session:
=> SET ROLE appdata;
SET ROLE
You can also enable a role as part of the user's login, by modifying the user's profile with
ALTER USER...DEFAULT ROLE
:
=> ALTER USER bob DEFAULT ROLE appdata;
ALTER USER
For details, see Enabling roles and Enabling roles automatically.
Granting administrative privileges
You can delegate to non-superusers users administrative access to a role by qualifying the GRANT (Role) statement with the option WITH ADMIN OPTION
. Users with administrative access can manage access to the role for other users, including granting them administrative access. In the following example, a superuser grants the appadmin
role with administrative privileges to users bob
and alice.
=> GRANT appadmin TO bob, alice WITH ADMIN OPTION;
GRANT ROLE
Now, both users can exercise their administrative privileges to grant the appadmin
role to other users, or revoke it. For example, user bob
can now revoke the appadmin
role from user alice
:
=> \connect - bob
You are now connected as user "bob".
=> REVOKE appadmin FROM alice;
REVOKE ROLE
Caution
As with all user privilege models, database superusers should be cautious when granting any user a role with administrative privileges. For example, if the database superuser grants two users a role with administrative privileges, either user can revoke that role from the other user.
Example
The following example creates a role called commenter
and grants that role to user bob
:
-
Create the comments
table:
=> CREATE TABLE comments (id INT, comment VARCHAR);
-
Create the commenter
role:
=> CREATE ROLE commenter;
-
Grant to commenter
INSERT and SELECT privileges on the comments
table:
=> GRANT INSERT, SELECT ON comments TO commenter;
-
Grant the commenter
role to user bob
.
=> GRANT commenter TO bob;
-
In order to access the role and its associated privileges, bob
enables the newly-granted role for himself:
=> \c - bob
=> SET ROLE commenter;
-
Because bob
has INSERT and SELECT privileges on the comments
table, he can perform the following actions:
=> INSERT INTO comments VALUES (1, 'Hello World');
OUTPUT
--------
1
(1 row)
=> SELECT * FROM comments;
id | comment
----+-------------
1 | Hello World
(1 row)
=> COMMIT;
COMMIT
-
Because bob
's role lacks DELETE privileges, the following statement returns an error:
=> DELETE FROM comments WHERE id=1;
ERROR 4367: Permission denied for relation comments
See also
Database privileges
4.2.6 - Revoking database roles
REVOKE (Role) can revoke roles from one or more grantees—that is, from users or roles:.
REVOKE (Role)
can revoke roles from one or more grantees—that is, from users or roles:
REVOKE [ ADMIN OPTION FOR ] role[,...] FROM grantee[,...] [ CASCADE ]
For example, the following statement revokes the commenter
role from user bob
:
=> \c
You are now connected as user "dbadmin".
=> REVOKE commenter FROM bob;
REVOKE ROLE
Revoking administrative access from a role
You can qualify
REVOKE (Role)
with the clause ADMIN OPTION FOR
. This clause revokes from the grantees the authority (granted by an earlier GRANT (Role)...WITH ADMIN OPTION
statement) to grant the specified roles to other users or roles. Current roles for the grantees are unaffected.
The following example revokes user Alice's authority to grant and revoke the commenter
role:
=> \c
You are now connected as user "dbadmin".
=> REVOKE ADMIN OPTION FOR commenter FROM alice;
REVOKE ROLE
4.2.7 - Enabling roles
When you enable a role in a session, you obtain all privileges assigned to that role.
When you enable a role in a session, you obtain all privileges assigned to that role. You can enable multiple roles simultaneously, thereby gaining all privileges of those roles, plus any privileges that are already granted to you directly.
By default, only predefined roles are enabled automatically for users. Otherwise, on starting a session, you must explicitly enable assigned roles with the Vertica function
SET ROLE
.
For example, the dbadmin creates the logreader
role and assigns it to user alice
:
=> \c
You are now connected as user "dbadmin".
=> CREATE ROLE logreader;
CREATE ROLE
=> GRANT SELECT ON TABLE applog to logreader;
GRANT PRIVILEGE
=> GRANT logreader TO alice;
GRANT ROLE
User alice
must enable the new role before she can view the applog
table:
=> \c - alice
You are now connected as user "alice".
=> SELECT * FROM applog;
ERROR: permission denied for relation applog
=> SET ROLE logreader;
SET
=> SELECT * FROM applog;
id | sourceID | data | event
----+----------+----------------------------+----------------------------------------------
1 | Loader | 2011-03-31 11:00:38.494226 | Error: Failed to open source file
2 | Reporter | 2011-03-31 11:00:38.494226 | Warning: Low disk space on volume /scratch-a
(2 rows)
Enabling all user roles
You can enable all roles available to your user account with SET ROLE ALL
:
=> SET ROLE ALL;
SET
=> SHOW ENABLED_ROLES;
name | setting
---------------+------------------------------
enabled roles | logreader, logwriter
(1 row)
Disabling roles
A user can disable all roles with
SET ROLE NONE
. This statement disables all roles for the current session, excluding predefined roles:
=> SET ROLE NONE;
=> SHOW ENABLED_ROLES;
name | setting
---------------+---------
enabled roles |
(1 row)
4.2.8 - Enabling roles automatically
By default, new users are assigned the PUBLIC role, which is automatically enabled when a new session starts.
By default, new users are assigned the PUBLIC, which is automatically enabled when a new session starts. Typically, other roles are created and users are assigned to them, but these are not automatically enabled. Instead, users must explicitly enable their assigned roles with each new session, with
SET ROLE
.
You can automatically enable roles for users in two ways:
Enable roles for individual users
After assigning roles to users, you can set one or more default roles for each user by modifying their profiles, with
ALTER USER...DEFAULT ROLE
. User default roles are automatically enabled at the start of the user session. You should consider setting default roles for users if they typically rely on the privileges of those roles to carry out routine tasks.
Important
ALTER USER...DEFAULT ROLE
overwrites previous default role settings.
The following example shows how to set regional_manager
as the default role for user LilyCP
:
=> \c
You are now connected as user "dbadmin".
=> GRANT regional_manager TO LilyCP;
GRANT ROLE
=> ALTER USER LilyCP DEFAULT ROLE regional_manager;
ALTER USER
=> \c - LilyCP
You are now connected as user "LilyCP".
=> SHOW ENABLED_ROLES;
name | setting
---------------+------------------
enabled roles | regional_manager
(1 row)
Enable all roles for all users
Configuration parameter EnableAllRolesOnLogin
specifies whether to enable all roles for all database users on login. By default, this parameter is set to 0. If set to 1, Vertica enables the roles of all users when they log in to the database.
Clearing default roles
You can clear all default role assignments for a user with
ALTER USER...DEFAULT ROLE NONE
. For example:
=> ALTER USER fred DEFAULT ROLE NONE;
ALTER USER
=> SELECT user_name, default_roles, all_roles FROM users WHERE user_name = 'fred';
user_name | default_roles | all_roles
-----------+---------------+-----------
fred | | logreader
(1 row)
4.2.9 - Viewing user roles
You can obtain information about roles in three ways:.
You can obtain information about roles in three ways:
Verifying role assignments
The function
HAS_ROLE
checks whether a Vertica role is granted to the specified user or role. Non-superusers can use this function to check their own role membership. Superusers can use it to determine role assignments for other users and roles. You can also use Management Console to check role assignments.
In the following example, a dbadmin
user checks whether user MikeL
is assigned the admnistrator
role:
=> \c
You are now connected as user "dbadmin".
=> SELECT HAS_ROLE('MikeL', 'administrator');
HAS_ROLE
----------
t
(1 row)
User MikeL
checks whether he has the regional_manager
role:
=> \c - MikeL
You are now connected as user "MikeL".
=> SELECT HAS_ROLE('regional_manager');
HAS_ROLE
----------
f
(1 row)
The dbadmin grants the regional_manager
role to the administrator
role. On checking again, MikeL
verifies that he now has the regional_manager
role:
dbadmin=> \c
You are now connected as user "dbadmin".
dbadmin=> GRANT regional_manager to administrator;
GRANT ROLE
dbadmin=> \c - MikeL
You are now connected as user "MikeL".
dbadmin=> SELECT HAS_ROLE('regional_manager');
HAS_ROLE
----------
t
(1 row)
Viewing available and enabled roles
SHOW AVAILABLE ROLES
lists all roles granted to you:
=> SHOW AVAILABLE ROLES;
name | setting
-----------------+-----------------------------
available roles | logreader, logwriter
(1 row)
SHOW ENABLED ROLES
lists the roles enabled in your session:
=> SHOW ENABLED ROLES;
name | setting
---------------+----------
enabled roles | logreader
(1 row)
Querying system tables
You can query tables ROLES, USERS, AND GRANTS, either separately or joined, to obtain detailed information about user roles, users assigned to those roles, and the privileges granted explicitly to users and implicitly through roles.
The following query on ROLES returns the names of all roles users can access, and the roles granted (assigned) to those roles. An asterisk (*) appended to a role indicates that the user can grant the role to other users:
=> SELECT * FROM roles;
name | assigned_roles
-----------------+----------------
public |
dbduser |
dbadmin | dbduser*
pseudosuperuser | dbadmin*
logreader |
logwriter |
logadmin | logreader, logwriter
(7 rows)
The following query on system table USERS returns all users with the DBADMIN role. An asterisk (*) appended to a role indicates that the user can grant the role to other users:
=> SELECT user_name, is_super_user, default_roles, all_roles FROM v_catalog.users WHERE all_roles ILIKE '%dbadmin%';
user_name | is_super_user | default_roles | all_roles
-----------+---------------+--------------------------------------+--------------------------------------
dbadmin | t | dbduser*, dbadmin*, pseudosuperuser* | dbduser*, dbadmin*, pseudosuperuser*
u1 | f | | dbadmin*
u2 | f | | dbadmin
(3 rows)
The following query on system table GRANTS returns the privileges granted to user Jane or role R1. An asterisk (*) appended to a privilege indicates that the user can grant the privilege to other users:
=> SELECT grantor,privileges_description,object_name,object_type,grantee FROM grants WHERE grantee='Jane' OR grantee='R1';
grantor | privileges_description | object_name | object_type | grantee
--------+------------------------+-------------+--------------+-----------
dbadmin | USAGE | general | RESOURCEPOOL | Jane
dbadmin | | R1 | ROLE | Jane
dbadmin | USAGE* | s1 | SCHEMA | Jane
dbadmin | USAGE, CREATE* | s1 | SCHEMA | R1
(4 rows)
4.3 - Database privileges
When a database object is created, such as a schema, table, or view, ownership of that object is assigned to the user who created it.
When a database object is created, such as a schema, table, or view, ownership of that object is assigned to the user who created it. By default, only the object's owner, and users with superuser privileges such as database administrators, have privileges on a new object. Only these users (and other users whom they explicitly authorize) can grant object privileges to other users
Privileges are granted and revoked by GRANT and REVOKE statements, respectively. The privileges that can be granted on a given object are specific to its type. For example, table privileges include SELECT, INSERT, and UPDATE, while library and resource pool privileges have USAGE privileges only. For a summary of object privileges, see Database object privileges.
Because privileges on database objects can come from several different sources like explicit grants, roles, and inheritance, privileges can be difficult to monitor. Use the GET_PRIVILEGES_DESCRIPTION meta-function to check the current user's effective privileges across all sources on a specified database object.
4.3.1 - Ownership and implicit privileges
All users have implicit privileges on the objects that they own.
All users have implicit privileges on the objects that they own. On creating an object, its owner automatically is granted all privileges associated with the object's type (see Database object privileges). Regardless of object type, the following privileges are inseparable from ownership and cannot be revoked, not even by the owner:
-
Authority to grant all object privileges to other users, and revoke them
-
ALTER (where applicable) and DROP
-
Extension of privilege granting authority on their objects to other users, and revoking that authority
Object owners can revoke all non-implicit, or ordinary, privileges from themselves. For example, on creating a table, its owner is automatically granted all implicit and ordinary privileges:
Implicit table privileges |
Ordinary table privileges |
ALTER DROP |
DELETE INSERT REFERENCES SELECT TRUNCATE UPDATE |
If user Joan
creates table t1
, she can revoke ordinary privileges UPDATE and INSERT from herself, which effectively makes this table read-only:
=> \c - Joan
You are now connected as user "Joan".
=> CREATE TABLE t1 (a int);
CREATE TABLE
=> INSERT INTO t1 VALUES (1);
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> REVOKE UPDATE, INSERT ON TABLE t1 FROM Joan;
REVOKE PRIVILEGE
=> INSERT INTO t1 VALUES (3);
ERROR 4367: Permission denied for relation t1
=> SELECT * FROM t1;
a
---
1
(1 row)
Joan can subsequently restore UPDATE and INSERT privileges to herself:
=> GRANT UPDATE, INSERT on TABLE t1 TO Joan;
GRANT PRIVILEGE
dbadmin=> INSERT INTO t1 VALUES (3);
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
dbadmin=> SELECT * FROM t1;
a
---
1
3
(2 rows)
4.3.2 - Inherited privileges
You can manage inheritance of privileges at three levels:.
You can manage inheritance of privileges at three levels:
By default, inherited privileges are enabled at the database level and disabled at the schema level. If privilege inheritance is enabled at both levels, newly created tables, views, and models automatically inherit their parent schema's privileges. You can also disable privilege inheritance on individual objects with the following statements:
4.3.2.1 - Enabling database inheritance
By default, inherited privileges are enabled at the database level.
By default, inherited privileges are enabled at the database level. You can toggle database-level inherited privileges with the DisableInheritedPrivileges configuration parameter.
To enable inherited privileges:
=> ALTER DATABASE database_name SET DisableInheritedPrivileges = 0;
To disable inherited privileges:
=> ALTER DATABASE database_name SET DisableInheritedPrivileges = 1;
4.3.2.2 - Enabling schema inheritance
By default, inherited privileges are disabled at the schema level.
Caution
Enabling inherited privileges with ALTER SCHEMA ... DEFAULT INCLUDE PRIVILEGES only affects newly created tables, views, and models.
This setting does not affect existing tables, views, and models.
By default, inherited privileges are disabled at the schema level. If inherited privileges are enabled at the database level, you can enable inheritance at the schema level with CREATE SCHEMA and ALTER SCHEMA. For example, the following statement creates the schema my_schema with schema inheritance enabled:
To create a schema with schema inheritance enabled:
=> CREATE SCHEMA my_schema DEFAULT INCLUDE PRIVILEGES;
To enable schema inheritance for an existing schema:
=> ALTER SCHEMA my_schema DEFAULT INCLUDE SCHEMA PRIVILEGES;
After schema-level privilege inheritance is enabled, privileges granted on the schema are automatically inherited by all newly created tables, views, and models in that schema. You can explicitly exclude a table, view, or model from privilege inheritance with the following statements:
For example, to prevent my_table from inheriting the privileges of my_schema:
=> ALTER TABLE my_schema.my_table EXCLUDE SCHEMA PRIVILEGES;
For information about which objects inherit privileges from which schemas, see INHERITING_OBJECTS.
For information about which privileges each object inherits, see INHERITED_PRIVILEGES.
Note
If inherited privileges are disabled for the database, enabling inheritance on its schemas has no effect. Attempts to do so return the following message:
Inherited privileges are globally disabled; schema parameter is set but has no effect.
Schema inheritance for existing objects
Enabling schema inheritance on an existing schema only affects newly created tables, views, and models in that schema. To allow an existing objects to inherit the privileges from their parent schema, you must explicitly set schema inheritance on each object with ALTER TABLE, ALTER VIEW, or ALTER MODEL.
For example, my_schema contains my_table, my_view, and my_model. Enabling schema inheritance on my_schema does not affect the privileges of my_table and my_view. The following statements explicitly set schema inheritance on these objects:
=> ALTER VIEW my_schema.my_view INCLUDE SCHEMA PRIVILEGES;
=> ALTER TABLE my_schema.my_table INCLUDE SCHEMA PRIVILEGES;
=> ALTER MODEL my_schema.my_model INCLUDE SCHEMA PRIVILEGES;
After enabling inherited privileges on a schema, you can grant privileges on it to users and roles with GRANT (schema). The specified user or role then implicitly has these same privileges on the objects in the schema:
=> GRANT USAGE, CREATE, SELECT, INSERT ON SCHEMA my_schema TO PUBLIC;
GRANT PRIVILEGE
See also
4.3.2.3 - Setting privilege inheritance on tables and views
If inherited privileges are enabled for the database and a schema, privileges granted to the schema are automatically granted to all new tables and views in it.
Caution
Enabling inherited privileges with ALTER SCHEMA ... DEFAULT INCLUDE PRIVILEGES only affects newly created tables, views, and models.
This setting does not affect existing tables, views, and models.
If inherited privileges are enabled for the database and a schema, privileges granted to the schema are automatically granted to all new tables and views in it. You can also explicitly exclude tables and views from inheriting schema privileges.
For information about which tables and views inherit privileges from which schemas, see INHERITING_OBJECTS.
For information about which privileges each table or view inherits, see the INHERITED_PRIVILEGES.
Set privileges inheritance on tables and views
CREATE TABLE/ALTER TABLE and CREATE VIEW/ALTER VIEW can allow tables and views to inherit privileges from their parent schemas. For example, the following statements enable inheritance on schema s1, so new table s1.t1 and view s1.myview automatically inherit the privileges set on that schema as applicable:
=> CREATE SCHEMA s1 DEFAULT INCLUDE PRIVILEGES;
CREATE SCHEMA
=> GRANT USAGE, CREATE, SELECT, INSERT ON SCHEMA S1 TO PUBLIC;
GRANT PRIVILEGE
=> CREATE TABLE s1.t1 ( ID int, f_name varchar(16), l_name(24));
WARNING 6978: Table "t1" will include privileges from schema "s1"
CREATE TABLE
=> CREATE VIEW s1.myview AS SELECT ID, l_name FROM s1.t1
WARNING 6978: View "myview" will include privileges from schema "s1"
CREATE VIEW
Note
Both CREATE statements omit the clause INCLUDE SCHEMA PRIVILEGES, so they return a warning message that the new objects will inherit schema privileges. CREATE statements that include this clause do not return a warning message.
If the schema already exists, you can use ALTER SCHEMA to have all newly created tables and views inherit the privileges of the schema. Tables and views created on the schema before this statement, however, are not affected:
=> CREATE SCHEMA s2;
CREATE SCHEMA
=> CREATE TABLE s2.t22 ( a int );
CREATE TABLE
...
=> ALTER SCHEMA S2 DEFAULT INCLUDE PRIVILEGES;
ALTER SCHEMA
In this case, inherited privileges were enabled on schema s2 after it already contained table s2.t22. To set inheritance on this table and other existing tables and views, you must explicitly set schema inheritance on them with ALTER TABLE and ALTER VIEW:
=> ALTER TABLE s2.t22 INCLUDE SCHEMA PRIVILEGES;
Exclude privileges inheritance from tables and views
You can use CREATE TABLE/ALTER TABLE and CREATE VIEW/ALTER VIEW to prevent table and views from inheriting schema privileges.
The following example shows how to create a table that does not inherit schema privileges:
=> CREATE TABLE s1.t1 ( x int) EXCLUDE SCHEMA PRIVILEGES;
You can modify an existing table so it does not inherit schema privileges:
=> ALTER TABLE s1.t1 EXCLUDE SCHEMA PRIVILEGES;
4.3.2.4 - Example usage: implementing inherited privileges
The following steps show how user Joe enables inheritance of privileges on a given schema so other users can access tables in that schema.
The following steps show how user Joe
enables inheritance of privileges on a given schema so other users can access tables in that schema.
-
Joe
creates schema schema1
, and creates table table1
in it:
=>\c - Joe
You are now connected as user Joe
=> CREATE SCHEMA schema1;
CRDEATE SCHEMA
=> CREATE TABLE schema1.table1 (id int);
CREATE TABLE
-
Joe
grants USAGE and CREATE privileges on schema1
to Myra
:
=> GRANT USAGE, CREATE ON SCHEMA schema1 to Myra;
GRANT PRIVILEGE
-
Myra
queries schema1.table1
, but the query fails:
=>\c - Myra
You are now connected as user Myra
=> SELECT * FROM schema1.table1;
ERROR 4367: Permission denied for relation table1
-
Joe
grants Myra
SELECT ON SCHEMA
privileges on schema1
:
=>\c - Joe
You are now connected as user Joe
=> GRANT SELECT ON SCHEMA schema1 to Myra;
GRANT PRIVILEGE
-
Joe
uses ALTER TABLE
to include SCHEMA privileges for table1
:
=> ALTER TABLE schema1.table1 INCLUDE SCHEMA PRIVILEGES;
ALTER TABLE
-
Myra
's query now succeeds:
=>\c - Myra
You are now connected as user Myra
=> SELECT * FROM schema1.table1;
id
---
(0 rows)
-
Joe
modifies schema1
to include privileges so all tables created in schema1
inherit schema privileges:
=>\c - Joe
You are now connected as user Joe
=> ALTER SCHEMA schema1 DEFAULT INCLUDE PRIVILEGES;
ALTER SCHEMA
=> CREATE TABLE schema1.table2 (id int);
CREATE TABLE
-
With inherited privileges enabled, Myra
can query table2
without Joe
having to explicitly grant privileges on the table:
=>\c - Myra
You are now connected as user Myra
=> SELECT * FROM schema1.table2;
id
---
(0 rows)
4.3.3 - Default user privileges
To set the minimum level of privilege for all users, Vertica has the special PUBLIC role, which it grants to each user automatically.
To set the minimum level of privilege for all users, Vertica has the special PUBLIC, which it grants to each user automatically. This role is automatically enabled, but the database administrator or a superuser can also grant higher privileges to users separately using GRANT statements.
Default privileges for MC users
Privileges on Management Console (MC) are managed through roles, which determine a user's access to MC and to MC-managed Vertica databases through the MC interface. MC privileges do not alter or override Vertica privileges or roles. See Users, roles, and privileges in MC for details.
4.3.4 - Effective privileges
A user's effective privileges on an object encompass privileges of all types, including:.
A user's effective privileges on an object encompass privileges of all types, including:
You can view your effective privileges on an object with the GET_PRIVILEGES_DESCRIPTION meta-function.
4.3.5 - Privileges required for common database operations
This topic lists the required privileges for database objects in Vertica.
This topic lists the required privileges for database objects in Vertica.
Unless otherwise noted, superusers can perform all operations shown in the following tables. Object owners always can perform operations on their own objects.
Schemas
The PUBLIC schema is present in any newly-created Vertica database. Newly-created users must be granted access to this schema:
=> GRANT USAGE ON SCHEMA public TO user;
A database superuser must also explicitly grant new users CREATE privileges, as well as grant them individual object privileges so the new users can create or look up objects in the PUBLIC schema.
Tables
Operation |
Required Privileges |
CREATE TABLE |
Schema: CREATE
Note
Referencing sequences in the CREATE TABLE statement requires the following privileges:
-
Sequence schema: USAGE
-
Sequence: SELECT
|
DROP TABLE |
Schema: USAGE or schema owner |
TRUNCATE TABLE |
Schema: USAGE or schema owner |
ALTER TABLE ADD/DROP/ RENAME/ALTER-TYPE COLUMN |
Schema: USAGE |
ALTER TABLE ADD/DROP CONSTRAINT |
Schema: USAGE |
ALTER TABLE PARTITION (REORGANIZE) |
Schema: USAGE |
ALTER TABLE RENAME |
USAGE and CREATE privilege on the schema that contains the table |
ALTER TABLE...SET SCHEMA |
-
New schema: CREATE
-
Old Schema: USAGE
|
SELECT |
|
INSERT |
-
Table: INSERT
-
Schema: USAGE
|
DELETE |
|
UPDATE |
|
REFERENCES |
|
ANALYZE_STATISTICS ANALYZE_STATISTICS_PARTITION |
|
DROP_STATISTICS |
|
DROP_PARTITIONS |
Schema: USAGE |
Views
Projections
Operation |
Required Privileges |
CREATE PROJECTION |
Note
If a projection is implicitly created with the table, no additional privilege is needed other than privileges for table creation.
|
AUTO/DELAYED PROJECTION |
On projections created during INSERT...SELECT or COPY operations:
-
Schema: USAGE
-
Anchor table: SELECT
|
ALTER PROJECTION |
Schema: USAGE and CREATE |
DROP PROJECTION |
Schema: USAGE or owner |
External procedures
Stored procedures
Triggers
Schedules
Libraries
User-defined functions
Note
The following table uses these abbreviations:
-
UDF = Scalar
-
UDT = Transform
-
UDAnF= Analytic
-
UDAF = Aggregate
Sequences
Resource pools
Users/profiles/roles
Object visibility
You can use one or a combination of vsql \d meta commands and SQL system tables to view objects on which you have privileges to view.
-
Use \dn to view schema names and owners
-
Use \dt to view all tables in the database, as well as the system table V_CATALOG.TABLES
-
Use \dj to view projections showing the schema, projection name, owner, and node, as well as the system table V_CATALOG.PROJECTIONS
Operation |
Required Privileges |
Look up schema |
Schema: At least one privilege |
Look up object in schema or in system tables |
|
Look up projection |
All anchor tables: At least one privilege
Schema (all anchor tables): USAGE
|
Look up resource pool |
Resource pool: SELECT |
Existence of object |
Schema: USAGE |
I/O operations
Operation |
Required Privileges |
COMMENT ON { is one of }:
|
Object owner or superuser |
Transactions
Sessions
Operation |
Required Privileges |
SET { is one of }:
|
None |
SHOW { name | ALL } |
None |
Tuning operations
Operation |
Required Privileges |
PROFILE |
Same privileges required to run the query being profiled |
EXPLAIN |
Same privileges required to run the query for which you use the EXPLAIN keyword |
TLS configuration
Cryptographic key
Certificate
4.3.6 - Database object privileges
Privileges can be granted explicitly on most user-visible objects in a Vertica database, such as tables and models.
Privileges can be granted explicitly on most user-visible objects in a Vertica database, such as tables and models. For some objects such as projections, privileges are implicitly derived from other objects.
Explicitly granted privileges
The following table provides an overview of privileges that can be explicitly granted on Vertica database objects:
Implicitly granted privileges
Superusers have unrestricted access to all non-cryptographic database metadata. For non-superusers, access to the metadata of specific objects depends on their privileges on those objects:
Metadata |
User access |
Catalog objects:
-
Tables
-
Columns
-
Constraints
-
Sequences
-
External procedures
-
Projections
-
ROS containers
|
Users must possess USAGE privilege on the schema and any type of access (SELECT) or modify privilege on the object to see catalog metadata about the object.
For internal objects such as projections and ROS containers, which have no access privileges directly associated with them, you must have the requisite privileges on the associated schema and tables to view their metadata. For example, to determine whether a table has any projection data, you must have USAGE on the table schema and SELECT on the table.
|
User sessions and functions, and system tables related to these sessions |
Non-superusers can access information about their own (current) sessions only, using the following functions:
|
Projection privileges
Projections, which store table data, do not have an owner or privileges directly associated with them. Instead, the privileges to create, access, or alter a projection are derived from the privileges that are set on its anchor tables and respective schemas.
Cryptographic privileges
Unless they have ownership, superusers only have implicit DROP privileges on keys, certificates, and TLS Configurations. This allows superusers to see the existence of these objects in their respective system tables (CRYPTOGRAPHIC_KEYS, CERTIFICATES, and TLS_CONFIGURATIONS) and DROP them, but does not allow them to see the key or certificate texts.
For details on granting additional privileges, see GRANT (key) and GRANT (TLS configuration).
4.3.7 - Granting and revoking privileges
Vertica supports GRANT and REVOKE statements to control user access to database objects—for example, GRANT (Schema) and REVOKE (Schema), GRANT (Table) and REVOKE (Table), and so on.
Vertica supports GRANT and REVOKE statements to control user access to database objects—for example, GRANT (schema) and REVOKE (schema), GRANT (table) and REVOKE (table), and so on. Typically, a superuser creates users and roles shortly after creating the database, and then uses GRANT statements to assign them privileges.
Where applicable, GRANT statements require USAGE privileges on the object schema. The following users can grant and revoke privileges:
-
Superusers: all privileges on all database objects, including the database itself
-
Non-superusers: all privileges on objects that they own
-
Grantees of privileges that include WITH GRANT OPTION: the same privileges on that object
In the following example, a dbadmin (with superuser privileges) creates user Carol
. Subsequent GRANT statements grant Carol
schema and table privileges:
-
CREATE and USAGE privileges on schema PUBLIC
-
SELECT, INSERT, and UPDATE privileges on table public.applog
. This GRANT statement also includes WITH GRANT OPTION
. This enables Carol
to grant the same privileges on this table to other users —in this case, SELECT privileges to user Tom
:
=> CREATE USER Carol;
CREATE USER
=> GRANT CREATE, USAGE ON SCHEMA PUBLIC to Carol;
GRANT PRIVILEGE
=> GRANT SELECT, INSERT, UPDATE ON TABLE public.applog TO Carol WITH GRANT OPTION;
GRANT PRIVILEGE
=> GRANT SELECT ON TABLE public.applog TO Tom;
GRANT PRIVILEGE
4.3.7.1 - Superuser privileges
A Vertica superuser is a database user—by default, named dbadmin—that is automatically created on installation.
A Vertica superuser is a database user—by default, named dbadmin
—that is automatically created on installation. Vertica superusers have complete and irrevocable authority over database users, privileges, and roles.
Important
Vertica superusers are not the same as Linux superusers with (root) privileges.
Superusers can change the privileges of any user and role, as well as override any privileges that are granted by users with the PSEUDOSUPERUSER role. They can also grant and revoke privileges on any user-owned object, and reassign object ownership.
Note
A superuser always changes a user's privileges on an object on behalf of the object owner. Thus, the
grantor
setting in system table
V_CATALOG.GRANTS always shows the object owner rather than the superuser who issued the GRANT statement.
Cryptographic privileges
For most catalog objects, superusers have all possible privileges. However, for keys, certificates, and TLS Configurations superusers only get DROP privileges by default and must be granted the other privileges by their owners. For details, see GRANT (key) and GRANT (TLS configuration).
Superusers can see the existence of all keys, certificates, and TLS Configurations, but they cannot see the text of keys or certificates unless they are granted USAGE privileges.
See also
DBADMIN
4.3.7.2 - Schema owner privileges
The schema owner is typically the user who creates the schema.
The schema owner is typically the user who creates the schema. By default, the schema owner has privileges to create objects within a schema. The owner can also alter the schema: reassign ownership, rename it, and enable or disable inheritance of schema privileges.
Schema ownership does not necessarily grant the owner access to objects in that schema. Access to objects depends on the privileges that are granted on them.
All other users and roles must be explicitly granted access to a schema by its owner or a superuser.
4.3.7.3 - Object owner privileges
The database, along with every object in it, has an owner.
The database, along with every object in it, has an owner. The object owner is usually the person who created the object, although a superuser can alter ownership of objects, such as table and sequence.
Object owners must have appropriate schema privilege to access, alter, rename, move or drop any object it owns without any additional privileges.
An object owner can also:
-
Grant privileges on their own object to other users
The WITH GRANT OPTION clause specifies that a user can grant the permission to other users. For example, if user Bob creates a table, Bob can grant privileges on that table to users Ted, Alice, and so on.
-
Grant privileges to roles
Users who are granted the role gain the privilege.
4.3.7.4 - Granting privileges
As described in Granting and Revoking Privileges, specific users grant privileges using the GRANT statement with or without the optional WITH GRANT OPTION, which allows the user to grant the same privileges to other users.
As described in Granting and revoking privileges, specific users grant privileges using the GRANT statement with or without the optional WITH GRANT OPTION, which allows the user to grant the same privileges to other users.
-
A superuser can grant privileges on all object types to other users.
-
A superuser or object owner can grant privileges to roles. Users who have been granted the role then gain the privilege.
-
An object owner can grant privileges on the object to other users using the optional WITH GRANT OPTION clause.
-
The user needs to have USAGE privilege on schema and appropriate privileges on the object.
When a user grants an explicit list of privileges, such as GRANT INSERT, DELETE, REFERENCES ON applog TO Bob
:
-
The GRANT statement succeeds only if all the roles are granted successfully. If any grant operation fails, the entire statement rolls back.
-
Vertica will return ERROR if the user does not have grant options for the privileges listed.
When a user grants ALL privileges, such as GRANT ALL ON applog TO Bob
, the statement always succeeds. Vertica grants all the privileges on which the grantor has the WITH GRANT OPTION and skips those privileges without the optional WITH GRANT OPTION.
For example, if the user Bob has delete privileges with the optional grant option on the applog table, only DELETE privileges are granted to Bob, and the statement succeeds:
=> GRANT DELETE ON applog TO Bob WITH GRANT OPTION;GRANT PRIVILEGE
For details, see the GRANT statements.
4.3.7.5 - Revoking privileges
The following non-superusers can revoke privileges on an object:.
The following non-superusers can revoke privileges on an object:
The user also must have USAGE privilege on the object's schema.
For example, the following query on system table V_CATALOG.GRANTS
shows that users u1
, u2
, and u3
have the following privileges on schema s1
and table s1.t1
:
=> SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
WHERE object_name IN ('s1', 't1') AND grantee IN ('u1', 'u2', 'u3');
object_type | object_name | grantee | grantor | privileges_description
-------------+-------------+---------+---------+---------------------------
SCHEMA | s1 | u1 | dbadmin | USAGE, CREATE
SCHEMA | s1 | u2 | dbadmin | USAGE, CREATE
SCHEMA | s1 | u3 | dbadmin | USAGE
TABLE | t1 | u1 | dbadmin | INSERT*, SELECT*, UPDATE*
TABLE | t1 | u2 | u1 | INSERT*, SELECT*, UPDATE*
TABLE | t1 | u3 | u2 | SELECT*
(6 rows)
Note
The asterisks (*) on privileges under privileges_description
indicate that the grantee can grant these privileges to other users.
In the following statements, u2
revokes the SELECT privileges that it granted on s1.t1
to u3
. Subsequent attempts by u3
to query this table return an error:
=> \c - u2
You are now connected as user "u2".
=> REVOKE SELECT ON s1.t1 FROM u3;
REVOKE PRIVILEGE
=> \c - u3
You are now connected as user "u2".
=> SELECT * FROM s1.t1;
ERROR 4367: Permission denied for relation t1
Revoking grant option
If you revoke privileges on an object from a user, that user can no longer act as grantor of those same privileges to other users. If that user previously granted the revoked privileges to other users, the REVOKE
statement must include the CASCADE
option to revoke the privilege from those users too; otherwise, it returns with an error.
For example, user u2
can grant SELECT, INSERT, and UPDATE privileges, and grants those privileges to user u4
:
=> \c - u2
You are now connected as user "u2".
=> GRANT SELECT, INSERT, UPDATE on TABLE s1.t1 to u4;
GRANT PRIVILEGE
If you query V_CATALOG.GRANTS
for privileges on table s1.t1
, it returns the following result set:
=> \ c
You are now connected as user "dbadmin".
=> SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
WHERE object_name IN ('t1') ORDER BY grantee;
object_type | object_name | grantee | grantor | privileges_description
-------------+-------------+---------+---------+------------------------------------------------------------
TABLE | t1 | dbadmin | dbadmin | INSERT*, SELECT*, UPDATE*, DELETE*, REFERENCES*, TRUNCATE*
TABLE | t1 | u1 | dbadmin | INSERT*, SELECT*, UPDATE*
TABLE | t1 | u2 | u1 | INSERT*, SELECT*, UPDATE*
TABLE | t1 | u4 | u2 | INSERT, SELECT, UPDATE
(3 rows)
Now, if user u1
wants to revoke UPDATE privileges from user u2
, the revoke operation must cascade to user u4
, who also has UPDATE privileges that were granted by u2
; otherwise, the REVOKE
statement returns with an error:
=> \c - u1
=> REVOKE update ON TABLE s1.t1 FROM u2;
ROLLBACK 3052: Dependent privileges exist
HINT: Use CASCADE to revoke them too
=> REVOKE update ON TABLE s1.t1 FROM u2 CASCADE;
REVOKE PRIVILEGE
=> \c
You are now connected as user "dbadmin".
=> SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
WHERE object_name IN ('t1') ORDER BY grantee;
object_type | object_name | grantee | grantor | privileges_description
-------------+-------------+---------+---------+------------------------------------------------------------
TABLE | t1 | dbadmin | dbadmin | INSERT*, SELECT*, UPDATE*, DELETE*, REFERENCES*, TRUNCATE*
TABLE | t1 | u1 | dbadmin | INSERT*, SELECT*, UPDATE*
TABLE | t1 | u2 | u1 | INSERT*, SELECT*
TABLE | t1 | u4 | u2 | INSERT, SELECT
(4 rows)
You can also revoke grantor privileges from a user without revoking those privileges. For example, user u1
can prevent user u2
from granting INSERT privileges to other users, but allow user u2
to retain that privilege:
=> \c - u1
You are now connected as user "u1".
=> REVOKE GRANT OPTION FOR INSERT ON TABLE s1.t1 FROM U2 CASCADE;
REVOKE PRIVILEGE
Note
The REVOKE statement must include the CASCADE, because user u2
previously granted user u4
INSERT privileges on table s1.t1
. When you revoke u2
's ability to grant this privilege, that privilege must be removed from any its grantees—in this case, user u4
.
You can confirm results of the revoke operation by querying V_CATALOG.GRANTS
for privileges on table s1.t1
:
=> \c
You are now connected as user "dbadmin".
=> SELECT object_type, object_name, grantee, grantor, privileges_description FROM v_catalog.grants
WHERE object_name IN ('t1') ORDER BY grantee;
object_type | object_name | grantee | grantor | privileges_description
-------------+-------------+---------+---------+------------------------------------------------------------
TABLE | t1 | dbadmin | dbadmin | INSERT*, SELECT*, UPDATE*, DELETE*, REFERENCES*, TRUNCATE*
TABLE | t1 | u1 | dbadmin | INSERT*, SELECT*, UPDATE*
TABLE | t1 | u2 | u1 | INSERT, SELECT*
TABLE | t1 | u4 | u2 | SELECT
(4 rows)
The query results show:
-
User u2
retains INSERT privileges on the table but can no longer grant INSERT privileges to other users (as indicated by absence of an asterisk).
-
The revoke operation cascaded down to grantee u4
, who now lacks INSERT privileges.
See also
REVOKE (table)
4.3.7.6 - Privilege ownership chains
The ability to revoke privileges on objects can cascade throughout an organization.
The ability to revoke privileges on objects can cascade throughout an organization. If the grant option was revoked from a user, the privilege that this user granted to other users will also be revoked.
If a privilege was granted to a user or role by multiple grantors, to completely revoke this privilege from the grantee the privilege has to be revoked by each original grantor. The only exception is a superuser may revoke privileges granted by an object owner, with the reverse being true, as well.
In the following example, the SELECT privilege on table t1 is granted through a chain of users, from a superuser through User3.
-
A superuser grants User1 CREATE privileges on the schema s1:
=> \c - dbadmin
You are now connected as user "dbadmin".
=> CREATE USER User1;
CREATE USER
=> CREATE USER User2;
CREATE USER
=> CREATE USER User3;
CREATE USER
=> CREATE SCHEMA s1;
CREATE SCHEMA
=> GRANT USAGE on SCHEMA s1 TO User1, User2, User3;
GRANT PRIVILEGE
=> CREATE ROLE reviewer;
CREATE ROLE
=> GRANT CREATE ON SCHEMA s1 TO User1;
GRANT PRIVILEGE
-
User1 creates new table t1 within schema s1 and then grants SELECT WITH GRANT OPTION privilege on s1.t1 to User2:
=> \c - User1
You are now connected as user "User1".
=> CREATE TABLE s1.t1(id int, sourceID VARCHAR(8));
CREATE TABLE
=> GRANT SELECT on s1.t1 to User2 WITH GRANT OPTION;
GRANT PRIVILEGE
-
User2 grants SELECT WITH GRANT OPTION privilege on s1.t1 to User3:
=> \c - User2
You are now connected as user "User2".
=> GRANT SELECT on s1.t1 to User3 WITH GRANT OPTION;
GRANT PRIVILEGE
-
User3 grants SELECT privilege on s1.t1 to the reviewer role:
=> \c - User3
You are now connected as user "User3".
=> GRANT SELECT on s1.t1 to reviewer;
GRANT PRIVILEGE
Users cannot revoke privileges upstream in the chain. For example, User2 did not grant privileges on User1, so when User1 runs the following REVOKE command, Vertica rolls back the command:
=> \c - User2
You are now connected as user "User2".
=> REVOKE CREATE ON SCHEMA s1 FROM User1;
ROLLBACK 0: "CREATE" privilege(s) for schema "s1" could not be revoked from "User1"
Users can revoke privileges indirectly from users who received privileges through a cascading chain, like the one shown in the example above. Here, users can use the CASCADE option to revoke privileges from all users "downstream" in the chain. A superuser or User1 can use the CASCADE option to revoke the SELECT privilege on table s1.t1 from all users. For example, a superuser or User1 can execute the following statement to revoke the SELECT privilege from all users and roles within the chain:
=> \c - User1
You are now connected as user "User1".
=> REVOKE SELECT ON s1.t1 FROM User2 CASCADE;
REVOKE PRIVILEGE
When a superuser or User1 executes the above statement, the SELECT privilege on table s1.t1 is revoked from User2, User3, and the reviewer role. The GRANT privilege is also revoked from User2 and User3, which a superuser can verify by querying the V_CATALOG.GRANTS system table.
=> SELECT * FROM grants WHERE object_name = 's1' AND grantee ILIKE 'User%';
grantor | privileges_description | object_schema | object_name | grantee
---------+------------------------+---------------+-------------+---------
dbadmin | USAGE | | s1 | User1
dbadmin | USAGE | | s1 | User2
dbadmin | USAGE | | s1 | User3
(3 rows)
4.3.8 - Modifying privileges
A or object owner can use one of the ALTER statements to modify a privilege, such as changing a sequence owner or table owner.
A superuser or object owner can use one of the ALTER statements to modify a privilege, such as changing a sequence owner or table owner. Reassignment to the new owner does not transfer grants from the original owner to the new owner; grants made by the original owner are dropped.
4.3.9 - Viewing privileges granted on objects
You can view information about privileges, grantors, grantees, and objects by querying these system tables:.
You can view information about privileges, grantors, grantees, and objects by querying these system tables:
An asterisk (*) appended to a privilege indicates that the user can grant the privilege to other users.
You can also view the effective privileges on a specified database object by using the GET_PRIVILEGES_DESCRIPTION meta-function.
Viewing explicitly granted privileges
To view explicitly granted privileges on objects, query the GRANTS table.
The following query returns the explicit privileges for the schema, myschema.
=> SELECT grantee, privileges_description FROM grants WHERE object_name='myschema';
grantee | privileges_description
---------+------------------------
Bob | USAGE, CREATE
Alice | CREATE
(2 rows)
Viewing inherited privileges
To view which tables and views inherit privileges from which schemas, query the INHERITING_OBJECTS table.
The following query returns the tables and views that inherit their privileges from their parent schema, customers.
=> SELECT * FROM inheriting_objects WHERE object_schema='customers';
object_id | schema_id | object_schema | object_name | object_type
-------------------+-------------------+---------------+---------------+-------------
45035996273980908 | 45035996273980902 | customers | cust_info | table
45035996273980984 | 45035996273980902 | customers | shipping_info | table
45035996273980980 | 45035996273980902 | customers | cust_set | view
(3 rows)
To view the specific privileges inherited by tables and views and information on their associated grant statements, query the INHERITED_PRIVILEGES table.
The following query returns the privileges that the tables and views inherit from their parent schema, customers.
=> SELECT object_schema,object_name,object_type,privileges_description,principal,grantor FROM inherited_privileges WHERE object_schema='customers';
object_schema | object_name | object_type | privileges_description | principal | grantor
---------------+---------------+-------------+---------------------------------------------------------------------------+-----------+---------
customers | cust_info | Table | INSERT*, SELECT*, UPDATE*, DELETE*, ALTER*, REFERENCES*, DROP*, TRUNCATE* | dbadmin | dbadmin
customers | shipping_info | Table | INSERT*, SELECT*, UPDATE*, DELETE*, ALTER*, REFERENCES*, DROP*, TRUNCATE* | dbadmin | dbadmin
customers | cust_set | View | SELECT*, ALTER*, DROP* | dbadmin | dbadmin
customers | cust_info | Table | SELECT | Val | dbadmin
customers | shipping_info | Table | SELECT | Val | dbadmin
customers | cust_set | View | SELECT | Val | dbadmin
customers | cust_info | Table | INSERT | Pooja | dbadmin
customers | shipping_info | Table | INSERT | Pooja | dbadmin
(8 rows)
Viewing effective privileges on an object
To view the current user's effective privileges on a specified database object, user the GET_PRIVILEGES_DESCRIPTION meta-function.
In the following example, user Glenn has set the REPORTER role and wants to check his effective privileges on schema s1
and table s1.articles
.
-
Table s1.articles
inherits privileges from its schema (s1
).
-
The REPORTER role has the following privileges:
-
User Glenn has the following privileges:
GET_PRIVILEGES_DESCRIPTION returns the following effective privileges for Glenn on schema s1
:
=> SELECT GET_PRIVILEGES_DESCRIPTION('schema', 's1');
GET_PRIVILEGES_DESCRIPTION
--------------------------------
SELECT, UPDATE, USAGE
(1 row)
GET_PRIVILEGES_DESCRIPTION returns the following effective privileges for Glenn on table s1.articles
:
=> SELECT GET_PRIVILEGES_DESCRIPTION('table', 's1.articles');
GET_PRIVILEGES_DESCRIPTION
--------------------------------
INSERT*, SELECT, UPDATE, DELETE
(1 row)
See also
4.4 - Access policies
CREATE ACCESS POLICY lets you create access policies on tables that specify how much data certain users and roles can query from those tables.
CREATE ACCESS POLICY lets you create access policies on tables that specify how much data certain users and roles can query from those tables. Access policies typically prevent these users from viewing the data of specific columns and rows of a table. You can apply access policies to table columns and rows. If a table has access policies on both, Vertica filters row access policies first, then filters the column access policies.
You can create most access policies for any table type—columnar, external, or flex. (You cannot create column access policies on flex tables.) You can also create access policies on any column type, including joins.
4.4.1 - Creating column access policies
CREATE ACCESS POLICY can create access policies on individual table columns, one policy per column.
CREATE ACCESS POLICY can create access policies on individual table columns, one policy per column. Each column access policy lets you specify, for different users and roles, various levels of access to the data of that column. The column access expression can also specify how to render column data for users and roles.
The following example creates an access policy on the customer_address
column in the client_dimension
table. This access policy gives non-superusers with the administrator
role full access to all data in that column, but masks customer address data from all other users:
=> CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address
-> CASE
-> WHEN ENABLED_ROLE('administrator') THEN customer_address
-> ELSE '**************'
-> END ENABLE;
CREATE ACCESS POLICY
Note
Vertica roles are compatible with LDAP users. You do not need separate LDAP roles to use column access policies with LDAP users.
Vertica uses this policy to determine the access it gives to users MaxineT and MikeL, who are assigned employee
and administrator
roles, respectively. When these users query the customer_dimension
table, Vertica applies the column access policy expression as follows:
=> \c - MaxineT;
You are now connected as user "MaxineT".
=> SET ROLE employee;
SET
=> SELECT customer_type, customer_name, customer_gender, customer_address, customer_city FROM customer_dimension;
customer_type | customer_name | customer_gender | customer_address | customer_city
---------------+-------------------------+-----------------+------------------+------------------
Individual | Craig S. Robinson | Male | ************** | Fayetteville
Individual | Mark M. Kramer | Male | ************** | Joliet
Individual | Barbara S. Farmer | Female | ************** | Alexandria
Individual | Julie S. McNulty | Female | ************** | Grand Prairie
...
=> \c - MikeL
You are now connected as user "MikeL".
=> SET ROLE administrator;
SET
=> SELECT customer_type, customer_name, customer_gender, customer_address, customer_city FROM customer_dimension;
customer_type | customer_name | customer_gender | customer_address | customer_city
---------------+-------------------------+-----------------+------------------+------------------
Individual | Craig S. Robinson | Male | 138 Alden Ave | Fayetteville
Individual | Mark M. Kramer | Male | 311 Green St | Joliet
Individual | Barbara S. Farmer | Female | 256 Cherry St | Alexandria
Individual | Julie S. McNulty | Female | 459 Essex St | Grand Prairie
...
Restrictions
The following limitations apply to access policies:
-
A column can have only one access policy.
-
Column access policies cannot be set on columns of complex types other than native arrays.
-
Column access policies cannot be set for materialized columns on flex tables. While it is possible to set an access policy for the __raw__
column, doing so restricts access to the whole table.
-
Row access policies are invalid on temporary tables and tables with aggregate projections.
-
Access policy expressions cannot contain:
-
If the query optimizer cannot replace a deterministic expression that involves only constants with their computed values, it blocks all DML operations such as INSERT.
4.4.2 - Creating row access policies
CREATE ACCESS POLICY can create a single row access policy for a given table.
CREATE ACCESS POLICY can create a single row access policy for a given table. This policy lets you specify for different users and roles various levels of access to table row data. When a user launches a query, Vertica evaluates the access policy's WHERE expression against all table rows. The query returns with only those rows where the expression evaluates to true for the current user or role.
For example, you might want to specify different levels of access to table store.store_store_sales
for four roles:
-
employee
: Users with this role should only access sales records that identify them as the employee, in column employee_key
. The following query shows how many sales records (in store.store_sales_fact
) are associated with each user (in public.emp_dimension
):
=> SELECT COUNT(sf.employee_key) AS 'Total Sales', sf.employee_key, ed.user_name FROM store.store_sales_fact sf
JOIN emp_dimension ed ON sf.employee_key=ed.employee_key
WHERE ed.job_title='Sales Associate' GROUP BY sf.employee_key, ed.user_name ORDER BY sf.employee_key
Total Sales | employee_key | user_name
-------------+--------------+-------------
533 | 111 | LucasLC
442 | 124 | JohnSN
487 | 127 | SamNS
477 | 132 | MeghanMD
545 | 140 | HaroldON
...
563 | 1991 | MidoriMG
367 | 1993 | ThomZM
(318 rows)
-
regional_manager
: Users with this role (public.emp_dimension
) should only access sales records for the sales region that they manage (store.store_dimension
):
=> SELECT distinct sd.store_region, ed.user_name, ed.employee_key, ed.job_title FROM store.store_dimension sd
JOIN emp_dimension ed ON sd.store_region=ed.employee_region WHERE ed.job_title = 'Regional Manager';
store_region | user_name | employee_key | job_title
--------------+-----------+--------------+------------------
West | JamesGD | 1070 | Regional Manager
South | SharonDM | 1710 | Regional Manager
East | BenOV | 593 | Regional Manager
MidWest | LilyCP | 611 | Regional Manager
NorthWest | CarlaTG | 1058 | Regional Manager
SouthWest | MarcusNK | 150 | Regional Manager
(6 rows)
-
dbadmin
and administrator
: Users with these roles have unlimited access to all table data.
Given these users and the data associated with them, you can create a row access policy on store.store_store_sales
that looks like this:
CREATE ACCESS POLICY ON store.store_sales_fact FOR ROWS WHERE
(ENABLED_ROLE('employee')) AND (store.store_sales_fact.employee_key IN
(SELECT employee_key FROM public.emp_dimension WHERE user_name=CURRENT_USER()))
OR
(ENABLED_ROLE('regional_manager')) AND (store.store_sales_fact.store_key IN
(SELECT sd.store_key FROM store.store_dimension sd
JOIN emp_dimension ed ON sd.store_region=ed.employee_region WHERE ed.user_name = CURRENT_USER()))
OR ENABLED_ROLE('dbadmin')
OR ENABLED_ROLE ('administrator')
ENABLE;
Important
In this example, the row policy limits access to a set of roles that are explicitly included in policy's WHERE expression. All other roles and users are implicitly denied access to the table data.
The following examples indicate the different levels of access that are available to users with the specified roles:
-
dbadmin
has access to all rows in store.store_sales_fact
:
=> \c
You are now connected as user "dbadmin".
=> SELECT count(*) FROM store.store_sales_fact;
count
---------
5000000
(1 row)
-
User LilyCP
has the role of regional_manager
, so she can access all sales data of the Midwest region that she manages:
=> \c - LilyCP;
You are now connected as user "LilyCP".
=> SET ROLE regional_manager;
SET
=> SELECT count(*) FROM store.store_sales_fact;
count
--------
782272
(1 row)
-
User SamRJ
has the role of employee
, so he can access only the sales data that he is associated with:
=> \c - SamRJ;
You are now connected as user "SamRJ".
=> SET ROLE employee;
SET
=> SELECT count(*) FROM store.store_sales_fact;
count
-------
417
(1 row)
Restrictions
The following limitations apply to row access policies:
-
A table can have only one row access policy.
-
Row access policies are invalid on the following tables:
-
You cannot create directed queries on a table with a row access policy.
4.4.3 - Access policies and DML operations
By default, Vertica abides by a rule that a user can only edit what they can see.
By default, Vertica abides by a rule that a user can only edit what they can see. That is, you must be able to view all rows and columns in the table in their original values (as stored in the table) and in their originally defined data types to perform actions that modify data on a table. For example, if a column is defined as VARCHAR(9) and an access policy on that column specifies the same column as VARCHAR(10), users using the access policy will be unable to perform the following operations:
-
INSERT
-
UPDATE
-
DELETE
-
MERGE
-
COPY
You can override this behavior by specifying GRANT TRUSTED in a new or existing access policy. This option forces the access policy to defer entirely to explicit GRANT statements when assessing whether a user can perform the above operations.
You can view existing access policies with the ACCESS_POLICY system table.
Row access
On tables where a row access policy is enabled, you can only perform DML operations when the condition in the row access policy evaluates to TRUE. For example:
t1 appears as follows:
A | B
---+---
1 | 1
2 | 2
3 | 3
Create the following row access policy on t1:
=> CREATE ACCESS POLICY ON t1 for ROWS
WHERE enabled_role('manager')
OR
A<2
ENABLE;
With this policy enabled, the following behavior exists for users who want to perform DML operations:
-
A user with the manager role can perform DML on all rows in the table, because the WHERE clause in the policy evaluates to TRUE.
-
Users with non-manager roles can only perform a SELECT to return data in column A that has a value of less than two. If the access policy has to read the data in the table to confirm a condition, it does not allow DML operations.
Column access
On tables where a column access policy is enabled, you can perform DML operations if you can view the entire column in its originally defined type.
Suppose table t1 is created with the following data types and values:
=> CREATE TABLE t1 (A int, B int);
=> INSERT INTO t1 VALUES (1,2);
=> SELECT * FROM t1;
A | B
---+---
1 | 2
(1 row)
Suppose the following access policy is created, which coerces the data type of column A from INT to VARCHAR(20) at execution time.
=> CREATE ACCESS POLICY on t1 FOR column A A::VARCHAR(20) ENABLE;
Column "A" is of type int but expression in Access Policy is of type varchar(20). It will be coerced at execution time
In this case, u1 can view column A in its entirety, but because the active access policy doesn't specify column A's original data type, u1 cannot perform DML operations on column A.
=> \c - u1
You are now connected as user "u1".
=> SELECT A FROM t1;
A
---
1
(1 row)
=> INSERT INTO t1 VALUES (3);
ERROR 6538: Unable to INSERT: "Access denied due to active access policy on table "t1" for column "A""
Overriding default behavior with GRANT TRUSTED
Specifying GRANT TRUSTED in an access policy overrides the default behavior ("users can only edit what they can see") and instructs the access policy to defer entirely to explicit GRANT statements when assessing whether a user can perform a DML operation.
Important
GRANT TRUSTED can allow users to make changes to tables even if the data is obscured by the access policy. In these cases, certain users who are constrained by the access policy but also have GRANTs on the table can make changes to the data that they cannot view or verify.
GRANT TRUSTED is useful in cases where the form the data is stored in doesn't match its semantically "true" form.
For example, when integrating with Voltage SecureData, a common use case is storing encrypted data with VoltageSecureProtect, where decryption is left to a case expression in an access policy that calls VoltageSecureAccess. In this case, while the decrypted form is intuitively understood to be the data's "true" form, it's still stored in the table in its encrypted form; users who can view the decrypted data wouldn't see the data as it was stored and therefore wouldn't be able to perform DML operations. You can use GRANT TRUSTED to override this behavior and allow users to perform these operations if they have the grants.
In this example, the customer_info table contains columns for the customer first and last name and SSN. SSNs are sensitive and access to it should be controlled, so it is encrypted with VoltageSecureProtect as it is inserted into the table:
=> CREATE TABLE customer_info(first_name VARCHAR, last_name VARCHAR, ssn VARCHAR);
=> INSERT INTO customer_info SELECT 'Alice', 'Smith', VoltageSecureProtect('998-42-4910' USING PARAMETERS format='ssn');
=> INSERT INTO customer_info SELECT 'Robert', 'Eve', VoltageSecureProtect('899-28-1303' USING PARAMETERS format='ssn');
=> SELECT * FROM customer_info;
first_name | last_name | ssn
------------+-----------+-------------
Alice | Smith | 967-63-8030
Robert | Eve | 486-41-3371
(2 rows)
In this system, the role "trusted_ssn" identifies privileged users for which Vertica will decrypt the values of the "ssn" column with VoltageSecureAccess. To allow these privileged users to perform DML operations for which they have grants, you might use the following access policy:
=> CREATE ACCESS POLICY ON customer_info FOR COLUMN ssn
CASE WHEN enabled_role('trusted_ssn') THEN VoltageSecureAccess(ssn USING PARAMETERS format='ssn')
ELSE ssn END
GRANT TRUSTED
ENABLE;
Again, note that GRANT TRUSTED allows all users with GRANTs on the table to perform the specified operations, including users without the "trusted_ssn" role.
4.4.4 - Access policies and query optimization
Access policies affect the projection designs that the Vertica Database Designer produces, and the plans that the optimizer creates for query execution.
Access policies affect the projection designs that the Vertica Database Designer produces, and the plans that the optimizer creates for query execution.
Projection designs
When Database Designer creates projections for a given table, it takes into account access policies that apply to the current user. The set of projections that Database Designer produces for the table are optimized for that user's access privileges, and other users with similar access privileges. However, these projections might be less than optimal for users with different access privileges. These differences might have some effect on how efficiently Vertica processes queries for the second group of users. When you evaluate projection designs for a table, choose a design that optimizes access for all authorized users.
Query rewrite
The Vertica optimizer enforces access policies by rewriting user queries in its query plan, which can affect query performance. For example, the clients table has row and column access policies, both enabled. When a user queries this table, the query optimizer produces a plan that rewrites the query so it includes both policies:
=> SELECT * FROM clients;
The query optimizer produces a query plan that rewrites the query as follows:
SELECT * FROM (
SELECT custID, password, CASE WHEN enabled_role('manager') THEN SSN ELSE substr(SSN, 8, 4) END AS SSN FROM clients
WHERE enabled_role('broker') AND
clients.clientID IN (SELECT brokers.clientID FROM brokers WHERE broker_name = CURRENT_USER())
) clients;
4.4.5 - Managing access policies
By default, you can only manage access policies on tables that you own.
By default, you can only manage access policies on tables that you own. You can optionally restrict access policy management to superusers with the AccessPolicyManagementSuperuserOnly parameter (false by default):
=> ALTER DATABASE DEFAULT SET PARAMETER AccessPolicyManagementSuperuserOnly = 1;
ALTER DATABASE
You can view and manage access policies for tables in several ways:
Viewing access policies
You can view access policies in two ways:
-
Query system table ACCESS_POLICY. For example, the following query returns all access policies on table public.customer_dimension
:
=> \x
=> SELECT policy_type, is_policy_enabled, table_name, column_name, expression FROM access_policy WHERE table_name = 'public.customer_dimension';
-[ RECORD 1 ]-----+----------------------------------------------------------------------------------------
policy_type | Column Policy
is_policy_enabled | Enabled
table_name | public.customer_dimension
column_name | customer_address
expression | CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END
-
Export table DDL from the database catalog with EXPORT_TABLES, EXPORT_OBJECTS, or EXPORT_CATALOG. For example:
=> SELECT export_tables('','customer_dimension');
export_tables
-----------------------------------------------------------------------------
CREATE TABLE public.customer_dimension
(
customer_key int NOT NULL,
customer_type varchar(16),
customer_name varchar(256),
customer_gender varchar(8),
...
CONSTRAINT C_PRIMARY PRIMARY KEY (customer_key) DISABLED
);
CREATE ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address CASE WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END ENABLE;
Modifying access policy expression
ALTER ACCESS POLICY can modify the expression of an existing access policy. For example, you can modify the access policy in the earlier example by extending access to the dbadmin role:
=> ALTER ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address
CASE WHEN enabled_role('dbadmin') THEN customer_address
WHEN enabled_role('administrator') THEN customer_address
ELSE '**************' END ENABLE;
ALTER ACCESS POLICY
Querying system table ACCESS_POLICY confirms this change:
=> SELECT policy_type, is_policy_enabled, table_name, column_name, expression FROM access_policy
WHERE table_name = 'public.customer_dimension' AND column_name='customer_address';
-[ RECORD 1 ]-----+-------------------------------------------------------------------------------------------------------------------------------------------
policy_type | Column Policy
is_policy_enabled | Enabled
table_name | public.customer_dimension
column_name | customer_address
expression | CASE WHEN enabled_role('dbadmin') THEN customer_address WHEN enabled_role('administrator') THEN customer_address ELSE '**************' END
Enabling and disabling access policies
Owners of a table can enable and disable its row and column access policies.
Row access policies
You enable and disable row access policies on a table:
ALTER ACCESS POLICY ON [schema.]table FOR ROWS { ENABLE | DISABLE }
The following examples disable and then re-enable the row access policy on table customer_dimension
:
=> ALTER ACCESS POLICY ON customer_dimension FOR ROWS DISABLE;
ALTER ACCESS POLICY
=> ALTER ACCESS POLICY ON customer_dimension FOR ROWS ENABLE;
ALTER ACCESS POLICY
Column access policies
You enable and disable access policies on a table column as follows:
ALTER ACCESS POLICY ON [schema.]table FOR COLUMN column { ENABLE | DISABLE }
The following examples disable and then re-enable the same column access policy on customer_dimension.customer_address
:
=> ALTER ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address DISABLE;
ALTER ACCESS POLICY
=> ALTER ACCESS POLICY ON public.customer_dimension FOR COLUMN customer_address ENABLE;
ALTER ACCESS POLICY
Copying access polices
You copy access policies from one table to another as follows. Non-superusers must have ownership of both the source and destination tables:
ALTER ACCESS POLICY ON [schema.]table { FOR COLUMN column | FOR ROWS } COPY TO TABLE table
When you create a copy of a table or move its contents with the following functions (but not CREATE TABLE AS SELECT or CREATE TABLE LIKE), the access policies of the original table are copied to the new/destination table:
To copy access policies to another table, use ALTER ACCESS POLICY.
Note
If you rename a table with
ALTER TABLE...RENAME TO, the access policies that were stored under the previous name are stored under the table's new name.
For example, you can copy a row access policy as follows:
=> ALTER ACCESS POLICY ON public.emp_dimension FOR ROWS COPY TO TABLE public.regional_managers_dimension;
The following statement copies the access policy on column employee_key
from table public.emp_dimension
to store.store_sales_fact
:
=> ALTER ACCESS POLICY ON public.emp_dimension FOR COLUMN employee_key COPY TO TABLE store.store_sales_fact;
Note
The copied policy retains the source policy's enabled/disabled settings.
5 - Using the administration tools
The Vertica Administration tools allow you to easily perform administrative tasks.
The Vertica Administration tools allow you to easily perform administrative tasks. You can perform most Vertica database administration tasks with Administration Tools.
Run Administration Tools using the Database Superuser account on the Administration host, if possible. Make sure that no other Administration Tools processes are running.
If the Administration host is unresponsive, run Administration Tools on a different node in the cluster. That node permanently takes over the role of Administration host.
5.1 - Running the administration tools
Administration tools, or "admintools," supports various commands to manage your database.
Administration tools, or "admintools," supports various commands to manage your database.
To run admintools, you must have SSH and local connections enabled for the dbadmin
user.
Syntax
/opt/vertica/bin/admintools [--debug ][
{ -h | --help }
| { -a | --help_all}
| { -t | --tool } name_of_tool [options]
]
--debug |
If you include this option, Vertica logs debug information.
Note
You can specify the debug option with or without naming a specific tool. If you specify debug with a specific tool, Vertica logs debug information during tool execution. If you do not specify a tool, Vertica logs debug information when you run tools through the admintools user interface.
|
-h
--help |
Outputs abbreviated help. |
-a
--help_all |
Outputs verbose help, which lists all command-line sub-commands and options. |
{ -t | --tool } name_of_tool [ options ] |
Specifies the tool to run, where name_of_tool is one of the tools described in the help output, and options are one or more comma-delimited tool arguments.
Note
Enter admintools -h to see the list of tools available. Enter admintools -t name_of_tool --help to review a specific tool's options.
|
An unqualified admintools
command displays the Main Menu dialog box.
If you are unfamiliar with this type of interface, read Using the administration tools interface.
Privileges
dbadmin
user
5.2 - First login as database administrator
The first time you log in as the and run the Administration Tools, the user interface displays.
The first time you log in as the Database Superuser and run the Administration Tools, the user interface displays.
-
In the end-user license agreement (EULA ) window, type accept
to proceed.
A window displays, requesting the location of the license key file you downloaded from the Vertica website. The default path is
/tmp/vlicense.dat
.
-
Type the absolute path to your license key (for example,
5.3 - Using the administration tools interface
The Vertica Administration Tools are implemented using Dialog, a graphical user interface that works in terminal (character-cell) windows.The interface responds to mouse clicks in some terminal windows, particularly local Linux windows, but you might find that it responds only to keystrokes.
The Vertica Administration Tools are implemented using Dialog, a graphical user interface that works in terminal (character-cell) windows.The interface responds to mouse clicks in some terminal windows, particularly local Linux windows, but you might find that it responds only to keystrokes. Thus, this section describes how to use the Administration Tools using only keystrokes.
Note
This section does not describe every possible combination of keystrokes you can use to accomplish a particular task. Feel free to experiment and to use whatever keystrokes you prefer.
Enter [return]
In all dialogs, when you are ready to run a command, select a file, or cancel the dialog, press the Enter key. The command descriptions in this section do not explicitly instruct you to press Enter.
OK - cancel - help
The OK, Cancel, and Help buttons are present on virtually all dialogs. Use the tab, space bar, or right and left arrow keys to select an option and then press Enter. The same keystrokes apply to dialogs that present a choice of Yes or No. |
|
Some dialogs require that you choose one command from a menu. Type the alphanumeric character shown or use the up and down arrow keys to select a command and then press Enter. |
|
List dialogs
In a list dialog, use the up and down arrow keys to highlight items, then use the space bar to select the items (which marks them with an X). Some list dialogs allow you to select multiple items. When you have finished selecting items, press Enter. |
|
In a form dialog (also referred to as a dialog box), use the tab key to cycle between OK, Cancel, Help, and the form field area. Once the cursor is in the form field area, use the up and down arrow keys to select an individual field (highlighted) and enter information. When you have finished entering information in all fields, press Enter.
Online help is provided in the form of text dialogs. If you have trouble viewing the help, see Notes for remote terminal users.
5.4 - Notes for remote terminal users
The appearance of the graphical interface depends on the color and font settings used by your terminal window.
The appearance of the graphical interface depends on the color and font settings used by your terminal window. The screen captures in this document were made using the default color and font settings in a PuTTy terminal application running on a Windows platform.
Note
If you are using a remote terminal application, such as PuTTy or a Cygwin bash shell, make sure your window is at least 81 characters wide and 23 characters high.
If you are using PuTTY, you can make the Administration Tools look like the screen captures in this document:
-
In a PuTTY window, right click the title area and select Change Settings.
-
Create or load a saved session.
-
In the Category dialog, click Window > Appearance.
-
In the Font settings, click the Change... button.
-
Select Font: Courier New: Regular Size: 10
-
Click Apply.
Repeat these steps for each existing session that you use to run the Administration Tools.
You can also change the translation to support UTF-8:
-
In a PuTTY window, right click the title area and select Change Settings.
-
Create or load a saved session.
-
In the Category dialog, click Window > Translation.
-
In the "Received data assumed to be in which character set" drop-down menu, select UTF-8.
-
Click Apply.
5.5 - Using administration tools help
The Help on Using the Administration Tools command displays a help screen about using the Administration Tools.
The Help on Using the Administration Tools command displays a help screen about using the Administration Tools.
Most of the online help in the Administration Tools is context-sensitive. For example, if you use up/down arrows to select a command, press tab to move to the Help button, and press return, you get help on the selected command.
-
Use the up and down arrow keys to choose the command for which you want help.
-
Use the Tab key to move the cursor to the Help button.
-
Press Enter (Return).
In a dialog box
-
Use the up and down arrow keys to choose the field on which you want help.
-
Use the Tab key to move the cursor to the Help button.
-
Press Enter (Return).
Some help files are too long for a single screen. Use the up and down arrow keys to scroll through the text.
5.6 - Distributing changes made to the administration tools metadata
Administration Tools-specific metadata for a failed node will fall out of synchronization with other cluster nodes if you make the following changes:.
Administration Tools-specific metadata for a failed node will fall out of synchronization with other cluster nodes if you make the following changes:
When you restore the node to the database cluster, you can use the Administration Tools to update the node with the latest Administration Tools metadata:
-
Log on to a host that contains the metadata you want to transfer and start the Administration Tools. (See Using the administration tools.)
-
On the Main Menu in the Administration Tools, select Configuration Menu and click OK.
-
On the Configuration Menu, select Distribute Config Files and click OK.
-
Select AdminTools Meta-Data.
The Administration Tools metadata is distributed to every host in the cluster.
-
Restart the database.
5.7 - Administration tools and Management Console
You can perform most database administration tasks using the Administration Tools, but you have the additional option of using the more visual and dynamic.
You can perform most database administration tasks using the Administration Tools, but you have the additional option of using the more visual and dynamic Management Console.
The following table compares the functionality available in both interfaces. Continue to use Administration Tools and the command line to perform actions not yet supported by Management Console.
Vertica Functionality |
Management Console |
Administration Tools |
Use a Web interface for the administration of Vertica |
Yes |
No |
Manage/monitor one or more databases and clusters through a UI |
Yes |
No |
Manage multiple databases on different clusters |
Yes |
Yes |
View database cluster state |
Yes |
Yes |
View multiple cluster states |
Yes |
No |
Connect to the database |
Yes |
Yes |
Start/stop an existing database |
Yes |
Yes |
Stop/restart Vertica on host |
Yes |
Yes |
Kill a Vertica process on host |
No |
Yes |
Create one or more databases |
Yes |
Yes |
View databases |
Yes |
Yes |
Remove a database from view |
Yes |
No |
Drop a database |
Yes |
Yes |
Create a physical schema design (Database Designer) |
Yes |
Yes |
Modify a physical schema design (Database Designer) |
Yes |
Yes |
Set the restart policy |
No |
Yes |
Roll back database to the Last Good Epoch |
No |
Yes |
Manage clusters (add, replace, remove hosts) |
Yes |
Yes |
Rebalance data across nodes in the database |
Yes |
Yes |
Configure database parameters dynamically |
Yes |
No |
View database activity in relation to physical resource usage |
Yes |
No |
View alerts and messages dynamically |
Yes |
No |
View current database size usage statistics |
Yes |
No |
View database size usage statistics over time |
Yes |
No |
Upload/upgrade a license file |
Yes |
Yes |
Warn users about license violation on login |
Yes |
Yes |
Create, edit, manage, and delete users/user information |
Yes |
No |
Use LDAP to authenticate users with company credentials |
Yes |
Yes |
Manage user access to MC through roles |
Yes |
No |
Map Management Console users to a Vertica database |
Yes |
No |
Enable and disable user access to MC and/or the database |
Yes |
No |
Audit user activity on database |
Yes |
No |
Hide features unavailable to a user through roles |
Yes |
No |
Generate new user (non-LDAP) passwords |
Yes |
No |
Management Console Provides some, but Not All of the Functionality Provided By the Administration Tools. MC Also Provides Functionality Not Available in the Administration Tools.
See also
5.8 - Administration tools reference
Administration Tools, or "admintools," uses the open-source vertica-python client to perform operations on the database.
Administration Tools, or "admintools," uses the open-source vertica-python client to perform operations on the database.
The follow sections explain in detail all the steps you can perform with Vertica Administration Tools:
5.8.1 - Viewing database cluster state
This tool shows the current state of the nodes in the database.
This tool shows the current state of the nodes in the database.
-
On the Main Menu, select View Database Cluster State, and click OK.
The normal state of a running database is ALL UP. The normal state of a stopped database is ALL DOWN.
-
If some hosts are UP and some DOWN, restart the specific host that is down using Restart Vertica on Host from the Administration Tools, or you can start the database as described in Starting and Stopping the Database (unless you have a known node failure and want to continue in that state.)
Nodes shown as INITIALIZING or RECOVERING indicate that Failure recovery is in progress.
Nodes in other states (such as NEEDS_CATCHUP) are transitional and can be ignored unless they persist.
See also
5.8.2 - Connecting to the database
This tool connects to a running with.
This tool connects to a running database with vsql. You can use the Administration Tools to connect to a database from any node within the database while logged in to any user account with access privileges. You cannot use the Administration Tools to connect from a host that is not a database node. To connect from other hosts, run vsql as described in Connecting from the command line.
-
On the Main Menu, click Connect to Database, and then click OK.
-
Supply the database password if asked:
Password:
When you create a new user with the CREATE USER command, you can configure the password or leave it empty. You cannot bypass the password if the user was created with a password configured. You can change a user's password using the ALTER USER command.
The Administration Tools connect to the database and transfer control to vsql.
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
=>
See Using vsql for more information.
Note
After entering your password, you may be prompted to change your password if it has expired. See
Configuring client authentication for details of password security.
See also
5.8.3 - Restarting Vertica on host
This tool restarts the Vertica process on one or more hosts in a running database.
This tool restarts the Vertica process on one or more hosts in a running database. Use this tool if the Vertica process stopped or was killed on the host.
-
To view the current state nodes in cluster, on the Main Menu, select View Database Cluster State.
-
Click OK to return to the Main Menu.
-
If one or more nodes are down, select Restart Vertica on Host, and click OK.
-
Select the database that contains the host that you want to restart, and click OK.
-
Select the one or more hosts to restart, and click OK.
-
Enter the database password.
-
Select View Database Cluster State again to verify all nodes are up.
5.8.4 - Configuration menu options
The Configuration Menu allows you to perform the following tasks:.
The Configuration Menu allows you to perform the following tasks:
5.8.4.1 - Creating a database
Use the procedures below to create either an Enterprise Mode or Eon Mode database with admintools.
Use the procedures below to create either an Enterprise Mode or Eon Mode database with admintools. To create a database with an in-browser wizard in Management Console, see Creating a database using MC. For details about creating a database with admintools through the command line, see Writing administration tools scripts.
Create an Enterprise Mode database
-
On the Configuration Menu, click Create Database. Click OK.
-
Select Enterprise Mode as your database mode.
-
Enter the name of the database and an optional comment. Click OK.
-
Enter a password. See Creating a database name and password for rules.
If you do not enter a password, you are prompted to confirm: Yes to enter a superuser password, No to create a database without one.
Caution
If you do not enter a password at this point, superuser password is set to empty. Unless the database is for evaluation or academic purposes, Vertica strongly recommends that you enter a superuser password.
-
If you entered a password, enter the password again.
-
Select the hosts to include in the database. The hosts in this list are the ones that were specified at installation time (
install_vertica -
s).
-
Specify the directories in which to store the catalog and data files.
Note
Catalog and data paths must contain only alphanumeric characters and cannot have leading space characters. Failure to comply with these restrictions could result in database creation failure.
Note
Do not use a shared directory for more than one node. Data and catalog directories must be distinct for each node. Multiple nodes must not be allowed to write to the same data or catalog directory.
-
Check the current database definition for correctness. Click Yes to proceed.
-
A message indicates that you have successfully created a database. Click OK.
You can also create an Enterprise Mode database using admintools through the command line, for example:
$ admintools -t create_db --data_path=/home/dbadmin --catalog_path=/home/dbadmin --database=verticadb --password=password --hosts=localhost
For more information, see Writing administration tools scripts.
Create an Eon Mode database
Note
Currently, the admintools menu interface does not support creating an Eon Mode database on Google Cloud Platform. Use the MC or the admintools command line to create an Eon Mode database instead.
-
On the Configuration Menu, click Create Database. Click OK.
-
Select Eon Mode as your database mode.
-
Enter the name of the database and an optional comment. Click OK.
-
Enter a password. See Creating a database name and password for rules.
AWS only: If you do not enter a password, you are prompted to confirm: Yes to enter a superuser password, No to create a database without one.
Caution
If you do not enter a password at this point, superuser password is set to empty. Unless the database is for evaluation or academic purposes, Vertica strongly recommends that you enter a superuser password.
-
If you entered a password, enter the password again.
-
Select the hosts to include in the database. The hosts in this list are those specified at installation time (
install_vertica -s
).
-
Specify the directories in which to store the catalog and depot, depot size, communal storage location, and number of shards.
-
Depot Size: Use an integer followed by %, K, G, or T. Default is 60% of the disk total disk space of the filesystem storing the depot.
-
Communal Storage: Use an existing Amazon S3 bucket in the same region as your instances. Specify a new subfolder name, which Vertica will dynamically create within the existing S3 bucket. For example, s3://existingbucket/newstorage1
. You can create a new subfolder within existing ones, but database creation will roll back if you do not specify any new subfolder name.
-
Number of Shards: Use a whole number. The default is equal to the number of nodes. For optimal performance, the number of shards should be no greater than 2x the number of nodes. When the number of nodes is greater than the number of shards (with ETS), the throughput of dashboard queries improves. When the number of shards exceeds the number of nodes, you can expand the cluster in the future to improve the performance of long analytic queries.
Note
Catalog and depot paths must contain only alphanumeric characters and cannot have leading space characters. Failure to comply with these restrictions could result in database creation failure.
-
Check the current database definition for correctness. Click Yes to proceed.
-
A message indicates that you successfully created a database. Click OK.
In on-premises, AWS, and Azure environments, you can create an Eon Mode database using admintools through the command line. For instructions specific to your environment, see Create a database in Eon Mode.
5.8.4.2 - Dropping a database
This tool drops an existing.
This tool drops an existing database. Only the Database Superuser is allowed to drop a database.
-
Stop the database.
-
On the Configuration Menu, click Drop Database and then click OK.
-
Select the database to drop and click OK.
-
Click Yes to confirm that you want to drop the database.
-
Type yes and click OK to reconfirm that you really want to drop the database.
-
A message indicates that you have successfully dropped the database. Click OK.
When Vertica drops the database, it also automatically drops the node definitions that refer to the database . The following exceptions apply:
-
Another database uses a node definition. If another database refers to any of these node definitions, none of the node definitions are dropped.
-
A node definition is the only node defined for the host. (Vertica uses node definitions to locate hosts that are available for database creation, so removing the only node defined for a host would make the host unavailable for new databases.)
5.8.4.3 - Viewing a database
This tool displays the characteristics of an existing .
This tool displays the characteristics of an existing database.
-
On the Configuration Menu, select View Database and click OK.
-
Select the database to view.
-
Vertica displays the following information about the database:
-
The name of the database.
-
The name and location of the log file for the database.
-
The hosts within the database cluster.
-
The value of the restart policy setting.
Note: This setting determines whether nodes within a K-Safe database are restarted when they are rebooted. See Setting the restart policy.
-
The database port.
-
The name and location of the catalog directory.
5.8.4.4 - Setting the restart policy
The Restart Policy enables you to determine whether or not nodes in a K-Safe are automatically restarted when they are rebooted.
The Restart Policy enables you to determine whether or not nodes in a K-Safe database are automatically restarted when they are rebooted. Since this feature does not automatically restart nodes if the entire database is DOWN, it is not useful for databases that are not K-Safe.
To set the Restart Policy for a database:
-
Open the Administration Tools.
-
On the Main Menu, select Configuration Menu, and click OK.
-
In the Configuration Menu, select Set Restart Policy, and click OK.
-
Select the database for which you want to set the Restart Policy, and click OK.
-
Select one of the following policies for the database:
-
Never — Nodes are never restarted automatically.
-
K-Safe — Nodes are automatically restarted if the database cluster is still UP. This is the default setting.
-
Always — Node on a single node database is restarted automatically.
Note
Always does not work if a single node database was not shutdown cleanly or crashed.
-
Click OK.
Best practice for restoring failed hardware
Following this procedure will prevent Vertica from misdiagnosing missing disk or bad mounts as data corruptions, which would result in a time-consuming, full-node recovery.
If a server fails due to hardware issues, for example a bad disk or a failed controller, upon repairing the hardware:
-
Reboot the machine into runlevel 1, which is a root and console-only mode.
Runlevel 1 prevents network connectivity and keeps Vertica from attempting to reconnect to the cluster.
-
In runlevel 1, validate that the hardware has been repaired, the controllers are online, and any RAID recover is able to proceed.
Note
You do not need to initialize RAID recover in runlevel 1; simply validate that it can recover.
-
Once the hardware is confirmed consistent, only then reboot to runlevel 3 or higher.
At this point, the network activates, and Vertica rejoins the cluster and automatically recovers any missing data. Note that, on a single-node database, if any files that were associated with a projection have been deleted or corrupted, Vertica will delete all files associated with that projection, which could result in data loss.
5.8.4.5 - Installing external procedure executable files
-
Run the Administration tools.
$ /opt/vertica/bin/adminTools
-
On the AdminTools Main Menu, click Configuration Menu, and then click OK.
-
On the Configuration Menu, click Install External Procedure and then click OK.
-
Select the database on which you want to install the external procedure.
-
Either select the file to install or manually type the complete file path, and then click OK.
-
If you are not the superuser, you are prompted to enter your password and click OK.
The Administration Tools automatically create the database-name
/procedures
directory on each node in the database and installs the external procedure in these directories for you.
-
Click OK in the dialog that indicates that the installation was successful.
5.8.5 - Advanced menu options
The Advanced Menu options allow you to perform the following tasks:.
The Advanced Menu options allow you to perform the following tasks:
5.8.5.1 - Rolling back the database to the last good epoch
Vertica provides the ability to roll the entire database back to a specific primarily to assist in the correction of human errors during data loads or other accidental corruptions.
Vertica provides the ability to roll the entire database back to a specific epoch primarily to assist in the correction of human errors during data loads or other accidental corruptions. For example, suppose that you have been performing a bulk load and the cluster went down during a particular COPY command. You might want to discard all epochs back to the point at which the previous COPY command committed and run the one that did not finish again. You can determine that point by examining the log files (see Monitoring the Log Files).
-
On the Advanced Menu, select Roll Back Database to Last Good Epoch.
-
Select the database to roll back. The database must be stopped.
-
Accept the suggested restart epoch or specify a different one.
-
Confirm that you want to discard the changes after the specified epoch.
The database restarts successfully.
Important
The default value of HistoryRetentionTime
is 0, which means that Vertica only keeps historical data when nodes are down. This settings prevents the use of the Administration tools 'Roll Back Database to Last Good Epoch' option because the AHM remains close to the current epoch. Vertica cannot roll back to an epoch that precedes the AHM.
If you rely on the Roll Back option to remove recently loaded data, consider setting a day-wide window for removing loaded data. For example:
=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = 86400;
5.8.5.2 - Stopping Vertica on host
This command attempts to gracefully shut down the Vertica process on a single node.
This command attempts to gracefully shut down the Vertica process on a single node.
Caution
Do not use this command to shut down the entire cluster. Instead,
stop the database to perform a clean shutdown that minimizes data loss.
-
On the Advanced Menu, select Stop Vertica on Host and click OK.
-
Select the hosts to stop.
-
Confirm that you want to stop the hosts.
If the command succeeds View Database Cluster State shows that the selected hosts are DOWN.
If the command fails to stop any selected nodes, proceed to Killing Vertica Process on Host.
5.8.5.3 - Killing the Vertica process on host
This command sends a kill signal to the Vertica process on a node.
This command sends a kill signal to the Vertica process on a node.
-
On the Advanced menu, select Kill Vertica Process on Host and click OK.
-
Select the hosts on which to kills the Vertica process.
-
Confirm that you want to stop the processes.
-
If the command succeeds, View Database Cluster State shows that the selected hosts are DOWN.
5.8.5.4 - Upgrading a Vertica license key
The following steps are for licensed Vertica users.
The following steps are for licensed Vertica users. Completing the steps copies a license key file into the database. See Managing licenses for more information.
-
On the Advanced menu select Upgrade License Key . Click OK.
-
Select the database for which to upgrade the license key.
-
Enter the absolute pathname of your downloaded license key file (for example,
/tmp/vlicense.dat
). Click OK.
-
Click OK when you see a message indicating that the upgrade succeeded.
Note
If you are using Vertica Community Edition, follow the instructions in
Vertica license changes for instructions to upgrade to a Vertica Premium Edition license key.
5.8.5.5 - Managing clusters
Cluster Management lets you add, replace, or remove hosts from a database cluster.
Cluster Management lets you add, replace, or remove hosts from a database cluster. These processes are usually part of a larger process of adding, removing, or replacing a database node.
Using cluster management
To use Cluster Management:
-
From the Main Menu, select Advanced Menu, and then click OK.
-
In the Advanced Menu, select Cluster Management, and then click OK.
-
Select one of the following, and then click OK.
5.8.5.6 - Getting help on administration tools
The Help Using the Administration Tools command displays a help screen about using the Administration Tools.
The Help Using the Administration Tools command displays a help screen about using the Administration Tools.
Most of the online help in the Administration Tools is context-sensitive. For example, if you up the use up/down arrows to select a command, press tab to move to the Help button, and press return, you get help on the selected command.
5.8.5.7 - Administration tools metadata
The Administration Tools configuration data (metadata) contains information that databases need to start, such as the hostname/IP address of each participating host in the database cluster.
The Administration Tools configuration data (metadata) contains information that databases need to start, such as the hostname/IP address of each participating host in the database cluster.
To facilitate hostname resolution within the Administration Tools, at the command line, and inside the installation utility, Vertica enforces all hostnames you provide through the Administration Tools to use IP addresses:
-
During installation
Vertica immediately converts any hostname you provide through command line options --hosts
, -add-hosts
or --remove-hosts
to its IP address equivalent.
-
If you provide a hostname during installation that resolves to multiple IP addresses (such as in multi-homed systems), the installer prompts you to choose one IP address.
-
Vertica retains the name you give for messages and prompts only; internally it stores these hostnames as IP addresses.
-
Within the Administration Tools
All hosts are in IP form to allow for direct comparisons (for example db = database = database.example.com).
-
At the command line
Vertica converts any hostname value to an IP address that it uses to look up the host in the configuration metadata. If a host has multiple IP addresses that are resolved, Vertica tests each IP address to see if it resides in the metadata, choosing the first match. No match indicates that the host is not part of the database cluster.
Metadata is more portable because Vertica does not require the names of the hosts in the cluster to be exactly the same when you install or upgrade your database.
5.8.6 - Administration tools connection behavior and requirements
The behavior of admintools when it connects to and performs operations on a database may vary based on your configuration.
The behavior of admintools when it connects to and performs operations on a database may vary based on your configuration. In particular, admintools considers its connection to other nodes, the status of those nodes, and the authentication method used by dbadmin.
Connection requirements and authentication
-
admintools uses passwordless SSH connections between cluster hosts for most operations, which is configured or confirmed during installation with the install_vertica script
-
For most situations, when issuing commands to the database, admintools prefers to uses its SSH connection to a target host and uses a localhost client connection to the Vertica database
-
The incoming IP address determines the authentication method used. That is, a client connection may have different behavior from a local connection, which may be trusted by default
-
dbadmin should have a local trust or password-based authentication method
-
When deciding which host to use for multi-step operations, admintools prefers localhost, and then to reconnect to known-to-be-good nodes
K-safety support
The Administration Tools allow certain operations on a K-Safe database, even if some nodes are unresponsive.
The database must have been marked as K-Safe using the MARK_DESIGN_KSAFE function.
The following management functions within the Administration Tools are operational when some nodes are unresponsive.
-
View database cluster state
-
Connect to database
-
Start database (including manual recovery)
-
Stop database
-
Replace node (assuming node that is down is the one being replaced)
-
View database parameters
-
Upgrade license key
The following management functions within the Administration Tools require that all nodes be UP in order to be operational:
5.8.7 - Writing administration tools scripts
You can invoke most Administration Tools from the command line or a shell script.
You can invoke most Administration Tools from the command line or a shell script.
Syntax
/opt/vertica/bin/admintools {
{ -h | --help }
| { -a | --help_all}
| { [--debug ] { -t | --tool } toolname [ tool-args ] }
}
Note
For convenience, add
/opt/vertica/bin
to your search path.
Parameters
-h
-help |
Outputs abbreviated help. |
-a
-help_all |
Outputs verbose help, which lists all command-line sub-commands and options. |
[debug] { -t | -tool } toolname [ args ] |
Specifies the tool to run, where toolname is one of the tools listed in the help output described below, and args is one or more comma-delimited toolname arguments. If you include the debug option, Vertica logs debug information during tool execution. |
To return a list of all available tools, enter admintools -h
at a command prompt.
To display help for a specific tool and its options or commands, qualify the specified tool name with --help
or -h
, as shown in the example below:
$ admintools -t connect_db --help
Usage: connect_db [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to connect
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
To list all available tools and their commands and options in individual help text, enter admintools -a
.
Usage:
adminTools [-t | --tool] toolName [options]
Valid tools are:
command_host
connect_db
create_db
db_add_node
db_add_subcluster
db_remove_node
db_remove_subcluster
db_replace_node
db_status
distribute_config_files
drop_db
host_to_node
install_package
install_procedure
kill_host
kill_node
license_audit
list_allnodes
list_db
list_host
list_node
list_packages
logrotate
node_map
re_ip
rebalance_data
restart_db
restart_node
restart_subcluster
return_epoch
revive_db
set_restart_policy
set_ssl_params
show_active_db
start_db
stop_db
stop_host
stop_node
stop_subcluster
uninstall_package
upgrade_license_key
view_cluster
-------------------------------------------------------------------------
Usage: command_host [options]
Options:
-h, --help show this help message and exit
-c CMD, --command=CMD
Command to run
-F, --force Provide the force cleanup flag. Only applies to start,
restart, condrestart. For other options it is ignored.
-------------------------------------------------------------------------
Usage: connect_db [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to connect
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-------------------------------------------------------------------------
Usage: create_db [options]
Options:
-h, --help show this help message and exit
-D DATA, --data_path=DATA
Path of data directory[optional] if not using compat21
-c CATALOG, --catalog_path=CATALOG
Path of catalog directory[optional] if not using
compat21
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
-d DB, --database=DB Name of database to be created
-l LICENSEFILE, --license=LICENSEFILE
Database license [optional]
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes [optional]
-P POLICY, --policy=POLICY
Database restart policy [optional]
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts to participate in
database
--shard-count=SHARD_COUNT
[Eon only] Number of shards in the database
--communal-storage-location=COMMUNAL_STORAGE_LOCATION
[Eon only] Location of communal storage
-x COMMUNAL_STORAGE_PARAMS, --communal-storage-params=COMMUNAL_STORAGE_PARAMS
[Eon only] Location of communal storage parameter file
--depot-path=DEPOT_PATH
[Eon only] Path to depot directory
--depot-size=DEPOT_SIZE
[Eon only] Size of depot
--force-cleanup-on-failure
Force removal of existing directories on failure of
command
--force-removal-at-creation
Force removal of existing directories before creating
the database
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: db_add_node [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of the database
-s HOSTS, --hosts=HOSTS
Comma separated list of hosts to add to database
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-a AHOSTS, --add=AHOSTS
Comma separated list of hosts to add to database
-c SCNAME, --subcluster=SCNAME
Name of subcluster for the new node
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
-------------------------------------------------------------------------
Usage: db_add_subcluster [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be modified
-s HOSTS, --hosts=HOSTS
Comma separated list of hosts to add to the subcluster
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-c SCNAME, --subcluster=SCNAME
Name of the new subcluster for the new node
--is-primary Create primary subcluster
--is-secondary Create secondary subcluster
--control-set-size=CONTROLSETSIZE
Set the number of nodes that will run spread within
the subcluster
--like=CLONESUBCLUSTER
Name of an existing subcluster from which to clone
properties for the new subcluster
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: db_remove_node [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be modified
-s HOSTS, --hosts=HOSTS
Name of the host to remove from the db
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
--skip-directory-cleanup
Caution: this option will force you to do a manual
cleanup. This option skips directory deletion during
remove node. This is best used in a cloud environment
where the hosts being removed will be subsequently
discarded.
-------------------------------------------------------------------------
Usage: db_remove_subcluster [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be modified
-c SCNAME, --subcluster=SCNAME
Name of subcluster to be removed
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
--skip-directory-cleanup
Caution: this option will force you to do a manual
cleanup. This option skips directory deletion during
remove subcluster. This is best used in a cloud
environment where the hosts being removed will be
subsequently discarded.
-------------------------------------------------------------------------
Usage: db_replace_node [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of the database
-o ORIGINAL, --original=ORIGINAL
Name of host you wish to replace
-n NEWHOST, --new=NEWHOST
Name of the replacement host
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: db_status [options]
Options:
-h, --help show this help message and exit
-s STATUS, --status=STATUS
Database status UP,DOWN or ALL(list running dbs -
UP,list down dbs - DOWN list all dbs - ALL
-------------------------------------------------------------------------
Usage: distribute_config_files
Sends admintools.conf from local host to all other hosts in the cluster
Options:
-h, --help show this help message and exit
-------------------------------------------------------------------------
Usage: drop_db [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Database to be dropped
-------------------------------------------------------------------------
Usage: host_to_node [options]
Options:
-h, --help show this help message and exit
-s HOST, --host=HOST comma separated list of hostnames which is to be
converted into its corresponding nodenames
-d DB, --database=DB show only node/host mapping for this database.
-------------------------------------------------------------------------
Usage: admintools -t install_package --package PACKAGE -d DB -p PASSWORD
Examples:
admintools -t install_package -d mydb -p 'mypasswd' --package default
# (above) install all default packages that aren't currently installed
admintools -t install_package -d mydb -p 'mypasswd' --package default --force-reinstall
# (above) upgrade (re-install) all default packages to the current version
admintools -t install_package -d mydb -p 'mypasswd' --package hcat
# (above) install package hcat
See also: admintools -t list_packages
Options:
-h, --help show this help message and exit
-d DBNAME, --dbname=DBNAME
database name
-p PASSWORD, --password=PASSWORD
database admin password
-P PACKAGE, --package=PACKAGE
specify package or 'all' or 'default'
--force-reinstall Force a package to be re-installed even if it is
already installed.
-------------------------------------------------------------------------
Usage: install_procedure [options]
Options:
-h, --help show this help message and exit
-d DBNAME, --database=DBNAME
Name of database for installed procedure
-f PROCPATH, --file=PROCPATH
Path of procedure file to install
-p OWNERPASSWORD, --password=OWNERPASSWORD
Password of procedure file owner
-------------------------------------------------------------------------
Usage: kill_host [options]
Options:
-h, --help show this help message and exit
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts on which the vertica
process is to be killed using a SIGKILL signal
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
-------------------------------------------------------------------------
Usage: kill_node [options]
Options:
-h, --help show this help message and exit
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts on which the vertica
process is to be killed using a SIGKILL signal
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
-------------------------------------------------------------------------
Usage: license_audit --dbname DB_NAME [OPTIONS]
Runs audit and collects audit results.
Options:
-h, --help show this help message and exit
-d DATABASE, --database=DATABASE
Name of the database to retrieve audit results
-p PASSWORD, --password=PASSWORD
Password for database admin
-q, --quiet Do not print status messages.
-f FILE, --file=FILE Output results to FILE.
-------------------------------------------------------------------------
Usage: list_allnodes [options]
Options:
-h, --help show this help message and exit
-------------------------------------------------------------------------
Usage: list_db [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be listed
-------------------------------------------------------------------------
Usage: list_host [options]
Options:
-h, --help show this help message and exit
-------------------------------------------------------------------------
Usage: list_node [options]
Options:
-h, --help show this help message and exit
-n NODENAME, --node=NODENAME
Name of the node to be listed
-------------------------------------------------------------------------
Usage: admintools -t list_packages [OPTIONS]
Examples:
admintools -t list_packages # lists all available packages
admintools -t list_packages --package all # lists all available packages
admintools -t list_packages --package default # list all packages installed by default
admintools -t list_packages -d mydb --password 'mypasswd' # list the status of all packages in mydb
Options:
-h, --help show this help message and exit
-d DBNAME, --dbname=DBNAME
database name
-p PASSWORD, --password=PASSWORD
database admin password
-P PACKAGE, --package=PACKAGE
specify package or 'all' or 'default'
-------------------------------------------------------------------------
Usage: logrotateconfig [options]
Options:
-h, --help show this help message and exit
-d DBNAME, --dbname=DBNAME
database name
-r ROTATION, --rotation=ROTATION
set how often the log is rotated.[
daily|weekly|monthly ]
-s MAXLOGSZ, --maxsize=MAXLOGSZ
set maximum log size before rotation is forced.
-k KEEP, --keep=KEEP set # of old logs to keep
-------------------------------------------------------------------------
Usage: node_map [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB List only data for this database.
-------------------------------------------------------------------------
Usage: re_ip [options]
Replaces the IP addresses of hosts and databases in a cluster, or changes the
control messaging mode/addresses of a database.
Options:
-h, --help show this help message and exit
-f MAPFILE, --file=MAPFILE
A text file with IP mapping information. If the -O
option is not used, the command replaces the IP
addresses of the hosts in the cluster and all
databases for those hosts. In this case, the format of
each line in MAPFILE is: [oldIPaddress newIPaddress]
or [oldIPaddress newIPaddress, newControlAddress,
newControlBroadcast]. If the former,
'newControlAddress' and 'newControlBroadcast' would
set to default values. Usage: $ admintools -t re_ip -f
<mapfile>
-O, --db-only Updates the control messaging addresses of a database.
Also used for error recovery (when re_ip encounters
some certain errors, a mapfile is auto-generated).
Format of each line in MAPFILE: [NodeName
AssociatedNodeIPaddress, newControlAddress,
newControlBrodcast]. 'NodeName' and
'AssociatedNodeIPaddress' must be consistent with
admintools.conf. Usage: $ admintools -t re_ip -f
<mapfile> -O -d <db_name>
-i, --noprompts System does not prompt for the validation of the new
settings before performing the re_ip operation. Prompting is on
by default.
-T, --point-to-point Sets the control messaging mode of a database to
point-to-point. Usage: $ admintools -t re_ip -d
<db_name> -T
-U, --broadcast Sets the control messaging mode of a database to
broadcast. Usage: $ admintools -t re_ip -d <db_name>
-U
-d DB, --database=DB Name of a database. Required with the following
options: -O, -T, -U.
-------------------------------------------------------------------------
Usage: rebalance_data [options]
Options:
-h, --help show this help message and exit
-d DBNAME, --dbname=DBNAME
database name
-k KSAFETY, --ksafety=KSAFETY
specify the new k value to use
-p PASSWORD, --password=PASSWORD
--script Don't re-balance the data, just provide a script for
later use.
-------------------------------------------------------------------------
Usage: restart_db [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be restarted
-e EPOCH, --epoch=EPOCH
Epoch at which the database is to be restarted. If
'last' is given as argument the db is restarted from
the last good epoch.
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-k, --allow-fallback-keygen
Generate spread encryption key from Vertica. Use under
support guidance only.
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: restart_node [options]
Options:
-h, --help show this help message and exit
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts to be restarted
-d DB, --database=DB Name of database whose node is to be restarted
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
--new-host-ips=NEWHOSTS
comma-separated list of new IPs for the hosts to be
restarted
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-F, --force force the node to start and auto recover if necessary
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
--waitfordown-timeout=WAITTIME
Seconds to wait until nodes to be restarted are down
-------------------------------------------------------------------------
Usage: restart_subcluster [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database whose subcluster is to be restarted
-c SCNAME, --subcluster=SCNAME
Name of subcluster to be restarted
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-s NEWHOSTS, --hosts=NEWHOSTS
Comma separated list of new hosts to rebind to the
nodes
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-F, --force Force the nodes in the subcluster to start and auto
recover if necessary
-------------------------------------------------------------------------
Usage: return_epoch [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database
-p PASSWORD, --password=PASSWORD
Database password in single quotes
-------------------------------------------------------------------------
Usage: revive_db [options]
Options:
-h, --help show this help message and exit
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts to participate in
database
-n NODEHOST, --node-and-host=NODEHOST
pair of nodename-hostname values delimited by "|" eg:
"v_testdb_node0001|10.0.0.1"Note: Each node-host pair
has to be specified as a new argument
--communal-storage-location=COMMUNAL_STORAGE_LOCATION
Location of communal storage
-x COMMUNAL_STORAGE_PARAMS, --communal-storage-params=COMMUNAL_STORAGE_PARAMS
Location of communal storage parameter file
-d DBNAME, --database=DBNAME
Name of database to be revived
--force Force cleanup of existing catalog directory
--display-only Describe the database on communal storage, and exit
--strict-validation Print warnings instead of raising errors while
validating cluster_config.json
-------------------------------------------------------------------------
Usage: sandbox_subcluster [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be modified
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-c SCNAME, --subcluster=SCNAME
Name of subcluster to be sandboxed
-b SBNAME, --sandbox=SBNAME
Name of the sandbox
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: set_restart_policy [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database for which to set policy
-p POLICY, --policy=POLICY
Restart policy: ('never', 'ksafe', 'always')
-------------------------------------------------------------------------
Usage: set_ssl_params [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database whose parameters will be set
-k KEYFILE, --ssl-key-file=KEYFILE
Path to SSL private key file
-c CERTFILE, --ssl-cert-file=CERTFILE
Path to SSL certificate file
-a CAFILE, --ssl-ca-file=CAFILE
Path to SSL CA file
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-------------------------------------------------------------------------
Usage: show_active_db [options]
Options:
-h, --help show this help message and exit
-------------------------------------------------------------------------
Usage: start_db [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be started
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-F, --force force the database to start at an epoch before data
consistency problems were detected.
-U, --unsafe Start database unsafely, skipping recovery. Use under
support guidance only.
-k, --allow-fallback-keygen
Generate spread encryption key from Vertica. Use under
support guidance only.
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts to be started
--fast Attempt fast startup on un-encrypted eon db. Fast
startup will use startup information from
cluster_config.json
-------------------------------------------------------------------------
Usage: stop_db [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be stopped
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-F, --force Force the databases to shutdown, even if users are
connected.
-z, --if-no-users Only shutdown if no users are connected.
If any users are connected, exit with an error.
-n DRAIN_SECONDS, --drain-seconds=DRAIN_SECONDS
Eon db only: seconds to wait for user connections to close.
Default value is 60 seconds.
When the time expires, connections will be forcibly closed
and the db will shut down.
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: stop_host [options]
Options:
-h, --help show this help message and exit
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts on which the vertica
process is to be killed using a SIGTERM signal
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
-------------------------------------------------------------------------
Usage: stop_node [options]
Options:
-h, --help show this help message and exit
-s HOSTS, --hosts=HOSTS
comma-separated list of hosts on which the vertica
process is to be killed using a SIGTERM signal
--compat21 (deprecated) Use Vertica 2.1 method using node names
instead of hostnames
-------------------------------------------------------------------------
Usage: stop_subcluster [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database whose subcluster is to be stopped
-c SCNAME, --subcluster=SCNAME
Name of subcluster to be stopped
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-n DRAIN_SECONDS, --drain-seconds=DRAIN_SECONDS
Seconds to wait for user connections to close.
Default value is 60 seconds.
When the time expires, connections will be forcibly closed
and the db will shut down.
-F, --force Force the subcluster to shutdown immediately,
even if users are connected.
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: uninstall_package [options]
Options:
-h, --help show this help message and exit
-d DBNAME, --dbname=DBNAME
database name
-p PASSWORD, --password=PASSWORD
database admin password
-P PACKAGE, --package=PACKAGE
specify package or 'all' or 'default'
-------------------------------------------------------------------------
Usage: unsandbox_subcluster [options]
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database to be modified
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-c SCNAME, --subcluster=SCNAME
Name of subcluster to be un-sandboxed
--timeout=NONINTERACTIVE_TIMEOUT
set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i)
-i, --noprompts do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
-------------------------------------------------------------------------
Usage: upgrade_license_key --database mydb --license my_license.key
upgrade_license_key --install --license my_license.key
Updates the vertica license.
Without '--install', updates the license used by the database and
the admintools license cache.
With '--install', updates the license cache in admintools that
is used for creating new databases.
Options:
-h, --help show this help message and exit
-d DB, --database=DB Name of database. Cannot be used with --install.
-l LICENSE, --license=LICENSE
Required - path to the license.
-i, --install When option is included, command will only update the
admintools license cache. Cannot be used with
--database.
-p PASSWORD, --password=PASSWORD
Database password.
-------------------------------------------------------------------------
Usage: view_cluster [options]
Options:
-h, --help show this help message and exit
-x, --xpand show the full cluster state, node by node
-d DB, --database=DB filter the output for a single database
6 - Operating the database
This topic explains how to start and stop your Vertica database, and how to use the database index tool.
This topic explains how to start and stop your Vertica database, and how to use the database index tool.
6.1 - Starting the database
You can start a database through one of the following:.
You can start a database through one of the following:
You can start a database with the Vertica Administration Tools:
-
Open the Administration Tools and select View Database Cluster State to make sure all nodes are down and no other database is running.
-
Open the Administration Tools. See Using the administration tools for information about accessing the Administration Tools.
-
On the Main Menu, select Start Database,and then select OK.
-
Select the database to start, and then click OK.
Caution
Start only one database at a time. If you start more than one database at a given time, results can be unpredictable: users are liable to encounter resource conflicts, or perform operations on the wrong database.
-
Enter the database password and click OK.
-
When prompted that the database started successfully, click OK.
-
Check the log files to make sure that no startup problems occurred.
Command line
You can start a database with the command line tool start_db
:
$ /opt/vertica/bin/admintools -t start_db -d db-name
[-p password]
[-s host1[,...] | --hosts=host1[,...]]
[--timeout seconds]
[-i | --noprompts]
[--fast]
[-F | --force]
Option |
Description |
-d --database |
Name of database to start. |
-p --password |
Required only during database creation, when you install a new license.
If the license is valid, the option -p (or --password ) is not required to start the database and is silently ignored. This is by design, as the database can only be started by the user who (as part of the verticadba UNIX user group) initially created the database or who has root or su privileges.
If the license is invalid, Vertica uses the -p password argument to attempt to upgrade the license with the license file stored in /opt/vertica/config/share/license.key .
|
-s --hosts |
(Eon Mode only) Comma delimited list of primary node host names or IP addresses. If you use this option, start_db attempts to start the database using just the nodes in the list. If omitted, start_db starts all database nodes.
For details, see Start Just the Primary Nodes in an Eon Mode Database below.
|
--timeout |
The number of seconds a timeout in seconds to await startup completion. If set to never , start_db never times out (implicitly sets -i ) |
-i
--noprompts |
Startup does not pause to await user input. Setting -i implies a timeout of 1200 seconds. |
--fast |
(Eon Mode only) Attempts fast startup on a database using startup information from cluster_config.json . This option can only be used with databases that do not use Spread encryption. |
-F
--force |
Forces the database to start at an epoch before data consistency problems were detected. |
The following example uses start_db
to start a single-node database:
$ /opt/vertica/bin/admintools -t start_db -d VMart
Info:
no password specified, using none
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (UP)
Database VMart started successfully
Eon Mode database node startup
On starting an Eon Mode database, you can start all primary nodes, or a subset of them. In both cases, pass start_db
the list of the primary nodes to start with the -s
option.
The following requirements apply:
- Primary node hosts must already be up to start the database.
- The
start_db
tool cannot start stopped hosts such as cloud-based VMs. You must either manually start the hosts or use the MC to start the cluster.
The following example starts the three primary nodes in a six-node Eon Mode database:
$ admintools -t start_db -d verticadb -p 'password' \
-s 10.11.12.10,10.11.12.20,10.11.12.30
Starting nodes:
v_verticadb_node0001 (10.11.12.10)
v_verticadb_node0002 (10.11.12.20)
v_verticadb_node0003 (10.11.12.30)
Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
Node Status: v_verticadb_node0001: (DOWN) v_verticadb_node0002: (DOWN) v_verticadb_node0003: (DOWN)
Node Status: v_verticadb_node0001: (UP) v_verticadb_node0002: (UP) v_verticadb_node0003: (UP)
Syncing catalog on verticadb with 2000 attempts.
Database verticadb: Startup Succeeded. All Nodes are UP
After the database starts, the secondary subclusters are down. You can choose to start them as needed. See Starting a Subcluster.
Starting the database with a subset of primary nodes
As a best pratice, Vertica recommends that you always start an Eon Mode database with all primary nodes. Occasionally, you might be unable to start the hosts for all primary nodes. In that case, you might need to start the database with a subset of its primary nodes.
If start_db
specifies a subset of database primary nodes, the following requirements apply:
- The nodes must comprise a quorum: at least 50% + 1 of all primary nodes in the cluster.
- Collectively, the nodes must provide coverage for all shards in communal storage. The primary nodes you use to start the database do not attempt to rebalance shard subscriptions while starting up.
If either or both of these conditions are not met, start_db
returns an error. In the following example, start_db
specifies three primary nodes in a database with nine primary nodes. The command returns an error that it cannot start the database with fewer than five primary nodes:
$ admintools -t start_db -d verticadb -p 'password' \
-s 10.11.12.10,10.11.12.20,10.11.12.30
Starting nodes:
v_verticadb_node0001 (10.11.12.10)
v_verticadb_node0002 (10.11.12.20)
v_verticadb_node0003 (10.11.12.30)
Error: Quorum not satisfied for verticadb.
3 < minimum 5 of 9 primary nodes.
Attempted to start the following nodes:
Primary
v_verticadb_node0001 (10.11.12.10)
v_verticadb_node0003 (10.11.12.30)
v_verticadb_node0002 (10.11.12.20)
Secondary
hint: you may want to start all primary nodes in the database
Database start up failed. Cluster partitioned.
If you try to start the database with fewer than the full set of primary nodes and the cluster fails to start, Vertica processes might continue to run on some of the hosts. If so, subsequent attempts to start the database will return with an error like this:
Error: the vertica process for the database is running on the following hosts:
10.11.12.10
10.11.12.20
10.11.12.30
This may be because the process has not completed previous shutdown activities. Please wait and retry again.
Database start up failed. Processes still running.
Database verticadb did not start successfully: Processes still running.. Hint: you may need to start all primary nodes.
Before you can start the database, you must stop the Vertica server process on the hosts listed in the error message, either with the admintools menus or the admintools command line's stop_host
tool:
$ admintools -t stop_host -s 10.11.12.10,10.11.12.20,10.11.12.30
6.2 - Stopping the database
There are many occasions when you must stop a database, for example, before an upgrade or performing various maintenance tasks.
There are many occasions when you must stop a database, for example, before an upgrade or performing various maintenance tasks. You can stop a running database through one of the following:
You cannot stop a running database if any users are connected or Database Designer is building or deploying a database design.
To stop a running database with admintools:
-
Verify that all cluster nodes are up. If any nodes are down, identify and restart them.
-
Close all user sessions:
-
Identify all users with active sessions by querying the
SESSIONS
system table. Notify users of the impending shutdown and request them to shut down their sessions.
-
Prevent users from starting new sessions by temporarily resetting configuration parameter MaxClientSessions to 0:
=> ALTER DATABASE DEFAULT SET MaxClientSessions = 0;
-
Close all remaining user sessions with Vertica functions CLOSE_SESSION and CLOSE_ALL_SESSIONS.
Note
You can also force a database shutdown and block new sessions with the function
SHUTDOWN.
-
Open Vertica Administration Tools.
-
From the Main Menu:
-
Select Stop Database
-
Click OK
-
Select the database to stop and click OK.
-
Enter the password (if asked) and click OK.
-
When prompted that database shutdown is complete, click OK.
Vertica functions
You can stop a database with the SHUTDOWN function. By default, the shutdown fails if any users are connected. To force a shutdown regardless of active user connections, call SHUTDOWN with an argument of true
:
=> SELECT SHUTDOWN('true');
SHUTDOWN
-------------------------
Shutdown: sync complete
(1 row)
In Eon Mode databases, you can stop subclusters with the SHUTDOWN_SUBCLUSTER and SHUTDOWN_WITH_DRAIN functions. SHUTDOWN_SUBCLUSTER shuts down subclusters immediately, whereas SHUTDOWN_WITH_DRAIN performs a graceful shutdown that drains client connections from subclusters before shutting them down. For more information, see Starting and stopping subclusters.
The following example demonstrates how you can shut down all subclusters in an Eon Mode database using SHUTDOWN_WITH_DRAIN:
=> SELECT SHUTDOWN_WITH_DRAIN('', 0);
NOTICE 0: Begin shutdown of subcluster (default_subcluster, analytics)
SHUTDOWN_WITH_DRAIN
-----------------------------------------------------------------------
Shutdown message sent to subcluster (default_subcluster, analytics)
(1 row)
Command line
You can stop a database with the admintools command stop_db
:
$ $ admintools -t stop_db --help
Usage: stop_db [options]
Options:
-h, --help Show this help message and exit.
-d DB, --database=DB Name of database to be stopped.
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes.
-F, --force Force the database to shutdown, even if users are
connected.
-z, --if-no-users Only shutdown if no users are connected. If any users
are connected, exit with an error.
-n DRAIN_SECONDS, --drain-seconds=DRAIN_SECONDS
Eon db only: seconds to wait for user connections to
close. Default value is 60 seconds. When the time
expires, connections will be forcibly closed and the
db will shut down.
--timeout=NONINTERACTIVE_TIMEOUT
Set a timeout (in seconds) to wait for actions to
complete ('never') will wait forever (implicitly sets
-i).
-i, --noprompts Do not stop and wait for user input(default false).
Setting this implies a timeout of 20 min.
Note
You cannot use both the -z
(--if-no-users
) and -F
(or --force
) options in the same stop_db
call.
stop_db
behavior depends on whether it stops an Eon Mode or Enterprise Mode database.
Stopping an Eon Mode database
In Eon Mode databases, the default behavior of stop_db
is to call SHUTDOWN_WITH_DRAIN to gracefully shut down all subclusters in the database. This graceful shutdown process drains client connections from subclusters before shutting them down.
The stop_db
option -n
(--drain-seconds
) lets you specify the number of seconds to wait—by default, 60—before forcefully closing client connections and shutting down all subclusters. If you set a negative -n
value, the subclusters are marked as draining but do not shut down until all active user sessions disconnect.
In the following example, the database initially has an active client session, but the session closes before the timeout limit is reached and the database shuts down:
$ admintools -t stop_db -d verticadb --password password --drain-seconds 200
Shutdown will use connection draining.
Shutdown will wait for all client sessions to complete, up to 200 seconds
Then it will force a shutdown.
Poller has been running for 0:00:00.000025 seconds since 2022-07-27 17:10:08.292919
------------------------------------------------------------
client_sessions |node_count |node_names
--------------------------------------------------------------
0 |5 |v_verticadb_node0005,v_verticadb_node0006,v_verticadb_node0003,v_verticadb_node0...
1 |1 |v_verticadb_node0001
STATUS: vertica.engine.api.db_client.module is still running on 1 host: nodeIP as of 2022-07-27 17:10:18. See /opt/vertica/log/adminTools.log for full details.
Poller has been running for 0:00:11.371296 seconds since 2022-07-27 17:10:08.292919
...
------------------------------------------------------------
client_sessions |node_count |node_names
--------------------------------------------------------------
0 |5 |v_verticadb_node0002,v_verticadb_node0004,v_verticadb_node0003,v_verticadb_node0...
1 |1 |v_verticadb_node0001
Stopping poller drain_status because it was canceled
Shutdown metafunction complete. Polling until database processes have stopped.
Database verticadb stopped successfully
If you use the -z
(--if-no-users
) option, the database shuts down immediately if there are no active user sessions. Otherwise, the stop_db
command returns an error:
$ admintools -t stop_db -d verticadb --password password --if-no-users
Running shutdown metafunction. Not using connection draining
Active session details
| Session id | Host Ip | Connected User |
| ------- -- | ---- -- | --------- ---- |
| v_verticadb_node0001-107720:0x257 | 192.168.111.31 | analyst |
Database verticadb not stopped successfully for the following reason:
Shutdown did not complete. Message: Shutdown: aborting shutdown
Active sessions prevented shutdown.
Omit the option --if-no-users to close sessions. See stop_db --help.
You can use the -F
(or --force
) option to shut down all subclusters immediately, without checking for active user sessions or draining the subclusters.
Stopping an Enterprise Mode database
In Enterprise Mode databases, the default behavior of stop_db
is to shut down the database only if there are no active sessions. If users are connected to the database, the command aborts with an error message and lists all active sessions. For example:
$ /opt/vertica/bin/admintools -t stop_db -d VMart
Info: no password specified, using none
Active session details
| Session id | Host Ip | Connected User |
| ------- -- | ---- -- | --------- ---- |
| v_vmart_node0001-91901:0x162 | 10.20.100.247 | analyst |
Database VMart not stopped successfully for the following reason:
Unexpected output from shutdown: Shutdown: aborting shutdown
NOTICE: Cannot shut down while users are connected
You can use the -F
(or --force
) option to override user connections and force a shutdown.
6.3 - CRC and sort order check
As a superuser, you can run the Index tool on a Vertica database to perform two tasks:.
As a superuser, you can run the Index tool on a Vertica database to perform two tasks:
If the database is down, invoke the Index tool from the Linux command line. If the database is up, invoke from VSQL with Vertica meta-function
RUN_INDEX_TOOL
:
Operation |
Database down |
Database up |
Run CRC |
/opt/vertica/bin/vertica -D catalog-path -v |
SELECT RUN_INDEX_TOOL ('checkcrc',... ); |
Check sort order |
/opt/vertica/bin/vertica -D catalog-path -I |
SELECT RUN_INDEX_TOOL ('checksort',... ); |
If invoked from the command line, the Index tool runs only on the current node. However, you can run the Index tool on multiple nodes simultaneously.
Result output
The Index tool writes summary information about its operation to standard output; detailed information on results is logged in one of two locations, depending on the environment where you invoke the tool:
Invoked from: |
Results written to: |
Linux command line |
indextool.log in the database catalog directory |
VSQL |
vertica.log on the current node |
For information about evaluating output for possible errors, see:
You can optimize meta-function performance by narrowing the scope of the operation to one or more projections, and specifying the number of threads used to execute the function. For details, see
RUN_INDEX_TOOL
.
6.3.1 - Evaluating CRC errors
Vertica evaluates the CRC values in each ROS data block each time it fetches data disk to process a query.
Vertica evaluates the CRC values in each ROS data block each time it fetches data disk to process a query. If CRC errors occur while fetching data, the following information is written to the vertica.log
file:
CRC Check Failure Details:File Name:
File Offset:
Compressed size in file:
Memory Address of Read Buffer:
Pointer to Compressed Data:
Memory Contents:
The Event Manager is also notified of CRC errors, so you can use an SNMP trap to capture CRC errors:
"CRC mismatch detected on file <file_path>. File may be corrupted. Please check hardware and drivers."
If you run a query from vsql, ODBC, or JDBC, the query returns a FileColumnReader ERROR
. This message indicates that a specific block's CRC does not match a given record as follows:
hint: Data file may be corrupt. Ensure that all hardware (disk and memory) is working properly.
Possible solutions are to delete the file <pathname> while the node is down, and then allow the node
to recover, or truncate the table data.code: ERRCODE_DATA_CORRUPTED
6.3.2 - Evaluating sort order errors
If ROS data is not sorted correctly in the projection's order, query results that rely on sorted data will be incorrect.
If ROS data is not sorted correctly in the projection's order, query results that rely on sorted data will be incorrect. You can use the Index tool to check the ROS sort order if you suspect or detect incorrect query results. The Index tool evaluates each ROS row to determine whether it is sorted correctly. If the check locates a row that is not in order, it writes an error message to the log file with the row number and contents of the unsorted row.
Reviewing errors
-
Open the indextool.log
file. For example:
$ cd VMart/v_check_node0001_catalog
-
Look for error messages that include an OID number and the string Sort Order Violation
. For example:
<INFO> ...on oid 45035996273723545: Sort Order Violation:
-
Find detailed information about the sort order violation string by running grep
on indextool.log
. For example, the following command returns the line before each string (-B1
), and the four lines that follow (-A4
):
[15:07:55][vertica-s1]: grep -B1 -A4 'Sort Order Violation:' /my_host/databases/check/v_check_node0001_catalog/indextool.log
2012-06-14 14:07:13.686 unknown:0x7fe1da7a1950 [EE] <INFO> An error occurred when running index tool thread on oid 45035996273723537:
Sort Order Violation:
Row Position: 624
Column Index: 0
Last Row: 2576000
This Row: 2575000
--
2012-06-14 14:07:13.687 unknown:0x7fe1dafa2950 [EE] <INFO> An error occurred when running index tool thread on oid 45035996273723545:
Sort Order Violation:
Row Position: 3
Column Index: 0
Last Row: 4
This Row: 2
--
-
Find the projection where a sort order violation occurred by querying system table
STORAGE_CONTAINERS
. Use a storage_oid
equal to the OID value listed in indextool.log
. For example:
=> SELECT * FROM storage_containers WHERE storage_oid = 45035996273723545;
7 - Working with native tables
You can create two types of native tables in Vertica (ROS format), columnar and flexible.
You can create two types of native tables in Vertica (ROS format), columnar and flexible. You can create both types as persistent or temporary. You can also create views that query a specific set of table columns.
The tables described in this section store their data in and are managed by the Vertica database. Vertica also supports external tables, which are defined in the database and store their data externally. For more information about external tables, see Working with external data.
7.1 - Creating tables
Use the CREATE TABLE statement to create a native table in the Vertica.
Use the CREATE TABLE statement to create a native table in the Vertica logical schema. You can specify the columns directly, as in the following example, or you can derive a table definition from another table using a LIKE or AS clause. You can specify constraints, partitioning, segmentation, and other factors. For details and restrictions, see the reference page.
The following example shows a basic table definition:
=> CREATE TABLE orders(
orderkey INT,
custkey INT,
prodkey ARRAY[VARCHAR(10)],
orderprices ARRAY[DECIMAL(12,2)],
orderdate DATE
);
Table data storage
Unlike traditional databases that store data in tables, Vertica physically stores table data in projections, which are collections of table columns. Projections store data in a format that optimizes query execution. Similar to materialized views, they store result sets on disk rather than compute them each time they are used in a query.
In order to query or perform any operation on a Vertica table, the table must have one or more projections associated with it. For more information, see Projections.
Deriving a table definition from the data
You can use the INFER_TABLE_DDL function to inspect Parquet, ORC, JSON, or Avro data and produce a starting point for a table definition. This function returns a CREATE TABLE statement, which might require further editing. For columns where the function could not infer the data type, the function labels the type as unknown and emits a warning. For VARCHAR and VARBINARY columns, you might need to adjust the length. Always review the statement the function returns, but especially for tables with many columns, using this function can save time and effort.
Parquet, ORC, and Avro files include schema information, but JSON files do not. For JSON, the function inspects the raw data to produce one or more candidate table definitions. See the function reference page for JSON examples.
In the following example, the function infers a complete table definition from Parquet input, but the VARCHAR columns use the default size and might need to be adjusted:
=> SELECT INFER_TABLE_DDL('/data/people/*.parquet'
USING PARAMETERS format = 'parquet', table_name = 'employees');
WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80
INFER_TABLE_DDL
-------------------------------------------------------------------------
create table "employees"(
"employeeID" int,
"personal" Row(
"name" varchar,
"address" Row(
"street" varchar,
"city" varchar,
"zipcode" int
),
"taxID" int
),
"department" varchar
);
(1 row)
For Parquet files, you can use the GET_METADATA function to inspect a file and report metadata including information about columns.
See also
7.2 - Creating temporary tables
CREATE TEMPORARY TABLE creates a table whose data persists only during the current session.
CREATE TEMPORARY TABLE creates a table whose data persists only during the current session. Temporary table data is never visible to other sessions.
By default, all temporary table data is transaction-scoped—that is, the data is discarded when a COMMIT statement ends the current transaction. If CREATE TEMPORARY TABLE includes the parameter ON COMMIT PRESERVE ROWS, table data is retained until the current session ends.
Temporary tables can be used to divide complex query processing into multiple steps. Typically, a reporting tool holds intermediate results while reports are generated—for example, the tool first gets a result set, then queries the result set, and so on.
When you create a temporary table, Vertica automatically generates a default projection for it. For more information, see Auto-projections.
Global versus local tables
CREATE TEMPORARY TABLE can create tables at two scopes, global and local, through the keywords GLOBAL and LOCAL, respectively:
Global temporary tables |
Vertica creates global temporary tables in the public schema. Definitions of these tables are visible to all sessions, and persist across sessions until they are explicitly dropped. Multiple users can access the table concurrently. Table data is session-scoped, so it is visible only to the session user, and is discarded when the session ends. |
Local temporary tables |
Vertica creates local temporary tables in the V_TEMP_SCHEMA namespace and inserts them transparently into the user's search path. These tables are visible only to the session where they are created. When the session ends, Vertica automatically drops the table and its data. |
Data retention
You can specify whether temporary table data is transaction- or session-scoped:
-
ON COMMIT DELETE ROWS (default): Vertica automatically removes all table data when each transaction ends.
-
ON COMMIT PRESERVE ROWS: Vertica preserves table data across transactions in the current session. Vertica automatically truncates the table when the session ends.
Note
If you create a temporary table with ON COMMIT PRESERVE ROWS, you cannot add projections for that table if it contains data. You must first remove all data from that table with TRUNCATE TABLE.
You can create projections for temporary tables created with ON COMMIT DELETE ROWS, whether populated with data or not. However, CREATE PROJECTION ends any transaction where you might have added data, so projections are always empty.
ON COMMIT DELETE ROWS
By default, Vertica removes all data from a temporary table, whether global or local, when the current transaction ends.
For example:
=> CREATE TEMPORARY TABLE tempDelete (a int, b int);
CREATE TABLE
=> INSERT INTO tempDelete VALUES(1,2);
OUTPUT
--------
1
(1 row)
=> SELECT * FROM tempDelete;
a | b
---+---
1 | 2
(1 row)
=> COMMIT;
COMMIT
=> SELECT * FROM tempDelete;
a | b
---+---
(0 rows)
If desired, you can use DELETE within the same transaction multiple times, to refresh table data repeatedly.
ON COMMIT PRESERVE ROWS
You can specify that a temporary table retain data across transactions in the current session, by defining the table with the keywords ON COMMIT PRESERVE ROWS. Vertica automatically removes all data from the table only when the current session ends.
For example:
=> CREATE TEMPORARY TABLE tempPreserve (a int, b int) ON COMMIT PRESERVE ROWS;
CREATE TABLE
=> INSERT INTO tempPreserve VALUES (1,2);
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT * FROM tempPreserve;
a | b
---+---
1 | 2
(1 row)
=> INSERT INTO tempPreserve VALUES (3,4);
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT * FROM tempPreserve;
a | b
---+---
1 | 2
3 | 4
(2 rows)
Eon restrictions
The following Eon Mode restrictions apply to temporary tables:
7.3 - Creating a table from other tables
You can create a table from other tables in two ways:.
You can create a table from other tables in two ways:
Important
You can also copy one table to another with the Vertica function
COPY_TABLE
.
7.3.1 - Replicating a table
You can create a table from an existing one using CREATE TABLE with the LIKE clause:.
You can create a table from an existing one using CREATE TABLE with the LIKE
clause:
CREATE TABLE [ IF NOT EXISTS ] [[ { namespace. | database. } ]schema.]table
LIKE [[ { namespace. | database. } ]schema.]existing-table
[ { INCLUDING | EXCLUDING } PROJECTIONS ]
[ { INCLUDE | EXCLUDE } [SCHEMA] PRIVILEGES ]
Creating a table with LIKE
replicates the source table definition and any storage policy associated with it. Table data and expressions on columns are not copied to the new table.
The user performing the operation owns the new table.
The source table cannot have out-of-date projections and cannot be a temporary table.
Copying constraints
CREATE TABLE LIKE
copies all table constraints except for:
- Foreign key constraints.
- Sequence column constraints.
For any column that obtains its values from a sequence, including IDENTITY columns, Vertica copies the column values into the new table, but removes the original constraint. For example, the following table definition sets an IDENTITY constraint on the ID column:
=> CREATE TABLE public.Premium_Customer
(
ID IDENTITY,
lname varchar(25),
fname varchar(25),
store_membership_card int
);
The following CREATE TABLE LIKE
statement uses the source table Premium_Customer
to create the replica All_Customers
. Vertica removes the IDENTITY
constraint, changing the column to an integer column with a NOT NULL
constraint:
=> CREATE TABLE All_Customers LIKE Premium_Customer;
CREATE TABLE
=> SELECT export_tables('','All_Customers');
export_tables
---------------------------------------------------
CREATE TABLE public.All_Customers
(
ID int NOT NULL,
lname varchar(25),
fname varchar(25),
store_membership_card int
);
(1 row)
Including projections
You can qualify the LIKE
clause with INCLUDING PROJECTIONS
or EXCLUDING PROJECTIONS
, which specify whether to copy projections from the source table:
-
EXCLUDING PROJECTIONS
(default): Do not copy projections from the source table.
-
INCLUDING PROJECTIONS
: Copy current projections from the source table. Vertica names the new projections according to Vertica naming conventions, to avoid name conflicts with existing objects.
Including schema privileges
You can specify default inheritance of schema privileges for the new table:
For more information see Setting privilege inheritance on tables and views.
Examples
-
Create the table states
:
=> CREATE TABLE states (
state char(2) NOT NULL, bird varchar(20), tree varchar (20), tax float, stateDate char (20))
PARTITION BY state;
-
Populate the table with data:
INSERT INTO states VALUES ('MA', 'chickadee', 'american_elm', 5.675, '07-04-1620');
INSERT INTO states VALUES ('VT', 'Hermit_Thrasher', 'Sugar_Maple', 6.0, '07-04-1610');
INSERT INTO states VALUES ('NH', 'Purple_Finch', 'White_Birch', 0, '07-04-1615');
INSERT INTO states VALUES ('ME', 'Black_Cap_Chickadee', 'Pine_Tree', 5, '07-04-1615');
INSERT INTO states VALUES ('CT', 'American_Robin', 'White_Oak', 6.35, '07-04-1618');
INSERT INTO states VALUES ('RI', 'Rhode_Island_Red', 'Red_Maple', 5, '07-04-1619');
-
View the table contents:
=> SELECT * FROM states;
state | bird | tree | tax | stateDate
-------+---------------------+--------------+-------+----------------------
VT | Hermit_Thrasher | Sugar_Maple | 6 | 07-04-1610
CT | American_Robin | White_Oak | 6.35 | 07-04-1618
RI | Rhode_Island_Red | Red_Maple | 5 | 07-04-1619
MA | chickadee | american_elm | 5.675 | 07-04-1620
NH | Purple_Finch | White_Birch | 0 | 07-04-1615
ME | Black_Cap_Chickadee | Pine_Tree | 5 | 07-04-1615
(6 rows
-
Create a sample projection and refresh:
=> CREATE PROJECTION states_p AS SELECT state FROM states;
=> SELECT START_REFRESH();
-
Create a table like the states
table and include its projections:
=> CREATE TABLE newstates LIKE states INCLUDING PROJECTIONS;
-
View projections for the two tables. Vertica has copied projections from states
to newstates
:
=> \dj
List of projections
Schema | Name | Owner | Node | Comment
-------------------------------+-------------------------------------------+---------+------------------+---------
public | newstates_b0 | dbadmin | |
public | newstates_b1 | dbadmin | |
public | newstates_p_b0 | dbadmin | |
public | newstates_p_b1 | dbadmin | |
public | states_b0 | dbadmin | |
public | states_b1 | dbadmin | |
public | states_p_b0 | dbadmin | |
public | states_p_b1 | dbadmin | |
-
Query the new table:
=> SELECT * FROM newstates;
state | bird | tree | tax | stateDate
-------+------+------+-----+-----------
(0 rows)
When you use the CREATE TABLE LIKE
statement, storage policy objects associated with the table are also copied. Data added to the new table use the same labeled storage location as the source table, unless you change the storage policy. For more information, see Working With Storage Locations.
See also
7.3.2 - Creating a table from a query
CREATE TABLE can specify an AS clause to create a table from a query, as follows:.
CREATE TABLE can specify an AS
clause to create a table from query results, as in the following example:
=> CREATE TABLE cust_basic_profile AS SELECT
customer_key, customer_gender, customer_age, marital_status, annual_income, occupation
FROM customer_dimension WHERE customer_age>18 AND customer_gender !='';
CREATE TABLE
=> SELECT customer_age, annual_income, occupation
FROM cust_basic_profile
WHERE customer_age > 23 ORDER BY customer_age;
customer_age | annual_income | occupation
--------------+---------------+--------------------
24 | 469210 | Hairdresser
24 | 140833 | Butler
24 | 558867 | Lumberjack
24 | 529117 | Mechanic
24 | 322062 | Acrobat
24 | 213734 | Writer
...
Labeling the AS clause
You can embed a LABEL hint in an AS
clause in two places:
If the AS clause contains labels in both places, the first label has precedence.
Labels are invalid for external tables.
Loading historical data
You can specify that the query return historical data by adding AT
followed by one of:
-
EPOCH LATEST
: Return data up to but not including the current epoch. The result set includes data from the latest committed DML transaction.
-
EPOCH
integer
: Return data up to and including the specified epoch.
-
TIME '
timestamp
'
: Return data from the epoch at the specified timestamp.
These options are ignored if used to query temporary or external tables.
See Epochs for additional information about how Vertica uses epochs.
Zero-width column handling
If the query returns a column with zero width, Vertica automatically converts it to a VARCHAR(80)
column. For example:
=> CREATE TABLE example AS SELECT '' AS X;
CREATE TABLE
=> SELECT EXPORT_TABLES ('', 'example');
EXPORT_TABLES
----------------------------------------------------------
CREATE TABLE public.example
(
X varchar(80)
);
Requirements and restrictions
-
If you create a temporary table from a query, you must specify ON COMMIT PRESERVE ROWS
in order to load the result set into the table. Otherwise, Vertica creates an empty table.
-
If the query output has expressions other than simple columns, such as constants or functions, you must specify an alias for each expression, or list all columns in the column name list.
-
You cannot use CREATE TABLE AS SELECT
with a SELECT
that returns values of complex types. You can, however, use CREATE TABLE LIKE
.
See also
7.4 - Immutable tables
Many secure systems contain records that must be provably immune to change.
Many secure systems contain records that must be provably immune to change. Protective strategies such as row and block checksums incur high overhead. Moreover, these approaches are not foolproof against unauthorized changes, whether deliberate or inadvertent, by database administrators or other users with sufficient privileges.
Immutable tables are insert-only tables in which existing data cannot be modified, regardless of user privileges. Updating row values and deleting rows are prohibited. Certain changes to table metadata—for example, renaming table columns—are also prohibited, in order to prevent attempts to circumvent these restrictions. Flattened or external tables, which obtain their data from outside sources, cannot be set to be immutable.
You define an existing table as immutable with ALTER TABLE:
ALTER TABLE table SET IMMUTABLE ROWS;
Once set, table immutability cannot be reverted, and is immediately applied to all existing table data, and all data that is loaded thereafter. In order to modify the data of an immutable table, you must copy the data to a new table—for example, with COPY, CREATE TABLE...AS, or COPY_TABLE.
When you execute ALTER TABLE...SET IMMUTABLE ROWS on a table, Vertica sets two columns for that table in the system table TABLES. Together, these columns show when the table was made immutable:
- immutable_rows_since_timestamp: Server system time when immutability was applied. This is valuable for long-term timestamp retrieval and efficient comparison.
- immutable_rows_since_epoch: The epoch that was current when immutability was applied. This setting can help protect the table from attempts to pre-insert records with a future timestamp, so that row's epoch is less than the table's immutability epoch.
Enforcement
The following operations are prohibited on immutable tables:
The following partition management functions are disallowed when the target table is immutable:
Allowed operations
In general, you can execute any DML operation on an immutable table that does not affect existing row data—for example, add rows with COPY or INSERT. After you add data to an immutable table, it cannot be changed.
Tip
A table's immutability can render meaningless certain operations that are otherwise permitted on an immutable table. For example, you can add a column to an immutable table with
ALTER TABLE...ADD COLUMN. However, all values in the new column are set to NULL (unless the column is defined with a
DEFAULT value), and they cannot be updated.
Other allowed operations fall generally into two categories:
-
Changes to a table's DDL that have no effect on its data:
-
Block operations on multiple table rows, or the entire table:
7.5 - Disk quotas
By default, schemas and tables are limited only by available disk space and license capacity.
By default, schemas and tables are limited only by available disk space and license capacity. You can set disk quotas for schemas or individual tables, for example, to support multi-tenancy. Setting, modifying, or removing a disk quota requires superuser privileges.
Most user operations that increase storage size enforce disk quotas. A table can temporarily exceed its quota during some operations such as recovery. If you lower a quota below the current usage, no data is lost but you cannot add more. Treat quotas as advisory, not as hard limits
A schema quota, if set, must be larger than the largest table quota within it.
A disk quota is a string composed of an integer and a unit of measure (K, M, G, or T), such as '15G' or '1T'. Do not use a space between the number and the unit. No other units of measure are supported.
To set a quota at creation time, use the DISK_QUOTA option for CREATE SCHEMA or CREATE TABLE:
=> CREATE SCHEMA internal DISK_QUOTA '10T';
CREATE SCHEMA
=> CREATE TABLE internal.sales (...) DISK_QUOTA '5T';
CREATE TABLE
=> CREATE TABLE internal.leads (...) DISK_QUOTA '12T';
ROLLBACK 0: Table can not have a greater disk quota than its Schema
To modify, add, or remove a quota on an existing schema or table, use ALTER SCHEMA or ALTER TABLE:
=> ALTER SCHEMA internal DISK_QUOTA '20T';
ALTER SCHEMA
=> ALTER TABLE internal.sales DISK_QUOTA SET NULL;
ALTER TABLE
You can set a quota that is lower than the current usage. The ALTER operation succeeds, the schema or table is temporarily over quota, and you cannot perform operations that increase data usage.
Data that is counted
In Eon Mode, disk usage is an aggregate of all space used by all shards for the schema or table. This value is computed for primary subscriptions only.
In Enterprise Mode, disk usage is the sum space used by all storage containers on all nodes for the schema or table. This sum excludes buddy projections but includes all other projections.
Disk usage is calculated based on compressed size.
When quotas are applied
Quotas, if present, affect most DML and ILM operations, including:
The following example shows a failure caused by exceeding a table's quota:
=> CREATE TABLE stats(score int) DISK_QUOTA '1k';
CREATE TABLE
=> COPY stats FROM STDIN;
1
2
3
4
5
\.
ERROR 0: Disk Quota Exceeded for the Table object public.stats
HINT: Delete data and PURGE or increase disk quota at the table level
DELETE does not free space, because deleted data is still preserved in the storage containers. The delete vector that is added by a delete operation does not count against a quota, so deleting is a quota-neutral operation. Disk space for deleted data is reclaimed when you purge it; see Removing table data.
Some uncommon operations, such as ADD COLUMN, RESTORE, and SWAP PARTITION, can create new storage containers during the transaction. These operations clean up the extra locations upon completion, but while the operation is in progress, a table or schema could exceed its quota. If you get disk-quota errors during these operations, you can temporarily increase the quota, perform the operation, and then reset it.
Quotas do not affect recovery, rebalancing, or Tuple Mover operations.
Monitoring
The DISK_QUOTA_USAGES system table shows current disk usage for tables and schemas that have quotas. This table does not report on objects that do not have quotas.
You can use this table to monitor usage and make decisions about adjusting quotas:
=> SELECT * FROM DISK_QUOTA_USAGES;
object_oid | object_name | is_schema | total_disk_usage_in_bytes | disk_quota_in_bytes
-------------------+-------------+-----------+---------------------+---------------------
45035996273705100 | s | t | 307 | 10240
45035996273705104 | public.t | f | 614 | 1024
45035996273705108 | s.t | f | 307 | 2048
(3 rows)
7.6 - Managing table columns
After you define a table, you can use ALTER TABLE to modify existing table columns.
After you define a table, you can use
ALTER TABLE
to modify existing table columns. You can perform the following operations on a column:
7.6.1 - Renaming columns
You rename a column with ALTER TABLE as follows:.
You rename a column with ALTER TABLE
as follows:
ALTER TABLE [schema.]table-name RENAME [ COLUMN ] column-name TO new-column-name
The following example renames a column in the Retail.Product_Dimension
table from Product_description
to Item_description
:
=> ALTER TABLE Retail.Product_Dimension
RENAME COLUMN Product_description TO Item_description;
If you rename a column that is referenced by a view, the column does not appear in the result set of the view even if the view uses the wild card (*) to represent all columns in the table. Recreate the view to incorporate the column's new name.
7.6.2 - Changing scalar column data type
In general, you can change a column's data type with ALTER TABLE if doing so does not require storage reorganization.
In general, you can change a column's data type with ALTER TABLE if doing so does not require storage reorganization. After you modify a column's data type, data that you load conforms to the new definition.
The sections that follow describe requirements and restrictions associated with changing a column with a scalar (primitive) data type. For information on modifying complex type columns, see Adding a new field to a complex type column.
Supported data type conversions
Vertica supports conversion for the following data types:
Data Types |
Supported Conversions |
Binary |
Expansion and contraction. |
Character |
All conversions between CHAR, VARCHAR, and LONG VARCHAR. |
Exact numeric |
All conversions between the following numeric data types: integer data types—INTEGER, INT, BIGINT, TINYINT, INT8, SMALLINT—and NUMERIC values of scale <=18 and precision 0.
You cannot modify the scale of NUMERIC data types; however, you can change precision in the ranges (0-18), (19-37), and so on.
|
Collection |
The following conversions are supported:
- Collection of one element type to collection of another element type, if the source element type can be coerced to the target element type.
- Between arrays and sets.
- Collection type to the same type (array to array or set to set), to change bounds or binary size.
For details, see Changing Collection Columns.
|
Unsupported data type conversions
Vertica does not allow data type conversion on types that require storage reorganization:
You also cannot change a column's data type if the column is one of the following:
You can work around some of these restrictions. For details, see Working with column data conversions.
7.6.2.1 - Changing column width
You can expand columns within the same class of data type.
You can expand columns within the same class of data type. Doing so is useful for storing larger items in a column. Vertica validates the data before it performs the conversion.
In general, you can also reduce column widths within the data type class. This is useful to reclaim storage if the original declaration was longer than you need, particularly with strings. You can reduce column width only if the following conditions are true:
Otherwise, Vertica returns an error and the conversion fails. For example, if you try to convert a column from varchar(25)
to varchar(10)
Vertica allows the conversion as long as all column data is no more than 10 characters.
In the following example, columns y
and z
are initially defined as VARCHAR data types, and loaded with values 12345
and 654321
, respectively. The attempt to reduce column z
's width to 5 fails because it contains six-character data. The attempt to reduce column y
's width to 5 succeeds because its content conforms with the new width:
=> CREATE TABLE t (x int, y VARCHAR, z VARCHAR);
CREATE TABLE
=> CREATE PROJECTION t_p1 AS SELECT * FROM t SEGMENTED BY hash(x) ALL NODES;
CREATE PROJECTION
=> INSERT INTO t values(1,'12345','654321');
OUTPUT
--------
1
(1 row)
=> SELECT * FROM t;
x | y | z
---+-------+--------
1 | 12345 | 654321
(1 row)
=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ROLLBACK 2378: Cannot convert column "z" to type "char(5)"
HINT: Verify that the data in the column conforms to the new type
=> ALTER TABLE t ALTER COLUMN y SET DATA TYPE char(5);
ALTER TABLE
Changing collection columns
If a column is a collection data type, you can use ALTER TABLE to change either its bounds or its maximum binary size. These properties are set at table creation time and can then be altered.
You can make a collection bounded, setting its maximum number of elements, as in the following example.
=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int,10];
ALTER TABLE
=> \d test.t1
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
test | t1 | arr | array[int8, 10] | 80 | | f | f |
(1 row)
Alternatively, you can set the binary size for the entire collection instead of setting bounds. Binary size is set either explicitly or from the DefaultArrayBinarySize configuration parameter. The following example creates an array column from the default, changes the default, and then uses ALTER TABLE to change it to the new default.
=> SELECT get_config_parameter('DefaultArrayBinarySize');
get_config_parameter
----------------------
100
(1 row)
=> CREATE TABLE test.t1 (arr array[int]);
CREATE TABLE
=> \d test.t1
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
test | t1 | arr | array[int8](96) | 96 | | f | f |
(1 row)
=> ALTER DATABASE DEFAULT SET DefaultArrayBinarySize=200;
ALTER DATABASE
=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int];
ALTER TABLE
=> \d test.t1
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------------+------+---------+----------+-------------+-------------
test | t1 | arr | array[int8](200)| 200 | | f | f |
(1 row)
Alternatively, you can set the binary size explicitly instead of using the default value.
=> ALTER TABLE test.t1 ALTER COLUMN arr SET DATA TYPE array[int](300);
Purging historical data
You cannot reduce a column's width if Vertica retains any historical data that exceeds the new width. To reduce the column width, first remove that data from the table:
-
Advance the AHM to an epoch more recent than the historical data that needs to be removed from the table.
-
Purge the table of all historical data that precedes the AHM with the function
PURGE_TABLE
.
For example, given the previous example, you can update the data in column t.z
as follows:
=> UPDATE t SET z = '54321';
OUTPUT
--------
1
(1 row)
=> SELECT * FROM t;
x | y | z
---+-------+-------
1 | 12345 | 54321
(1 row)
Although no data in column z now exceeds 5 characters, Vertica retains the history of its earlier data, so attempts to reduce the column width to 5 return an error:
=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ROLLBACK 2378: Cannot convert column "z" to type "char(5)"
HINT: Verify that the data in the column conforms to the new type
You can reduce the column width by purging the table's historical data as follows:
=> SELECT MAKE_AHM_NOW();
MAKE_AHM_NOW
-------------------------------
AHM set (New AHM Epoch: 6350)
(1 row)
=> SELECT PURGE_TABLE('t');
PURGE_TABLE
----------------------------------------------------------------------------------------------------------------------
Task: purge operation
(Table: public.t) (Projection: public.t_p1_b0)
(Table: public.t) (Projection: public.t_p1_b1)
(1 row)
=> ALTER TABLE t ALTER COLUMN z SET DATA TYPE char(5);
ALTER TABLE
7.6.2.2 - Working with column data conversions
Vertica conforms to the SQL standard by disallowing certain data conversions for table columns.
Vertica conforms to the SQL standard by disallowing certain data conversions for table columns. However, you sometimes need to work around this restriction when you convert data from a non-SQL database. The following examples describe one such workaround, using the following table:
=> CREATE TABLE sales(id INT, price VARCHAR) UNSEGMENTED ALL NODES;
CREATE TABLE
=> INSERT INTO sales VALUES (1, '$50.00');
OUTPUT
--------
1
(1 row)
=> INSERT INTO sales VALUES (2, '$100.00');
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT * FROM SALES;
id | price
----+---------
1 | $50.00
2 | $100.00
(2 rows)
To convert the price
column's existing data type from VARCHAR to NUMERIC, complete these steps:
-
Add a new column for temporary use. Assign the column a NUMERIC data type, and derive its default value from the existing price column.
-
Drop the original price column.
-
Rename the new column to the original column.
Add a new column for temporary use
-
Add a column temp_price
to table sales
. You can use the new column temporarily, setting its data type to what you want (NUMERIC), and deriving its default value from the price
column. Cast the default value for the new column to a NUMERIC data type and query the table:
=> ALTER TABLE sales ADD COLUMN temp_price NUMERIC(10,2) DEFAULT
SUBSTR(sales.price, 2)::NUMERIC;
ALTER TABLE
=> SELECT * FROM SALES;
id | price | temp_price
----+---------+------------
1 | $50.00 | 50.00
2 | $100.00 | 100.00
(2 rows)
-
Use ALTER TABLE to drop the default expression from the new column temp_price
. Vertica retains the values stored in this column:
=> ALTER TABLE sales ALTER COLUMN temp_price DROP DEFAULT;
ALTER TABLE
Drop the original price column
Drop the extraneous price
column. Before doing so, you must first advance the AHM to purge historical data that would otherwise prevent the drop operation:
-
Advance the AHM:
=> SELECT MAKE_AHM_NOW();
MAKE_AHM_NOW
-------------------------------
AHM set (New AHM Epoch: 6354)
(1 row)
-
Drop the original price column:
=> ALTER TABLE sales DROP COLUMN price CASCADE;
ALTER COLUMN
Rename the new column to the original column
You can now rename the temp_price
column to price
:
-
Use ALTER TABLE
to rename the column:
=> ALTER TABLE sales RENAME COLUMN temp_price to price;
-
Query the sales table again:
=> SELECT * FROM sales;
id | price
----+--------
1 | 50.00
2 | 100.00
(2 rows)
7.6.3 - Adding a new field to a complex type column
You can add new fields to columns of complex types (any combination or nesting of arrays and structs) in native tables.
You can add new fields to columns of complex types (any combination or nesting of arrays and structs) in native tables. To add a field to an existing table's column, use a single ALTER TABLE statement.
Requirements and restrictions
The following are requirements and restrictions associated with adding a new field to a complex type column:
- New fields can only be added to rows/structs.
- The new type definition must contain all of the existing fields in the complex type column. Dropping existing fields from the complex type is not allowed. All of the existing fields in the new type must exactly match their definitions in the old type.This requirement also means that existing fields cannot be renamed.
- New fields can only be added to columns of native (non-external) tables.
- New fields can be added at any level within a nested complex type. For example, if you have a column defined as
ROW(id INT, name ROW(given_name VARCHAR(20), family_name VARCHAR(20))
, you can add a middle_name field to the nested ROW.
- New fields can be of any type, either complex or primitive.
- Blank field names are not allowed when adding new fields. Note that blank field names in complex type columns are allowed when creating the table. Vertica automatically assigns a name to each unnamed field.
- If you change the ordering of existing fields using ALTER TABLE, the change affects existing data in addition to new data. This means it is possible to reorder existing fields.
- When you call ALTER COLUMN ... SET DATA TYPE to add a field to a complex type column, Vertica will place an O lock on the table preventing DELETE, UPDATE, INSERT, and COPY statements from accessing the table and blocking SELECT statements issued at SERIALIZABLE isolation level, until the operation completes.
- Performance is slower when adding a field to an array element than when adding a field to an element not nested in an array.
Examples
Adding a field
Consider a company storing customer data:
=> CREATE TABLE customers(id INT, name VARCHAR, address ROW(street VARCHAR, city VARCHAR, zip INT));
CREATE TABLE
The company has just decided to expand internationally, so now needs to add a country field:
=> ALTER TABLE customers ALTER COLUMN address
SET DATA TYPE ROW(street VARCHAR, city VARCHAR, zip INT, country VARCHAR);
ALTER TABLE
You can view the table definition to confirm the change:
=> \d customers
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+----------------------------------------------------------------------+------+---------+----------+-------------+-------------
public | customers | id | int | 8 | | f | f |
public | customers | name | varchar(80) | 80 | | f | f |
public | customers | address | ROW(street varchar(80),city varchar(80),zip int,country varchar(80)) | -1 | | f | f |
(3 rows)
You can also see that the country field remains null for existing customers:
=> SELECT * FROM customers;
id | name | address
----+------+--------------------------------------------------------------------------------
1 | mina | {"street":"1 allegheny square east","city":"hamden","zip":6518,"country":null}
(1 row)
Common error messages
While you can add one or more fields with a single ALTER TABLE statement, existing fields cannot be removed. The following example throws an error because the city field is missing:
=> ALTER TABLE customers ALTER COLUMN address SET DATA TYPE ROW(street VARCHAR, state VARCHAR, zip INT, country VARCHAR);
ROLLBACK 2377: Cannot convert column "address" from "ROW(varchar(80),varchar(80),int,varchar(80))" to type "ROW(varchar(80),varchar(80),int,varchar(80))"
Similarly, you cannot alter the type of an existing field. The following example will throw an error because the zip field's type cannot be altered:
=> ALTER TABLE customers ALTER COLUMN address SET DATA TYPE ROW(street VARCHAR, city VARCHAR, zip VARCHAR, country VARCHAR);
ROLLBACK 2377: Cannot convert column "address" from "ROW(varchar(80),varchar(80),int,varchar(80))" to type "ROW(varchar(80),varchar(80),varchar(80),varchar(80))"
Additional properties
A complex type column's field order follows the order specified in the ALTER command, allowing you to reorder a column's existing fields. The following example reorders the fields of the address column:
=> ALTER TABLE customers ALTER COLUMN address
SET DATA TYPE ROW(street VARCHAR, country VARCHAR, city VARCHAR, zip INT);
ALTER TABLE
The table definition shows the address column's fields have been reordered:
=> \d customers
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-----------+---------+----------------------------------------------------------------------+------+---------+----------+-------------+-------------
public | customers | id | int | 8 | | f | f |
public | customers | name | varchar(80) | 80 | | f | f |
public | customers | address | ROW(street varchar(80),country varchar(80),city varchar(80),zip int) | -1 | | f | f |
(3 rows)
Note that you cannot add new fields with empty names. When creating a complex table, however, you can omit field names, and Vertica automatically assigns a name to each unnamed field:
=> CREATE TABLE products(name VARCHAR, description ROW(VARCHAR));
CREATE TABLE
Because the field created in the description
column has not been named, Vertica assigns it a default name. This default name can be checked in the table definition:
=> \d products
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+----------+-------------+---------------------+------+---------+----------+-------------+-------------
public | products | name | varchar(80) | 80 | | f | f |
public | products | description | ROW(f0 varchar(80)) | -1 | | f | f |
(2 rows)
Above, we see that the VARCHAR field in the description
column was automatically assigned the name f0
. When adding new fields, you must specify the existing Vertica-assigned field name:
=> ALTER TABLE products ALTER COLUMN description
SET DATA TYPE ROW(f0 VARCHAR(80), expanded_description VARCHAR(200));
ALTER TABLE
7.6.4 - Defining column values
You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:.
You can define a column so Vertica automatically sets its value from an expression through one of the following clauses:
-
DEFAULT
-
SET USING
-
DEFAULT USING
DEFAULT
The DEFAULT option sets column values to a specified value. It has the following syntax:
DEFAULT default-expression
Default values are set when you:
-
Load new rows into a table, for example, with INSERT or COPY. Vertica populates DEFAULT columns in new rows with their default values. Values in existing rows, including columns with DEFAULT expressions, remain unchanged.
-
Execute UPDATE on a table and set the value of a DEFAULT column to DEFAULT
:
=> UPDATE table-name SET column-name=DEFAULT;
-
Add a column with a DEFAULT expression to an existing table. Vertica populates the new column with its default values when it is added to the table.
Note
Altering an existing table column to specify a DEFAULT expression has no effect on existing values in that column. Vertica applies the DEFAULT expression only on new rows when they are added to the table, through load operations such as INSERT and COPY. To refresh all values in a column with the column's DEFAULT expression, update the column as shown above.
Restrictions
DEFAULT expressions cannot specify volatile functions with ALTER TABLE...ADD COLUMN. To specify volatile functions, use CREATE TABLE or ALTER TABLE...ALTER COLUMN statements.
SET USING
The SET USING option sets the column value to an expression when the function REFRESH_COLUMNS is invoked on that column. This option has the following syntax:
SET USING using-expression
This approach is useful for large denormalized (flattened) tables, where multiple columns get their values by querying other tables.
Restrictions
SET USING has the following restrictions:
DEFAULT USING
The DEFAULT USING option sets DEFAULT and SET USING constraints on a column, equivalent to using DEFAULT and SET USING separately with the same expression on the same column. It has the following syntax:
DEFAULT USING expression
For example, the following column definitions are effectively identical:
=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
DEFAULT USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));
=> ALTER TABLE public.orderFact ADD COLUMN cust_name varchar(20)
DEFAULT (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid))
SET USING (SELECT name FROM public.custDim WHERE (custDim.cid = orderFact.cid));
DEFAULT USING supports the same expressions as SET USING and is subject to the same restrictions.
Supported expressions
DEFAULT and SET USING generally support the same expressions. These include:
-
Queries
-
Other columns in the same table
-
Literals (constants)
-
All operators supported by Vertica
-
The following categories of functions:
Expression restrictions
The following restrictions apply to DEFAULT and SET USING expressions:
-
The return value data type must match or be cast to the column data type.
-
The expression must return a value that conforms to the column bounds. For example, a column that is defined as a VARCHAR(1)
cannot be set to a default string of abc
.
-
In a temporary table, DEFAULT and SET USING do not support subqueries. If you try to create a temporary table where DEFAULT or SET USING use subquery expressions, Vertica returns an error.
-
A column's SET USING expression cannot specify another column in the same table that also sets its value with SET USING. Similarly, a column's DEFAULT expression cannot specify another column in the same table that also sets its value with DEFAULT, or whose value is automatically set to a sequence. However, a column's SET USING expression can specify another column that sets its value with DEFAULT.
Note
You can set a column's DEFAULT expression from another column in the same table that sets its value with SET USING. However, the DEFAULT column is typically set to NULL
, as it is only set on load operations that initially set the SET USING column to NULL
.
-
DEFAULT and SET USING expressions only support one SELECT statement; attempts to include multiple SELECT statements in the expression return an error. For example, given table t1
:
=> SELECT * FROM t1;
a | b
---+---------
1 | hello
2 | world
(2 rows)
Attempting to create table t2
with the following DEFAULT expression returns with an error:
=> CREATE TABLE t2 (aa int, bb varchar(30) DEFAULT (SELECT 'I said ')||(SELECT b FROM t1 where t1.a = t2.aa));
ERROR 9745: Expressions with multiple SELECT statements cannot be used in 'set using' query definitions
Disambiguating predicate columns
If a SET USING or DEFAULT query expression joins two columns with the same name, the column names must include their table names. Otherwise, Vertica assumes that both columns reference the dimension table, and the predicate always evaluates to true.
For example, tables orderFact and custDim both include column cid. Flattened table orderFact defines column cust_name with a SET USING query expression. Because the query predicate references columns cid from both tables, the column names are fully qualified:
=> CREATE TABLE public.orderFact
(
...
cid int REFERENCES public.custDim(cid),
cust_name varchar(20) SET USING (
SELECT name FROM public.custDim WHERE (custDIM.cid = orderFact.cid)),
...
)
Examples
Derive a column's default value from another column
-
Create table t
with two columns, date
and state
, and insert a row of data:
=> CREATE TABLE t (date DATE, state VARCHAR(2));
CREATE TABLE
=> INSERT INTO t VALUES (CURRENT_DATE, 'MA');
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMMIT
SELECT * FROM t;
date | state
------------+-------
2017-12-28 | MA
(1 row)
-
Use ALTER TABLE to add a third column that extracts the integer month value from column date
:
=> ALTER TABLE t ADD COLUMN month INTEGER DEFAULT date_part('month', date);
ALTER TABLE
-
When you query table t
, Vertica returns the number of the month in column date
:
=> SELECT * FROM t;
date | state | month
------------+-------+-------
2017-12-28 | MA | 12
(1 row)
Update default column values
-
Update table t
by subtracting 30 days from date
:
=> UPDATE t SET date = date-30;
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT * FROM t;
date | state | month
------------+-------+-------
2017-11-28 | MA | 12
(1 row)
The value in month
remains unchanged.
-
Refresh the default value in month
from column date
:
=> UPDATE t SET month=DEFAULT;
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> SELECT * FROM t;
date | state | month
------------+-------+-------
2017-11-28 | MA | 11
(1 row)
Derive a default column value from user-defined scalar function
This example shows a user-defined scalar function that adds two integer values. The function is called add2ints
and takes two arguments.
-
Develop and deploy the function, as described in Scalar functions (UDSFs).
-
Create a sample table, t1
, with two integer columns:
=> CREATE TABLE t1 ( x int, y int );
CREATE TABLE
-
Insert some values into t1:
=> insert into t1 values (1,2);
OUTPUT
--------
1
(1 row)
=> insert into t1 values (3,4);
OUTPUT
--------
1
(1 row)
-
Use ALTER TABLE to add a column to t1
, with the default column value derived from the UDSF add2ints
:
alter table t1 add column z int default add2ints(x,y);
ALTER TABLE
-
List the new column:
select z from t1;
z
----
3
7
(2 rows)
Table with a SET USING column that queries another table for its values
-
Define tables t1
and t2
. Column t2.b
is defined to get its data from column t1.b
, through the query in its SET USING clause:
=> CREATE TABLE t1 (a INT PRIMARY KEY ENABLED, b INT);
CREATE TABLE
=> CREATE TABLE t2 (a INT, alpha VARCHAR(10),
b INT SET USING (SELECT t1.b FROM t1 WHERE t1.a=t2.a))
ORDER BY a SEGMENTED BY HASH(a) ALL NODES;
CREATE TABLE
Important
The definition for table t2
includes SEGMENTED BY and ORDER BY clauses that exclude SET USING column b
. If these clauses are omitted, Vertica creates an auto-projection for this table that specifies column b
in its SEGMENTED BY and ORDER BY clauses . Inclusion of a SET USING column in any projection's segmentation or sort order prevents function REFRESH_COLUMNS from populating this column. Instead, it returns with an error.
For details on this and other restrictions, see REFRESH_COLUMNS.
-
Populate the tables with data:
=> INSERT INTO t1 VALUES(1,11),(2,22),(3,33),(4,44);
=> INSERT INTO t2 VALUES (1,'aa'),(2,'bb');
=> COMMIT;
COMMIT
-
View the data in table t2
: Column in SET USING column b
is empty, pending invocation of Vertica function REFRESH_COLUMNS:
=> SELECT * FROM t2;
a | alpha | b
---+-------+---
1 | aa |
2 | bb |
(2 rows)
-
Refresh the column data in table t2
by calling function REFRESH_COLUMNS:
=> SELECT REFRESH_COLUMNS ('t2','b', 'REBUILD');
REFRESH_COLUMNS
---------------------------
refresh_columns completed
(1 row)
In this example, REFRESH_COLUMNS is called with the optional argument REBUILD. This argument specifies to replace all data in SET USING column b
. It is generally good practice to call REFRESH_COLUMNS with REBUILD on any new SET USING column. For details, see REFRESH_COLUMNS.
-
View data in refreshed column b
, whose data is obtained from table t1
as specified in the column's SET USING query:
=> SELECT * FROM t2 ORDER BY a;
a | alpha | b
---+-------+----
1 | aa | 11
2 | bb | 22
(2 rows)
DEFAULT and SET USING expressions support subqueries that can obtain values from other tables, and use those with values in the current table to compute column values. The following example adds a column gmt_delivery_time
to fact table customer_orders
. The column specifies a DEFAULT expression to set values in the new column as follows:
-
Calls meta-function NEW_TIME, which performs the following tasks:
-
Populates the gmt_delivery_time
column with the converted values.
=> CREATE TABLE public.customers(
customer_key int,
customer_name varchar(64),
customer_address varchar(64),
customer_tz varchar(5),
...);
=> CREATE TABLE public.customer_orders(
customer_key int,
order_number int,
product_key int,
product_version int,
quantity_ordered int,
store_key int,
date_ordered date,
date_shipped date,
expected_delivery_date date,
local_delivery_time timestamptz,
...);
=> ALTER TABLE customer_orders ADD COLUMN gmt_delivery_time timestamp
DEFAULT NEW_TIME(customer_orders.local_delivery_time,
(SELECT c.customer_tz FROM customers c WHERE (c.customer_key = customer_orders.customer_key)),
'GMT');
7.7 - Altering table definitions
You can modify a table's definition with ALTER TABLE, in response to evolving database schema requirements.
You can modify a table's definition with
ALTER TABLE
, in response to evolving database schema requirements. Changing a table definition is often more efficient than staging data in a temporary table, consuming fewer resources and less storage.
For information on making column-level changes, see Managing table columns. For details about changing and reorganizing table partitions, see Partitioning existing table data.
7.7.1 - Adding table columns
You add a column to a persistent table with ALTER TABLE...ADD COLUMN:.
You add a column to a persistent table with ALTER TABLE...ADD COLUMN:
ALTER TABLE ...
ADD COLUMN [IF NOT EXISTS] column datatype
[column-constraint]
[ENCODING encoding-type]
[PROJECTIONS (projections-list) | ALL PROJECTIONS ]
An ALTER TABLE statement can include more than one ADD COLUMN clause, separated by commas:
ALTER TABLE...
ADD COLUMN pid INT NOT NULL,
ADD COLUMN desc VARCHAR(200),
ADD COLUMN region INT DEFAULT 1
Columns that use DEFAULT with static values, as shown in the previous example, can be added in a single ALTER TABLE statement. Columns that use non-static DEFAULT values must be added in separate ALTER TABLE statements.
Before you add columns to a table, verify that all its superprojections are up to date.
Table locking
When you use ADD COLUMN to alter a table, Vertica takes an O lock on the table until the operation completes. The lock prevents DELETE, UPDATE, INSERT, and COPY statements from accessing the table. The lock also blocks SELECT statements issued at SERIALIZABLE isolation level, until the operation completes.
Adding a column to a table does not affect K-safety of the physical schema design.
You can add columns when nodes are down.
Adding new columns to projections
When you add a column to a table, Vertica automatically adds the column to superprojections of that table. The ADD COLUMN clause can also specify to add the column to one or more non-superprojections, with one of these options:
-
PROJECTIONS (
projections-list
): Adds the new column to one or more projections of this table, specified as a comma-delimted list of projection base names. Vertica adds the column to all buddies of each projection. The projection list cannot include projections with pre-aggregated data such as live aggregate projections; otherwise, Vertica rolls back the ALTER TABLE statement.
-
ALL PROJECTIONS
adds the column to all projections of this table, excluding projections with pre-aggregated data.
For example, the store_orders
table has two projections, a superprojection (store_orders_super
) and a user-created projection (store_orders_p
). The following ALTER TABLE...ADD COLUMN statement adds a column to the store_orders
table. Because the statement omits the PROJECTIONS option, Vertica adds the column only to the table's superprojection:
=> ALTER TABLE public.store_orders ADD COLUMN expected_ship_date date;
ALTER TABLE
=> SELECT projection_column_name, projection_name FROM projection_columns WHERE table_name ILIKE 'store_orders'
ORDER BY projection_name , projection_column_name;
projection_column_name | projection_name
------------------------+--------------------
order_date | store_orders_p_b0
order_no | store_orders_p_b0
ship_date | store_orders_p_b0
order_date | store_orders_p_b1
order_no | store_orders_p_b1
ship_date | store_orders_p_b1
expected_ship_date | store_orders_super
order_date | store_orders_super
order_no | store_orders_super
ship_date | store_orders_super
shipper | store_orders_super
(11 rows)
The following ALTER TABLE...ADD COLUMN statement includes the PROJECTIONS option. This specifies to include projection store_orders_p
in the add operation. Vertica adds the new column to this projection and the table's superprojection:
=> ALTER TABLE public.store_orders ADD COLUMN delivery_date date PROJECTIONS (store_orders_p);
=> SELECT projection_column_name, projection_name FROM projection_columns WHERE table_name ILIKE 'store_orders'
ORDER BY projection_name, projection_column_name;
projection_column_name | projection_name
------------------------+--------------------
delivery_date | store_orders_p_b0
order_date | store_orders_p_b0
order_no | store_orders_p_b0
ship_date | store_orders_p_b0
delivery_date | store_orders_p_b1
order_date | store_orders_p_b1
order_no | store_orders_p_b1
ship_date | store_orders_p_b1
delivery_date | store_orders_super
expected_ship_date | store_orders_super
order_date | store_orders_super
order_no | store_orders_super
ship_date | store_orders_super
shipper | store_orders_super
(14 rows)
Updating associated table views
Adding new columns to a table that has an associated view does not update the view's result set, even if the view uses a wildcard (*) to represent all table columns. To incorporate new columns, you must recreate the view.
7.7.2 - Dropping table columns
ALTER TABLE...DROP COLUMN drops the specified table column and the ROS containers that correspond to the dropped column:.
ALTER TABLE...DROP COLUMN drops the specified table column and the ROS containers that correspond to the dropped column:
ALTER TABLE [schema.]table DROP [ COLUMN ] [IF EXISTS] column [CASCADE | RESTRICT]
After the drop operation completes, data backed up from the current epoch onward recovers without the column. Data recovered from a backup that precedes the current epoch re-add the table column. Because drop operations physically purge object storage and catalog definitions (table history) from the table, AT EPOCH (historical) queries return nothing for the dropped column.
The altered table retains its object ID.
Note
Drop column operations can be fast because these catalog-level changes do not require data reorganization, so Vertica can quickly reclaim disk storage.
Restrictions
-
You cannot drop or alter a primary key column or a column that participates in the table partitioning clause.
-
You cannot drop the first column of any projection sort order, or columns that participate in a projection segmentation expression.
-
In Enterprise Mode, all nodes must be up. This restriction does not apply to Eon mode.
-
You cannot drop a column associated with an access policy. Attempts to do so produce the following error:
ERROR 6482: Failed to parse Access Policies for table "t1"
Using CASCADE to force a drop
If the table column to drop has dependencies, you must qualify the DROP COLUMN clause with the CASCADE option. For example, the target column might be specified in a projection sort order. In this and other cases, DROP COLUMN...CASCADE handles the dependency by reorganizing catalog definitions or dropping a projection. In all cases, CASCADE performs the minimal reorganization required to drop the column.
Use CASCADE to drop a column with the following dependencies:
Dropped column dependency |
CASCADE behavior |
Any constraint |
Vertica drops the column when a FOREIGN KEY constraint depends on a UNIQUE or PRIMARY KEY constraint on the referenced columns. |
Specified in projection sort order |
Vertica truncates projection sort order up to and including the projection that is dropped without impact on physical storage for other columns and then drops the specified column. For example if a projection's columns are in sort order (a,b,c), dropping column b causes the projection's sort order to be just (a), omitting column (c). |
Specified in a projection segmentation expression |
The column to drop is integral to the projection definition. If possible, Vertica drops the projection as long as doing so does not compromise K-safety; otherwise, the transaction rolls back. |
Referenced as default value of another column |
See Dropping a Column Referenced as Default, below. |
Dropping a column referenced as default
You might want to drop a table column that is referenced by another column as its default value. For example, the following table is defined with two columns, a
and b
:, where b
gets its default value from column a
:
=> CREATE TABLE x (a int) UNSEGMENTED ALL NODES;
CREATE TABLE
=> ALTER TABLE x ADD COLUMN b int DEFAULT a;
ALTER TABLE
In this case, dropping column a
requires the following procedure:
-
Remove the default dependency through ALTER COLUMN..DROP DEFAULT:
=> ALTER TABLE x ALTER COLUMN b DROP DEFAULT;
-
Create a replacement superprojection for the target table if one or both of the following conditions is true:
-
The target column is the table's first sort order column. If the table has no explicit sort order, the default table sort order specifies the first table column as the first sort order column. In this case, the new superprojection must specify a sort order that excludes the target column.
-
If the table is segmented, the target column is specified in the segmentation expression. In this case, the new superprojection must specify a segmentation expression that excludes the target column.
Given the previous example, table x
has a default sort order of (a,b). Because column a
is the table's first sort order column, you must create a replacement superprojection that is sorted on column b
:
=> CREATE PROJECTION x_p1 as select * FROM x ORDER BY b UNSEGMENTED ALL NODES;
-
Run
START_REFRESH
:
=> SELECT START_REFRESH();
START_REFRESH
----------------------------------------
Starting refresh background process.
(1 row)
-
Run MAKE_AHM_NOW:
=> SELECT MAKE_AHM_NOW();
MAKE_AHM_NOW
-------------------------------
AHM set (New AHM Epoch: 1231)
(1 row)
-
Drop the column:
=> ALTER TABLE x DROP COLUMN a CASCADE;
Vertica implements the CASCADE directive as follows:
Examples
The following series of commands successfully drops a BYTEA data type column:
=> CREATE TABLE t (x BYTEA(65000), y BYTEA, z BYTEA(1));
CREATE TABLE
=> ALTER TABLE t DROP COLUMN y;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624: Column "y" does not exist
=> ALTER TABLE t DROP COLUMN x RESTRICT;
ALTER TABLE
=> SELECT x FROM t;
ERROR 2624: Column "x" does not exist
=> SELECT * FROM t;
z
---
(0 rows)
=> DROP TABLE t CASCADE;
DROP TABLE
The following series of commands tries to drop a FLOAT(8) column and fails because there are not enough projections to maintain K-safety.
=> CREATE TABLE t (x FLOAT(8),y FLOAT(08));
CREATE TABLE
=> ALTER TABLE t DROP COLUMN y RESTRICT;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624: Column "y" does not exist
=> ALTER TABLE t DROP x CASCADE;
ROLLBACK 2409: Cannot drop any more columns in t
=> DROP TABLE t CASCADE;
7.7.3 - Altering constraint enforcement
ALTER TABLE...ALTER CONSTRAINT can enable or disable enforcement of primary key, unique, and check constraints.
ALTER TABLE...ALTER CONSTRAINT
can enable or disable enforcement of primary key, unique, and check constraints. You must qualify this clause with the keyword ENABLED
or DISABLED
:
For example:
ALTER TABLE public.new_sales ALTER CONSTRAINT C_PRIMARY ENABLED;
For details, see Constraint enforcement.
7.7.4 - Renaming tables
ALTER TABLE...RENAME TO renames one or more tables.
ALTER TABLE...RENAME TO
renames one or more tables. Renamed tables retain their original OIDs.
You rename multiple tables by supplying two comma-delimited lists. Vertica maps the names according to their order in the two lists. Only the first list can qualify table names with a schema. For example:
=> ALTER TABLE S1.T1, S1.T2 RENAME TO U1, U2;
The RENAME TO
parameter is applied atomically: all tables are renamed, or none of them. For example, if the number of tables to rename does not match the number of new names, none of the tables is renamed.
Caution
If a table is referenced by a view, renaming it causes the view to fail, unless you create another table with the previous name to replace the renamed table.
Using rename to swap tables within a schema
You can use ALTER TABLE...RENAME TO
to swap tables within the same schema, without actually moving data. You cannot swap tables across schemas.
The following example swaps the data in tables T1
and T2
through intermediary table temp
:
-
t1
to temp
-
t2
to t1
-
temp
to t2
=> DROP TABLE IF EXISTS temp, t1, t2;
DROP TABLE
=> CREATE TABLE t1 (original_name varchar(24));
CREATE TABLE
=> CREATE TABLE t2 (original_name varchar(24));
CREATE TABLE
=> INSERT INTO t1 VALUES ('original name t1');
OUTPUT
--------
1
(1 row)
=> INSERT INTO t2 VALUES ('original name t2');
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
=> ALTER TABLE t1, t2, temp RENAME TO temp, t1, t2;
ALTER TABLE
=> SELECT * FROM t1, t2;
original_name | original_name
------------------+------------------
original name t2 | original name t1
(1 row)
7.7.5 - Moving tables to another schema
ALTER TABLE...SET SCHEMA moves a table from one schema to another.
ALTER TABLE...SET SCHEMA
moves a table from one schema to another. Vertica automatically moves all projections that are anchored to the source table to the destination schema. It also moves all IDENTITY columns to the destination schema.
Moving a table across schemas requires that you have USAGE
privileges on the current schema and CREATE
privileges on destination schema. You can move only one table between schemas at a time. You cannot move temporary tables across schemas.
Name conflicts
If a table of the same name or any of the projections that you want to move already exist in the new schema, the statement rolls back and does not move either the table or any projections. To work around name conflicts:
-
Rename any conflicting table or projections that you want to move.
-
Run
ALTER TABLE...SET SCHEMA
again.
Note
Vertica lets you move system tables to system schemas. Moving system tables could be necessary to support designs created through the
Database Designer.
Example
The following example moves table T1
from schema S1
to schema S2
. All projections that are anchored on table T1
automatically move to schema S2
:
=> ALTER TABLE S1.T1 SET SCHEMA S2;
7.7.6 - Changing table ownership
As a superuser or table owner, you can reassign table ownership with ALTER TABLE...OWNER TO, as follows:.
As a superuser or table owner, you can reassign table ownership with ALTER TABLE...OWNER TO, as follows:
ALTER TABLE [schema.]table-name OWNER TO owner-name
Changing table ownership is useful when moving a table from one schema to another. Ownership reassignment is also useful when a table owner leaves the company or changes job responsibilities. Because you can change the table owner, the tables won't have to be completely rewritten, you can avoid loss in productivity.
Changing table ownership automatically causes the following changes:
-
Grants on the table that were made by the original owner are dropped and all existing privileges on the table are revoked from the previous owner. Changes in table ownership has no effect on schema privileges.
-
Ownership of dependent IDENTITY sequences are transferred with the table. However, ownership does not change for named sequences created with CREATE SEQUENCE. To transfer ownership of these sequences, use ALTER SEQUENCE.
-
New table ownership is propagated to its projections.
Example
In this example, user Bob connects to the database, looks up the tables, and transfers ownership of table t33
from himself to user Alice.
=> \c - Bob
You are now connected as user "Bob".
=> \d
Schema | Name | Kind | Owner | Comment
--------+--------+-------+---------+---------
public | applog | table | dbadmin |
public | t33 | table | Bob |
(2 rows)
=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE
When Bob looks up database tables again, he no longer sees table t33
:
=> \d List of tables
List of tables
Schema | Name | Kind | Owner | Comment
--------+--------+-------+---------+---------
public | applog | table | dbadmin |
(1 row)
When user Alice connects to the database and looks up tables, she sees she is the owner of table t33
.
=> \c - Alice
You are now connected as user "Alice".
=> \d
List of tables
Schema | Name | Kind | Owner | Comment
--------+------+-------+-------+---------
public | t33 | table | Alice |
(2 rows)
Alice or a superuser can transfer table ownership back to Bob. In the following case a superuser performs the transfer.
=> \c - dbadmin
You are now connected as user "dbadmin".
=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> \d
List of tables
Schema | Name | Kind | Owner | Comment
--------+----------+-------+---------+---------
public | applog | table | dbadmin |
public | comments | table | dbadmin |
public | t33 | table | Bob |
s1 | t1 | table | User1 |
(4 rows)
You can also query system table TABLES to view table and owner information. Note that a change in ownership does not change the table ID.
In the below series of commands, the superuser changes table ownership back to Alice and queries the TABLES
system table.
=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE
=> SELECT table_schema_id, table_schema, table_id, table_name, owner_id, owner_name FROM tables;
table_schema_id | table_schema | table_id | table_name | owner_id | owner_name
-------------------+--------------+-------------------+------------+-------------------+------------
45035996273704968 | public | 45035996273713634 | applog | 45035996273704962 | dbadmin
45035996273704968 | public | 45035996273724496 | comments | 45035996273704962 | dbadmin
45035996273730528 | s1 | 45035996273730548 | t1 | 45035996273730516 | User1
45035996273704968 | public | 45035996273795846 | t33 | 45035996273724576 | Alice
(5 rows)
Now the superuser changes table ownership back to Bob and queries the TABLES
table again. Nothing changes but the owner_name
row, from Alice to Bob.
=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> SELECT table_schema_id, table_schema, table_id, table_name, owner_id, owner_name FROM tables;
table_schema_id | table_schema | table_id | table_name | owner_id | owner_name
-------------------+--------------+-------------------+------------+-------------------+------------
45035996273704968 | public | 45035996273713634 | applog | 45035996273704962 | dbadmin
45035996273704968 | public | 45035996273724496 | comments | 45035996273704962 | dbadmin
45035996273730528 | s1 | 45035996273730548 | t1 | 45035996273730516 | User1
45035996273704968 | public | 45035996273793876 | foo | 45035996273724576 | Alice
45035996273704968 | public | 45035996273795846 | t33 | 45035996273714428 | Bob
(5 rows)
7.8 - Sequences
Sequences can be used to set the default values of columns to sequential integer values.
Sequences can be used to set the default values of columns to sequential integer values. Sequences guarantee uniqueness, and help avoid constraint enforcement problems and overhead. Sequences are especially useful for primary key columns.
While sequence object values are guaranteed to be unique, they are not guaranteed to be contiguous. For example, two nodes can increment a sequence at different rates. The node with a heavier processing load increments the sequence, but the values are not contiguous with those being incremented on a node with less processing. For details, see Distributing sequences.
Vertica supports the following sequence types:
- Named sequences are database objects that generates unique numbers in sequential ascending or descending order. Named sequences are defined independently through CREATE SEQUENCE statements, and are managed independently of the tables that reference them. A table can set the default values of one or more columns to named sequences.
- IDENTITY column sequences increment or decrement column's value as new rows are added. Unlike named sequences, IDENTITY sequence types are defined in a table's DDL, so they do not persist independently of that table. A table can contain only one IDENTITY column.
7.8.1 - Sequence types compared
The following table lists the differences between the two sequence types:.
The following table lists the differences between the two sequence types:
Supported Behavior |
Named Sequence |
IDENTITY |
Default cache value 250K |
• |
• |
Set initial cache |
• |
• |
Define start value |
• |
• |
Specify increment unit |
• |
• |
Exists as an independent object |
• |
|
Exists only as part of table |
|
• |
Create as column constraint |
|
• |
Requires name |
• |
|
Use in expressions |
• |
|
Unique across tables |
• |
|
Change parameters |
• |
|
Move to different schema |
• |
|
Set to increment or decrement |
• |
|
Grant privileges to object |
• |
|
Specify minimum value |
• |
|
Specify maximum value |
• |
|
7.8.2 - Named sequences
Named sequences are sequences that are defined by CREATE SEQUENCE.
Named sequences are sequences that are defined by CREATE SEQUENCE. Unlike IDENTITY sequences, which are defined in a table's DDL, you create a named sequence as an independent object, and then set it as the default value of a table column.
Named sequences are used most often when an application requires a unique identifier in a table or an expression. After a named sequence returns a value, it never returns the same value again in the same session.
7.8.2.1 - Creating and using named sequences
You create a named sequence with CREATE SEQUENCE.
You create a named sequence with
CREATE SEQUENCE
. The statement requires only a sequence name; all other parameters are optional. To create a sequence, a user must have CREATE privileges on a schema that contains the sequence.
The following example creates an ascending named sequence, my_seq
, starting at the value 100:
=> CREATE SEQUENCE my_seq START 100;
CREATE SEQUENCE
Incrementing and decrementing a sequence
When you create a named sequence object, you can also specify its increment or decrement value by setting its INCREMENT
parameter. If you omit this parameter, as in the previous example, the default is set to 1.
You increment or decrement a sequence by calling the function
NEXTVAL
on it—either directly on the sequence itself, or indirectly by adding new rows to a table that references the sequence. When called for the first time on a new sequence, NEXTVAL
initializes the sequence to its start value. Vertica also creates a cache for the sequence. Subsequent NEXTVAL
calls on the sequence increment its value.
The following call to NEXTVAL
initializes the new my_seq
sequence to 100:
=> SELECT NEXTVAL('my_seq');
nextval
---------
100
(1 row)
Getting a sequence's current value
You can obtain the current value of a sequence by calling
CURRVAL
on it. For example:
=> SELECT CURRVAL('my_seq');
CURRVAL
---------
100
(1 row)
Note
CURRVAL
returns an error if you call it on a new sequence that has not yet been initialized by NEXTVAL
, or an existing sequence that has not yet been accessed in a new session. For example:
=> CREATE SEQUENCE seq2;
CREATE SEQUENCE
=> SELECT currval('seq2');
ERROR 4700: Sequence seq2 has not been accessed in the session
Referencing sequences in tables
A table can set the default values of any column to a named sequence. The table creator must have the following privileges: SELECT on the sequence, and USAGE on its schema.
In the following example, column id
gets its default values from named sequence my_seq
:
=> CREATE TABLE customer(id INTEGER DEFAULT my_seq.NEXTVAL,
lname VARCHAR(25),
fname VARCHAR(25),
membership_card INTEGER
);
For each row that you insert into table customer
, the sequence invokes the NEXTVAL
function to set the value of the id
column. For example:
=> INSERT INTO customer VALUES (default, 'Carr', 'Mary', 87432);
=> INSERT INTO customer VALUES (default, 'Diem', 'Nga', 87433);
=> COMMIT;
For each row, the insert operation invokes NEXTVAL
on the sequence my_seq
, which increments the sequence to 101 and 102, and sets the id
column to those values:
=> SELECT * FROM customer;
id | lname | fname | membership_card
-----+-------+-------+-----------------
101 | Carr | Mary | 87432
102 | Diem | Nga | 87433
(1 row)
7.8.2.2 - Distributing sequences
When you create a sequence, its CACHE parameter determines the number of sequence values each node maintains during a session.
When you create a sequence, its CACHE
parameter determines the number of sequence values each node maintains during a session. The default cache value is 250K, so each node reserves 250,000 values per session for each sequence. The default cache size provides an efficient means for large insert or copy operations.
If sequence caching is set to a low number, nodes are liable to request a new set of cache values more frequently. While it supplies a new cache, Vertica must lock the catalog. Until Vertica releases the lock, other database activities such as table inserts are blocked, which can adversely affect overall performance.
When a new session starts, node caches are initially empty. By default, the initiator node requests and reserves cache for all nodes in a cluster. You can change this default so each node requests its own cache, by setting configuration parameter ClusterSequenceCacheMode
to 0.
For information on how Vertica requests and distributes cache among all nodes in a cluster, refer to Sequence caching.
Effects of distributed sessions
Vertica distributes a session across all nodes. The first time a cluster node calls the function NEXTVAL on a sequence to increment (or decrement) its value, the node requests its own cache of sequence values. The node then maintains that cache for the current session. As other nodes call NEXTVAL, they too create and maintain their own cache of sequence values.
During a session, nodes call NEXTVAL independently and at different frequencies. Each node uses its own cache to populate the sequence. All sequence values are guaranteed to be unique, but can be out of order with a NEXTVAL statement executed on another node. As a result, sequence values are often non-contiguous.
In all cases, increments a sequence only once per row. Thus, if the same sequence is referenced by multiple columns, NEXTVAL sets all columns in that row to the same value. This applies to rows of joined tables.
Calculating sequences
Vertica calculates the current value of a sequence as follows:
-
At the end of every statement, the state of all sequences used in the session is returned to the initiator node.
-
The initiator node calculates the maximum
CURRVAL
of each sequence across all states on all nodes.
-
This maximum value is used as CURRVAL
in subsequent statements until another NEXTVAL is invoked.
Losing sequence values
Sequence values in cache can be lost in the following situations:
-
If a statement fails after NEXTVAL is called (thereby consuming a sequence value from the cache), the value is lost.
-
If a disconnect occurs (for example, dropped session), any remaining values in cache that have not been returned through NEXTVAL are lost.
-
When the initiator node distributes a new block of cache to each node where one or more nodes has not used up its current cache allotment. For information on this scenario, refer to Sequence caching.
You can recover lost sequence values by using ALTER SEQUENCE...RESTART, which resets the sequence to the specified value in the next session.
Caution
Using ALTER SEQUENCE to set a sequence start value below its
current value can result in duplicate keys.
7.8.2.3 - Altering sequences
ALTER SEQUENCE can change sequences in two ways:.
ALTER SEQUENCE can change sequences in two ways:
- Changes values that control sequence behavior—for example, its start value and range of minimum and maximum values. These changes take effect only when you start a new database session.
- Changes sequence name, schema, or ownership. These changes take effect immediately.
Note
The same ALTER SEQUENCE statement cannot make both types of changes.
Changing sequence behavior
ALTER SEQUENCE can change one or more sequence attributes through the following parameters:
These parameters... |
Control... |
INCREMENT |
How much to increment or decrement the sequence on each call to NEXTVAL. |
MINVALUE /MAXVALUE |
Range of valid integers. |
RESTART |
Sequence value on its next call to NEXTVAL. |
CACHE /NO CACHE |
How many sequence numbers are pre-allocated and stored in memory for faster access. |
CYCLE /NO CYCLE |
Whether the sequence wraps when its minimum or maximum values are reached. |
These changes take effect only when you start a new database session. For example, if you create a named sequence my_sequence
that starts at 10 and increments by 1 (the default), each sequence call to NEXTVAL increments its value by 1:
=> CREATE SEQUENCE my_sequence START 10;
=> SELECT NEXTVAL('my_sequence');
nextval
---------
10
(1 row)
=> SELECT NEXTVAL('my_sequence');
nextval
---------
11
(1 row)
The following ALTER SEQUENCE statement specifies to restart the sequence at 50:
=>ALTER SEQUENCE my_sequence RESTART WITH 50;
However, this change has no effect in the current session. The next call to NEXTVAL increments the sequence to 12:
=> SELECT NEXTVAL('my_sequence');
NEXTVAL
---------
12
(1 row)
The sequence restarts at 50 only after you start a new database session:
=> \q
$ vsql
Welcome to vsql, the Vertica Analytic Database interactive terminal.
=> SELECT NEXTVAL('my_sequence');
NEXTVAL
---------
50
(1 row)
Changing sequence name, schema, and ownership
You can use ALTER SEQUENCE to make the following changes to a sequence:
Each of these changes requires separate ALTER SEQUENCE statements. These changes take effect immediately.
For example, the following statement renames a sequence from my_seq
to serial
:
=> ALTER SEQUENCE s1.my_seq RENAME TO s1.serial;
This statement moves sequence s1.serial
to schema s2
:
=> ALTER SEQUENCE s1.my_seq SET SCHEMA TO s2;
The following statement reassigns ownership of s2.serial
to another user:
=> ALTER SEQUENCE s2.serial OWNER TO bertie;
Note
Only a superuser or the sequence owner can change its ownership. Reassignment does not transfer grants from the original owner to the new owner. Grants made by the original owner are dropped.
7.8.2.4 - Dropping sequences
Use DROP SEQUENCE to remove a named sequence.
Use
DROP SEQUENCE
to remove a named sequence. For example:
=> DROP SEQUENCE my_sequence;
You cannot drop a sequence if one of the following conditions is true:
-
Other objects depend on the sequence. DROP SEQUENCE
does not support cascade operations.
-
A column's DEFAULT
expression references the sequence. Before dropping the sequence, you must remove all column references to it.
7.8.3 - IDENTITY sequences
IDENTITY (synonymous with AUTO_INCREMENT) columns are defined with a sequence that automatically increments column values as new rows are added.
IDENTITY (synonymous with AUTO_INCREMENT) columns are defined with a sequence that automatically increments column values as new rows are added. You define an IDENTITY column in a table as follows:
CREATE TABLE table-name...
(column-name {IDENTITY | AUTO_INCREMENT}
( [ cache-size | start, increment [, cache-size ] ] )
Settings
start |
First value to set for this column.
Default: 1
|
increment |
Positive or negative integer that specifies how much to increment or decrement the sequence on each new row insertion from the previous row value, by default set to 1. To decrement sequence values, specify a negative value.
Note
The actual amount by which column values are incremented or decremented might be larger than the increment setting, unless sequence caching is disabled.
Default: 1
|
cache-size |
How many unique numbers each node caches per session. A value of 0 or 1 disables sequence caching. For details, see Sequence caching.
Default: 250,000
|
Managing settings
Like named sequences, you can manage an IDENTITY column with ALTER SEQUENCE—for example, reset its start integer. Two exceptions apply: because the sequence is defined as part of a table column, you cannot change the sequence name or schema. You can query the SEQUENCES system table for the name of an IDENTITY column's sequence. This name is automatically created when you define the table, and conforms to the following convention:
table-name_col-name_seq
For example, you can change the maximum value of an IDENTITY column that is defined in the testAutoId
table:
=> SELECT * FROM sequences WHERE identity_table_name = 'testAutoId';
-[ RECORD 1 ]-------+-------------------------
sequence_schema | public
sequence_name | testAutoId_autoIdCol_seq
owner_name | dbadmin
identity_table_name | testAutoId
session_cache_count | 250000
allow_cycle | f
output_ordered | f
increment_by | 1
minimum | 1
maximum | 1000
current_value | 1
sequence_schema_id | 45035996273704980
sequence_id | 45035996274278950
owner_id | 45035996273704962
identity_table_id | 45035996274278948
=> ALTER SEQUENCE testAutoId_autoIdCol_seq maxvalue 10000;
ALTER SEQUENCE
This change, like other changes to a sequence, take effect only when you start a new database session. One exception applies: changes to the sequence owner take effect immediately.
You can obtain the last value generated for an IDENTITY column by calling LAST_INSERT_ID.
Restrictions
The following restrictions apply to IDENTITY columns:
- A table can contain only one IDENTITY column.
- IDENTITY column values automatically increment before the current transaction is committed; rolling back the transaction does not revert the change.
- You cannot change the value of an IDENTITY column.
Examples
The following example shows how to use the IDENTITY column-constraint to create a table with an ID column. The ID column has an initial value of 1. It is incremented by 1 every time a row is inserted.
-
Create table Premium_Customer
:
=> CREATE TABLE Premium_Customer(
ID IDENTITY(1,1),
lname VARCHAR(25),
fname VARCHAR(25),
store_membership_card INTEGER
);
=> INSERT INTO Premium_Customer (lname, fname, store_membership_card )
VALUES ('Gupta', 'Saleem', 475987);
The IDENTITY column has a seed of 1, which specifies the value for the first row loaded into the table, and an increment of 1, which specifies the value that is added to the IDENTITY value of the previous row.
-
Confirm the row you added and see the ID value:
=> SELECT * FROM Premium_Customer;
ID | lname | fname | store_membership_card
----+-------+--------+-----------------------
1 | Gupta | Saleem | 475987
(1 row)
-
Add another row:
=> INSERT INTO Premium_Customer (lname, fname, store_membership_card)
VALUES ('Lee', 'Chen', 598742);
-
Call the Vertica function LAST_INSERT_ID. The function returns value 2 because you previously inserted a new customer (Chen Lee), and this value is incremented each time a row is inserted:
=> SELECT LAST_INSERT_ID();
last_insert_id
----------------
2
(1 row)
-
View all the ID values in the Premium_Customer
table:
=> SELECT * FROM Premium_Customer;
ID | lname | fname | store_membership_card
----+-------+--------+-----------------------
1 | Gupta | Saleem | 475987
2 | Lee | Chen | 598742
(2 rows)
The next three examples illustrate the three valid ways to use IDENTITY arguments.
The first example uses a cache of 100, and the defaults for start value (1) and increment value (1):
=> CREATE TABLE t1(x IDENTITY(100), y INT);
The next example specifies the start and increment values as 1, and defaults to a cache value of 250,000:
=> CREATE TABLE t2(y IDENTITY(1,1), x INT);
The third example specifies start and increment values of 1, and a cache value of 100:
=> CREATE TABLE t3(z IDENTITY(1,1,100), zx INT);
7.8.4 - Sequence caching
Caching is similar for all sequence types: named sequences and IDENTITY column sequences.
Caching is similar for all sequence types: named sequences and IDENTITY column sequences. To allocate cache among the nodes in a cluster for a given sequence, Vertica uses the following process.
- By default, when a session begins, the cluster initiator node requests cache for itself and other nodes in the cluster.
- The initiator node distributes cache to other nodes when it distributes the execution plan.
- Because the initiator node requests caching for all nodes, only the initiator locks the global catalog for the cache request.
This approach is optimal for handling large INSERT-SELECT and COPY operations. The following figure shows how the initiator request and distributes cache for a named sequence in a three-node cluster, where caching for that sequence is set to 250 K:
Nodes run out of cache at different times. While executing the same query, nodes individually request additional cache as needed.
For new queries in the same session, the initiator might have an empty cache if it used all of its cache to execute the previous query execution. In this case, the initiator requests cache for all nodes.
Configuring sequence caching
You can change how nodes obtain sequence caches by setting the configuration parameter ClusterSequenceCacheMode to 0 (disabled). When this parameter is set to 0, all nodes in the cluster request their own cache and catalog lock. However, for initial large INSERT-SELECT and COPY operations, when the cache is empty for all nodes, each node requests cache at the same time. These multiple requests result in simultaneous locks on the global catalog, which can adversely affect performance. For this reason, ClusterSequenceCacheMode should remain set to its default value of 1 (enabled).
The following example compares how different settings of ClusterSequenceCacheMode affect how Vertica manages sequence caching. The example assumes a three-node cluster, 250 K caches for each node (the default), and sequence ID values that increment by 1.
Workflow step |
ClusterSequenceCacheMode = 1 |
ClusterSequenceCacheMode = 0 |
1 |
Cache is empty for all nodes.
Initiator node requests 250 K cache for each node.
|
Cache is empty for all nodes.
Each node, including initiator, requests its own 250 K cache.
|
2 |
Blocks of cache are distributed to each node as follows:
Each node begins to use its cache as it processes sequence updates.
|
3 |
Initiator node and node 3 run out of cache.
Node 2 only uses 250 K +1 to 400 K, 100 K of cache remains from 400 K +1 to 500 K.
|
4 |
Executing same statement:
-
As each node uses up its cache, it requests a new cache allocation.
-
If node 2 never uses its cache, the 100-K unused cache becomes a gap in sequence IDs.
Executing a new statement in same session, if initiator node cache is empty:
-
It requests and distributes new cache blocks for all nodes.
-
Nodes receive a new cache before the old cache is used, creating a gap in ID sequencing.
|
Executing same or new statement:
-
As each node uses up its cache, it requests a new cache allocation.
-
If node 2 never uses its cache, the 100 K unused cache becomes a gap in sequence IDs.
|
7.9 - Merging table data
MERGE statements can perform update and insert operations on a target table based on the results of a join with a source data set.
MERGE
statements can perform update and insert operations on a target table based on the results of a join with a source data set. The join can match a source row with only one target row; otherwise, Vertica returns an error.
MERGE
has the following syntax:
MERGE INTO target-table USING source-dataset ON join-condition
matching-clause[ matching-clause ]
Merge operations have at least three components:
7.9.1 - Basic MERGE example
In this example, a merge operation involves two tables:.
In this example, a merge operation involves two tables:
-
visits_daily
logs daily restaurant traffic, and is updated with each customer visit. Data in this table is refreshed every 24 hours.
-
visits_history
stores the history of customer visits to various restaurants, accumulated over an indefinite time span.
Each night, you merge the daily visit count from visits_daily
into visits_history
. The merge operation modifies the target table in two ways:
One MERGE
statement executes both operations as a single (upsert) transaction.
Source and target tables
The source and target tables visits_daily
and visits_history
are defined as follows:
CREATE TABLE public.visits_daily
(
customer_id int,
location_name varchar(20),
visit_time time(0) DEFAULT (now())::timetz(6)
);
CREATE TABLE public.visits_history
(
customer_id int,
location_name varchar(20),
visit_count int
);
Table visits_history
contains rows of three customers who between them visited two restaurants, Etoile and LaRosa:
=> SELECT * FROM visits_history ORDER BY customer_id, location_name;
customer_id | location_name | visit_count
-------------+---------------+-------------
1001 | Etoile | 2
1002 | La Rosa | 4
1004 | Etoile | 1
(3 rows)
By close of business, table visits_daily
contains three rows of restaurant visits:
=> SELECT * FROM visits_daily ORDER BY customer_id, location_name;
customer_id | location_name | visit_time
-------------+---------------+------------
1001 | Etoile | 18:19:29
1003 | Lux Cafe | 08:07:00
1004 | La Rosa | 11:49:20
(3 rows)
Table data merge
The following MERGE
statement merges visits_daily
data into visits_history
:
-
For matching customers, MERGE
updates the occurrence count.
-
For non-matching customers, MERGE
inserts new rows.
=> MERGE INTO visits_history h USING visits_daily d
ON (h.customer_id=d.customer_id AND h.location_name=d.location_name)
WHEN MATCHED THEN UPDATE SET visit_count = h.visit_count + 1
WHEN NOT MATCHED THEN INSERT (customer_id, location_name, visit_count)
VALUES (d.customer_id, d.location_name, 1);
OUTPUT
--------
3
(1 row)
MERGE
returns the number of rows updated and inserted. In this case, the returned value specifies three updates and inserts:
-
Customer 1001
's third visit to Etoile
-
New customer 1003
's first visit to new restaurant Lux Cafe
-
Customer 1004
's first visit to La Rosa
If you now query table visits_history
, the result set shows the merged (updated and inserted) data. Updated and new rows are highlighted:
7.9.2 - MERGE source options
A MERGE operation joins the target table to one of the following data sources:.
A MERGE
operation joins the target table to one of the following data sources:
-
Another table
-
View
-
Subquery result set
Merging from table and view data
You merge data from one table into another as follows:
MERGE INTO target-table USING { source-table | source-view } join-condition
matching-clause[ matching-clause ]
If you specify a view, Vertica expands the view name to the query that it encapsulates, and uses the result set as the merge source data.
For example, the VMart table public.product_dimension
contains current and discontinued products. You can move all discontinued products into a separate table public.product_dimension_discontinued
, as follows:
=> CREATE TABLE public.product_dimension_discontinued (
product_key int,
product_version int,
sku_number char(32),
category_description char(32),
product_description varchar(128));
=> MERGE INTO product_dimension_discontinued tgt
USING product_dimension src ON tgt.product_key = src.product_key
AND tgt.product_version = src.product_version
WHEN NOT MATCHED AND src.discontinued_flag='1' THEN INSERT VALUES
(src.product_key,
src.product_version,
src.sku_number,
src.category_description,
src.product_description);
OUTPUT
--------
1186
(1 row)
Source table product_dimension
uses two columns, product_key
and product_version
, to identify unique products. The MERGE
statement joins the source and target tables on these columns in order to return single instances of non-matching rows. The WHEN NOT MATCHED
clause includes a filter (src.discontinued_flag='1'
), which reduces the result set to include only discontinued products. The remaining rows are inserted into target table product_dimension_discontinued
.
Merging from a subquery result set
You can merge into a table the result set that is returned by a subquery, as follows:
MERGE INTO target-table USING (subquery) sq-alias join-condition
matching-clause[ matching-clause ]
For example, the VMart table public.product_dimension
is defined as follows (DDL truncated):
CREATE TABLE public.product_dimension
(
product_key int NOT NULL,
product_version int NOT NULL,
product_description varchar(128),
sku_number char(32),
...
)
ALTER TABLE public.product_dimension
ADD CONSTRAINT C_PRIMARY PRIMARY KEY (product_key, product_version) DISABLED;
Columns product_key
and product_version
comprise the table's primary key. You can modify this table so it contains a single column that concatenates the values of these two columns. This column can be used to uniquely identify each product, while also maintaining the original values from product_key
and product_version
.
You populate the new column with a MERGE
statement that queries the other two columns:
=> ALTER TABLE public.product_dimension ADD COLUMN product_ID numeric(8,2);
ALTER TABLE
=> MERGE INTO product_dimension tgt
USING (SELECT (product_key||'.0'||product_version)::numeric(8,2) AS pid, sku_number
FROM product_dimension) src
ON tgt.product_key||'.0'||product_version::numeric=src.pid
WHEN MATCHED THEN UPDATE SET product_ID = src.pid;
OUTPUT
--------
60000
(1 row)
The following query verifies that the new column values correspond to the values in product_key
and product_version
:
=> SELECT product_ID, product_key, product_version, product_description
FROM product_dimension
WHERE category_description = 'Medical'
AND product_description ILIKE '%diabetes%'
AND discontinued_flag = 1 ORDER BY product_ID;
product_ID | product_key | product_version | product_description
------------+-------------+-----------------+-----------------------------------------
5836.02 | 5836 | 2 | Brand #17487 diabetes blood testing kit
14320.02 | 14320 | 2 | Brand #43046 diabetes blood testing kit
18881.01 | 18881 | 1 | Brand #56743 diabetes blood testing kit
(3 rows)
7.9.3 - MERGE matching clauses
MERGE supports one instance of the following matching clauses:.
MERGE
supports one instance of the following matching clauses:
Each matching clause can specify an additional filter, as described in Update and insert filters.
WHEN MATCHED THEN UPDATE SET
Updates all target table rows that are joined to the source table, typically with data from the source table:
WHEN MATCHED [ AND update-filter ] THEN UPDATE
SET { target-column = expression }[,...]
Vertica can execute the join only on unique values in the source table's join column. If the source table's join column contains more than one matching value, the MERGE
statement returns with a run-time error.
WHEN NOT MATCHED THEN INSERT
WHEN NOT MATCHED THEN INSERT
inserts into the target table a new row for each source table row that is excluded from the join:
WHEN NOT MATCHED [ AND insert-filter ] THEN INSERT
[ ( column-list ) ] VALUES ( values-list )
column-list
is a comma-delimited list of one or more target columns in the target table, listed in any order. MERGE
maps column-list
columns to values-list
values in the same order, and each column-value pair must be compatible. If you omit column-list
, Vertica maps values-list
values to columns according to column order in the table definition.
For example, given the following source and target table definitions:
CREATE TABLE t1 (a int, b int, c int);
CREATE TABLE t2 (x int, y int, z int);
The following WHEN NOT MATCHED
clause implicitly sets the values of the target table columns a
, b
, and c
in the newly inserted rows:
MERGE INTO t1 USING t2 ON t1.a=t2.x
WHEN NOT MATCHED THEN INSERT VALUES (t2.x, t2.y, t2.z);
In contrast, the following WHEN NOT MATCHED
clause excludes columns t1.b
and t2.y
from the merge operation. The WHEN NOT MATCHED
clause explicitly pairs two sets of columns from the target and source tables: t1.a
to t2.x
, and t1.c
to t2.z
. Vertica sets excluded column t1.b
. to null:
MERGE INTO t1 USING t2 ON t1.a=t2.x
WHEN NOT MATCHED THEN INSERT (a, c) VALUES (t2.x, t2.z);
7.9.4 - Update and insert filters
Each WHEN MATCHED and WHEN NOT MATCHED clause in a MERGE statement can optionally specify an update filter and insert filter, respectively:.
Each WHEN MATCHED
and WHEN NOT MATCHED
clause in a MERGE
statement can optionally specify an update filter and insert filter, respectively:
WHEN MATCHED AND update-filter THEN UPDATE ...
WHEN NOT MATCHED AND insert-filter THEN INSERT ...
Vertica also supports Oracle syntax for specifying update and insert filters:
WHEN MATCHED THEN UPDATE SET column-updates WHERE update-filter
WHEN NOT MATCHED THEN INSERT column-values WHERE insert-filter
Each filter can specify multiple conditions. Vertica handles the filters as follows:
-
An update filter is applied to the set of matching rows in the target table that are returned by the MERGE
join. For each row where the update filter evaluates to true, Vertica updates the specified columns.
-
An insert filter is applied to the set of source table rows that are excluded from the MERGE
join. For each row where the insert filter evaluates to true, Vertica adds a new row to the target table with the specified values.
For example, given the following data in tables t11
and t22
:
=> SELECT * from t11 ORDER BY pk;
pk | col1 | col2 | SKIP_ME_FLAG
----+------+------+--------------
1 | 2 | 3 | t
2 | 3 | 4 | t
3 | 4 | 5 | f
4 | | 6 | f
5 | 6 | 7 | t
6 | | 8 | f
7 | 8 | | t
(7 rows)
=> SELECT * FROM t22 ORDER BY pk;
pk | col1 | col2
----+------+------
1 | 2 | 4
2 | 4 | 8
3 | 6 |
4 | 8 | 16
(4 rows)
You can merge data from table t11
into table t22
with the following MERGE
statement, which includes update and insert filters:
=> MERGE INTO t22 USING t11 ON ( t11.pk=t22.pk )
WHEN MATCHED
AND t11.SKIP_ME_FLAG=FALSE AND (
COALESCE (t22.col1<>t11.col1, (t22.col1 is null)<>(t11.col1 is null))
)
THEN UPDATE SET col1=t11.col1, col2=t11.col2
WHEN NOT MATCHED
AND t11.SKIP_ME_FLAG=FALSE
THEN INSERT (pk, col1, col2) VALUES (t11.pk, t11.col1, t11.col2);
OUTPUT
--------
3
(1 row)
=> SELECT * FROM t22 ORDER BY pk;
pk | col1 | col2
----+------+------
1 | 2 | 4
2 | 4 | 8
3 | 4 | 5
4 | | 6
6 | | 8
(5 rows)
Vertica uses the update and insert filters as follows:
-
Evaluates all matching rows against the update filter conditions. Vertica updates each row where the following two conditions both evaluate to true:
-
Evaluates all non-matching rows in the source table against the insert filter. For each row where column t11.SKIP_ME_FLAG
is set to false, Vertica inserts a new row in the target table.
7.9.5 - MERGE optimization
You can improve MERGE performance in several ways:.
You can improve MERGE
performance in several ways:
Projections for MERGE operations
The Vertica query optimizer automatically chooses the best projections to implement a merge operation. A good projection design strategy provides projections that help the query optimizer avoid extra sort and data transfer operations, and facilitate MERGE
performance.
For example, the following MERGE
statement fragment joins source and target tables tgt
and src
, respectively, on columns tgt.a
and src.b
:
=> MERGE INTO tgt USING src ON tgt.a = src.b ...
Vertica can use a local merge join if projections for tables tgt
and src
use one of the following projection designs, where inputs are presorted by projection ORDER BY
clauses:
Optimizing MERGE query plans
Vertica prepares an optimized query plan if the following conditions are all true:
-
The MERGE
statement contains both matching clauses
WHEN MATCHED THEN UPDATE SET
and
WHEN NOT MATCHED THEN INSERT
. If the MERGE
statement contains only one matching clause, it uses a non-optimized query plan.
-
The MERGE
statement excludes update and insert filters.
-
The target table join column has a unique or primary key constraint. This requirement does not apply to the source table join column.
-
Both matching clauses specify all columns in the target table.
-
Both matching clauses specify identical source values.
For details on evaluating an
EXPLAIN
-generated query plan, see MERGE path.
The examples that follow use a simple schema to illustrate some of the conditions under which Vertica prepares or does not prepare an optimized query plan for MERGE
:
CREATE TABLE target(a INT PRIMARY KEY, b INT, c INT) ORDER BY b,a;
CREATE TABLE source(a INT, b INT, c INT) ORDER BY b,a;
INSERT INTO target VALUES(1,2,3);
INSERT INTO target VALUES(2,4,7);
INSERT INTO source VALUES(3,4,5);
INSERT INTO source VALUES(4,6,9);
COMMIT;
Optimized MERGE statement
Vertica can prepare an optimized query plan for the following MERGE
statement because:
-
The target table's join column t.a
has a primary key constraint.
-
All columns in the target table (a,b,c)
are included in the UPDATE
and INSERT
clauses.
-
The UPDATE
and INSERT
clauses specify identical source values: s.a
, s.b
, and s.c
.
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a=s.a, b=s.b, c=s.c
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);
OUTPUT
--------
2
(1 row)
The output value of 2 indicates success and denotes the number of rows updated/inserted from the source into the target.
Non-optimized MERGE statement
In the next example, the MERGE
statement runs without optimization because the source values in the UPDATE/INSERT
clauses are not identical. Specifically, the UPDATE
clause includes constants for columns s.a
and s.c
and the INSERT
clause does not:
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c - 1
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a, s.b, s.c);
To make the previous MERGE
statement eligible for optimization, rewrite the statement so that the source values in the UPDATE
and INSERT
clauses are identical:
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c -1
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a + 1, s.b, s.c - 1);
7.9.6 - MERGE restrictions
The following restrictions apply to updating and inserting table data with MERGE.
The following restrictions apply to updating and inserting table data with
MERGE
.
Constraint enforcement
If primary key, unique key, or check constraints are enabled for automatic enforcement in the target table, Vertica enforces those constraints when you load new data. If a violation occurs, Vertica rolls back the operation and returns an error.
Caution
If you run MERGE multiple times using the same target and source table, each iteration is liable to introduce duplicate values into the target columns and return with an error.
Columns prohibited from merge
The following columns cannot be specified in a merge operation; attempts to do so return with an error:
-
IDENTITY columns, or columns whose default value is set to a named sequence.
-
Vmap columns such as __raw__
in flex tables.
-
Columns of complex types ARRAY, SET, or ROW.
7.10 - Removing table data
Vertica provides several ways to remove data from a table:.
Vertica provides several ways to remove data from a table:
Delete operation |
Description |
Drop a table |
Permanently remove a table and its definition, optionally remove associated views and projections. |
Delete table rows |
Mark rows with delete vectors and store them so data can be rolled back to a previous epoch. The data must be purged to reclaim disk space. |
Truncate table data |
Remove all storage and history associated with a table. The table structure is preserved for future use. |
Purge data |
Permanently remove historical data from physical storage and free disk space for reuse. |
Drop partitions |
Remove one more partitions from a table. Each partition contains a related subset of data in the table. Dropping partitioned data is efficient, and provides query performance benefits. |
7.10.1 - Data removal operations compared
/need to include purge operations? or is that folded into DELETE operations?/.
The following table summarizes differences between various data removal operations.
Operations and options |
Performance |
Auto commits |
Saves history |
DELETE FROM ``table |
Normal |
No |
Yes |
DELETE FROM ``temp-table |
High |
No |
No |
DELETE FROM table where-clause |
Normal |
No |
Yes |
DELETE FROM temp-table where-clause |
Normal |
No |
Yes |
DELETE FROM temp-table where-clause
ON COMMIT PRESERVE ROWS
|
Normal |
No |
Yes |
DELETE FROM temp-table where-clause
ON COMMIT DELETE ROWS
|
High |
Yes |
No |
DROP table |
High |
Yes |
No |
TRUNCATE table |
High |
Yes |
No |
TRUNCATE temp-table |
High |
Yes |
No |
SELECT DROP_PARTITIONS (...) |
High |
Yes |
No |
Choosing the best operation
The following table can help you decide which operation is best for removing table data:
If you want to... |
Use... |
Delete both table data and definitions and start from scratch. |
DROP TABLE |
Quickly drop data while preserving table definitions, and reload data. |
TRUNCATE TABLE |
Regularly perform bulk delete operations on logical sets of data. |
DROP_PARTITIONS |
Occasionally perform small deletes with the option to roll back or review history. |
DELETE |
7.10.2 - Optimizing DELETE and UPDATE
Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries.
Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not achieve the same level of performance as other queries. A DELETE and UPDATE operation must update all projections, so the operation can only be as fast as the slowest projection.
To improve the performance of DELETE and UPDATE queries, consider the following issues:
- Query performance after large DELETE operations: Vertica's implementation of DELETE differs from traditional databases: it does not delete data from disk storage; rather, it marks rows as deleted so they are available for historical queries. Deletion of 10% or more of the total rows in a table can adversely affect queries on that table. In that case, consider purging those rows to improve performance.
- Recovery performance: Recovery is the action required for a cluster to restore K-safety after a crash. Large numbers of deleted records can degrade the performance of a recovery. To improve recovery performance, purge the deleted rows.
- Concurrency: DELETE and UPDATE take exclusive locks on the table. Only one DELETE or UPDATE transaction on a table can be in progress at a time and only when no load operations are in progress. Delete and update operations on different tables can run concurrently.
Projection column requirements for optimized delete
A projection is optimized for delete and update operations if it contains all columns required by the query predicate. In general, DML operations are significantly faster when performed on optimized projections than on non-optimized projections.
For example, consider the following table and projections:
=> CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER);
=> CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a;
=> CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;
In the following query, both p1
and p2
are eligible for DELETE and UPDATE optimization because column a
is available:
=> DELETE from t WHERE a = 1;
In the following example, only projection p1
is eligible for DELETE and UPDATE optimization because the b column is not available in p2
:
=> DELETE from t WHERE b = 1;
Optimized DELETE in subqueries
To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE statement's WHERE clause must be in the projection definition.
For example, the following simple schema has two tables and three projections:
=> CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
=> CREATE TABLE tb2 (g INT, h INT, i INT, j INT);
The first projection references all columns in tb1
and sorts on column a
:
=> CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;
The buddy projection references and sorts on column a
in tb1
:
=> CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;
This projection references all columns in tb2
and sorts on column i
:
=> CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;
Consider the following DML statement, which references tb1.a
in its WHERE
clause. Since both projections on tb1
contain column a
, both are eligible for the optimized DELETE:
=> DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);
Restrictions
Optimized DELETE operations are not supported under the following conditions:
-
With replicated projections if subqueries reference the target table. For example, the following syntax is not supported:
=> DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e = tb1.e);
-
With subqueries that do not return multiple rows. For example, the following syntax is not supported:
=> DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
Projection sort order for optimizing DELETE
Design your projections so that frequently-used DELETE or UPDATE predicate columns appear in the sort order of all projections for large DELETE and UPDATE operations.
For example, suppose most of the DELETE queries you perform on a projection look like the following:
=> DELETE from t where time_key < '1-1-2007'
To optimize the delete operations, make time_key
appear in the ORDER BY clause of all projections. This schema design results in better performance of the delete operation.
In addition, add sort columns to the sort order such that each combination of the sort key values uniquely identifies a row or a small set of rows. For more information, see Choosing sort order: best practices. To analyze projections for sort order issues, use the EVALUATE_DELETE_PERFORMANCE function.
7.10.3 - Purging deleted data
In Vertica, delete operations do not remove rows from physical storage.
In Vertica, delete operations do not remove rows from physical storage. DELETE marks rows as deleted, as does UPDATE, which combines delete and insert operations. In both cases, Vertica retains discarded rows as historical data, which remains accessible to historical queries until it is purged.
The cost of retaining historical data is twofold:
-
Disk space is allocated to deleted rows and delete markers.
-
Typical (non-historical) queries must read and skip over deleted data, which can impact performance.
A purge operation permanently removes historical data from physical storage and frees disk space for reuse. Only historical data that precedes the Ancient History Mark (AHM) is eligible to be purged.
You can purge data in two ways:
In both cases, Vertica purges all historical data up to and including the AHM epoch and resets the AHM. See Epochs for additional information about how Vertica uses epochs.
Caution
Large delete and purge operations can take a long time to complete, so use them sparingly. If your application requires deleting data on a regular basis, such as by month or year, consider designing tables that take advantage of
table partitioning. If partitioning is not suitable, consider
rebuilding the table.
7.10.3.1 - Setting a purge policy
The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged.
The preferred method for purging data is to establish a policy that determines which deleted data is eligible to be purged. Eligible data is automatically purged when the Tuple Mover performs mergeout operations.
Vertica provides two methods for determining when deleted data is eligible to be purged:
Specifying the time for which delete data is saved
Specifying the time for which delete data is saved is the preferred method for determining which deleted data can be purged. By default, Vertica saves historical data only when nodes are down.
To change the specified time for saving deleted data, use the HistoryRetentionTime
configuration parameter:
=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = {seconds | -1};
In the above syntax:
-
seconds is the amount of time (in seconds) for which to save deleted data.
-
-1 indicates that you do not want to use the HistoryRetentionTime
configuration parameter to determine which deleted data is eligible to be purged. Use this setting if you prefer to use the other method (HistoryRetentionEpochs
) for determining which deleted data can be purged.
The following example sets the history epoch retention level to 240 seconds:
=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = 240;
Specifying the number of epochs that are saved
Unless you have a reason to limit the number of epochs, Vertica recommends that you specify the time over which delete data is saved.
To specify the number of historical epoch to save through the HistoryRetentionEpochs
configuration parameter:
-
Turn off the HistoryRetentionTime
configuration parameter:
=> ALTER DATABASE DEFAULT SET HistoryRetentionTime = -1;
-
Set the history epoch retention level through the HistoryRetentionEpochs
configuration parameter:
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = {num_epochs | -1};
-
num_epochs is the number of historical epochs to save.
-
-1 indicates that you do not want to use the HistoryRetentionEpochs
configuration parameter to trim historical epochs from the epoch map. By default, HistoryRetentionEpochs
is set to -1.
The following example sets the number of historical epochs to save to 40:
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = 40;
Modifications are immediately implemented across all nodes within the database cluster. You do not need to restart the database.
Note
If both HistoryRetentionTime
and HistoryRetentionEpochs
are specified, HistoryRetentionTime
takes precedence.
See Epoch management parameters for additional details. See Epochs for information about how Vertica uses epochs.
Disabling purge
If you want to preserve all historical data, set the value of both historical epoch retention parameters to -1, as follows:
=> ALTER DABABASE mydb SET HistoryRetentionTime = -1;
=> ALTER DATABASE DEFAULT SET HistoryRetentionEpochs = -1;
7.10.3.2 - Manually purging data
You manually purge deleted data as follows:.
You manually purge deleted data as follows:
-
Set the cut-off date for purging deleted data. First, call one of the following functions to verify the current ancient history mark (AHM):
-
Set the AHM to the desired cut-off date with one of the following functions:
If you call SET_AHM_TIME
, keep in mind that the timestamp you specify is mapped to an epoch, which by default has a three-minute granularity. Thus, if you specify an AHM time of 2008-01-01 00:00:00.00
, Vertica might purge data from the first three minutes of 2008, or retain data from last three minutes of 2007.
Note
You cannot advance the AHM beyond a point where Vertica is unable to recover data for a down node.
-
Purge deleted data from the desired projections with one of the following functions:
The tuple mover performs a mergeout operation to purge the data. Vertica periodically invokes the tuple mover to perform mergeout operations, as configured by tuple mover parameters. You can manually invoke the tuple mover by calling the function
DO_TM_TASK
.
Caution
Manual purge operations can take a long time.
See Epochs for additional information about how Vertica uses epochs.
7.10.4 - Truncating tables
TRUNCATE TABLE removes all storage associated with the target table and its projections.
TRUNCATE TABLE removes all storage associated with the target table and its projections. Vertica preserves the table and the projection definitions. If the truncated table has out-of-date projections, those projections are cleared and marked up-to-date when TRUNCATE TABLE returns.
TRUNCATE TABLE commits the entire transaction after statement execution, even if truncating the table fails. You cannot roll back a TRUNCATE TABLE statement.
Use TRUNCATE TABLE for testing purposes. You can use it to remove all data from a table and load it with fresh data, without recreating the table and its projections.
Table locking
TRUNCATE TABLE takes an O (owner) lock on the table until the truncation process completes. The savepoint is then released.
If the operation cannot obtain an O lock on the target table, Vertica tries to close any internal Tuple Mover sessions that are running on that table. If successful, the operation can proceed. Explicit Tuple Mover operations that are running in user sessions do not close. If an explicit Tuple Mover operation is running on the table, the operation proceeds only when the operation is complete.
Restrictions
You cannot truncate an external table.
Examples
=> INSERT INTO sample_table (a) VALUES (3);
=> SELECT * FROM sample_table;
a
---
3
(1 row)
=> TRUNCATE TABLE sample_table;
TRUNCATE TABLE
=> SELECT * FROM sample_table;
a
---
(0 rows)
7.11 - Rebuilding tables
You can reclaim disk space on a large scale by rebuilding tables, as follows:.
You can reclaim disk space on a large scale by rebuilding tables, as follows:
-
Create a table with the same (or similar) definition as the table to rebuild.
-
Create projections for the new table.
-
Copy data from the target table into the new one with
INSERT...SELECT
.
-
Drop the old table and its projections.
Note
Rather than dropping the old table, you can rename it and use it as a backup copy. Before doing so, verify that you have sufficient disk space for both the new and old tables.
-
Rename the new table with
ALTER TABLE...RENAME
, using the name of the old table.
Caution
When you rebuild a table, Vertica purges the table of all delete vectors that precede the AHM. This prevents historical queries on any older epoch.
Projection considerations
-
You must have enough disk space to contain the old and new projections at the same time. If necessary, you can drop some of the old projections before loading the new table. You must, however, retain at least one superprojection of the old table (or two buddy superprojections to maintain K-safety) until the new table is loaded. (See Prepare disk storage locations for disk space requirements.)
-
You can specify different names for the new projections or use ALTER TABLE...RENAME
to change the names of the old projections.
-
The relationship between tables and projections does not depend on object names. Instead, it depends on object identifiers that are not affected by rename operations. Thus, if you rename a table, its projections continue to work normally.
7.12 - Dropping tables
DROP TABLE drops a table from the database catalog.
DROP TABLE
drops a table from the database catalog. If any projections are associated with the table, DROP TABLE
returns an error message unless it also includes the CASCADE
option. One exception applies: the table only has an auto-generated superprojection (auto-projection) associated with it.
Using CASCADE
In the following example, DROP TABLE
tries to remove a table that has several projections associated with it. Because it omits the CASCADE
option, Vertica returns an error:
=> DROP TABLE d1;
NOTICE: Constraint - depends on Table d1
NOTICE: Projection d1p1 depends on Table d1
NOTICE: Projection d1p2 depends on Table d1
NOTICE: Projection d1p3 depends on Table d1
NOTICE: Projection f1d1p1 depends on Table d1
NOTICE: Projection f1d1p2 depends on Table d1
NOTICE: Projection f1d1p3 depends on Table d1
ERROR: DROP failed due to dependencies: Cannot drop Table d1 because other objects depend on it
HINT: Use DROP ... CASCADE to drop the dependent objects too.
=> DROP TABLE d1 CASCADE;
DROP TABLE
=> CREATE TABLE mytable (a INT, b VARCHAR(256));
CREATE TABLE
=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Doesn't exist
NOTICE: Nothing was dropped
DROP TABLE
The next attempt includes the CASCADE
option and succeeds:
=> DROP TABLE d1 CASCADE;
DROP TABLE
=> CREATE TABLE mytable (a INT, b VARCHAR(256));
CREATE TABLE
=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Doesn't exist
NOTICE: Nothing was dropped
DROP TABLE
Using IF EXISTS
In the following example, DROP TABLE
includes the option IF EXISTS
. This option specifies not to report an error if one or more of the tables to drop does not exist. This clause is useful in SQL scripts—for example, to ensure that a table is dropped before you try to recreate it:
=> DROP TABLE IF EXISTS mytable;
DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Table doesn't exist
NOTICE: Nothing was dropped
DROP TABLE
Dropping and restoring view tables
Views that reference a table that is dropped and then replaced by another table with the same name continue to function and use the contents of the new table. The new table must have the same column definitions.
8 - Managing client connections
Vertica provides several settings to control client connections:.
Vertica provides several settings to control client connections:
-
Limit the number of client connections a user can have open at the same time.
-
Limit the time a client connection can be idle before being automatically disconnected.
-
Use connection load balancing to spread the overhead of servicing client connections among nodes.
-
Detect unresponsive clients with TCP keepalive.
-
Drain a subcluster to reject any new client connections to that subcluster. For details, see Drain client connections.
-
Route client connections to subclusters based on their workloads. For details, see Workload routing.
Total client connections to a given node cannot exceed the limits set in MaxClientSessions.
Changes to a client's MAXCONNECTIONS
property have no effect on current sessions; these changes apply only to new sessions. For example, if you change user's connection mode from DATABASE
to NODE
, current node connections are unaffected. This change applies only to new sessions, which are reserved on the invoking node.
When Vertica closes a client connection, the client's ongoing operations, if any, are canceled.
8.1 - Limiting the number and length of client connections
You can manage how many active sessions a user can open to the server, and the duration of those sessions.
You can manage how many active sessions a user can open to the server, and the duration of those sessions. Doing so helps prevent overuse of available resources, and can improve overall throughput.
You can define connection limits at two levels:
-
Set the MAXCONNECTIONS property on individual users. This property specifies how many sessions a user can open concurrently on individual nodes, or across the database cluster. For example, the following ALTER USER statement allows user Joe up to 10 concurrent sessions:
=> ALTER USER Joe MAXCONNECTIONS 10 ON DATABASE;
-
Set the configuration parameter MaxClientSessions on the database or individual nodes. This parameter specifies the maximum number of client sessions that can run on nodes in the database cluster, by default set to 50. An extra five sessions are always reserved to dbadmin users. This enables them to log in when the total number of client sessions equals MaxClientSessions.
Total client connections to a given node cannot exceed the limits set in MaxClientSessions.
Changes to a client's MAXCONNECTIONS property have no effect on current sessions; these changes apply only to new sessions. For example, if you change user's connection mode from DATABASE to NODE, current node connections are unaffected. This change applies only to new sessions, which are reserved on the invoking node.
Managing TCP keepalive settings
Vertica uses kernel TCP keepalive parameters to detect unresponsive clients and determine when the connection should be closed.Vertica also supports a set of equivalent KeepAlive parameters that can override TCP keepalive parameter settings. By default, all Vertica KeepAlive parameters are set to 0, which signifies to use TCP keepalive settings. To override TCP keepalive settings, set the equivalent parameters at the database level with ALTER DATABASE, or for the current session with ALTER SESSION.
TCP keepalive Parameter |
Vertica Parameter |
Description |
tcp_keepalive_time |
KeepAliveIdleTime |
Length (in seconds) of the idle period before the first TCP keepalive probe is sent to ensure that the client is still connected. |
tcp_keepalive_probes |
KeepAliveProbeCount |
Number of consecutive keepalive probes that must go unacknowledged by the client before the client connection is considered lost and closed |
tcp_keepalive_intvl |
KeepAliveProbeInterval |
Time interval (in seconds) between keepalive probes. |
Examples
The following examples show how to use Vertica KeepAlive parameters to override TCP keepalive parameters as follows:
-
After 600 seconds (ten minutes), the first keepalive probe is sent to the client.
-
Consecutive keepalive probes are sent every 30 seconds.
-
If the client fails to respond to 10 keepalive probes, the connection is considered lost and is closed.
To make this the default policy for client connections, use ALTER DATABASE:
=> ALTER DATABASE DEFAULT SET KeepAliveIdleTime = 600;
=> ALTER DATABASE DEFAULT SET KeepAliveProbeInterval = 30;
=> ALTER DATABASE DEFAULT SET KeepAliveProbeCount = 10;
To override database-level policies for the current session, use ALTER SESSION:
=> ALTER SESSION SET KeepAliveIdleTime = 400;
=> ALTER SESSION SET KeepAliveProbeInterval = 72;
=> ALTER SESSION SET KeepAliveProbeCount = 60;
Query system table CONFIGURATION_PARAMETERS to verify database and session settings of the three Vertica KeepAlive parameters:
=> SELECT parameter_name, database_value, current_value FROM configuration_parameters WHERE parameter_name ILIKE 'KeepAlive%';
parameter_name | database_value | current_value
------------------------+----------------+---------------
KeepAliveProbeCount | 10 | 60
KeepAliveIdleTime | 600 | 400
KeepAliveProbeInterval | 30 | 72
(3 rows)
Limiting idle session length
If a client continues to respond to TCP keepalive probes, but is not running any queries, the client's session is considered idle. Idle sessions eventually time out. The maximum time that sessions are allowed to idle can be set at three levels, in descending order of precedence:
- As dbadmin, set the IDLESESSIONTIMEOUT property for individual users. This property overrides all other session timeout settings.
- Users can limit the idle time of the current session with SET SESSION IDLESESSIONTIMEOUT. Non-superusers can only set their session idle time to a value equal to or lower than their own IDLESESSIONTIMEOUT setting. If no session idle time is explicitly set for a user, the session idle time for that user is inherited from the node or database settings.
- As dbadmin, set configuration parameter DEFAULTIDLESESSIONTIMEOUT. on the database or on individual nodes. This You can limit the default database cluster or individual nodes, with configuration parameter DEFAULTIDLESESSIONTIMEOUT. This parameter sets the default timeout value for all non-superusers.
All settings apply to sessions that are continuously idle—that is, sessions where no queries are running. If a client is slow or unresponsive during query execution, that time does not apply to timeouts. For example, the time that is required for a streaming batch insert is not counted towards timeout. The server identifies a session as idle starting from the moment it starts to wait for any type of message from that session.
Viewing session settings
The following sections demonstrate how you can query the database for details about the session and connection limits.
Session length limits
Use SHOW DATABASE to view the session length limit for the database:
=> SHOW DATABASE DEFAULT DEFAULTIDLESESSIONTIMEOUT;
name | setting
---------------------------+---------
DefaultIdleSessionTimeout | 2 day
(1 row)
Use SHOW to view the length limit for the current session:
=> SHOW IDLESESSIONTIMEOUT;
name | setting
--------------------+---------
idlesessiontimeout | 1
(1 row)
Connection limits
Use SHOW DATABASE to view the connection limits for the database:
=> SHOW DATABASE DEFAULT MaxClientSessions;
name | setting
-------------------+---------
MaxClientSessions | 50
(1 row)
Query USERS to view the connection limits for users:
=> SELECT user_name, max_connections, connection_limit_mode FROM users
WHERE user_name != 'dbadmin';
user_name | max_connections | connection_limit_mode
-----------+-----------------+-----------------------
SuzyX | 3 | database
Joe | 10 | database
(2 rows)
Closing user sessions
To manually close a user session, use CLOSE_USER_SESSIONS:
=> SELECT CLOSE_USER_SESSIONS ('Joe');
close_user_sessions
------------------------------------------------------------------------------
Close all sessions for user Joe sent. Check v_monitor.sessions for progress.
(1 row)
Example
A user executes a query, and for some reason the query takes an unusually long time to finish (for example, because of server traffic or query complexity). In this case, the user might think the query failed, and opens another session to run the same query. Now, two sessions run the same query, using an extra connection.
To prevent this situation, you can limit how many sessions individual users can run, by modifying their MAXCONNECTIONS user property. This can help minimize the chances of running redundant queries. It also helps prevent users from consuming all available connections, as set by the database For example, the following setting on user SuzyQ
limits her to two database sessions at any time:
=> CREATE USER SuzyQ MAXCONNECTIONS 2 ON DATABASE;
Limiting Another issue setting client connections prevents is when a user connects to the server many times. Too many user connections exhausts the number of allowable connections set by database configuration parameter MaxClientSessions.
Note
No user can have a MAXCONNECTIONS limit greater than the MaxClientSessions setting.
Cluster changes and connections
Behavior changes can occur with client connection limits when the following changes occur to a cluster:
Changes in node availability between connection requests have little impact on connection limits.
In terms of honoring connection limits, no significant impact exists when nodes go down or come up in between connection requests. No special actions are needed to handle this. However, if a node goes down, its active session exits and other nodes in the cluster also drop their sessions. This frees up connections. The query may hang in which case the blocked sessions are reasonable and as expected.
8.2 - Drain client connections
Draining client connections in a subclusters prepares the subcluster for shutdown by marking all nodes in the subcluster as draining.
Eon Mode only
Draining client connections in a subclusters prepares the subcluster for shutdown by marking all nodes in the subcluster as draining. Work from existing user sessions continues on draining nodes, but the nodes refuse new client connections and are excluded from load-balancing operations. If clients attempt to connect to a draining node, they receive an error that informs them of the draining status. Load balancing operations exclude draining nodes, so clients that opt-in to connection load balancing should receive a connection error only if all nodes in the load balancing policy are draining. You do not need to change any connection load balancing configurations to use this feature. dbadmin can still connect to draining nodes.
To drain client connections before shutting down a subcluster, you can use the SHUTDOWN_WITH_DRAIN function. This function performs a Graceful Shutdown that marks a subcluster as draining until either the existing connections complete their work and close or a user-specified timeout is reached. When one of these conditions is met, the function proceeds to shutdown the subcluster. Vertica provides several meta-functions that allow you to independently perform each step of the SHUTDOWN_WITH_DRAIN process. You can use the START_DRAIN_SUBCLUSTER function to mark a subcluster as draining and then the SHUTDOWN_SUBCLUSTER function to shut down a subcluster once its connections have closed.
You can use the CANCEL_DRAIN_SUBCLUSTER function to mark all nodes in a subcluster as not draining. As soon as a node is both UP and not draining, the node accepts new client connections. If all nodes in a draining subcluster are down, the draining status of its nodes is automatically reset to not draining.
You can query the DRAINING_STATUS system table to monitor the draining status of each node as well as client connection information, such as the number of active user sessions on each node.
The following example drains a subcluster named analytics, then cancels the draining of the subcluster.
To mark the analytics subcluster as draining, call SHUTDOWN_WITH_DRAIN with a negative timeout value:
=> SELECT SHUTDOWN_WITH_DRAIN('analytics', -1);
NOTICE 0: Draining has started on subcluster (analytics)
You can confirm that the subcluster is draining by querying the DRAINING_STATUS system table:
=> SELECT node_name, subcluster_name, is_draining FROM draining_status ORDER BY 1;
node_name | subcluster_name | is_draining
-------------------+--------------------+--------------
verticadb_node0001 | default_subcluster | f
verticadb_node0002 | default_subcluster | f
verticadb_node0003 | default_subcluster | f
verticadb_node0004 | analytics | t
verticadb_node0005 | analytics | t
verticadb_node0006 | analytics | t
If a client attempts to connect directly to a node in the draining subcluster, they receive the following error message:
$ /opt/vertica/bin/vsql -h noeIP --password password verticadb analyst
vsql: FATAL 10611: New session rejected because subcluster to which this node belongs is draining connections
To cancel the graceful shutdown of the analytics
subcluster, you can type Ctrl+C:
=> SELECT SHUTDOWN_WITH_DRAIN('analytics', -1);
NOTICE 0: Draining has started on subcluster (analytics)
^CCancel request sent
ERROR 0: Cancel received after draining started and before shutdown issued. Nodes will not be shut down. The subclusters are still in the draining state.
HINT: Run cancel_drain_subcluster('') to restore all nodes to the 'not_draining' state
As mentioned in the above hint, you can run CANCEL_DRAIN_SUBCLUSTER to reset the status of the draining nodes in the subcluster to not draining:
=> SELECT CANCEL_DRAIN_SUBCLUSTER('analytics');
CANCEL_DRAIN_SUBCLUSTER
--------------------------------------------------------
Targeted subcluster: 'analytics'
Action: CANCEL DRAIN
(1 row)
To confirm that the subcluster is no longer draining, you can again query the DRAINING_STATUS system table:
=> SELECT node_name, subcluster_name, is_draining FROM draining_status ORDER BY 1;
node_name | subcluster_name | is_draining
-------------------+--------------------+-------
verticadb_node0001 | default_subcluster | f
verticadb_node0002 | default_subcluster | f
verticadb_node0003 | default_subcluster | f
verticadb_node0004 | analytics | f
verticadb_node0005 | analytics | f
verticadb_node0006 | analytics | f
(6 rows)
8.3 - Connection load balancing
Each client connection to a host in the Vertica cluster requires a small overhead in memory and processor time.
Each client connection to a host in the Vertica cluster requires a small overhead in memory and processor time. If many clients connect to a single host, this overhead can begin to affect the performance of the database. You can spread the overhead of client connections by dictating that certain clients connect to specific hosts in the cluster. However, this manual balancing becomes difficult as new clients and hosts are added to your environment.
Connection load balancing helps automatically spread the overhead of client connections across the cluster by having hosts redirect client connections to other hosts. By redirecting connections, the overhead from client connections is spread across the cluster without having to manually assign particular hosts to individual clients. Clients can connect to a small handful of hosts, and they are naturally redirected to other hosts in the cluster. Load balancing does not redirect connections to draining hosts. For more information see, Drain client connections.
Native connection load balancing
Native connection load balancing is a feature built into the Vertica Analytic Database server and client libraries as well as vsql. Both the server and the client need to enable load balancing for it to function. If connection load balancing is enabled, a host in the database cluster can redirect a client's attempt to connect to it to another currently-active host in the cluster. This redirection is based on a load balancing policy. This redirection only takes place once, so a client is not bounced from one host to another.
Because native connection load balancing is incorporated into the Vertica client libraries, any client application that connects to Vertica transparently takes advantage of it simply by setting a connection parameter.
How you choose to implement connection load balancing depends on your network environment. Since native load connection balancing is easier to implement, you should use it unless your network configuration requires that clients be separated from the hosts in the Vertica database by a firewall.
For more about native connection load balancing, see About Native Connection Load Balancing.
Workload routing
Workload routing lets you create rules for routing client connections to particular subclusters based on their workloads.
The primary advantages of this type of load balancing is as follows:
- Database administrators can associate certain subclusters with certain workloads (as opposed to client IP addresses).
- Clients do not need to know anything about the subcluster they will be routed to, only the type of workload they have.
- Database administrators can change workload routing policies at any time, and these changes are transparent to all clients.
For details, see Workload routing.
8.3.1 - About native connection load balancing
Native connection load balancing is a feature built into the Vertica server and client libraries that helps spread the CPU and memory overhead caused by client connections across the hosts in the database.
Native connection load balancing is a feature built into the Vertica server and client libraries that helps spread the CPU and memory overhead caused by client connections across the hosts in the database. It can prevent unequal distribution of client connections among hosts in the cluster.
There are two types of native connection load balancing:
-
Cluster-wide balancing—This method the legacy method of connection load balancing. It was the only type of load balancing prior to Vertica version 9.2. Using this method, you apply a single load balancing policy across the entire cluster. All connection to the cluster are handled the same way.
-
Load balancing policies—This method lets you set different load balancing policies depending on the source of client connection. For example, you can have a policy that redirects connections from outside of your local network to one set of nodes in your cluster, and connections from within your local network to another set of nodes.
8.3.2 - Classic connection load balancing
The classic connection load balancing feature applies a single policy for all client connections to your database.
The classic connection load balancing feature applies a single policy for all client connections to your database. Both your database and the client must enable the load balancing option in order for connections to be load balanced. When both client and server enable load balancing, the following process takes place when the client attempts to open a connection to Vertica:
-
The client connects to a host in the database cluster, with a connection parameter indicating that it is requesting a load-balanced connection.
-
The host chooses a host from the list of currently up hosts in the cluster, according to the current load balancing scheme. Under all schemes, it is possible for a host to select itself.
-
The host tells the client which host it selected to handle the client's connection.
-
If the host chose another host in the database to handle the client connection, the client disconnects from the initial host. Otherwise, the client jumps to step 6.
-
The client establishes a connection to the host that will handle its connection. The client sets this second connection request so that the second host does not interpret the connection as a request for load balancing.
-
The client connection proceeds as usual, (negotiating encryption if the connection has SSL enabled, and proceeding to authenticating the user ).
This process is transparent to the client application. The client driver automatically disconnects from the initial host and reconnects to the host selected for load balancing.
Requirements
-
In mixed IPv4 and IPv6 environments, balancing only works for the address family for which you have configured native load balancing. For example, if you have configured load balancing using an IPv4 address, then IPv6 clients cannot use load balancing, however the IPv6 clients can still connect, but load balancing does not occur.
-
The native load balancer returns an IP address for the client to use. This address must be one that the client can reach. If your nodes are on a private network, native load-balancing requires you to publish a public address in one of two ways:
-
Set the public address on each node. Vertica saves that address in the export_address
field in the NODES system table.
-
Set the subnet on the database. Vertica saves that address in the export_subnet
field in the DATABASES system table.
Load balancing schemes
The load balancing scheme controls how a host selects which host to handle a client connection. There are three available schemes:
-
NONE
(default): Disables native connection load balancing.
-
ROUNDROBIN
: Chooses the next host from a circular list of hosts in the cluster that are up—for example, in a three-node cluster, iterates over node1, node2, and node3, then wraps back to node1. Each host in the cluster maintains its own pointer to the next host in the circular list, rather than there being a single cluster-wide state.
-
RANDOM
: Randomly chooses a host from among all hosts in the cluster that are up.
You set the native connection load balancing scheme using the SET_LOAD_BALANCE_POLICY function. See Enabling and Disabling Native Connection Load Balancing for instructions.
Driver notes
-
Native connection load balancing works with the ADO.NET driver's connection pooling. The connection the client makes to the initial host, and the final connection to the load-balanced host, use pooled connections if they are available.
-
If a client application uses the JDBC and ODBC driver with third-party connection pooling solutions, the initial connection is not pooled because it is not a full client connection. The final connection is pooled because it is a standard client connection.
Connection failover
The client libraries include a failover feature that allow them to connect to backup hosts if the host specified in the connection properties is unreachable. When using native connection load balancing, this failover feature is only used for the initial connection to the database. If the host to which the client was redirected does not respond to the client's connection request, the client does not attempt to connect to a backup host and instead returns a connection error to the user.
Clients are redirected only to hosts that are known to be up. Thus, this sort of connection failure should only occur if the targeted host goes down at the same moment the client is redirected to it. For more information, see ADO.NET connection failover, JDBC connection failover, and Connection failover.
8.3.2.1 - Enabling and disabling classic connection load balancing
Only a database can enable or disable classic cluster-wide connection load balancing.
Only a database superuser can enable or disable classic cluster-wide connection load balancing. To enable or disable load balancing, use the SET_LOAD_BALANCE_POLICY function to set the load balance policy. Setting the load balance policy to anything other than 'NONE' enables load balancing on the server. The following example enables native connection load balancing by setting the load balancing policy to ROUNDROBIN.
=> SELECT SET_LOAD_BALANCE_POLICY('ROUNDROBIN');
SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------------
Successfully changed the client initiator load balancing policy to: roundrobin
(1 row)
To disable native connection load balancing, use SET_LOAD_BALANCE_POLICY to set the policy to 'NONE':
=> SELECT SET_LOAD_BALANCE_POLICY('NONE');
SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------
Successfully changed the client initiator load balancing policy to: none
(1 row)
Note
When a client makes a connection, the native load-balancer chooses a node and returns the value from the
export_address
column in the
NODES table. The client then uses the
export_address
to connect. The
node_address
specifies the address to use for inter-node and spread communications. When a database is installed, the
export_address
and
node_address
are set to the same value. If you installed Vertica on a private address, then you must set the export_address to a
public address for each node.
By default, client connections are not load balanced, even when connection load balancing is enabled on the server. Clients must set a connection parameter to indicates they are willing to have their connection request load balanced. See Load balancing in ADO.NET, Load balancing in JDBC, and Load balancing, for information on enabling load balancing on the client. For vsql, use the -C
command-line option to enable load balancing.
Important
In mixed IPv4 and IPv6 environments, balancing only works for the address family for which you have configured load balancing. For example, if you have configured load balancing using an IPv4 address, then IPv6 clients cannot use load balancing, however the IPv6 clients can still connect, but load balancing does not occur.
Resetting the load balancing state
When the load balancing policy is ROUNDROBIN, each host in the Vertica cluster maintains its own state of which host it will select to handle the next client connection. You can reset this state to its initial value (usually, the host with the lowest-node id) using the RESET_LOAD_BALANCE_POLICY function:
=> SELECT RESET_LOAD_BALANCE_POLICY();
RESET_LOAD_BALANCE_POLICY
-------------------------------------------------------------------------
Successfully reset stateful client load balance policies: "roundrobin".
(1 row)
See also
8.3.2.2 - Monitoring legacy connection load balancing
Query the LOAD_BALANCE_POLICY column of the V_CATALOG.DATABASES to determine the state of native connection load balancing on your server:.
Query the LOAD_BALANCE_POLICY column of the V_CATALOG.DATABASES to determine the state of native connection load balancing on your server:
=> SELECT LOAD_BALANCE_POLICY FROM V_CATALOG.DATABASES;
LOAD_BALANCE_POLICY
---------------------
roundrobin
(1 row)
Determining to which node a client has connected
A client can determine the node to which is has connected by querying the NODE_NAME column of the V_MONITOR.CURRENT_SESSION table:
=> SELECT NODE_NAME FROM V_MONITOR.CURRENT_SESSION;
NODE_NAME
------------------
v_vmart_node0002
(1 row)
8.3.3 - Connection load balancing policies
Connection load balancing policies help spread the load of servicing client connections by redirecting connections based on the connection's origin.
Connection load balancing policies help spread the load of servicing client connections by redirecting connections based on the connection's origin. These policies can also help prevent nodes reaching their client connection limits and rejecting new connections by spreading connections among nodes. See Limiting the number and length of client connections for more information about client connection limits.
A load balancing policy consists of:
-
Network addresses that identify particular IP address and port number combinations on a node.
-
One or more connection load balancing groups that consists of network addresses that you want to handle client connections. You define load balancing groups using fault groups, subclusters, or a list of network addresses.
-
One or more routing rules that map a range of client IP addresses to a connection load balancing group.
When a client connects to a node in the database with load balancing enabled, the node evaluates all of the routing rules based on the client's IP address to determine if any match. If more than one rule matches the IP address, the node applies the most specific rule (the one that affects the fewest IP addresses).
If the node finds a matching rule, it uses the rule to determine the pool of potential nodes to handle the client connection. When evaluating potential target nodes, it always ensures that the nodes are currently up. The initially-contacted node then chooses one of the nodes in the group based on the group's distribution scheme. This scheme can be either choosing a node at random, or choosing a node in a rotating "round-robin" order. For example, in a three-node cluster, the round robin order would be node 1, then node 2, then node 3, and then back to node 1 again.
After it processes the rules, if the node determines that another node should handle the client's connection, it tells the client which node it has chosen. The client disconnects from the initial node and connects to the chosen node to continue with the connection process (either negotiating encryption if the connection has TLS/SSL enabled, or authentication).
If the initial node chooses itself based on the routing rules, it tells the client to proceed to the next step of the connection process.
If no routing rule matches the incoming IP address, the node checks to see if classic connection load balancing is enabled by both Vertica and the client. If so, it handles the connection according to the classic load balancing policy. See Classic connection load balancing for more information.
Finally, if the database is running in Eon Mode, the node tries to apply a default interior load balancing rule. See Default Subcluster Interior Load Balancing Policy below.
If no routing rule matches the incoming IP address and classic load balancing and the default subcluster interior load balancing rule did not apply, the node handles the connection itself. It also handles the connection itself if it cannot follow the load balancing rule. For example, if all nodes in the load balancing group targeted by the rule are down, then the initially-contacted node handles the client connection itself. In this case, the node does not attempt to apply any other less-restrictive load balancing rules that would apply to the incoming connection. It only attempts to apply a single load balancing rule.
Use cases
Using load balancing policies you can:
-
Ensure connections originating from inside or outside of your internal network are directed to a valid IP address for the client. For example, suppose your Vertica nodes have two IP addresses: one for the external network and another for the internal network. These networks are mutually exclusive. You cannot reach the private network from the public, and you cannot reach the public network from the private. Your load balancing rules need to provide the client with an IP address they can actually reach.
-
Enable access to multiple nodes of a Vertica cluster that are behind a NAT router. A NAT router is accessible from the outside network via a single IP address. Systems within the NAT router's private network can be accessed on this single IP address using different port numbers. You can create a load balancing policy that redirects a client connection to the NAT's IP address but with a different port number.
-
Designate sets of nodes to service client connections from an IP address range. For example, if your ETL systems have a set range of IP addresses, you could limit their client connections to an arbitrary set of Vertica nodes, a subcluster, or a fault group. This technique lets you isolate the overhead of servicing client connections to a few nodes. It is useful when you are using subclusters in an Eon Mode database to isolate workloads (see Subclusters for more information).
Using connection load balancing policies with IPv4 and IPv6
Connection load balancing policies work with both IPv4 and IPv6. As far as the load balancing policies are concerned, the two address families represent separate networks. If you want your load balancing policy to handle both IPv4 and IPv6 addresses, you must create separate sets of network addresses, load balancing groups, and rules for each protocol. When a client opens a connection to a node in the cluster, the addressing protocol it uses determines which set of rules Vertica consults when deciding whether and how to balance the connection.
Default subcluster interior load balancing policy
Databases running in Eon Mode have a default connection load balancing policy that helps spread the load of handling client connections among the nodes in a subcluster. When a client connects to a node while opting into connection load balancing, the node checks for load balancing policies that apply to the client's IP address. If it does not find any applicable load balancing rule, and classic load balancing is not enabled, the node falls back to the default interior load balancing rule. This rule distributes connections among the nodes in the same subcluster as the initially-contacted node.
As with other connection load balancing policies, the nodes in the subcluster must have a network address defined for them to be eligible to handle the client connection. If no nodes in the subcluster have a network address, the node does not apply the default subcluster interior load balancing rule, and the connection is not load balanced.
This default rule is convenient when you are primarily interested in load balancing connections within each subcluster. You just create network addresses for the nodes in your subcluster. You do not need to create load balancing groups or rules. Clients that opt-in to load balancing are then automatically balanced among the nodes in the subcluster.
Interior load balancing policy with multiple network addresses
If your nodes have multiple network addresses, the default subcluster interior load balancing rule chooses the address that was created first as the target of load balancing rule. For example, suppose you create a network address on a node for the private IP address 192.168.1.10. Then you create another network address for the node for the public IP address 233.252.0.1. The default subcluster interior connection load balancing rule always selects 192.168.1.10 as the target of the rule.
If you want the default interior load balancing rule to choose a different network address as its target, drop the other network addresses on the node and then recreate them. Deleting and recreating other addresses makes the address you want the rule to select the oldest address. For example, suppose you want the rule to use a public address (233.252.0.1) that was created after a private address (192.168.1.10). In this case, you can drop the address for 192.168.1.10 and then recreate it. The rule then defaults to the older public 233.252.0.1 address.
If you intend to create multiple network addresses for the nodes in your subcluster, create the network addresses you want to use with the default subcluster interior load balancing first. For example, suppose you want to use the default interior load balancing subcluster rule to load balance most client connections. However, you also want to create a connection load balancing policy to manage connections coming in from a group of ETL systems. In this case, create the network addresses you want to use for the default interior load balancing rule first, then create the network addresses for the ETL systems.
Load balancing policies vs. classic load balancing
There are several differences between the classic load balancing feature and the load balancing policy feature:
-
In classic connection load balancing, you just enable the load balancing option on both client and server, and load balancing is enabled. There are more steps to implement load balancing policies: you have to create addresses, groups, and rules and then enable load balancing on the client.
-
Classic connection load balancing only supports a single, cluster-wide policy for redirecting connections. With connection load balancing policies, you get to choose which nodes handle client connections based on the connection's origin. This gives you more flexibility to handle complex situations. Examples include routing connections through a NAT-based router or having nodes that are accessible via multiple IP addresses on different networks.
-
In classic connection load balancing, each node in the cluster can only be reached via a single IP address. This address is set in the EXPORT_ADDRESS column of the NODES system table. With connection load balancing policies, you can create a network address for each IP address associated with a node. Then you create rules that redirect to those addresses.
Steps to create a load balancing policy
There are three steps you must follow to create a load balancing policy:
-
Create one or more network addresses for each node that you want to participate in the connection load balancing policies.
-
Create one or more load balancing groups to be the target of the routing rules. Load balancing groups can target a collection of specific network addresses. Alternatively, you can create a group from a fault group or subcluster. You can limit the members of the load balance group to a subset of the fault group or subcluster using an IP address filter.
-
Create one or more routing rules.
While not absolutely necessary, it is always a good idea to idea to test your load balancing policy to ensure it works the way you expect it to.
After following these steps, Vertica will apply the load balancing policies to client connections that opt into connection load balancing. See Load balancing in ADO.NET, Load balancing in JDBC, and Load balancing, for information on enabling load balancing on the client. For vsql, use the -C
command-line option to enable load balancing.
These steps are explained in the other topics in this section.
See also
8.3.3.1 - Creating network addresses
Network addresses assign a name to an IP address and port number on a node.
Network addresses assign a name to an IP address and port number on a node. You use these addresses when you define load balancing groups. A node can have multiple network addresses associated with it. For example, suppose a node has one IP address that is only accessible from outside of the local network, and another that is accessible only from inside the network. In this case, you can define one network address using the external IP address, and another using the internal address. You can then create two different load balancing policies, one for external clients, and another for internal clients.
Note
You must create network addresses for your nodes, even if you intend to base your connection load balance groups on fault groups or subclusters. Load balancing rules can only select nodes that have a network address defined for them.
You create a network address using the CREATE NETWORK ADDRESS statement. This statement takes:
-
The name to assign to the network address
-
The name of the node
-
The IP address of the node to associate with the network address
-
The port number the node uses to accept client connections (optional)
Note
You can use hostnames instead of IP addresses when creating network addresses. However, doing so may lead to confusion if you are not sure which IP address a hostname resolves to. Using hostnames can also cause problems if your DNS server maps the hostname to multiple IP addresses.
The following example demonstrates creating three network addresses, one for each node in a three-node database.
=> SELECT node_name,node_address,node_address_family FROM v_catalog.nodes;
node_name | node_address | node_address_family
------------------+--------------+----------------------
v_vmart_node0001 | 10.20.110.21 | ipv4
v_vmart_node0002 | 10.20.110.22 | ipv4
v_vmart_node0003 | 10.20.110.23 | ipv4
(4 rows)
=> CREATE NETWORK ADDRESS node01 ON v_vmart_node0001 WITH '10.20.110.21';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_vmart_node0002 WITH '10.20.110.22';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 on v_vmart_node0003 WITH '10.20.110.23';
CREATE NETWORK ADDRESS
Creating network addresses for IPv6 addresses works the same way:
=> CREATE NETWORK ADDRESS node1_ipv6 ON v_vmart_node0001 WITH '2001:0DB8:7D5F:7433::';
CREATE NETWORK ADDRESS
Vertica does not perform any tests on the IP address you supply in the CREATE NETWORK ADDRESS statement. You must test the IP addresses you supply to this statement to confirm they correspond to the right node.
Vertica does not restrict the address you supply because it is often not aware of all the network addresses through which the node is accessible. For example, your node may be accessible from an external network via an IP address that Vertica is not configured to use. Or, your node can have both an IPv4 and an IPv6 address, only one of which Vertica is aware of.
For example, suppose v_vmart_node0003 from the previous example is not accessible via the IP address 192.168.1.5. You can still create a network address for it using that address:
=> CREATE NETWORK ADDRESS node04 ON v_vmart_node0003 WITH '192.168.1.5';
CREATE NETWORK ADDRESS
If you create a network group and routing rule that targets this address, client connections would either connect to the wrong node, or fail due to being connected to a host that's not part of a Vertica cluster.
Specifying a port number in a network address
By default, the CREATE NETWORK ADDRESS statement assumes the port number for the node's client connection is the default 5433. Sometimes, you may have a node listening for client connections on a different port. You can supply an alternate port number for the network address using the PORT keyword.
For example, suppose your nodes are behind a NAT router. In this case, you can have your nodes listen on different port numbers so the NAT router can route connections to them. When creating network addresses for these nodes, you supply the IP address of the NAT router and the port number the node is listening on. For example:
=> CREATE NETWORK ADDRESS node1_nat ON v_vmart_node0001 WITH '192.168.10.10' PORT 5433;
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node2_nat ON v_vmart_node0002 with '192.168.10.10' PORT 5434;
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node3_nat ON v_vmart_node0003 with '192.168.10.10' PORT 5435;
CREATE NETWORK ADDRESS
8.3.3.2 - Creating connection load balance groups
After you have created network addresses for nodes, you create collections of them so you can target them with routing rules.
After you have created network addresses for nodes, you create collections of them so you can target them with routing rules. These collections of network addresses are called load balancing groups. You have two ways to select the addresses to include in a load balancing group:
-
A list of network addresses
-
The name of one or more fault groups or subclusters, plus an IP address range in CIDR format. The address range selects which network addresses in the fault groups or subclusters Vertica adds to the load balancing group. Only the network addresses that are within the IP address range you supply are added to the load balance group. This filter lets you base your load balance group on a portion of the nodes that make up the fault group or subcluster.
Note
Load balance groups can only be based on fault groups or subclusters, or contain an arbitrary list of network addresses. You cannot mix these sources. For example, if you create a load balance group based on one or more fault groups, then you can only add additional fault groups to it. Vertica will return an error if you try to add a network address or subcluster to the load balance group.
You create a load balancing group using the CREATE LOAD BALANCE GROUP statement. When basing your group on a list of addresses, this statement takes the name for the group and the list of addresses. The following example demonstrates creating addresses for four nodes, and then creating two groups based on those nodes.
=> CREATE NETWORK ADDRESS addr01 ON v_vmart_node0001 WITH '10.20.110.21';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS addr02 ON v_vmart_node0002 WITH '10.20.110.22';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS addr03 on v_vmart_node0003 WITH '10.20.110.23';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS addr04 on v_vmart_node0004 WITH '10.20.110.24';
CREATE NETWORK ADDRESS
=> CREATE LOAD BALANCE GROUP group_1 WITH ADDRESS addr01, addr02;
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP group_2 WITH ADDRESS addr03, addr04;
CREATE LOAD BALANCE GROUP
=> SELECT * FROM LOAD_BALANCE_GROUPS;
name | policy | filter | type | object_name
------------+------------+-----------------+-----------------------+-------------
group_1 | ROUNDROBIN | | Network Address Group | addr01
group_1 | ROUNDROBIN | | Network Address Group | addr02
group_2 | ROUNDROBIN | | Network Address Group | addr03
group_2 | ROUNDROBIN | | Network Address Group | addr04
(4 rows)
A network address can be a part of as many load balancing groups as you like. However, each group can only have a single network address per node. You cannot add two network addresses belonging to the same node to the same load balancing group.
Creating load balancing groups from fault groups
To create a load balancing group from one or more fault groups, you supply:
-
The name for the load balancing group
-
The name of one or more fault groups
-
An IP address filter in CIDR format that filters the fault groups to be added to the load balancing group basd on their IP addresses. Vertica excludes any network addresses in the fault group that do not fall within this range. If you want all of the nodes in the fault groups to be added to the load balance group, specify the filter 0.0.0.0/0.
This example creates two load balancing groups from a fault group. The first includes all network addresses in the group by using the CIDR notation for all IP addresses. The second limits the fault group to three of the four nodes in the fault group by using the IP address filter.
=> CREATE FAULT GROUP fault_1;
CREATE FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE v_vmart_node0001;
ALTER FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE v_vmart_node0002;
ALTER FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE v_vmart_node0003;
ALTER FAULT GROUP
=> ALTER FAULT GROUP fault_1 ADD NODE v_vmart_node0004;
ALTER FAULT GROUP
=> SELECT node_name,node_address,node_address_family,export_address
FROM v_catalog.nodes;
node_name | node_address | node_address_family | export_address
------------------+--------------+---------------------+----------------
v_vmart_node0001 | 10.20.110.21 | ipv4 | 10.20.110.21
v_vmart_node0002 | 10.20.110.22 | ipv4 | 10.20.110.22
v_vmart_node0003 | 10.20.110.23 | ipv4 | 10.20.110.23
v_vmart_node0004 | 10.20.110.24 | ipv4 | 10.20.110.24
(4 rows)
=> CREATE LOAD BALANCE GROUP group_all WITH FAULT GROUP fault_1 FILTER
'0.0.0.0/0';
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP group_some WITH FAULT GROUP fault_1 FILTER
'10.20.110.21/30';
CREATE LOAD BALANCE GROUP
=> SELECT * FROM LOAD_BALANCE_GROUPS;
name | policy | filter | type | object_name
----------------+------------+-----------------+-----------------------+-------------
group_all | ROUNDROBIN | 0.0.0.0/0 | Fault Group | fault_1
group_some | ROUNDROBIN | 10.20.110.21/30 | Fault Group | fault_1
(2 rows)
You can also supply multiple fault groups to the CREATE LOAD BALANCE GROUP statement:
=> CREATE LOAD BALANCE GROUP group_2_faults WITH FAULT GROUP
fault_2, fault_3 FILTER '0.0.0.0/0';
CREATE LOAD BALANCE GROUP
Note
If you supply a filter range that does not match any network addresses of the nodes in the fault groups, Vertica creates an empty load balancing group. Any routing rules that direct connections to the empty load balance group will fail, because no nodes are set to handle connections for the group. In this case, the node that the client connected to initially handles the client connection itself.
Creating load balance groups from subclusters
Creating a load balance group from a subcluster is similar to creating a load balance group from a fault group. You just use WITH SUBCLUSTER instead of WITH FAULT GROUP in the CREATE LOAD BALANCE GROUP statement.
=> SELECT node_name,node_address,node_address_family,subcluster_name
FROM v_catalog.nodes;
node_name | node_address | node_address_family | subcluster_name
----------------------+--------------+---------------------+--------------------
v_verticadb_node0001 | 10.11.12.10 | ipv4 | load_subcluster
v_verticadb_node0002 | 10.11.12.20 | ipv4 | load_subcluster
v_verticadb_node0003 | 10.11.12.30 | ipv4 | load_subcluster
v_verticadb_node0004 | 10.11.12.40 | ipv4 | analytics_subcluster
v_verticadb_node0005 | 10.11.12.50 | ipv4 | analytics_subcluster
v_verticadb_node0006 | 10.11.12.60 | ipv4 | analytics_subcluster
(6 rows)
=> CREATE NETWORK ADDRESS node01 ON v_verticadb_node0001 WITH '10.11.12.10';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_verticadb_node0002 WITH '10.11.12.20';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 ON v_verticadb_node0003 WITH '10.11.12.30';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node04 ON v_verticadb_node0004 WITH '10.11.12.40';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node05 ON v_verticadb_node0005 WITH '10.11.12.50';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node06 ON v_verticadb_node0006 WITH '10.11.12.60';
CREATE NETWORK ADDRESS
=> CREATE LOAD BALANCE GROUP load_subcluster WITH SUBCLUSTER load_subcluster
FILTER '0.0.0.0/0';
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP analytics_subcluster WITH SUBCLUSTER
analytics_subcluster FILTER '0.0.0.0/0';
CREATE LOAD BALANCE GROUP
Setting the group's distribution policy
A load balancing group has a policy setting that determines how the initially-contacted node chooses a target from the group. CREATE LOAD BALANCE GROUP supports three policies:
-
ROUNDROBIN (default) rotates among the available members of the load balancing group. The initially-contacted node keeps track of which node it chose last time, and chooses the next one in the cluster.
Note
Each node in the cluster maintains its own round-robin pointer that indicates which node it should pick next for each load-balancing group. Therefore, if clients connect to different initial nodes, they may be redirected to the same node.
-
RANDOM chooses an available node from the group randomly.
-
NONE disables load balancing.
The following example demonstrates creating a load balancing group with a RANDOM distribution policy.
=> CREATE LOAD BALANCE GROUP group_random WITH ADDRESS node01, node02,
node03, node04 POLICY 'RANDOM';
CREATE LOAD BALANCE GROUP
The next step
After creating the load balancing group, you must add a load balancing routing rule that tells Vertica how incoming connections should be redirected to the groups. See Creating load balancing routing rules.
8.3.3.3 - Creating load balancing routing rules
Once you have created one or more connection load balancing groups, you are ready to create load balancing routing rules.
Once you have created one or more connection load balancing groups, you are ready to create load balancing routing rules. These rules tell Vertica how to redirect client connections based on their IP addresses.
You create routing rules using the CREATE ROUTING RULE statement. You pass this statement:
The following example creates two rules. The first redirects connections coming from the IP address range 192.168.1.0 through 192.168.1.255 to a load balancing group named group_1. The second routes connections from the IP range 10.20.1.0 through 10.20.1.255 to the load balancing group named group_2.
=> CREATE ROUTING RULE internal_clients ROUTE '192.168.1.0/24' TO group_1;
CREATE ROUTING RULE
=> CREATE ROUTING RULE external_clients ROUTE '10.20.1.0/24' TO group_2;
CREATE ROUTING RULE
Creating a catch-all routing rule
Vertica applies routing rules in most specific to least specific order. This behavior lets you create a "catch-all" rule that handles all incoming connections. Then you can create rules to handle smaller IP address ranges for specific purposes. For example, suppose you wanted to create a catch-all rule that worked with the rules created in the previous example. Then you can create a new rule that routes 0.0.0.0/0 (the CIDR notation for all IP addresses) to a group that should handle connections that aren't handled by either of the previously-created rules. For example:
=> CREATE LOAD BALANCE GROUP group_all WITH ADDRESS node01, node02, node03, node04;
CREATE LOAD BALANCE GROUP
=> CREATE ROUTING RULE catch_all ROUTE '0.0.0.0/0' TO group_all;
CREATE ROUTING RULE
After running the above statements, any connection that does not originate from the IP address ranges 192.168.1.* or 10.20.1.* are routed to the group_all group.
8.3.3.4 - Testing connection load balancing policies
After creating your routing rules, you should test them to verify that they perform the way you expect.
After creating your routing rules, you should test them to verify that they perform the way you expect. The best way to test your rules is to call the DESCRIBE_LOAD_BALANCE_DECISION function with an IP address. This function evaluates the routing rules and reports back how Vertica would route a client connection from the IP address. It uses the same logic that Vertica uses when handling client connections, so the results reflect the actual connection load balancing result you will see from client connections. It also reflects the current state of the your Vertica cluster, so it will not redirect connections to down nodes.
The following example demonstrates testing a set of rules. One rule handles all connections from the range 192.168.1.0 to 192.168.1.255, while the other handles all connections originating from the 192 subnet. The third call demonstrates what happens when no rules apply to the IP address you supply.
=> SELECT describe_load_balance_decision('192.168.1.25');
describe_load_balance_decision
--------------------------------------------------------------------------------
Describing load balance decision for address [192.168.1.25]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address matches this rule
Matched to load balance group [group_1] the group has policy [ROUNDROBIN]
number of addresses [2]
(0) LB Address: [10.20.100.247]:5433
(1) LB Address: [10.20.100.248]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [10.20.100.248] port [5433]
(1 row)
=> SELECT describe_load_balance_decision('192.168.2.25');
describe_load_balance_decision
--------------------------------------------------------------------------------
Describing load balance decision for address [192.168.2.25]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address does not match source ip filter for this rule.
Considered rule [subnet_192] source ip filter [192.0.0.0/8]... input address
matches this rule
Matched to load balance group [group_all] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [10.20.100.247]:5433
(1) LB Address: [10.20.100.248]:5433
(2) LB Address: [10.20.100.249]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [10.20.100.248] port [5433]
(1 row)
=> SELECT describe_load_balance_decision('1.2.3.4');
describe_load_balance_decision
--------------------------------------------------------------------------------
Describing load balance decision for address [1.2.3.4]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address does not match source ip filter for this rule.
Considered rule [subnet_192] source ip filter [192.0.0.0/8]... input address
does not match source ip filter for this rule.
Routing table decision: No matching routing rules: input address does not match
any routing rule source filters. Details: [Tried some rules but no matching]
No rules matched. Falling back to classic load balancing.
Classic load balance decision: Classic load balancing considered, but either
the policy was NONE or no target was available. Details: [NONE or invalid]
(1 row)
The DESCRIBE_LOAD_BALANCE_DECISION function also takes into account the classic cluster-wide load balancing settings:
=> SELECT SET_LOAD_BALANCE_POLICY('ROUNDROBIN');
SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------------
Successfully changed the client initiator load balancing policy to: roundrobin
(1 row)
=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('1.2.3.4');
describe_load_balance_decision
--------------------------------------------------------------------------------
Describing load balance decision for address [1.2.3.4]
Load balance cache internal version id (node-local): [2]
Considered rule [etl_rule] source ip filter [10.20.100.0/24]... input address
does not match source ip filter for this rule.
Considered rule [internal_clients] source ip filter [192.168.1.0/24]... input
address does not match source ip filter for this rule.
Considered rule [subnet_192] source ip filter [192.0.0.0/8]... input address
does not match source ip filter for this rule.
Routing table decision: No matching routing rules: input address does not
match any routing rule source filters. Details: [Tried some rules but no matching]
No rules matched. Falling back to classic load balancing.
Classic load balance decision: Success. Load balance redirect to: [10.20.100.247]
port [5433]
(1 row)
Note
The DESCRIBE_LOAD_BALANCE_DECISION function assumes the client connection has opted to be load balanced. In reality, clients may not enable load balancing. This setting prevents the load-balancing features from redirecting the connection.
The function can also help you debug connection issues you notice after going live with your load balancing policy. For example, if you notice that one node is handling a large number of client connections, you can test the client IP addresses against your policies to see why the connections are not being balanced.
8.3.3.5 - Load balancing policy examples
The following examples demonstrate some common use cases for connection load balancing policies.
The following examples demonstrate some common use cases for connection load balancing policies.
Enabling client connections from multiple networks
Suppose you have a Vertica cluster that is accessible from two (or more) different networks. Some examples of this situation are:
-
You have an internal and an external network. In this configuration, your database nodes usually have two or more IP addresses, which each address only accessible from one of the networks. This configuration is common when running Vertica in a cloud environment. In many cases, you can create a catch-all rule that applies to all IP addresses, and then add additional routing rules for the internal subnets.
-
You want clients to be load balanced whether they use IPv4 or IPv6 protocols. From the database's perspective, IPv4 and IPv6 connections are separate networks because each node has a separate IPv4 and IPv6 IP address.
When creating a load balancing policy for a database that is accessible from multiple networks, client connections must be directed to IP addresses on the network they can access. The best solution is to create load balancing groups for each set of IP addresses assigned to a node. Then create routing rules that redirect client connections to the IP addresses that are accessible from their network.
The following example:
-
Creates two sets of network addresses: one for the internal network and another for the external network.
-
Creates two load balance groups: one for the internal network and one for the external.
-
Creates three routing rules: one for the internal network, and two for the external. The internal routing rule covers a subset of the network covered by one of the external rules.
-
Tests the routing rules using internal and external IP addresses.
=> CREATE NETWORK ADDRESS node01_int ON v_vmart_node0001 WITH '192.168.0.1';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node01_ext ON v_vmart_node0001 WITH '203.0.113.1';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02_int ON v_vmart_node0002 WITH '192.168.0.2';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02_ext ON v_vmart_node0002 WITH '203.0.113.2';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03_int ON v_vmart_node0003 WITH '192.168.0.3';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03_ext ON v_vmart_node0003 WITH '203.0.113.3';
CREATE NETWORK ADDRESS
=> CREATE LOAD BALANCE GROUP internal_group WITH ADDRESS node01_int, node02_int, node03_int;
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP external_group WITH ADDRESS node01_ext, node02_ext, node03_ext;
CREATE LOAD BALANCE GROUP
=> CREATE ROUTING RULE internal_rule ROUTE '192.168.0.0/24' TO internal_group;
CREATE ROUTING RULE
=> CREATE ROUTING RULE external_rule ROUTE '0.0.0.0/0' TO external_group;
CREATE ROUTING RULE
=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('198.51.100.10');
DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
Describing load balance decision for address [198.51.100.10]
Load balance cache internal version id (node-local): [3]
Considered rule [internal_rule] source ip filter [192.168.0.0/24]... input
address does not match source ip filter for this rule.
Considered rule [external_rule] source ip filter [0.0.0.0/0]... input
address matches this rule
Matched to load balance group [external_group] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [203.0.113.1]:5433
(1) LB Address: [203.0.113.2]:5433
(2) LB Address: [203.0.113.3]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [203.0.113.3] port [5433]
(1 row)
=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('198.51.100.10');
DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
Describing load balance decision for address [192.168.0.79]
Load balance cache internal version id (node-local): [3]
Considered rule [internal_rule] source ip filter [192.168.0.0/24]... input
address matches this rule
Matched to load balance group [internal_group] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [192.168.0.1]:5433
(1) LB Address: [192.168.0.3]:5433
(2) LB Address: [192.168.0.2]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [192.168.0.2] port
[5433]
(1 row)
Isolating workloads
You may want to control which nodes in your cluster are used by specific types of clients. For example, you may want to limit clients that perform data-loading tasks to one set of nodes, and reserve the rest of the nodes for running queries. This separation of workloads is especially common for Eon Mode databases. See Controlling Where a Query Runs for an example of using load balancing policies in an Eon Mode database to control which subcluster a client connects to.
You can create client load balancing policies that support workload isolation if clients performing certain types of tasks always originate from a limited IP address range. For example, if the clients that load data into your system always fall into a specific subnet, you can create a policy that limits which nodes those clients can access.
In the following example:
-
There are two fault groups (group_a and group_b) that separate workloads in an Eon Mode database. These groups are used as the basis of the load balancing groups.
-
The ETL client connections all originate from the 203.0.113.0/24 subnet.
-
User connections originate in the range of 192.0.0.0 to 199.255.255.255.
=> CREATE NETWORK ADDRESS node01 ON v_vmart_node0001 WITH '192.0.2.1';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_vmart_node0002 WITH '192.0.2.2';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 ON v_vmart_node0003 WITH '192.0.2.3';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node04 ON v_vmart_node0004 WITH '192.0.2.4';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node05 ON v_vmart_node0005 WITH '192.0.2.5';
CREATE NETWORK ADDRESS
^
=> CREATE LOAD BALANCE GROUP lb_users WITH FAULT GROUP group_a FILTER '192.0.2.0/24';
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP lb_etl WITH FAULT GROUP group_b FILTER '192.0.2.0/24';
CREATE LOAD BALANCE GROUP
=> CREATE ROUTING RULE users_rule ROUTE '192.0.0.0/5' TO lb_users;
CREATE ROUTING RULE
=> CREATE ROUTING RULE etl_rule ROUTE '203.0.113.0/24' TO lb_etl;
CREATE ROUTING RULE
=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('198.51.200.129');
DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
Describing load balance decision for address [198.51.200.129]
Load balance cache internal version id (node-local): [6]
Considered rule [etl_rule] source ip filter [203.0.113.0/24]... input address
does not match source ip filter for this rule.
Considered rule [users_rule] source ip filter [192.0.0.0/5]... input address
matches this rule
Matched to load balance group [lb_users] the group has policy [ROUNDROBIN]
number of addresses [3]
(0) LB Address: [192.0.2.1]:5433
(1) LB Address: [192.0.2.2]:5433
(2) LB Address: [192.0.2.3]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [192.0.2.2] port
[5433]
(1 row)
=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('203.0.113.24');
DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
Describing load balance decision for address [203.0.113.24]
Load balance cache internal version id (node-local): [6]
Considered rule [etl_rule] source ip filter [203.0.113.0/24]... input address
matches this rule
Matched to load balance group [lb_etl] the group has policy [ROUNDROBIN] number
of addresses [2]
(0) LB Address: [192.0.2.4]:5433
(1) LB Address: [192.0.2.5]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [192.0.2.5] port
[5433]
(1 row)
=> SELECT DESCRIBE_LOAD_BALANCE_DECISION('10.20.100.25');
DESCRIBE_LOAD_BALANCE_DECISION
-------------------------------------------------------------------------------
Describing load balance decision for address [10.20.100.25]
Load balance cache internal version id (node-local): [6]
Considered rule [etl_rule] source ip filter [203.0.113.0/24]... input address
does not match source ip filter for this rule.
Considered rule [users_rule] source ip filter [192.0.0.0/5]... input address
does not match source ip filter for this rule.
Routing table decision: No matching routing rules: input address does not match
any routing rule source filters. Details: [Tried some rules but no matching]
No rules matched. Falling back to classic load balancing.
Classic load balance decision: Classic load balancing considered, but either the
policy was NONE or no target was available. Details: [NONE or invalid]
(1 row)
Enabling the default subcluster interior load balancing policy
Vertica attempts to apply the default subcluster interior load balancing policy if no other load balancing policy applies to an incoming connection and classic load balancing is not enabled. See Default Subcluster Interior Load Balancing Policy for a description of this rule.
To enable default subcluster interior load balancing, you must create network addresses for the nodes in a subcluster. Once you create the addresses, Vertica attempts to apply this rule to load balance connections within a subcluster when no other rules apply.
The following example confirms the database has no load balancing groups or rules. Then it adds publicly-accessible network addresses to the nodes in the primary subcluster. When these addresses are added, Vertica applies the default subcluster interior load balancing policy.
=> SELECT * FROM LOAD_BALANCE_GROUPS;
name | policy | filter | type | object_name
------+--------+--------+------+-------------
(0 rows)
=> SELECT * FROM ROUTING_RULES;
name | source_address | destination_name
------+----------------+------------------
(0 rows)
=> CREATE NETWORK ADDRESS node01_ext ON v_verticadb_node0001 WITH '203.0.113.1';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02_ext ON v_verticadb_node0002 WITH '203.0.113.2';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03_ext ON v_verticadb_node0003 WITH '203.0.113.3';
CREATE NETWORK ADDRESS
=> SELECT describe_load_balance_decision('11.0.0.100');
describe_load_balance_decision
-----------------------------------------------------------------------------------------------
Describing load balance decision for address [11.0.0.100] on subcluster [default_subcluster]
Load balance cache internal version id (node-local): [2]
Considered rule [auto_rr_default_subcluster] subcluster interior filter [default_subcluster]...
current subcluster matches this rule
Matched to load balance group [auto_lbg_sc_default_subcluster] the group has policy
[ROUNDROBIN] number of addresses [3]
(0) LB Address: [203.0.113.1]:5433
(1) LB Address: [203.0.113.2]:5433
(2) LB Address: [203.0.113.3]:5433
Chose address at position [1]
Routing table decision: Success. Load balance redirect to: [203.0.113.2] port [5433]
(1 row)
Load balance both IPv4 and IPv6 connections
Connection load balancing policies regard IPv4 and IPv6 connections as separate networks. To load balance both types of incoming client connections, create two sets of network addresses, at least two load balancing groups, and two load balancing , once for each network address family.
This example creates two load balancing policies for the default subcluster: one for the IPv4 network addresses (192.168.111.31 to 192.168.111.33) and one for the IPv6 network addresses (fd9b:1fcc:1dc4:78d3::31 to fd9b:1fcc:1dc4:78d3::33).
=> SELECT node_name,node_address,subcluster_name FROM NODES;
node_name | node_address | subcluster_name
----------------------+----------------+--------------------
v_verticadb_node0001 | 192.168.111.31 | default_subcluster
v_verticadb_node0002 | 192.168.111.32 | default_subcluster
v_verticadb_node0003 | 192.168.111.33 | default_subcluster
=> CREATE NETWORK ADDRESS node01 ON v_verticadb_node0001 WITH
'192.168.111.31';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node01_ipv6 ON v_verticadb_node0001 WITH
'fd9b:1fcc:1dc4:78d3::31';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02 ON v_verticadb_node0002 WITH
'192.168.111.32';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node02_ipv6 ON v_verticadb_node0002 WITH
'fd9b:1fcc:1dc4:78d3::32';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03 ON v_verticadb_node0003 WITH
'192.168.111.33';
CREATE NETWORK ADDRESS
=> CREATE NETWORK ADDRESS node03_ipv6 ON v_verticadb_node0003 WITH
'fd9b:1fcc:1dc4:78d3::33';
CREATE NETWORK ADDRESS
=> CREATE LOAD BALANCE GROUP group_ipv4 WITH SUBCLUSTER default_subcluster
FILTER '192.168.111.0/24';
CREATE LOAD BALANCE GROUP
=> CREATE LOAD BALANCE GROUP group_ipv6 WITH SUBCLUSTER default_subcluster
FILTER 'fd9b:1fcc:1dc4:78d3::0/64';
CREATE LOAD BALANCE GROUP
=> CREATE ROUTING RULE all_ipv4 route '0.0.0.0/0' TO group_ipv4;
CREATE ROUTING RULE
=> CREATE ROUTING RULE all_ipv6 route '0::0/0' TO group_ipv6;
CREATE ROUTING RULE
=> SELECT describe_load_balance_decision('203.0.113.50');
describe_load_balance_decision
-----------------------------------------------------------------------------------------------
Describing load balance decision for address [203.0.113.50] on subcluster [default_subcluster]
Load balance cache internal version id (node-local): [3]
Considered rule [all_ipv4] source ip filter [0.0.0.0/0]... input address matches this rule
Matched to load balance group [ group_ipv4] the group has policy [ROUNDROBIN] number of addresses [3]
(0) LB Address: [192.168.111.31]:5433
(1) LB Address: [192.168.111.32]:5433
(2) LB Address: [192.168.111.33]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [192.168.111.33] port [5433]
(1 row)
=> SELECT describe_load_balance_decision('2001:0DB8:EA04:8F2C::1');
describe_load_balance_decision
---------------------------------------------------------------------------------------------------------
Describing load balance decision for address [2001:0DB8:EA04:8F2C::1] on subcluster [default_subcluster]
Load balance cache internal version id (node-local): [3]
Considered rule [all_ipv4] source ip filter [0.0.0.0/0]... input address does not match source ip filter for this rule.
Considered rule [all_ipv6] source ip filter [0::0/0]... input address matches this rule
Matched to load balance group [ group_ipv6] the group has policy [ROUNDROBIN] number of addresses [3]
(0) LB Address: [fd9b:1fcc:1dc4:78d3::31]:5433
(1) LB Address: [fd9b:1fcc:1dc4:78d3::32]:5433
(2) LB Address: [fd9b:1fcc:1dc4:78d3::33]:5433
Chose address at position [2]
Routing table decision: Success. Load balance redirect to: [fd9b:1fcc:1dc4:78d3::33] port [5433]
(1 row)
Other examples
For other examples of using connection load balancing, see the following topics:
8.3.3.6 - Viewing load balancing policy configurations
Query the following system tables in the V_CATALOG Schema to see the load balance policies defined in your database:.
Query the following system tables in the V_CATALOG schema to see the load balance policies defined in your database:
-
NETWORK_ADDRESSES lists all of the network addresses defined in your database.
-
LOAD_BALANCE_GROUPS lists the contents of your load balance groups.
Note
This table does not directly lists all of your database's load balance groups. Instead, it lists the contents of the load balance groups. It is possible for your database to have load balancing groups that are not in this table because they do not contain any network addresses or fault groups.
-
ROUTING_RULES lists all of the routing rules defined in your database.
This example demonstrates querying each of the load balancing policy system tables.
=> \x
Expanded display is on.
=> SELECT * FROM V_CATALOG.NETWORK_ADDRESSES;
-[ RECORD 1 ]----+-----------------
name | node01
node | v_vmart_node0001
address | 10.20.100.247
port | 5433
address_family | ipv4
is_enabled | t
is_auto_detected | f
-[ RECORD 2 ]----+-----------------
name | node02
node | v_vmart_node0002
address | 10.20.100.248
port | 5433
address_family | ipv4
is_enabled | t
is_auto_detected | f
-[ RECORD 3 ]----+-----------------
name | node03
node | v_vmart_node0003
address | 10.20.100.249
port | 5433
address_family | ipv4
is_enabled | t
is_auto_detected | f
-[ RECORD 4 ]----+-----------------
name | alt_node1
node | v_vmart_node0001
address | 192.168.1.200
port | 8080
address_family | ipv4
is_enabled | t
is_auto_detected | f
-[ RECORD 5 ]----+-----------------
name | test_addr
node | v_vmart_node0001
address | 192.168.1.100
port | 4000
address_family | ipv4
is_enabled | t
is_auto_detected | f
=> SELECT * FROM LOAD_BALANCE_GROUPS;
-[ RECORD 1 ]----------------------
name | group_all
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node01
-[ RECORD 2 ]----------------------
name | group_all
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node02
-[ RECORD 3 ]----------------------
name | group_all
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node03
-[ RECORD 4 ]----------------------
name | group_1
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node01
-[ RECORD 5 ]----------------------
name | group_1
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node02
-[ RECORD 6 ]----------------------
name | group_2
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node01
-[ RECORD 7 ]----------------------
name | group_2
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node02
-[ RECORD 8 ]----------------------
name | group_2
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node03
-[ RECORD 9 ]----------------------
name | etl_group
policy | ROUNDROBIN
filter |
type | Network Address Group
object_name | node01
=> SELECT * FROM ROUTING_RULES;
-[ RECORD 1 ]----+-----------------
name | internal_clients
source_address | 192.168.1.0/24
destination_name | group_1
-[ RECORD 2 ]----+-----------------
name | etl_rule
source_address | 10.20.100.0/24
destination_name | etl_group
-[ RECORD 3 ]----+-----------------
name | subnet_192
source_address | 192.0.0.0/8
destination_name | group_all
8.3.3.7 - Maintaining load balancing policies
Once you have created load balancing policies, you maintain them using the following statements:.
Once you have created load balancing policies, you maintain them using the following statements:
-
ALTER NETWORK ADDRESS letsyou: rename, change the IP address, and enable or disable a network address.
-
ALTER LOAD BALANCE GROUP letsyou rename, add or remove network addresses or fault groups, change the fault group IP address filter, or change the policy of a load balance group.
-
ALTER ROUTING RULE letsyou rename, change the source IP address, and the target load balance group of a rule.
See the refence pages for these statements for examples.
Deleting load balancing policy objects
You can also delete existing load balance policy objects using the following statements:
8.3.4 - Workload routing
Workload routing routes client connections to subclusters based on their workloads.
Workload routing routes client connections to subclusters. This lets you reserve subclusters for certain types of tasks.
When a client connects to Vertica, they connect to a Connection node, which then routes the client to the correct subcluster based on the client's specified workload, user, or role and the database's routing rules. If multiple subclusters are associated with the same workload, the client is randomly routed to one of those subclusters.
In this context, "routing" refers to the connection node acting as a proxy for the client and the Execution node in the target subcluster. All queries and query results are first sent to the connection node and then passed on to the execution node and client, respectively.
The primary advantages of this type of load balancing are as follows:
- Database administrators can associate certain subclusters with certain workloads and roles (as opposed to client IP addresses).
- Clients do not need to know anything about the subcluster they will be routed to, only the type of workload they have or the role they should use.
Workload routing depends on actions from both the database administrator and the client:
- The database administrator must create rules for handling various workloads.
- The client must either specify the type of workload they have or have an enabled role (either from a manual SET ROLE or default role) associated with a routing rule.
View the current workload
To view the workload associated with the current session, use SHOW WORKLOAD:
=> SHOW WORKLOAD;
name | setting
----------+------------
workload | my_worload
(1 row)
Create workload routing rules
Routing rules apply to a client's specified workload and either user or role. If you specify more than one subcluster in a routing rule, the client is randomly routed to one of those subclusters.
If multiple routing rules could apply to a client's session, the rule with the highest priority is used. For details, see Priorities.
To view existing routing rules, see WORKLOAD_ROUTING_RULES.
To view workloads available to you and your enabled roles, use SHOW AVAILABLE WORKLOADS:
=> SHOW AVAILABLE WORKLOADS;
name | setting
---------------------+------------------------
available workloads | reporting, analytics
(1 row)
Workload-based routing
Workload-based routing rules apply to clients that specify a particular workload, routing them to one of the subclusters listed in the rule. In this example, when a client connects to the database and specifies the analytics
workload, their connection is randomly routed to either sc_analytics
or sc_analytics_2
:
=> CREATE ROUTING RULE ROUTE WORKLOAD analytics TO SUBCLUSTER sc_analytics, sc_analytics_2;
To alter a routing rule, use ALTER ROUTING RULE. For example, to route analytics
workloads to sc_analytics
:
=> ALTER ROUTING RULE FOR WORKLOAD analytics SET SUBCLUSTER TO sc_analytics;
To add or remove a subcluster, use ALTER ROUTING RULE. For example:
- To add a subcluster
sc_01
:
=> ALTER ROUTING RULE FOR WORKLOAD analytics ADD SUBCLUSTER sc_01;
- To remove a subcluster
sc_01
:
=> ALTER ROUTING RULE FOR WORKLOAD analytics REMOVE SUBCLUSTER sc_01;
To drop a routing rule, use DROP ROUTING RULE and specify the workload. For example, to drop the routing rule for the analytics
workload:
=> DROP ROUTING RULE FOR WORKLOAD analytics;
User- and role-based routing
You can grant USAGE privileges to a user or role to let them route their queries to one of the subclusters listed in the routing rule. In this example, when a client connects to the database and enables the analytics_role
role, their connection is randomly routed to either sc_analytics
or sc_analtyics_2
:
=> CREATE ROUTING RULE ROUTE WORKLOAD analytics TO SUBCLUSTER sc_analytics, sc_analytics_2;
=> GRANT USAGE ON ROUTING RULE analytics TO analytics_role;
Users can then enable the role and set their workload to analytics
for the session:
=> SET ROLE analytics_role;
=> SET SESSION WORKLOAD analytics;
Users can also enable the role automatically by setting it as a default role and then specify the workload when they connect. For details on default roles, see Enabling roles automatically.
Similarly, in this example, when a client connects to the database as user analytics_user
, they are randomly routed to either sc_analytics
or sc_analtyics_2
:
=> CREATE ROUTING RULE ROUTE WORKLOAD analytics TO SUBCLUSTER sc_analytics, sc_analytics_2;
=> GRANT USAGE ON ROUTING RULE