This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Configuring the database

Before reading the topics in this section, you should be familiar with the material in [%=Vertica.GETTING_STARTED_GUIDE%] and are familiar with creating and configuring a fully-functioning example database.

Before reading the topics in this section, you should be familiar with the material in Getting started and are familiar with creating and configuring a fully-functioning example database.

See also

1 - Configuration procedure

This section describes the tasks required to set up a Vertica database.

This section describes the tasks required to set up a Vertica database. It assumes that you have a valid license key file, installed the Vertica rpm package, and ran the installation script as described.

You complete the configuration procedure using:

Continuing configuring

Follow the configuration procedure sequentially as this section describes.

Vertica strongly recommends that you first experiment with creating and configuring a database.

You can use this generic configuration procedure several times during the development process, modifying it to fit your changing goals. You can omit steps such as preparing actual data files and sample queries, and run the Database Designer without optimizing for queries. For example, you can create, load, and query a database several times for development and testing purposes, then one final time to create and load the production database.

1.1 - Prepare disk storage locations

You must create and specify directories in which to store your catalog and data files ().

You must create and specify directories in which to store your catalog and data files (physical schema). You can specify these locations when you install or configure the database, or later during database operations. Both the catalog and data directories must be owned by the database superuser.

The directory you specify for database catalog files (the catalog path) is used across all nodes in the cluster. For example, if you specify /home/catalog as the catalog directory, Vertica uses that catalog path on all nodes. The catalog directory should always be separate from any data file directories.

The data path you designate is also used across all nodes in the cluster. Specifying that data should be stored in /home/data, Vertica uses this path on all database nodes.

Do not use a single directory to contain both catalog and data files. You can store the catalog and data directories on different drives, which can be either on drives local to the host (recommended for the catalog directory) or on a shared storage location, such as an external disk enclosure or a SAN.

Before you specify a catalog or data path, be sure the parent directory exists on all nodes of your database. Creating a database in admintools also creates the catalog and data directories, but the parent directory must exist on each node.

You do not need to specify a disk storage location during installation. However, you can do so by using the --data-dir parameter to the install_vertica script. See Specifying disk storage location during installation.

1.1.1 - Specifying disk storage location during database creation

When you invoke the Create Database command in the , a dialog box allows you to specify the catalog and data locations.

When you invoke the Create Database command in the Administration tools, a dialog box allows you to specify the catalog and data locations. These locations must exist on each host in the cluster and must be owned by the database administrator.

Database data directories

When you click OK, Vertica automatically creates the following subdirectories:

catalog-pathname/database-name/node-name_catalog/data-pathname/database-name/node-name_data/

For example, if you use the default value (the database administrator's home directory) of /home/dbadmin for the Stock Exchange example database, the catalog and data directories are created on each node in the cluster as follows:

/home/dbadmin/Stock_Schema/stock_schema_node1_host01_catalog/home/dbadmin/Stock_Schema/stock_schema_node1_host01_data

Notes

  • Catalog and data path names must contain only alphanumeric characters and cannot have leading space characters. Failure to comply with these restrictions will result in database creation failure.

  • Vertica refuses to overwrite a directory if it appears to be in use by another database. Therefore, if you created a database for evaluation purposes, dropped the database, and want to reuse the database name, make sure that the disk storage location previously used has been completely cleaned up. See Managing storage locations for details.

1.1.2 - Specifying disk storage location on MC

You can use the MC interface to specify where you want to store database metadata on the cluster in the following ways:.

You can use the MC interface to specify where you want to store database metadata on the cluster in the following ways:

  • When you configure MC the first time

  • When you create new databases using on MC

See also

Configuring Management Console.

1.1.3 - Configuring disk usage to optimize performance

Once you have created your initial storage location, you can add additional storage locations to the database later.

Once you have created your initial storage location, you can add additional storage locations to the database later. Not only does this provide additional space, it lets you control disk usage and increase I/O performance by isolating files that have different I/O or access patterns. For example, consider:

  • Isolating execution engine temporary files from data files by creating a separate storage location for temp space.

  • Creating labeled storage locations and storage policies, in which selected database objects are stored on different storage locations based on measured performance statistics or predicted access patterns.

See also

Managing storage locations

1.1.4 - Using shared storage with Vertica

If using shared SAN storage, ensure there is no contention among the nodes for disk space or bandwidth.

If using shared SAN storage, ensure there is no contention among the nodes for disk space or bandwidth.

  • Each host must have its own catalog and data locations. Hosts cannot share catalog or data locations.

  • Configure the storage so that there is enough I/O bandwidth for each node to access the storage independently.

1.1.5 - Viewing database storage information

You can view node-specific information on your Vertica cluster through the.

You can view node-specific information on your Vertica cluster through the Management Console. See Monitoring Vertica Using Management Console for details.

1.1.6 - Anti-virus scanning exclusions

You should exclude the Vertica catalog and data directories from anti-virus scanning.

You should exclude the Vertica catalog and data directories from anti-virus scanning. Certain anti-virus products have been identified as targeting Vertica directories, and sometimes lock or delete files in them. This can adversely affect Vertica performance and data integrity.

Identified anti-virus products include the following:

  • ClamAV

  • SentinelOne

  • Sophos

  • Symantec

  • Twistlock

1.2 - Disk space requirements for Vertica

In addition to actual data stored in the database, Vertica requires disk space for several data reorganization operations, such as and managing nodes in the cluster.

In addition to actual data stored in the database, Vertica requires disk space for several data reorganization operations, such as mergeout and managing nodes in the cluster. For best results, Vertica recommends that disk utilization per node be no more than sixty percent (60%) for a K-Safe=1 database to allow such operations to proceed.

In addition, disk space is temporarily required by certain query execution operators, such as hash joins and sorts, in the case when they cannot be completed in memory (RAM). Such operators might be encountered during queries, recovery, refreshing projections, and so on. The amount of disk space needed (known as temp space) depends on the nature of the queries, amount of data on the node and number of concurrent users on the system. By default, any unused disk space on the data disk can be used as temp space. However, Vertica recommends provisioning temp space separate from data disk space.

See also

Configuring disk usage to optimize performance.

1.3 - Disk space requirements for Management Console

You can install Management Console on any node in the cluster, so it has no special disk requirements other than disk space you allocate for your database cluster.

You can install Management Console on any node in the cluster, so it has no special disk requirements other than disk space you allocate for your database cluster.

1.4 - Prepare the logical schema script

Designing a logical schema for a Vertica database is no different from designing one for any other SQL database.

Designing a logical schema for a Vertica database is no different from designing one for any other SQL database. Details are described more fully in Designing a logical schema.

To create your logical schema, prepare a SQL script (plain text file, typically with an extension of .sql) that:

  1. Creates additional schemas (as necessary). See Using multiple schemas.

  2. Creates the tables and column constraints in your database using the CREATE TABLE command.

  3. Defines the necessary table constraints using the ALTER TABLE command.

  4. Defines any views on the table using the CREATE VIEW command.

You can generate a script file using:

  • A schema designer application.

  • A schema extracted from an existing database.

  • A text editor.

  • One of the example database example-name_define_schema.sql scripts as a template. (See the example database directories in /opt/vertica/examples.)

In your script file, make sure that:

  • Each statement ends with a semicolon.

  • You use data types supported by Vertica, as described in the SQL Reference Manual.

Once you have created a database, you can test your schema script by executing it as described in Create the logical schema. If you encounter errors, drop all tables, correct the errors, and run the script again.

1.5 - Prepare data files

Prepare two sets of data files:.

Prepare two sets of data files:

  • Test data files. Use test files to test the database after the partial data load. If possible, use part of the actual data files to prepare the test data files.

  • Actual data files. Once the database has been tested and optimized, use your data files for your initial Data load.

How to name data files

Name each data file to match the corresponding table in the logical schema. Case does not matter.

Use the extension .tbl or whatever you prefer. For example, if a table is named Stock_Dimension, name the corresponding data file stock_dimension.tbl. When using multiple data files, append _nnn (where nnn is a positive integer in the range 001 to 999) to the file name. For example, stock_dimension.tbl_001, stock_dimension.tbl_002, and so on.

1.6 - Prepare load scripts

You can postpone this step if your goal is to test a logical schema design for validity.

Prepare SQL scripts to load data directly into physical storage using COPY on vsql, or through ODBC.

You need scripts that load:

  • Large tables

  • Small tables

Vertica recommends that you load large tables using multiple files. To test the load process, use files of 10GB to 50GB in size. This size provides several advantages:

  • You can use one of the data files as a sample data file for the Database Designer.

  • You can load just enough data to Perform a partial data load before you load the remainder.

  • If a single load fails and rolls back, you do not lose an excessive amount of time.

  • Once the load process is tested, for multi-terabyte tables, break up the full load in file sizes of 250–500GB.

See also

1.7 - Create an optional sample query script

The purpose of a sample query script is to test your schema and load scripts for errors.

The purpose of a sample query script is to test your schema and load scripts for errors.

Include a sample of queries your users are likely to run against the database. If you don't have any real queries, just write simple SQL that collects counts on each of your tables. Alternatively, you can skip this step.

1.8 - Create an empty database

Two options are available for creating an empty database:.

Two options are available for creating an empty database:

Although you can create more than one database (for example, one for production and one for testing), there can be only one active database for each installation of Vertica Analytic Database.

1.8.1 - Creating a database name and password

Database names must conform to the following rules:.

Database names

Database names must conform to the following rules:

  • Be between 1-30 characters

  • Begin with a letter

  • Follow with any combination of letters (upper and lowercase), numbers, and/or underscores.

Database names are case sensitive; however, Vertica strongly recommends that you do not create databases with names that differ only in case. For example, do not create a database called mydatabase and another called MyDataBase.

Database passwords

Database passwords can contain letters, digits, and special characters listed in the next table. Passwords cannot include non-ASCII Unicode characters.

The allowed password length is between 0-100 characters. The database superuser can change a Vertica user's maximum password length using ALTER PROFILE.

You use Profiles to specify and control password definitions. For instance, a profile can define the maximum length, reuse time, and the minimum number or required digits for a password, as well as other details.

The following special (ASCII) characters are valid in passwords. Special characters can appear anywhere in a password string. For example, mypas$word or $mypassword are both valid, while ±mypassword is not. Using special characters other than the ones listed below can cause database instability.

  • #

  • ?

  • =

  • _

  • '

  • )

  • (

  • @

  • \

  • /

  • !

  • ,

  • ~

  • :

  • %

  • ;

  • `

  • ^

  • +

  • .

  • -

  • space

  • &

  • <

  • >

  • [

  • ]

  • {

  • }

  • |

  • *

  • $

  • "

See also

1.8.2 - Create a database using administration tools

Run the from your as follows:.
  1. Run the Administration tools from your Administration host as follows:

    $ /opt/vertica/bin/admintools
    

    If you are using a remote terminal application, such as PuTTY or a Cygwin bash shell, see Notes for remote terminal users.

  2. Accept the license agreement and specify the location of your license file. For more information see Managing licenses for more information.

    This step is necessary only if it is the first time you have run the Administration Tools

  3. On the Main Menu, click Configuration Menu, and click OK.

  4. On the Configuration Menu, click Create Database, and click OK.

  5. Enter the name of the database and an optional comment, and click OK. See Creating a database name and password for naming guidelines and restrictions.

  6. Establish the superuser password for your database.

    • To provide a password enter the password and click OK. Confirm the password by entering it again, and then click OK.

    • If you don't want to provide the password, leave it blank and click OK. If you don't set a password, Vertica prompts you to verify that you truly do not want to establish a superuser password for this database. Click Yes to create the database without a password or No to establish the password.

  7. Select the hosts to include in the database from the list of hosts specified when Vertica was installed ( install_vertica -s), and click OK.

  8. Specify the directories in which to store the data and catalog files, and click OK.

  9. Catalog and data path names must contain only alphanumeric characters and cannot have leading spaces. Failure to comply with these restrictions results in database creation failure.

    For example:

    Catalog pathname: /home/dbadmin

    Data Pathname: /home/dbadmin

  10. Review the Current Database Definition screen to verify that it represents the database you want to create, and then click Yes to proceed or No to modify the database definition.

  11. If you click Yes, Vertica creates the database you defined and then displays a message to indicate that the database was successfully created.

  12. Click OK to acknowledge the message.

1.9 - Create the logical schema

Connect to the database.
  1. Connect to the database.

    In the Administration Tools Main Menu, click Connect to Database and click OK.

    See Connecting to the Database for details.

    The vsql welcome script appears:

    Welcome to vsql, the Vertica Analytic Database interactive terminal.
    Type:  \h or \? for help with vsql commands
           \g or terminate with semicolon to execute query
           \q to quit
    
    =>
    
  2. Run the logical schema script

    Using the \i meta-command in vsql to run the SQL logical schema script that you prepared earlier.

  3. Disconnect from the database

    Use the \q meta-command in vsql to return to the Administration Tools.

1.10 - Perform a partial data load

Vertica recommends that for large tables, you perform a partial data load and then test your database before completing a full data load.

Vertica recommends that for large tables, you perform a partial data load and then test your database before completing a full data load. This load should load a representative amount of data.

  1. Load the small tables.

    Load the small table data files using the SQL load scripts and data files you prepared earlier.

  2. Partially load the large tables.

    Load 10GB to 50GB of table data for each table using the SQL load scripts and data files that you prepared earlier.

For more information about projections, see Projections.

1.11 - Test the database

Test the database to verify that it is running as expected.

Test the database to verify that it is running as expected.

Check queries for syntax errors and execution times.

  1. Use the vsql \timing meta-command to enable the display of query execution time in milliseconds.

  2. Execute the SQL sample query script that you prepared earlier.

  3. Execute several ad hoc queries.

1.12 - Optimize query performance

Optimizing the database consists of optimizing for compression and tuning for queries.

Optimizing the database consists of optimizing for compression and tuning for queries. (See Creating a database design.)

To optimize the database, use the Database Designer to create and deploy a design for optimizing the database. See Using Database Designer to create a comprehensive design.

After you run the Database Designer, use the techniques described in Query optimization to improve the performance of certain types of queries.

1.13 - Complete the data load

To complete the load:.

To complete the load:

  1. Monitor system resource usage.

    Continue to run the top, free, and df utilities and watch them while your load scripts are running (as described in Monitoring Linux resource usage). You can do this on any or all nodes in the cluster. Make sure that the system is not swapping excessively (watch kswapd in top) or running out of swap space (watch for a large amount of used swap space in free).

  2. Complete the large table loads.

    Run the remainder of the large table load scripts.

1.14 - Test the optimized database

Check query execution times to test your optimized design:.

Check query execution times to test your optimized design:

  1. Use the vsql \timing meta-command to enable the display of query execution time in milliseconds.

    Execute a SQL sample query script to test your schema and load scripts for errors.

  2. Execute several ad hoc queries

    1. Run Administration tools and select Connect to Database.

    2. Use the \i meta-command to execute the query script; for example:

      vmartdb=> \i vmart_query_03.sql  customer_name   | annual_income
      ------------------+---------------
       James M. McNulty |        999979
       Emily G. Vogel   |        999998
      (2 rows)
      Time: First fetch (2 rows): 58.411 ms. All rows formatted: 58.448 ms
      vmartdb=> \i vmart_query_06.sql
       store_key | order_number | date_ordered
      -----------+--------------+--------------
              45 |       202416 | 2004-01-04
             113 |        66017 | 2004-01-04
             121 |       251417 | 2004-01-04
              24 |       250295 | 2004-01-04
               9 |       188567 | 2004-01-04
             166 |        36008 | 2004-01-04
              27 |       150241 | 2004-01-04
             148 |       182207 | 2004-01-04
             198 |        75716 | 2004-01-04
      (9 rows)
      Time: First fetch (9 rows): 25.342 ms. All rows formatted: 25.383 ms
      

Once the database is optimized, it should run queries efficiently. If you discover queries that you want to optimize, you can modify and update the design. See Incremental design.

1.15 - Implement locales for international data sets

Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax.

Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate GROUP BY and ORDER BY clauses, joins, and the analytic ORDER BY clause.

The default locale for a Vertica database is en_US@collation=binary (English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default en_US@collation=binary collation, regardless of the session collation. Any locale-specific collation is applied at query time.

If you set the locale to null, Vertica sets the locale to en_US_POSIX. You can set the locale back to the default locale and collation by issuing the vsql meta-command \locale. For example:

You can set locale through ODBC, JDBC, and ADO.net.

ICU locale support

Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax. The locale used by the database session is not derived from the operating system (through the LANG variable), so Vertica recommends that you set the LANG for each node running vsql, as described in the next section.

While ICU library services can specify collation, currency, and calendar preferences, Vertica supports only the collation component. Any keywords not relating to collation are rejected. Projections are always collated using the en_US@collation=binary collation regardless of the session collation. Any locale-specific collation is applied at query time.

The SET DATESTYLE TO ... command provides some aspects of the calendar, but Vertica supports only dollars as currency.

Changing DB locale for a session

This examples sets the session locale to Thai.

  1. At the operating-system level for each node running vsql, set the LANG variable to the locale language as follows:

    export LANG=th_TH.UTF-8
    
  2. For each Vertica session (from ODBC/JDBC or vsql) set the language locale.

    From vsql:

    \locale th_TH
    
  3. From ODBC/JDBC:

    "SET LOCALE TO th_TH;"
    
  4. In PUTTY (or ssh terminal), change the settings as follows:

    settings > window > translation > UTF-8
    
  5. Click Apply and then click Save.

All data loaded must be in UTF-8 format, not an ISO format, as described in Delimited data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8, are not supported, so functions like SUBSTRING do not work correctly for multibyte characters. Thus, settings for locale should not work correctly. If the translation setting ISO-8859-11:2001 (Latin/Thai) works, the data is loaded incorrectly. To convert data correctly, use a utility program such as Linux iconv.

See also

1.15.1 - Specify the default locale for the database

After you start the database, the default locale configuration parameter, DefaultSessionLocale, sets the initial locale.

After you start the database, the default locale configuration parameter, DefaultSessionLocale, sets the initial locale. You can override this value for individual sessions.

To set the locale for the database, use the configuration parameter as follows:

=> ALTER DATABASE DEFAULT SET DefaultSessionLocale = 'ICU-locale-identifier';

For example:

=> ALTER DATABASE DEFAULT SET DefaultSessionLocale = 'en_GB';

1.15.2 - Override the default locale for a session

You can override the default locale for the current session in two ways:.

You can override the default locale for the current session in two ways:

  • VSQL command \locale. For example:

    => \locale en_GBINFO:
    INFO 2567:  Canonical locale: 'en_GB'
    Standard collation: 'LEN'
    English (United Kingdom)
    
  • SQL statement SET LOCALE. For example:

    
    => SET LOCALE TO en_GB;
    INFO 2567:  Canonical locale: 'en_GB'
    Standard collation: 'LEN'
    English (United Kingdom)
    

Both methods accept locale short and long forms. For example:

=> SET LOCALE TO LEN;
INFO 2567:  Canonical locale: 'en'
Standard collation: 'LEN'
English

=> \locale LEN
INFO 2567:  Canonical locale: 'en'
Standard collation: 'LEN'
English

See also

1.15.3 - Server versus client locale settings

Vertica differentiates database server locale settings from client application locale settings:.

Vertica differentiates database server locale settings from client application locale settings:

  • Server locale settings only impact collation behavior for server-side query processing.

  • Client applications verify that locale is set appropriately in order to display characters correctly.

The following sections describe best practices to ensure predictable results.

Server locale

The server session locale should be set as described in Specify the default locale for the database. If locales vary across different sessions, set the server locale at the start of each session from your client.

vsql client

  • If the database does not have a default session locale, set the server locale for the session to the desired locale.

  • The locale setting in the terminal emulator where the vsql client runs should be set to be equivalent to session locale setting on the server side (ICU locale). By doing so, the data is collated correctly on the server and displayed correctly on the client.

  • All input data for vsql should be in UTF-8, and all output data is encoded in UTF-8

  • Vertica does not support non UTF-8 encodings and associated locale values; .

  • For instructions on setting locale and encoding, refer to your terminal emulator documentation.

ODBC clients

  • ODBC applications can be either in ANSI or Unicode mode. If the user application is Unicode, the encoding used by ODBC is UCS-2. If the user application is ANSI, the data must be in single-byte ASCII, which is compatible with UTF-8 used on the database server. The ODBC driver converts UCS-2 to UTF-8 when passing to the Vertica server and converts data sent by the Vertica server from UTF-8 to UCS-2.

  • If the user application is not already in UCS-2, the application must convert the input data to UCS-2, or unexpected results could occur. For example:

    • For non-UCS-2 data passed to ODBC APIs, when it is interpreted as UCS-2, it could result in an invalid UCS-2 symbol being passed to the APIs, resulting in errors.

    • The symbol provided in the alternate encoding could be a valid UCS-2 symbol. If this occurs, incorrect data is inserted into the database.

  • If the database does not have a default session locale, ODBC applications should set the desired server session locale using SQLSetConnectAttr (if different from database wide setting). By doing so, you get the expected collation and string functions behavior on the server.

JDBC and ADO.NET clients

  • JDBC and ADO.NET applications use a UTF-16 character set encoding and are responsible for converting any non-UTF-16 encoded data to UTF-16. The same cautions apply as for ODBC if this encoding is violated.

  • The JDBC and ADO.NET drivers convert UTF-16 data to UTF-8 when passing to the Vertica server and convert data sent by Vertica server from UTF-8 to UTF-16.

  • If there is no default session locale at the database level, JDBC and ADO.NET applications should set the correct server session locale by executing the SET LOCALE TO command in order to get the expected collation and string functions behavior on the server. For more information, see SET LOCALE.

1.16 - Change transaction isolation levels

By default, Vertica uses the READ COMMITTED isolation level for all sessions.

By default, Vertica uses the READ COMMITTED isolation level for all sessions. You can change the default isolation level for the database or for a given session.

A transaction retains its isolation level until it completes, even if the session's isolation level changes during the transaction. Vertica internal processes (such as the Tuple Mover and refresh operations) and DDL operations always run at the SERIALIZABLE isolation level to ensure consistency.

Database isolation level

The configuration parameter TransactionIsolationLevel specifies the database isolation level, and is used as the default for all sessions. Use ALTER DATABASE to change the default isolation level.For example:

=> ALTER DATABASE DEFAULT SET TransactionIsolationLevel = 'SERIALIZABLE';
ALTER DATABASE
=> ALTER DATABASE DEFAULT SET TransactionIsolationLevel = 'READ COMMITTED';
ALTER DATABASE

Changes to the database isolation level only apply to future sessions. Existing sessions and their transactions continue to use their original isolation level.

Use SHOW CURRENT to view the database isolation level:

=> SHOW CURRENT TransactionIsolationLevel;
  level   |           name            |    setting
----------+---------------------------+----------------
 DATABASE | TransactionIsolationLevel | READ COMMITTED
(1 row)

Session isolation level

SET SESSION CHARACTERISTICS AS TRANSACTION changes the isolation level for a specific session. For example:

=> SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET

Use SHOW to view the current session's isolation level:

=> SHOW TRANSACTION_ISOLATION;

See also

Transactions

2 - Configuration parameters

Configuration parameters are settings that affect database behavior.

Configuration parameters are settings that affect database behavior. You can use configuration parameters to enable, disable, or tune features related to different database aspects like Tuple Mover, security, Database Designer, or projections. Configuration parameters have default values, stored in the Vertica database.

You can modify certain parameters to configure your Vertica database in two ways:

Before you modify a database parameter, review all documentation about the parameter to determine the context under which you can change it. Some parameter changes require a database restart to take effect. The CHANGE_REQUIRES_RESTART column in the system table CONFIGURATION_PARAMETERS indicates whether a parameter requires a restart. You can also query CONFIGURATION_PARAMETERS to determine what levels (node, session, database) are valid for a given parameter.

2.1 - Managing configuration parameters: VSQL

You can configure Vertica configuration parameters at the following levels, in ascending order precedence:.

You can configure Vertica configuration parameters at the following levels, in ascending order precedence:

  • Database

  • Node

  • Session

  • User

Most configuration parameters can be set at the database level. You can query system table CONFIGURATION_PARAMETERS to determine at which other levels specific configuration parameters can be set. For example, the following query obtains all partitioning parameters and their allowed levels:

=> SELECT parameter_name, current_value, default_value, allowed_levels FROM configuration_parameters
     WHERE parameter_name ILIKE '%partition%';
           parameter_name           | current_value | default_value | allowed_levels
------------------------------------+---------------+---------------+----------------
 MaxPartitionCount                  | 1024          | 1024          | NODE, DATABASE
 PartitionSortBufferSize            | 67108864      | 67108864      | DATABASE
 DisableAutopartition               | 0             | 0             | SESSION, USER
 PatternMatchingMaxPartitionMatches | 3932160       | 3932160       | NODE, DATABASE
 PatternMatchingMaxPartition        | 20971520      | 20971520      | NODE, DATABASE
 ActivePartitionCount               | 1             | 1             | NODE, DATABASE
(6 rows)

Setting and clearing configuration parameters

You change specific configuration parameters with the appropriate ALTER statements; the same statements also let you reset configuration parameters to their default values:

For example, the following ALTER statements change ActivePartitionCount at the database level from 1 to 2 , and DisablePartitionCount at the session level from 0 to 1:

=> ALTER DATABASE DEFAULT SET ActivePartitionCount = 2;
ALTER DATABASE
=> ALTER SESSION SET DisableAutopartition = 1;
ALTER SESSION
=> SELECT parameter_name, current_value, default_value FROM configuration_parameters
      WHERE parameter_name IN ('ActivePartitionCount', 'DisableAutopartition');
    parameter_name    | current_value | default_value
----------------------+---------------+---------------
 ActivePartitionCount | 2             | 1
 DisableAutopartition | 1             | 0
(2 rows)

You can later reset the same configuration parameters to their default values:

=> ALTER DATABASE DEFAULT CLEAR ActivePartitionCount;
ALTER DATABASE
=> ALTER SESSION CLEAR DisableAutopartition;
ALTER DATABASE
=> SELECT parameter_name, current_value, default_value FROM configuration_parameters
      WHERE parameter_name IN ('ActivePartitionCount', 'DisableAutopartition');
    parameter_name    | current_value | default_value
----------------------+---------------+---------------
 DisableAutopartition | 0             | 0
 ActivePartitionCount | 1             | 1
(2 rows)

2.2 - Viewing configuration parameter values

You can view active configuration parameter values in two ways:.

You can view active configuration parameter values in two ways:

SHOW statements

Use the following SHOW statements to view active configuration parameters:

  • SHOW CURRENT: Returns settings of active configuration parameter values. Vertica checks settings at all levels, in the following ascending order of precedence:

    • session

    • node

    • database

    If no values are set at any scope, SHOW CURRENT returns the parameter's default value.

  • SHOW DATABASE: Displays configuration parameter values set for the database.

  • SHOW USER: Displays configuration parameters set for the specified user, and for all users.

  • SHOW SESSION: Displays configuration parameter values set for the current session.

  • SHOW NODE: Displays configuration parameter values set for a node.

If a configuration parameter requires a restart to take effect, the values in a SHOW CURRENT statement might differ from values in other SHOW statements. To see which parameters require restart, query the CONFIGURATION_PARAMETERS system table.

System tables

You can query several system tables for configuration parameters:

2.3 - Configuration parameter categories

Use the configuration parameters in the following categories to configure Vertica.

Use the configuration parameters in the following categories to configure Vertica. Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

2.3.1 - General parameters

The following parameters configure basic database operations.

The following parameters configure basic database operations. Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
ApplyEventsDuringSALCheck

Boolean, specifies whether Vertica uses catalog events to filter out dropped corrupt partitions during node startup. Dropping corrupt partitions can speed node recovery.

When disabled (0), Vertica reports corrupt partitions, but takes no action. Leaving corrupt partitions in place can reset the current projection checkpoint epoch to the epoch before the corruption occurred.

This parameter has no effect on unpartitioned tables.

Default: 0

ApportionedFileMinimumPortionSizeKB

Specifies the minimum portion size (in kilobytes) for use with apportioned file loads. Vertica apportions a file load across multiple nodes only if:

  • The load can be divided into portions at least equaling this value.

  • EnableApportionedFileLoad and EnableApportionLoad are set to 1 (enabled).

See also EnableApportionLoad and EnableApportionedFileLoad.

Default: 1024

BlockedSocketGracePeriod

Sets how long a session socket remains blocked while awaiting client input or output for a given query. See Handling session socket blocking.

Default: None (Socket blocking can continue indefinitely.)

CatalogCheckpointPercent

Specifies the threshold at which a checkpoint is created for the database catalog.

By default, this parameter is set to 50 (percent), so when transaction logs reach 50% of the size of the last checkpoint, Vertica adds a checkpoint. Each checkpoint demarcates all changes to the catalog since the last checkpoint.

Default: 50 (percent)

ClusterSequenceCacheMode

Boolean, specifies whether the initiator node requests cache for other nodes in a cluster, and then sends cache to other nodes along with the execution plan, one of the following.

  • 1 (enabled): Initiator node requests cache.

  • 0: (disabled): All nodes request their own cache.

See Distributing named sequences.

Default: 1 (enabled)

CompressCatalogOnDisk

Specifies whether to compress the size of the catalog on disk, one of the following:

  • 0: Do not compress.

  • 1: Compress checkpoints, but not logs.

  • 2: Compress checkpoints and logs.

Consider enabling this parameter if the catalog disk partition is small (<50 GB) and the metadata is large (hundreds of tables, partitions, or nodes).

Default: 0

CompressNetworkData

Boolean, specifies whether to compress all data sent over the internal network when enabled (set to 1). This compression speeds up network traffic at the expense of added CPU load. If the network is throttling database performance, enable compression to correct the issue.

Default: 0

CopyFaultTolerantExpressions

Boolean, indicates whether to report record rejections during transformations and proceed (true) or abort COPY operations if a transformation fails.

Default: 0 (false)

CopyFromVerticaWithIdentity

Allows COPY FROM VERTICA and EXPORT TO VERTICA to load values into Identity (or Auto-increment) columns. The destination Identity column is not incremented automatically. To disable the default behavior, set this parameter to 0 (zero).

Default: 1

DatabaseHeartbeatInterval

Determines the interval (in seconds) at which each node performs a health check and communicates a heartbeat. If a node does not receive a message within five times of the specified interval, the node is evicted from the cluster. Setting the interval to 0 disables the feature.

See Automatic eviction of unhealthy nodes.

Default: 120

DefaultArrayBinarySize

The maximum binary size, in bytes, for an unbounded collection, if a maximum size is not specified at creation time.

Default: 65000

DefaultTempTableLocal

Boolean, specifies whether CREATE TEMPORARY TABLE creates a local or global temporary table, one of the following:

  • 0: Create global temporary table.

  • 1: Create local temporary table.

For details, see Creating temporary tables.

Default: 0

DivideZeroByZeroThrowsError

Boolean, specifies whether to return an error if a division by zero operation is requested:

  • 0: Return 0.

  • 1: Returns an error.

Default: 1

EnableApportionedChunkingInDefaultLoadParser

Boolean, specifies whether to enable the built-in parser for delimited files to take advantage of both apportioned load and cooperative parse for potentially better performance.

Default: 1 (enable)

EnableApportionedFileLoad

Boolean, specifies whether to enable automatic apportioning across nodes of file loads using COPY FROM VERTICA. Vertica attempts to apportion the load if:

  • This parameter and EnableApportionLoad are both enabled.

  • The parser supports apportioning.

  • The load is divisible into portion sizes of at least the value of ApportionedFileMinimumPortionSizeKB.

Setting this parameter does not guarantee that loads will be apportioned, but disabling it guarantees that they will not be.

See Distributing a load.

Default: 1 (enable)

EnableApportionLoad

Boolean, specifies whether to enable automatic apportioning across nodes of data loads using COPY...WITH SOURCE. Vertica attempts to apportion the load if:

  • This parameter is enabled.

  • The source and parser both support apportioning.

Setting this parameter does not guarantee that loads will be apportioned, but disabling it guarantees that they will not be.

For details, see Distributing a load.

Default: 1 (enable)

EnableBetterFlexTypeGuessing

Boolean, specifies whether to enable more accurate type guessing when assigning data types to non-string keys in a flex table __raw__ column with COMPUTE_FLEXTABLE_KEYS or COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW. If this parameter is disabled (0), Vertica uses a limited set of Vertica data type assignments. For examples, see Computing flex table keys.

Default: 1 (enable)

EnableCooperativeParse

Boolean, specifies whether to implement multi-threaded parsing capabilities on a node. You can use this parameter for both delimited and fixed-width loads.

Default: 1 (enable)

EnableForceOuter

Boolean, specifies whether Vertica uses a table's force_outer value to implement a join. For more information, see Controlling join inputs.

Default: 0 (forced join inputs disabled)

EnableMetadataMemoryTracking

Boolean, specifies whether to enable Vertica to track memory used by database metadata in the METADATA resource pool.

Default: 1 (enable)

EnableResourcePoolCPUAffinity

Boolean, specifies whether Vertica aligns queries to the resource pool of the processing CPU. When disabled (0), queries run on any CPU, regardless of the CPU_AFFINITY_SET of the resource pool.

Default: 1

EnableStrictTimeCasts

Specifies whether all cast failures result in an error.

Default: 0 (disable)

EnableUniquenessOptimization

Boolean, specifies whether to enable query optimization that is based on guaranteed uniqueness of column values. Columns that can be guaranteed to include unique values include:

Default: 1 (enable)

EnableWithClauseMaterialization Superseded by WithClauseMaterialization.
EnableWITHTempRelReuseLimit

Sets EE5 temp relation support for WITH materialization. Accepted values are:

  • 0: Disables this feature.

  • 1: Force-saves all WITH clause queries into EE5 Temp Relations, whether or not they are reused.

  • 2: Enables this feature only when queries are reused.

Default: 2

ExternalTablesExceptionsLimit

Determines the maximum number of COPY exceptions and rejections allowed when a SELECT statement references an external table. Set to -1 to remove any exceptions limit. See Querying external tables.

Default: 100

FailoverToStandbyAfter

Specifies the length of time that an active standby node waits before taking the place of a failed node.

This parameter is set to an interval literal.

Default: None

FencedUDxMemoryLimitMB

Sets the maximum amount of memory, in megabytes (MB), that a fenced-mode UDF can use. If a UDF attempts to allocate more memory than this limit, that attempt triggers an exception. For more information, see Fenced and unfenced modes.

Default: -1 (no limit)

FlexTableDataTypeGuessMultiplier

Specifies a multiplier that the COMPUTE_FLEXTABLE_KEYS and COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW functions use when assigning a data type and column width for the flex keys table. Both functions assign each key a data type, and multiply the longest key value by this factor to estimate column width. This value is not used to calculate the width of any real columns in a flex table.

Default: 2.0

FlexTableRawSize

Specifies the default column width for the __raw__ column of new flex tables, a value between 1 and 32000000, inclusive.

Default: 130000

ForceUDxFencedMode

When enabled (1), forces all UDxs that support fenced mode to run in fenced mode even if their definition specified NOT FENCED.

Default: 0

JavaBinaryForUDx Sets the full path to the Java executable that Vertica uses to run Java UDxs. See Installing Java on Vertica hosts.
JoinDefaultTupleFormat

Specifies how to size VARCHAR column data when joining tables on those columns, and buffers accordingly, one of the following:

  • fixed: Use join column metadata to size column data to a fixed length, and buffer accordingly.

  • variable: Use the actual length of join column data, so buffer size varies for each join.

Default: fixed

KeepAliveIdleTime

Length (in seconds) of the idle period before the first TCP keepalive probe is sent to ensure that the client is still connected. If set to 0, Vertica uses the kernel's tcp_keepalive_time parameter setting.

Default: 0

KeepAliveProbeCount

Number of consecutive keepalive probes that must go unacknowledged by the client before the client connection is considered lost and closed. If set to 0, Vertica uses the kernel's tcp_keepalive_probes parameter setting.

Default: 0

KeepAliveProbeInterval

Time interval (in seconds) between keepalive probes. If set to 0, Vertica uses the kernel's tcp_keepalive_intvl parameter setting.

Default: 0

LockTimeout

Specifies in seconds how long a table can be locked. You can set this parameter at all levels: session, node, and database.

Default: 300

LoadSourceStatisticsLimit

Specifies the maximum number of sources per load operation that are profiled in the LOAD_SOURCES system table. Set it to 0 to disable profiling.

Default: 256

MaxBundleableROSSizeKB

Specifies the minimum size, in kilobytes, of an independent ROS file. Vertica bundles storage container ROS files below this size into a single file. Bundling improves the performance of any file-intensive operations, including backups, restores, and mergeouts.

If you set this parameter to a value of 0, Vertica bundles .fdb and .pidx files without bundling other storage container files.

Default: 1024

MaxClientSessions

Determines the maximum number of client sessions that can run on a single node of the database. The default value allows for five additional administrative logins. These logins prevent DBAs from being locked out of the system if non-dbadmin users reach the login limit.

Default: 50 user logins and 5 additional administrative logins

ParquetMetadataCacheSizeMB

Size of the cache used for metadata when reading Parquet data. The cache uses local TEMP storage.

Default: 4096

PatternMatchingUseJit

Boolean, specifies whether to enables just-in-time compilation (to machine code) of regular expression pattern matching functions used in queries. Enabling this parameter can usually improve pattern matching performance on large tables. The Perl Compatible Regular Expressions (PCRE) pattern-match library evaluates regular expressions. Restart the database for this parameter to take effect.

See also Regular expression functions.

Default: 1 (enable)

PatternMatchStackAllocator

Boolean, specifies whether to override the stack memory allocator for the pattern-match library. The Perl Compatible Regular Expressions (PCRE) pattern-match library evaluates regular expressions. Restart the database for this parameter to take effect.

See also Regular expression functions.

Default: 1 (enable override)

TerraceRoutingFactor

Specifies whether to enable or disable terrace routing on any Enterprise Mode large cluster that implements rack-based fault groups.

  • To enable, set as follows:
    where: * numRackNodes: Number of nodes in a rack * numRacks: Number of racks in the cluster
  • To disable, set to a value so large that terrace routing cannot be enabled for the largest clusters—for example, 1000.

For details, see Terrace routing.

Default: 2

TransactionIsolationLevel

Changes the isolation level for the database. After modification, Vertica uses the new transaction level for every new session. Existing sessions and their transactions continue to use the original isolation level.

See also Change transaction isolation levels.

Default: READ COMMITTED

TransactionMode

Specifies whether transactions are in read/write or read-only modes. Read/write is the default. Existing sessions and their transactions continue to use the original isolation level.

Default: READ WRITE

UDxFencedBlockTimeout

Specifies the number of seconds to wait for output before aborting a UDx running in Fenced and unfenced modes. If the server aborts a UDx for this reason, it produces an error message similar to "ERROR 3399: Failure in UDx RPC call: timed out in receiving a UDx message". If you see this error frequently, you can increase this limit. UDxs running in fenced mode do not run in the server process, so increasing this value does not impede server performance.

Default: 60

UseLocalTzForParquetTimestampConversion

Boolean, specifies whether to do timezone conversion when reading Parquet files. Hive version 1.2.1 introduced an option to localize timezones when writing Parquet files. Previously it wrote them in UTC and Vertica adjusted the value when reading the files.

Set to 0 if Hive already adjusted the timezones.

Default: 1 (enable conversion)

UseServerIdentityOverUserIdentity

Boolean, specifies whether to ignore user-supplied credentials for non-Linux file systems and always use a USER storage location to govern access to data. See Creating a Storage Location for USER Access.

Default: 0 (disable)

WithClauseMaterialization

Boolean, specifies whether to enable materialization of WITH clause results. When materialization is enabled (1), Vertica evaluates each WITH clause once and stores results in a temporary table.

Default: 0 (disable)

WithClauseRecursionLimit

Specifies the maximum number of times a WITH RECURSIVE clause iterates over the content of its own result set before it exits. For details, see WITH clause recursion.

Default: 8

2.3.2 - Eon Mode parameters

The following parameters configure how the database operates when running in Eon Mode.

The following parameters configure how the database operates when running in Eon Mode. Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
BackgroundDepotWarming

Specifies background depot warming behavior:

  • 1: The depot loads objects while it is warming, and continues to do so in the background after the node becomes active and starts executing queries.

  • 0: Node activation is deferred until the depot fetches and loads all queued objects

For details, see Depot Warming.

Default: 1

CatalogSyncInterval

Specifies in minutes how often the transaction log sync service syncs metadata to communal storage. If you change this setting, Vertica restarts the interval count.

Default: 5

DelayForDeletes

Specifies in hours how long to wait before deleting a file from communal storage. Vertica first deletes a file from the depot. After the specified time interval, the delete also occurs in communal storage.

Default: 0. Deletes the file from communal storage as soon as it is not in use by shard subscribers.

DepotOperationsForQuery

Specifies behavior when the depot does not contain queried file data, one of the following:

  • ALL (default): Fetch file data from communal storage, if necessary displace existing files by evicting them from the depot.

  • FETCHES: Fetch file data from communal storage only if space is available; otherwise, read the queried data directly from communal storage.

  • NONE: Do not fetch file data to the depot, read the queried data directly from communal storage.

You can also specify query-level behavior with the hint DEPOT_FETCH.

ECSMode

String parameter that sets the strategy Vertica uses when dividing the data in a shard among subscribing nodes during an ECS-enabled query, one of the following:

  • AUTO: Optimizer automatically determines the strategy to use.

  • COMPUTE_OPTIMIZED: Force use of the compute-optimized strategy.

  • IO_OPTIMIZED: Force use of the I/O-optimized strategy.

For details, see Manually choosing an ECS strategy.

Default: AUTO

ElasticKSafety

Boolean parameter that controls whether Vertica adjusts shard subscriptions due to the loss of a primary node:

  • 1: When a primary node is lost, Vertica subscribes other primary nodes to the down node's shard subscriptions. This action helps reduce the chances of a database into going read-only mode due to loss of shard coverage.
  • 0 : Vertica does not change shard subscriptions in reaction to the loss of a primary node.

Default: 1

For details, see Maintaining Shard Coverage.

EnableDepotWarmingFromPeers

Boolean parameter, specifies whether Vertica warms a node depot while the node is starting up and not ready to process queries:

  • 1: Warm the depot while a node comes up.

  • 0: Warm the depot only after the node is up.

For details, see Depot Warming.

Default: 0

FileDeletionServiceInterval

Specifies in seconds the interval between each execution of the reaper cleaner service task.

Default: 60 seconds

PreFetchPinnedObjectsToDepotAtStartup

If enabled (set to 1), a warming depot fetches objects that are pinned on its subcluster. For details, see Depot Warming.

Default: 0

ReaperCleanUpTimeoutAtShutdown

Specifies in seconds how long Vertica waits for the reaper to delete files from communal storage before shutting down. If set to a negative value, Vertica shuts down without waiting for the reaper.

Default: 300

StorageMergeMaxTempCacheMB

The size of temp space allocated per query to the StorageMerge operator for caching the data of S3 storage containers.

For details, see Local caching of storage containers.

UseCommunalStorageForBatchDepotWarming

Boolean parameter, specifies whether where a node retrieves data when warming its depot:

  • 1: Retrieve data from communal storage.

  • 0: Retrieve data from a peer.

Default: 1

UseDepotForReads

Boolean parameter, specifies whether Vertica accesses the depot to answer queries, or accesses only communal storage:

  • 1: Vertica first searches the depot for the queried data; if not there, Vertica fetches the data from communal storage for this and future queries.

  • 0: Vertica bypasses the depot and always obtains queried data from communal storage.

Default: 1

UseDepotForWrites

Boolean parameter, specifies whether Vertica writes loaded data to the depot and then uploads files to communal storage:

  • 1: Write loaded data to the depot, upload files to communal storage.

  • 0: Bypass the depot and always write directly to communal storage.

Default: 1

UsePeerToPeerDataTransfer

Boolean parameter, specifies whether Vertica pushes loaded data to other shard subscribers:

  • 1: Send loaded data to all shard subscribers.

  • 0: Do not push data to other shard subscribers.

Default: 0

2.3.3 - S3 parameters

Use the following parameters to configure reading from S3 file systems and on-premises storage with S3-compatible APIs, such as Pure Storage, using COPY FROM.

Use the following parameters to configure reading from S3 file systems and on-premises storage with S3-compatible APIs, such as Pure Storage, using COPY FROM. For more information about reading data from S3, see S3 Object Store.

For the parameters to control the AWS Library (UDSource), see Configure the Vertica library for Amazon Web Services.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
AWSAuth

ID and secret key for authentication. For extra security, do not store credentials in the database; use ALTER SESSION...SET PARAMETER to set this value for the current session only. If you use a shared credential, you can set it in the database with ALTER DATABASE...SET PARAMETER. For example:

=> ALTER SESSION SET AWSAuth='ID:secret';

AWS calls these AccessKeyID and SecretAccessKey.

To use admintools create_db or revive_db for Eon Mode on-premises, create a configuration file called auth_params.conf with these settings:

AWSAuth = key:secret
AWSEndpoint = IP:port
AWSCAFile

File name of the TLS server certificate bundle to use. Setting this parameter overrides the Vertica default CA bundle path specified in the SystemCABundlePath parameter.

If set, this parameter overrides the Vertica default CA bundle path specified in the SystemCABundlePath parameter.

=> ALTER DATABASE DEFAULT SET AWSCAFile = '/etc/ssl/ca-bundle.pem';

Default: system-dependent

AWSCAPath

Path Vertica uses to look up TLS server certificates. The file name of the TLS server certificate bundle to use.

If set, this parameter overrides the Vertica default CA bundle path specified in the SystemCABundlePath parameter.

=> ALTER DATABASE DEFAULT SET AWSCAPath = '/etc/ssl/';

Default: system-dependent

AWSEnableHttps

Boolean, specifies whether to use the HTTPS protocol when connecting to S3, can be set only at the database level with ALTER DATABASE...SET PARAMETER. If you choose not to use TLS, this parameter must be set to 0.

Default: 1 (enabled)

AWSEndpoint

Endpoint to use when interpreting S3 URLs, set as follows.

```
AWSEndpoint = s3-fips.dualstack.us-east-1.amazonaws.com
S3EnableVirtualAddressing = 1
```
  • On-premises/Pure: IP address of the Pure Storage server. If using admintools create_db or revive_db, create configuration file auth_params.conf and include these settings:
awsauth = key:secret
    awsendpoint = IP:port
  • When AWSEndpoint is not set, the default behavior is to use virtual-hosted request URLs.

Default: s3.amazonaws.com

AWSLogLevel

Log level, one of the following:

  • OFF

  • FATAL

  • ERROR

  • WARN

  • INFO

  • DEBUG

  • TRACE

**Default:**ERROR

AWSRegion

AWS region containing the S3 bucket from which to read files. This parameter can only be configured with one region at a time. If you need to access buckets in multiple regions, change the parameter each time you change regions.

If you do not set the correct region, you might experience a delay before queries fail because Vertica retries several times before giving up.

Default: us-east-1

AWSSessionToken

Temporary security token generated by running the get-session-token command, which generates temporary credentials you can use to configure multi-factor authentication.

S3BucketConfig

Contains S3 bucket configuration information as a JSON object with the following properties. Each property has an equivalent parameter (shown in parentheses). If both the property in S3BucketConfig and the equivalent S3 parameter are set, the S3BucketConfig property takes precedence.

Properties:

  • bucket: The name of the bucket

  • region: The name of the region (AWSRegion)

  • protocol: Specifies whether to secure the connection, one of the following:

    • http: Unencrypted connection

    • https: Encrypted connection

  • endpoint: The endpoint URL or IP address (AWSEndpoint)

  • enableVirtualAddressing: Whether to rewrite the S3 URL to use a virtual hosted path (S3BucketCredentials)

  • requesterPays: Boolean, specifies whether requester (instead of bucket owner) pays the cost of accessing data on the bucket; must be set in order to access S3 buckets configured as Requester Pays buckets. By setting this property to true, you are accepting the charges for accessing data. If not specified, the default value is false.

The configuration properties for a given bucket may differ based on its type. For example, the following S3BucketConfig is for an AWS bucket AWSBucket and a Pure Storage bucket PureStorageBucket. AWSBucket doesn't specify an endpoint, so Vertica uses the AWSEndpoint, which defaults to s3.amazonaws.com:

ALTER DATABASE DEFAULT SET S3BucketConfig=
'[
    {
        "bucket": "AWSBucket",
        "region": "us-east-2",
        "protocol": "https",
        "requesterPays": true
    },
    {
        "bucket": "PureStorageBucket",
        "endpoint": "pure.mycorp.net:1234",
        "protocol": "http",
        "enableVirtualAddressing": false
    }
]';
S3BucketCredentials

Contains credentials for accessing an S3 bucket. Each property in S3BucketCredentials has an equivalent parameter (shown in parentheses). When set, S3BucketCredentials takes precedence over both AWSAuth and AWSSessionToken.

Providing credentials for more than one bucket authenticates to them simultaneously, allowing you to perform cross-endpoint joins, export from one bucket to another, etc.

Properties:

  • bucket: The name of the bucket

  • accessKey: The access key for the bucket (the ID in AWSAuth)

  • secretAccessKey: The secret access key for the bucket (the secret in AWSAuth)

  • sessionToken: The session token, only used when S3BucketCredentials is set at the session level (AWSSessionToken)

For example, the following S3BucketCredentials is for an AWS bucket AWSBucket and a Pure Storage bucket PureStorageBucket and sets all possible properties:

ALTER SESSION SET S3BucketCredentials='
[
    {
        "bucket": "AWSBucket",
        "accessKey": "<AK0>",
        "secretAccessKey": "<SAK0>",
        "sessionToken": "1234567890"
    },
    {
        "bucket": "PureStorageBucket",
        "accessKey": "<AK1>",
        "secretAccessKey": "<SAK1>"
    }
]';

This parameter is only visible to the superuser. Users can set this parameter at the session level with ALTER SESSION.

S3EnableVirtualAddressing

Boolean, specifies whether to rewrite S3 URLs to use virtual-hosted paths. For example, if you use AWS, the S3 URLs change to bucketname.s3.amazonaws.com instead of s3.amazonaws.com/bucketname. This configuration setting takes effect only when you have specified a value for AWSEndpoint.

If you set AWSEndpoint to a FIPS-compliant S3 Endpoint, you must enable S3EnableVirtualAddressing in auth_params.conf:

AWSEndpoint = s3-fips.dualstack.us-east-1.amazonaws.com
S3EnableVirtualAddressing = 1

The value of this parameter does not affect how you specify S3 paths.

Default: 0 (disabled)

S3RequesterPays Boolean, specifies whether requester (instead of bucket owner) pays the cost of accessing data on the bucket. When true, the bucket owner is only responsible for paying the cost of storing the data, rather than all costs associated with the bucket; must be set in order to access S3 buckets configured as Requester Pays buckets. By setting this property to true, you are accepting the charges for accessing data. If not specified, the default value is false.
AWSStreamingConnectionPercentage

Controls the number of connections to the communal storage that Vertica uses for streaming reads. In a cloud environment, this setting helps prevent streaming data from communal storage using up all available file handles. It leaves some file handles available for other communal storage operations.

Due to the low latency of on-premises object stores, this option is unnecessary for an Eon Mode database that uses on-premises communal storage, such as Pure Storage and MinIO. In this case, disable the parameter by setting it to 0.

2.3.4 - AWS library S3-Compatible user-defined session parameters

The AWS library is deprecated.

Use these parameters to configure the Vertica library for all S3-compatible file systems. You use this library to export data from Vertica to S3. All parameters listed are case-sensitive.

Using ALTER SESSION to change the S3 configuration parameters described in S3 parameters also changes the corresponding parameters for this library. The reverse is not true: setting these parameters sets the values only for the library.

Parameter Description
aws_ca_bundle

The path which Vertica will use when looking for a SSL server certificate bundle. For SUSE Linux you must set a value. Setting the AWSCAFile configuration parameter for a session also sets this UDParameter.

For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_ca_bundle='/home/user/ssl_folder/ca_bundle';

Default: system dependent

aws_ca_path

The path which Vertica will use when looking for SSL server certificates. For SUSE Linux you must set a value. Setting the AWSCAPath configuration parameter for a session also sets this UDParameter.

For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_ca_path='/home/user/ssl_folder';

Default: system dependent

aws_endpoint

The endpoint to use when interpreting S3 URLs. For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_endpoint='localhost:8080';

Default: s3.amazonaws.com

aws_id

Your access key ID. Setting the AWSAuth configuration parameter for a session also sets this UDParameter.

For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_id='aws-id';

aws_max_recv_speed

The maximum transfer speed when receiving data to S3 in bytes per second. For example, to set a maximum receive speed of 100KB/S:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_max_recv_speed=102400;

aws_max_send_speed

The maximum transfer speed when sending data to S3 in bytes per second. For example, to set a maximum send speed of 1KB/S:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_max_send_speed=1024;

aws_proxy

A string value which allows you to set an HTTP/HTTPS proxy for the library.

For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_proxy='192.168.1.1:8080';

aws_region

The AWS region containing your S3 bucket. aws_region can only be configured with one region at a time. If you need to access buckets in multiple regions, you must re-set the parameter each time you change regions.

Setting the AWSRegion configuration parameter for a session also sets this UDParameter.

For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_region='region-id';

Default: us-east-1

aws_secret

Your secret access key. Setting the AWSAuth configuration parameter for a session also sets this UDParameter.

For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_secret='aws-key';

aws_session_token

The temporary security token generated by running the AWS STS command get-session-token. This AWS STS command generates temporary credentials you can use to implement multi-factor authentication for security purposes.

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_session_token='session-token';

aws_verbose

When enabled, logs libcurl debug messages to /opt/vertica/packages/AWS/logs.

For example:

=> ALTER SESSION SET UDPARAMETER FOR awslib aws_verbose=1;

Default: false

See also

2.3.5 - Google Cloud Storage parameters

Use the following parameters to configure reading from Google Cloud Storage (GCS) using COPY FROM.

Use the following parameters to configure reading from Google Cloud Storage (GCS) using COPY FROM. For more information about reading data from S3, see Specifying where to load data from.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
GCSAuth

An ID and secret key to authenticate to GCS. You can set parameters globally and for the current session with ALTER DATABASE...SET PARAMETER and ALTER SESSION...SET PARAMETER, respectively. For extra security, do not store credentials in the database; instead, set it for the current session with ALTER SESSION. For example:

=> ALTER SESSION SET GCSAuth='ID:secret';

If you use a shared credential, set it in the database with ALTER DATABASE.

GCSEnableHttps

Specifies whether to use the HTTPS protocol when connecting to GCS, can be set only at the database level with ALTER DATABASE...SET PARAMETER.

Default: 1 (enabled)

GCSEndpoint

The connection endpoint address.

Default: storage.googleapis.com

2.3.6 - Azure parameters

Use the following parameters to configure reading from Azure blob storage.

Use the following parameters to configure reading from Azure blob storage. For more information about reading data from Azure, see Azure Blob Storage object store.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.
AzureStorageCredentials
Collection of JSON objects, each of which specifies connection credentials for one endpoint. This parameter takes precedence over Azure managed identities.

The collection must contain at least one object and may contain more. Each object must specify at least one of accountName or blobEndpoint, and at least one of accountKey or sharedAccessSignature.

  • accountName: If not specified, uses the label of blobEndpoint.
  • blobEndpoint: Host name with optional port (host:port). If not specified, uses account.blob.core.windows.net.
  • accountKey: Access key for the account or endpoint.
  • sharedAccessSignature: Access token for finer-grained access control, if being used by the Azure endpoint.
AzureStorageEndpointConfig
Collection of JSON objects, each of which specifies configuration elements for one endpoint. Each object must specify at least one of accountName or blobEndpoint.
  • accountName: If not specified, uses the label of blobEndpoint.
  • blobEndpoint: Host name with optional port (host:port). If not specified, uses account.blob.core.windows.net.
  • protocol: HTTPS (default) or HTTP.
  • isMultiAccountEndpoint: true if the endpoint supports multiple accounts, false otherwise (default is false). To use multiple-account access, you must include the account name in the URI. If a URI path contains an account, this value is assumed to be true unless explicitly set to false.

2.3.7 - Apache Hadoop parameters

The following table describes general parameters for configuring integration with Apache Hadoop.

The following table describes general parameters for configuring integration with Apache Hadoop. See Apache Hadoop integration for more information.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
EnableHDFSBlockInfoCache

Boolean, whether to distribute block location metadata collected during planning on the initiator to all database nodes for execution. Distributing this metadata reduces name node accesses, and thus load, but can degrade database performance somewhat in deployments where the name node isn't contended. This performance effect is because the data must be serialized and distributed. Enable distribution if protecting the name node is more important than query performance; usually this applies to large HDFS clusters where name node contention is already an issue.

Default: 0 (disabled)

HadoopConfDir

Directory path containing the XML configuration files copied from Hadoop. The same path must be valid on every Vertica node. You can use the VERIFY_HADOOP_CONF_DIR meta-function to test that the value is set correctly. Setting this parameter is required to read data from HDFS.

For all Vertica users, the files are accessed by the Linux user under which the Vertica server process runs.

When you set this parameter, previously-cached configuration information is flushed.

You can set this parameter at the session level. Doing so overrides the database value; it does not append to it. For example:

=> ALTER SESSION SET HadoopConfDir='/test/conf:/hadoop/hcat/conf';

To append, get the current value and include it on the new path after your additions. Setting this parameter at the session level does not change how the files are accessed.

Default: obtained from environment if possible

HadoopFSAuthentication

How (or whether) to use Kerberos authentication with HDFS. By default, if KerberosKeytabFile is set, Vertica uses that credential for both Vertica and HDFS. Usually this is the desired behavior. However, if you are using a Kerberized Vertica cluster with a non-Kerberized HDFS cluster, set this parameter to "none" to indicate that Vertica should not use the Vertica Kerberos credential to access HDFS.

Default: "keytab" if KerberosKeytabFile is set, otherwise "none"

HadoopFSBlockSizeBytes

Block size to write to HDFS. Larger files are divided into blocks of this size.

Default: 64MB

HadoopFSNNOperationRetryTimeout

Number of seconds a metadata operation (such as list directory) waits for a response before failing. Accepts float values for millisecond precision.

Default: 6 seconds

HadoopFSReadRetryTimeout

Number of seconds a read operation waits before failing. Accepts float values for millisecond precision. If you are confident that your file system will fail more quickly, you can improve performance by lowering this value.

Default: 180 seconds

HadoopFSReplication

Number of replicas HDFS makes. This is independent of the replication that Vertica does to provide K-safety. Do not change this setting unless directed otherwise by Vertica support.

Default: 3

HadoopFSRetryWaitInterval

Initial number of seconds to wait before retrying read, write, and metadata operations. Accepts float values for millisecond precision. The retry interval increases exponentially with every retry.

Default: 3 seconds

HadoopFSTokenRefreshFrequency

How often, in seconds, to refresh the Hadoop tokens used to hold Kerberos tickets (see Token expiration).

Default: 0 (refresh when token expires)

HadoopFSWriteRetryTimeout

Number of seconds a write operation waits before failing. Accepts float values for millisecond precision. If you are confident that your file system will fail more quickly, you can improve performance by lowering this value.

Default: 180 seconds

HadoopImpersonationConfig Session parameter specifying the delegation token or Hadoop user for HDFS access. See HadoopImpersonationConfig format for information about the value of this parameter and Proxy users and delegation tokens for more general context.
HDFSUseWebHDFS

Boolean, whether URLs in the hdfs scheme use WebHDFS instead of LibHDFS++ to access data.

Default: 0 (disabled)

WebhdfsClientCertConf

mTLS configurations for accessing one or more WebHDFS servers. The value is a JSON string; each member has the following properties:

  • nameservice: WebHDFS name service

  • authority: host:port

  • certName: name of a certificate defined by CREATE CERTIFICATE

nameservice and authority are mutually exclusive.

For example:

=> ALTER SESSION SET WebhdfsClientCertConf =
    '[{"authority" : "my.authority.com:50070", "certName" : "myCert"},
      {"nameservice" : "prod", "certName" : "prodCert"}]';

HCatalog Connector parameters

The following table describes the parameters for configuring the HCatalog Connector. See Using the HCatalog Connector for more information.

Parameter Description
EnableHCatImpersonation

Boolean, whether the HCatalog Connector uses (impersonates) the current Vertica user when accessing Hive. If impersonation is enabled, the HCatalog Connector uses the Kerberos credentials of the logged-in Vertica user to access Hive data. Disable impersonation if you are using an authorization service to manage access without also granting users access to the underlying files. For more information, see Configuring security.

Default: 1 (enabled)

HCatalogConnectorUseHiveServer2

Boolean, whether Vertica internally uses HiveServer2 instead of WebHCat to get metadata from Hive.

Default: 1 (enabled)

HCatalogConnectorUseLibHDFSPP

Boolean, whether the HCatalog Connector should use the hdfs scheme instead of webhdfs with the HCatalog Connector.

Default: 1 (enabled)

HCatConnectionTimeout

The number of seconds the HCatalog Connector waits for a successful connection to the HiveServer2 (or WebHCat) server before returning a timeout error.

Default: 0 (Wait indefinitely)

HCatSlowTransferLimit

Lowest transfer speed (in bytes per second) that the HCatalog Connector allows when retrieving data from the HiveServer2 (or WebHCat) server. In some cases, the data transfer rate from the server to Vertica is below this threshold. In such cases, after the number of seconds specified in the HCatSlowTransferTime parameter pass, the HCatalog Connector cancels the query and closes the connection.

Default: 65536

HCatSlowTransferTime

Number of seconds the HCatalog Connector waits before testing whether the data transfer from the server is too slow. See the HCatSlowTransferLimit parameter.

Default: 60

2.3.8 - Security parameters

Use these client authentication configuration parameters and general security parameters to configure TLS.

Use these client authentication configuration parameters and general security parameters to configure TLS.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Database parameters

Parameter Description
DataSSLParams

This parameter has been deprecated. Use the data_channel TLS CONFIGURATION instead.

Enables encryption using SSL on the data channel. The value of this parameter is a comma-separated list of the following:

  • An SSL certificate (chainable)

  • The corresponding SSL private key

  • The SSL CA (Certificate Authority) certificate.

You should set EncryptSpreadComm before setting this parameter.

In the following example, the SSL Certificate contains two certificates, where the certificate for the non-root CA verifies the certificate for the cluster. This is called an SSL Certificate Chain.

=> ALTER DATABASE DEFAULT SET PARAMETER DataSSLParams =
'----BEGIN CERTIFICATE-----<certificate for Cluster>-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----<certificate for non-root CA>-----END CERTIFICATE-----,
-----BEGIN RSA PRIVATE KEY-----<private key for Cluster>-----END RSA PRIVATE KEY-----,
-----BEGIN CERTIFICATE-----<certificate for public CA>-----END CERTIFICATE-----';
DefaultIdleSessionTimeout

Indicates a default session timeout value for all users where IDLESESSIONTIMEOUT is not set. For example:

=> ALTER DATABASE DEFAULT SET defaultidlesessiontimeout = '300 secs';

DHParams

String, a Diffie-Hellman group of at least 2048 bits in the form:

-----BEGIN DH PARAMETERS-----...-----END DH PARAMETERS-----

You can generate your own or use the pre-calculated Modular Exponential (MODP) Diffie-Hellman groups specified in RFC 3526.

Changes to this parameter do not take effect until you restart the database.

Default: RFC 3526 2048-bit MODP Group 14:


-----BEGIN DH PARAMETERS-----MIIBCAKCAQEA///////////
JD9qiIWjCNMTGYouA3BzRKQJOCIpnzHQCC76mOxObIlFKCHmONAT
d75UZs806QxswKwpt8l8UN0/hNW1tUcJF5IW1dmJefsb0TELppjf
tawv/XLb0Brft7jhr+1qJn6WunyQRfEsf5kkoZlHs5Fs9wgB8uKF
jvwWY2kg2HFXTmmkWP6j9JM9fg2VdI9yjrZYcYvNWIIVSu57VKQd
wlpZtZww1Tkq8mATxdGwIyhghfDKQXkYuNs474553LBgOhgObJ4O
i7Aeij7XFXfBvTFLJ3ivL9pVYFxg5lUl86pVq5RXSJhiY+gUQFXK
OWoqsqmj//////////wIBAg==-----END DH PARAMETERS-----
DoUserSpecificFilteringInSysTables

Boolean, specifies whether a non-superuser can view details of another user:

  • 0: Users can view details of other users.

  • 1: Users can only view details about themselves.

Default: 0

EnableAllRolesOnLogin

Boolean, specifies whether to automatically enable all roles granted to a user on login:

  • 0: Do not automatically enable roles

  • 1: Automatically enable roles. With this setting, users do not need to run SET ROLE.

Default: 0 (disable)

EnabledCipherSuites

Specifies which SSL cipher suites to use for secure client-server communication. Changes to this parameter apply only to new connections.

Default: Vertica uses the Microsoft Schannel default cipher suites. For more information, see the Schannel documentation.

EncryptSpreadComm

Enables Spread encryption on the control channel, set to one of the following strings:

  • vertica: Specifies that Vertica generates the Spread encryption key for the database cluster.

  • aws-kms|key-name, where key-name is a named key in the iAWS Key Management Service (KMS). On database restart, Vertica fetches the named key from the KMS instead of generating its own.

If the parameter is empty, Spread communication is unencrypted. In general, you should enable this parameter before modifying other security parameters.

Enabling this parameter requires database restart.

GlobalHeirUsername

A string that specifies which user inherits objects after their owners are dropped. This setting ensures preservation of data that would otherwise be lost.

Set this parameter to one of the following string values:

  • Empty string: Objects of dropped users are removed from the database.

  • username: Reassigns objects of dropped users to username. If username does not exist, Vertica creates that user and sets GlobalHeirUsername to it.

  • <auto>: Reassigns objects of dropped LDAP users to user dbadmin.

For more information about usage, see Examples.

Default: <auto>

ImportExportTLSMode

When using CONNECT TO VERTICA to connect to another Vertica cluster for import or export, specifies the degree of stringency for using TLS. Possible values are:

  • PREFER: Try TLS but fall back to plaintext if TLS fails.

  • REQUIRE: Use TLS and fail if the server does not support TLS.

  • VERIFY_CA: Require TLS (as with REQUIRE), and also validate the other server's certificate using the CA specified by the "server" TLS CONFIGURATION's CA certificates (in this case, "ca_cert" and "ica_cert"):

    => SELECT name, certificate, ca_certificate, mode FROM tls_configurations WHERE name = 'server';
      name  |   certificate    |   ca_certificate   |   mode
    --------+------------------+--------------------+-----------
     server | server_cert      | ca_cert,ica_cert   | VERIFY_CA
    (1 row)
    
  • VERIFY_FULL: Require TLS and validate the certificate (as with VERIFY_CA), and also validate the server certificate's hostname.

  • REQUIRE_FORCE, VERIFY_CA_FORCE, and VERIFY_FULL_FORCE: Same behavior as REQUIRE, VERIFY_CA, and VERIFY_FULL, respectively, and cannot be overridden by CONNECT TO VERTICA.

Default: PREFER

PasswordLockTimeUnit

The time units for which an account is locked by PASSWORD_LOCK_TIME after FAILED_LOGIN_ATTEMPTS, one of the following:

  • 'd': days (default)

  • 'h': hours

  • 'm': minutes

  • 's': seconds

For example, to configure the default profile to lock user accounts for 30 minutes after three unsuccessful login attempts:


=> ALTER DATABASE DEFAULT SET PasswordLockTimeUnit = 'm'
=> ALTER PROFILE DEFAULT LIMIT PASSWORD_LOCK_TIME 30;
RequireFIPS

Boolean, specifies whether the FIPS mode is enabled:

  • 0 (disable)

  • 1: (enable)

On startup, Vertica automatically sets this parameter from the contents of the file crypto.fips_enabled. You cannot modify this parameter.

For details, see FIPS compliance for the Vertica server.

Default: 0

SecurityAlgorithm

Sets the algorithm for the function that hash authentication uses, one of the following:

  • SHA512

  • MD5

For example:

=> ALTER DATABASE DEFAULT SET SecurityAlgorithm = 'SHA512';

Default: SHA512

SystemCABundlePath

The absolute path to a certificate bundle of trusted CAs. This CA bundle is used when establishing TLS connections to external services such as AWS or Azure through their respective SDKs and libcurl. The CA bundle file must be in the same location on all nodes.

If this parameter is empty, Vertica searches the "standard" paths for the CA bundles, which differs between distributions:

  • Red Hat-based: /etc/pki/tls/certs/ca-bundle.crt
  • Debian-based: /etc/ssl/certs/ca-certificates.crt
  • SUSE: /var/lib/ca-certificates/ca-bundle.pem

Example:

=> ALTER DATABASE DEFAULT SET SystemCABundlePath = 'path/to/ca_bundle.pem';

Default: Empty

TLS CONFIGURATION parameters

To set your Vertica database's TLSMode, private key, server certificate, and CA certificate(s), see ALTER TLS CONFIGURATION. In versions prior to 11.0.0, these parameters were known as EnableSSL, SSLPrivateKey, SSLCertificate, and SSLCA, respectively.

Examples

Set the database parameter GlobalHeirUsername:

=> \du
      List of users
 User name | Is Superuser
-----------+--------------
 Joe       | f
 SuzyQ     | f
 dbadmin   | t
(3 rows)

=> ALTER DATABASE DEFAULT SET PARAMETER GlobalHeirUsername='SuzyQ';
ALTER DATABASE
=>  \c - Joe
You are now connected as user "Joe".
=> CREATE TABLE t1 (a int);
CREATE TABLE

=> \c
You are now connected as user "dbadmin".
=> \dt t1
             List of tables
 Schema | Name | Kind  | Owner | Comment
--------+------+-------+-------+---------
 public | t1   | table | Joe   |
(1 row)

=> DROP USER Joe;
NOTICE 4927:  The Table t1 depends on User Joe
ROLLBACK 3128:  DROP failed due to dependencies
DETAIL:  Cannot drop User Joe because other objects depend on it
HINT:  Use DROP ... CASCADE to drop the dependent objects too
=> DROP USER Joe CASCADE;
DROP USER
=> \dt t1
             List of tables
 Schema | Name | Kind  | Owner | Comment
--------+------+-------+-------+---------
 public | t1   | table | SuzyQ |
(1 row)

2.3.9 - Kerberos configuration parameters

The following parameters let you configure the Vertica principal for Kerberos authentication and specify the location of the Kerberos keytab file.

The following parameters let you configure the Vertica principal for Kerberos authentication and specify the location of the Kerberos keytab file.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
KerberosServiceName

Provides the service name portion of the Vertica Kerberos principal. By default, this parameter is vertica. For example:

vertica/host@EXAMPLE.COM

Default: vertica

KerberosHostname

Provides the instance or host name portion of the Vertica Kerberos principal. For example:

vertica/host@EXAMPLE.COM

If you omit the optional KerberosHostname parameter, Vertica uses the return value from the function gethostname(). Assuming each cluster node has a different host name, those nodes will each have a different principal, which you must manage in that node's keytab file.

KerberosRealm

Provides the realm portion of the Vertica Kerberos principal. A realm is the authentication administrative domain and is usually formed in uppercase letters. For example:

vertica/hostEXAMPLE.COM

KerberosKeytabFile

Provides the location of the keytab file that contains credentials for the Vertica Kerberos principal. By default, this file is located in /etc. For example:

KerberosKeytabFile=/etc/krb5.keytab

KerberosTicketDuration

Determines the lifetime of the ticket retrieved from performing a kinit. The default is 0 (zero) which disables this parameter.

If you omit setting this parameter, the lifetime is determined by the default Kerberos configuration.

2.3.10 - Machine learning parameters

You use machine learning parameters to configure various aspects of machine learning functionality in Vertica.

You use machine learning parameters to configure various aspects of machine learning functionality in Vertica.

Parameter Description
MaxModelSizeKB

Sets the maximum size of models that can be imported. The sum of the size of files specified in the metadata.json file determines the model size. The unit of this parameter is KBytes. The native Vertica model (category=VERTICA_MODELS) is exempted from this limit. If you can export the model from Vertica, and the model is not altered while outside Vertica, you can import it into Vertica again.

The MaxModelSizeKB parameter can be set only by a superuser and only at the database level. It is visible only to a superuser. Its default value is 4GB, and its valid range is between 1KB and 64GB (inclusive).

Examples:

To set this parameter to 3KB:

=> ALTER DATABASE DEFAULT SET MaxModelSizeKB = 3;

To set this parameter to 64GB (the maximum allowed):

=> ALTER DATABASE DEFAULT SET MaxModelSizeKB = 67108864;

To reset this parameter to the default value:

=> ALTER DATABASE DEFAULT CLEAR MaxModelSizeKB;

Default: 4GB

2.3.11 - Tuple mover parameters

These parameters control how the operates.

These parameters control how the Tuple Mover operates.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameters Description
ActivePartitionCount

Sets the number of active partitions. The active partitions are those most recently created. For example:

=> ALTER DATABASE DEFAULT SET ActivePartitionCount = 2;

For information about how the Tuple Mover treats active and inactive partitions during a mergeout operation, see Partition mergeout.

Default: 1

CancelTMTimeout

When partition, copy table, and rebalance operations encounter a conflict with an internal Tuple Mover job, those operations attempt to cancel the conflicting Tuple Mover job. This parameter specifies the amount of time, in seconds, that the blocked operation waits for the Tuple Mover cancellation to take effect. If the operation is unable to cancel the Tuple Mover job within limit specified by this parameter, the operation displays an error and rolls back.

Default: 300

EnableTMOnRecoveringNode

Boolean, specifies whether Tuple Mover performs mergeout activities on nodes with a node state of RECOVERING. Enabling Tuple Mover reduces the number of ROS containers generated during recovery. Having fewer than 1024 ROS containers per projection allows Vertica to maintain optimal recovery performance.

Default: 1 (enabled)

MaxMrgOutROSSizeMB

Specifies in MB the maximum size of ROS containers that are candidates for mergeout operations. The Tuple Mover avoids merging ROS containers that are larger than this setting.

Default: -1 (no maximum limit)

MergeOutInterval

Specifies in seconds how frequently the Tuple Mover checks the mergeout request queue for pending requests:

  1. If the queue contains mergeout requests, the Tuple Mover does nothing and goes back to sleep.

  2. If the queue is empty, the Tuple Mover:

    • Processes pending storage location move requests.

    • Checks for new unqueued purge requests and adds them to the queue.

    It then goes back to sleep.

Default: 600

PurgeMergeoutPercent

Specifies as a percentage the threshold of deleted records in a ROS container that invokes an automatic mergeout operation, to purge those records. Vertica only counts the number of 'aged-out' delete vectors—that is, delete vectors that are as 'old' or older than the ancient history mark (AHM) epoch.

This threshold applies to all ROS containers for non-partitioned tables. It also applies to ROS containers of all inactive partitions. In both cases, aged-out delete vectors are permanently purged from the ROS container.

Default: 20 (percent)

2.3.12 - Projection parameters

The following configuration parameters help you manage projections.

The following configuration parameters help you manage projections.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameters Description
AnalyzeRowCountInterval

Specifies how often Vertica checks the number of projection rows and whether the threshold set by ARCCommitPercentage has been crossed.

For more information, see Collecting database statistics.

Default: 86400 seconds (24 hours)

ARCCommitPercentage

Sets the threshold percentage of difference between the last-recorded aggregate projection row count and current row count for a given table. When the difference exceeds this threshold, Vertica updates the catalog with the current row count.

Default: 3 (percent)

ContainersPerProjectionLimit

Specifies how many ROS containers Vertica creates per projection before ROS pushback occurs.

Default: 1024

MaxAutoSegColumns

Specifies the number of columns (0 –1024) to use in an auto-projection's hash segmentation clause. Set to 0 to use all columns.

Default: 8

MaxAutoSortColumns

Specifies the number of columns (0 –1024) to use in an auto-projection's sort expression. Set to 0 to use all columns.

Default: 8

RebalanceQueryStorageContainers

By default, prior to performing a rebalance, Vertica performs a system table query to compute the size of all projections involved in the rebalance task. This query enables Vertica to optimize the rebalance to most efficiently utilize available disk space. This query can, however, significantly increase the time required to perform the rebalance.

By disabling the system table query, you can reduce the time required to perform the rebalance. If your nodes are low on disk space, disabling the query increases the chance that a node runs out of disk space. In that situation, the rebalance fails.

Default: 1 (enable)

RewriteQueryForLargeDim

If enabled (1), Vertica rewrites a SET USING or DEFAULT USING query during a REFRESH_COLUMNS operation by reversing the inner and outer join between the target and source tables. Doing so can optimize refresh performance in cases where the source data is in a table that is larger than the target table.

Default: 0

SegmentAutoProjection

Determines whether auto-projections are segmented if the table definition omits a segmentation clause. You can set this parameter at database and session scopes.

Default: 1 (create segmented auto projections)

2.3.13 - Epoch management parameters

The following table describes the epoch management parameters for configuring Vertica.

The following table describes the epoch management parameters for configuring Vertica.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.
Parameters Description
AdvanceAHMInterval

Determines how frequently (in seconds) Vertica checks the history retention status.

AdvanceAHMInterval cannot be set to a value that is less than the EpochMapInterval.

Default: 180 (seconds)

AHMBackupManagement

Blocks the advancement of the Ancient History Mark (AHM). When this parameter is enabled, the AHM epoch cannot be later than the epoch of your latest full backup. If you advance the AHM to purge and delete data, do not enable this parameter.

Default: 0

EpochMapInterval

Determines the granularity of mapping between epochs and time available to historical queries. When a historical queries AT TIME T request is issued, Vertica maps it to an epoch within a granularity of EpochMapInterval seconds. It similarly affects the time reported for Last Good Epoch during Failure recovery. Note that it does not affect internal precision of epochs themselves.

Default: 180 (seconds)

HistoryRetentionTime

Determines how long deleted data is saved (in seconds) as an historical reference. When the specified time since the deletion has passed, you can purge the data. Use the -1 setting if you prefer to use HistoryRetentionEpochs to determine which deleted data can be purged.

Default: 0 (Data saved when nodes are down.)

HistoryRetentionEpochs

Specifies the number of historical epochs to save, and therefore, the amount of deleted data.

Unless you have a reason to limit the number of epochs, Vertica recommends that you specify the time over which deleted data is saved.

If you specify both History parameters, HistoryRetentionTime takes precedence. Setting both parameters to -1, preserves all historical data.

See Setting a purge policy.

Default: -1 (disabled)

2.3.14 - Monitoring parameters

The following table describes parameters that control options for monitoring the Vertica database.

The following table describes parameters that control options for monitoring the Vertica database.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.
Parameters Description
EnableDataCollector

Enables and disables the Data Collector, which is the Workload Analyzer's internal diagnostics utility. Affects all sessions on all nodes. Use 0 to turn off data collection.

Default: 1 (enabled)

SnmpTrapDestinationsList

Defines where Vertica sends traps for SNMP. See Configuring reporting for SNMP. For example:

=> ALTER DATABASE DEFAULT SET SnmpTrapDestinationsList = 'localhost 162 public';

Default: none

SnmpTrapsEnabled

Enables event trapping for SNMP. See Configuring reporting for SNMP.

Default: 0

SnmpTrapEvents

Define which events Vertica traps through SNMP. See Configuring reporting for SNMP. For example:

ALTER DATABASE DEFAULT SET SnmpTrapEvents = 'Low Disk Space, Recovery Failure';

Default: Low Disk Space, Read Only File System, Loss of K Safety, Current Fault Tolerance at Critical Level, Too Many ROS Containers, Node State Change, Recovery Failure, Stale Checkpoint, and CRC Mismatch.

SyslogEnabled

Enables event trapping for syslog. See Configuring reporting for syslog.

Default: 0

SyslogEvents

Defines events that generate a syslog entry. See Configuring reporting for syslog. For example:

ALTER DATABASE DEFAULT SET SyslogEvents = 'Low Disk Space, Recovery Failure';

Default: none

SyslogFacility

Defines which SyslogFacility Vertica uses. See Configuring reporting for syslog.

Default: user

2.3.15 - Profiling parameters

The following table describes the profiling parameters for configuring Vertica.

The following table describes the profiling parameters for configuring Vertica. See Profiling database performance for more information on profiling queries.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameters Description
GlobalEEProfiling

Enables profiling for query execution runs in all sessions on all nodes.

Default: 0

GlobalQueryProfiling

Enables query profiling for all sessions on all nodes.

Default: 0

GlobalSessionProfiling

Enables session profiling for all sessions on all nodes.

Default: 0

SaveDCEEProfileThresholdUS

Sets in microseconds the query duration threshold for saving profiling information to system tables QUERY_CONSUMPTION and EXECUTION_ENGINE_PROFILES. You can set this parameter to a maximum value of 2147483647 (231-1, or ~35.79 minutes).

Default: 60000000 (60 seconds)

2.3.16 - Database Designer parameters

The following table describes the parameters for configuring the Vertica Database Designer.

The following table describes the parameters for configuring the Vertica Database Designer.

Parameter Description
DBDCorrelationSampleRowCount

Minimum number of table rows at which Database Designer discovers and records correlated columns.

Default: 4000

DBDLogInternalDesignProcess

Enables or disables Database Designer logging.

Default: 0 (False)

DBDUseOnlyDesignerResourcePool

Enables use of the DBD pool by the Vertica Database Designer.

When set to false, design processing is mostly contained by the user's resource pool, but might spill over into some system resource pools for less-intensive tasks

Default: 0 (False)

2.3.17 - Internationalization parameters

The following table describes internationalization parameters for configuring Vertica.

The following table describes internationalization parameters for configuring Vertica.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameters Description
DefaultIntervalStyle

Sets the default interval style to use. If set to 0 (default), the interval is in PLAIN style (the SQL standard), no interval units on output. If set to 1, the interval is in UNITS on output. This parameter does not take effect until the database is restarted.

Default: 0

DefaultSessionLocale

Sets the default session startup locale for the database. This parameter does not take effect until the database is restarted.

Default: en_US@collation=binary

EscapeStringWarning

Issues a warning when backslashes are used in a string literal. This can help locate backslashes that are being treated as escape characters so they can be fixed to follow the SQL standard-conforming string syntax instead.

Default: 1

StandardConformingStrings

Determines whether character string literals treat backslashes () as string literals or escape characters. When set to -1, backslashes are treated as string literals; when set to 0, backslashes are treated as escape characters.

Default: -1

2.3.18 - Text search parameters

You can configure Vertica for text search using the following parameter.

You can configure Vertica for text search using the following parameter.

Parameters Description
TextIndexMaxTokenLength

Controls the maximum size of a token in a text index.

For example:

ALTER DATABASE database_name SET PARAMETER TextIndexMaxTokenLength=760;

If the parameter is set to a value greater than 65000 characters, then the tokenizer truncates the token at 65000 characters.

Default: 128 (characters)

2.3.19 - Kafka user-defined session parameters

Use the following Vertica use-defined session parameters to configure Kafka SSL when not using a scheduler.

Use the following Vertica use-defined session parameters to configure Kafka SSL when not using a scheduler. The kafka_ parameters configure SSL authentication for Kafka. Refer to TLS/SSL encryption with Kafka for more information.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameter Description
kafka_SSL_CA

The contents of the certificate authority certificate. For example:

=> ALTER SESSION SET UDPARAMETER kafka_SSL_CA='MIIBOQIBAAJBAIOL';

Default: none

kafka_SSL_Certificate

The contents of the SSL certificate. For example:

=> ALTER SESSION SET UDPARAMETER kafka_SSL_Certificate='XrM07O4dV/nJ5g';

This parameter is optional when the Kafka server's parameter ssl.client.auth is set to none or requested.

Default: none

kafka_SSL_PrivateKey_secret

The private key used to encrypt the session. Vertica does not log this information. For example:

=> ALTER SESSION SET UDPARAMETER kafka_SSL_PrivateKey_secret='A60iThKtezaCk7F';

This parameter is optional when the Kafka server's parameter ssl.client.auth is set to none or requested.

Default: none

kafka_SSL_PrivateKeyPassword_secret

The password used to create the private key. Vertica does not log this information.

For example:

ALTER SESSION SET UDPARAMETER kafka_SSL_PrivateKeyPassword_secret='secret';

This parameter is optional when the Kafka server's parameter ssl.client.auth is set to none or requested.

Default: none

kafka_Enable_SSL

Enables SSL authentication for Vertica-Kafka integration. For example:

=> ALTER SESSION SET UDPARAMETER kafka_Enable_SSL=1;

Default: 0

MaxSessionUDParameterSize

Sets the maximum length of a value in a user-defined session parameter. For example:

=> ALTER SESSION SET MaxSessionUDParameterSize = 2000

Default: 1000

User-defined session parameters

2.3.20 - Constraint parameters

The following configuration parameters control how Vertica evaluates and enforces constraints.

The following configuration parameters control how Vertica evaluates and enforces constraints. All parameters are set at the database level through ALTER DATABASE.

Three of these parameters—EnableNewCheckConstraintsByDefault, EnableNewPrimaryKeysByDefault, and EnableNewUniqueKeysByDefault—can be used to enforce CHECK, PRIMARY KEY, and UNIQUE constraints, respectively. For details, see Constraint enforcement.

Parameters Description
EnableNewCheckConstraintsByDefault

Boolean parameter, set to 0 or 1:

  • 0: Disable enforcement of new CHECK constraints except where the table DDL explicitly enables them.

  • 1 (default): Enforce new CHECK constraints except where the table DDL explicitly disables them.

EnableNewPrimaryKeysByDefault

Boolean parameter, set to 0 or 1:

  • 0 (default): Disable enforcement of new PRIMARY KEY constraints except where the table DDL explicitly enables them.

  • 1: Enforce new PRIMARY KEY constraints except where the table DDL explicitly disables them.

EnableNewUniqueKeysByDefault

Boolean parameter, set to 0 or 1:

  • 0 (default): Disable enforcement of new UNIQUE constraints except where the table DDL explicitly enables them.

  • 1: Enforce new UNIQUE constraints except where the table DDL explicitly disables them.

MaxConstraintChecksPerQuery

Sets the maximum number of constraints that ANALYZE_CONSTRAINTS can handle with a single query:

  • -1 (default): No maximum set, ANALYZE_CONSTRAINTS uses a single query to evaluate all constraints within the specified scope.

  • Integer > 0: The maximum number of constraints per query. If the number of constraints to evaluate exceeds this value, ANALYZE_CONSTRAINTS handles it with multiple queries.

For details, see Distributing Constraint Analysis.

2.3.21 - Numeric precision parameters

The following configuration parameters let you configure numeric precision for numeric data types.

The following configuration parameters let you configure numeric precision for numeric data types. For more about using these parameters, seeNumeric data type overflow with SUM, SUM_FLOAT, and AVG.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameters Description
AllowNumericOverflow

Boolean, set to one of the following:

  • 1 (true): Allows silent numeric overflow. Vertica does not implicitly extend precision of numeric data types. Vertica ignores the value of NumericSumExtraPrecisionDigits.

  • 0 (false): Vertica produces an overflow error, if a result exceeds the precision set by NumericSumExtraPrecisionDigits.

Default: 1 (true)

NumericSumExtraPrecisionDigits

An integer between 0 and 20, inclusive. Vertica produces an overflow error if a result exceeds the specified precision. This parameter setting only applies if AllowNumericOverflow is set to 0 (false).

Default: 6 (places beyond the DDL-specified precision)

2.3.22 - Memory management parameters

The following table describes parameters for managing Vertica memory usage.

The following table describes parameters for managing Vertica memory usage.

Query the CONFIGURATION_PARAMETERS system table to determine what levels (node, session, user, database) are valid for a given parameter.

Parameters Description
MemoryPollerIntervalSec

Specifies in seconds how often the Vertica memory poller checks whether Vertica memory usage is below the thresholds of several configuration parameters (see below):

  • MemoryPollerMallocBloatThreshold

  • MemoryPollerReportThreshold

  • MemoryPollerTrimThreshold

Default: 2

MemoryPollerMallocBloatThreshold

Threshold of glibc memory bloat.

The memory poller calls glibc function malloc_info() to obtain the amount of free memory in malloc. It then compares MemoryPollerMallocBloatThreshold—by default, set to 0.3—with the following expression:

free-memory-in-malloc / RSS

If this expression evaluates to a value higher than MemoryPollerMallocBloatThreshold, the memory poller calls glibc function malloc_trim(). This function reclaims free memory from malloc and returns it to the operating system. Details on calls to malloc_trim() are written to system table MEMORY_EVENTS.

To disable polling of this threshold, set the parameter to 0.

Default: 0.3

MemoryPollerReportThreshold

Threshold of memory usage that determines whether the Vertica memory poller writes a report.

The memory poller compares MemoryPollerReportThreshold with the following expression:

RSS / available-memory

When this expression evaluates to a value higher than MemoryPollerReportThreshold—by default, set to 0.93, then the memory poller writes a report to MemoryReport.log, in the Vertica working directory. This report includes information about Vertica memory pools, how much memory is consumed by individual queries and session, and so on. The memory poller also logs the report as an event in system table MEMORY_EVENTS, where it sets EVENT_TYPE to MEMORY_REPORT.

To disable polling of this threshold, set the parameter to 0.

Default: 0.93

MemoryPollerTrimThreshold

Threshold for the memory poller to start checking whether to trim glibc-allocated memory.

The memory poller compares MemoryPollerTrimThreshold—by default, set to 0.83— with the following expression:

RSS / available-memory

If this expression evaluates to a value higher than MemoryPollerTrimThreshold, then the memory poller starts checking the next threshold—set in MemoryPollerMallocBloatThreshold—for glibc memory bloat.

To disable polling of this threshold, set the parameter to 0. Doing so also disables polling of MemoryPollerMallocBloatThreshold.

Default: 0.83

3 - Designing a logical schema

Designing a logical schema for a Vertica database is the same as designing for any other SQL database.

Designing a logical schema for a Vertica database is the same as designing for any other SQL database. A logical schema consists of objects such as schemas, tables, views and referential Integrity constraints that are visible to SQL users. Vertica supports any relational schema design that you choose.

3.1 - Using multiple schemas

Using a single schema is effective if there is only one database user or if a few users cooperate in sharing the database.

Using a single schema is effective if there is only one database user or if a few users cooperate in sharing the database. In many cases, however, it makes sense to use additional schemas to allow users and their applications to create and access tables in separate namespaces. For example, using additional schemas allows:

  • Many users to access the database without interfering with one another.

    Individual schemas can be configured to grant specific users access to the schema and its tables while restricting others.

  • Third-party applications to create tables that have the same name in different schemas, preventing table collisions.

Unlike other RDBMS, a schema in a Vertica database is not a collection of objects bound to one user.

3.1.1 - Multiple schema examples

This section provides examples of when and how you might want to use multiple schemas to separate database users.

This section provides examples of when and how you might want to use multiple schemas to separate database users. These examples fall into two categories: using multiple private schemas and using a combination of private schemas (i.e. schemas limited to a single user) and shared schemas (i.e. schemas shared across multiple users).

Using multiple private schemas

Using multiple private schemas is an effective way of separating database users from one another when sensitive information is involved. Typically a user is granted access to only one schema and its contents, thus providing database security at the schema level. Database users can be running different applications, multiple copies of the same application, or even multiple instances of the same application. This enables you to consolidate applications on one database to reduce management overhead and use resources more effectively. The following examples highlight using multiple private schemas.

Using multiple schemas to separate users and their unique applications

In this example, both database users work for the same company. One user (HRUser) uses a Human Resource (HR) application with access to sensitive personal data, such as salaries, while another user (MedUser) accesses information regarding company healthcare costs through a healthcare management application. HRUser should not be able to access company healthcare cost information and MedUser should not be able to view personal employee data.

To grant these users access to data they need while restricting them from data they should not see, two schemas are created with appropriate user access, as follows:

  • HRSchema—A schema owned by HRUser that is accessed by the HR application.

  • HealthSchema—A schema owned by MedUser that is accessed by the healthcare management application.

Using multiple schemas to support multitenancy

This example is similar to the last example in that access to sensitive data is limited by separating users into different schemas. In this case, however, each user is using a virtual instance of the same application.

An example of this is a retail marketing analytics company that provides data and software as a service (SaaS) to large retailers to help them determine which promotional methods they use are most effective at driving customer sales.

In this example, each database user equates to a retailer, and each user only has access to its own schema. The retail marketing analytics company provides a virtual instance of the same application to each retail customer, and each instance points to the user’s specific schema in which to create and update tables. The tables in these schemas use the same names because they are created by instances of the same application, but they do not conflict because they are in separate schemas.

Example of schemas in this database could be:

  • MartSchema—A schema owned by MartUser, a large department store chain.

  • PharmSchema—A schema owned by PharmUser, a large drug store chain.

Using multiple schemas to migrate to a newer version of an application

Using multiple schemas is an effective way of migrating to a new version of a software application. In this case, a new schema is created to support the new version of the software, and the old schema is kept as long as necessary to support the original version of the software. This is called a “rolling application upgrade.”

For example, a company might use a HR application to store employee data. The following schemas could be used for the original and updated versions of the software:

  • HRSchema—A schema owned by HRUser, the schema user for the original HR application.

  • V2HRSchema—A schema owned by V2HRUser, the schema user for the new version of the HR application.

Combining private and shared schemas

The previous examples illustrate cases in which all schemas in the database are private and no information is shared between users. However, users might want to share common data. In the retail case, for example, MartUser and PharmUser might want to compare their per store sales of a particular product against the industry per store sales average. Since this information is an industry average and is not specific to any retail chain, it can be placed in a schema on which both users are granted USAGE privileges.

Example of schemas in this database might be:

  • MartSchema—A schema owned by MartUser, a large department store chain.

  • PharmSchema—A schema owned by PharmUser, a large drug store chain.

  • IndustrySchema—A schema owned by DBUser (from the retail marketing analytics company) on which both MartUser and PharmUser have USAGE privileges. It is unlikely that retailers would be given any privileges beyond USAGE on the schema and SELECT on one or more of its tables.

3.1.2 - Creating schemas

You can create as many schemas as necessary for your database.

You can create as many schemas as necessary for your database. For example, you could create a schema for each database user. However, schemas and users are not synonymous as they are in Oracle.

By default, only a superuser can create a schema or give a user the right to create a schema. (See GRANT (database) in the SQL Reference Manual.)

To create a schema use the CREATE SCHEMA statement, as described in the SQL Reference Manual.

3.1.3 - Specifying objects in multiple schemas

Once you create two or more schemas, each SQL statement or function must identify the schema associated with the object you are referencing.

Once you create two or more schemas, each SQL statement or function must identify the schema associated with the object you are referencing. You can specify an object within multiple schemas by:

  • Qualifying the object name by using the schema name and object name separated by a dot. For example, to specify MyTable, located in Schema1, qualify the name as Schema1.MyTable.

  • Using a search path that includes the desired schemas when a referenced object is unqualified. By Setting search paths, Vertica will automatically search the specified schemas to find the object.

3.1.4 - Setting search paths

Each user session has a search path of schemas.

Each user session has a search path of schemas. Vertica uses this search path to find tables and user-defined functions (UDFs) that are unqualified by their schema name. A session search path is initially set from the user's profile. You can change the session's search path at any time by calling SET SEARCH_PATH. This search path remains in effect until the next SET SEARCH_PATH statement, or the session ends.

Viewing the current search path

SHOW SEARCH_PATH returns the session's current search path. For example:


=> SHOW SEARCH_PATH;
    name     |                      setting
-------------+---------------------------------------------------
 search_path | "$user", public, v_catalog, v_monitor, v_internal

Schemas are listed in descending order of precedence. The first schema has the highest precedence in the search order. If this schema exists, it is also defined as the current schema, which is used for tables that are created with unqualified names. You can identify the current schema by calling the function CURRENT_SCHEMA:

=> SELECT CURRENT_SCHEMA;
 current_schema
----------------
 public
(1 row)

Setting the user search path

A session search path is initially set from the user's profile. If the search path in a user profile is not set by CREATE USER or ALTER USER, it is set to the database default:

=> CREATE USER agent007;
CREATE USER
=> \c - agent007
You are now connected as user "agent007".
=> SHOW SEARCH_PATH;
    name     |                      setting
-------------+---------------------------------------------------
 search_path | "$user", public, v_catalog, v_monitor, v_internal

$user resolves to the session user name—in this case, agent007—and has the highest precedence. If a schema agent007, exists, Vertica begins searches for unqualified tables in that schema. Also, calls to CURRENT_SCHEMA return this schema. Otherwise, Vertica uses public as the current schema and begins searches in it.

Use ALTER USER to modify an existing user's search path. These changes overwrite all non-system schemas in the search path, including $USER. System schemas are untouched. Changes to a user's search path take effect only when the user starts a new session; current sessions are unaffected.

For example, the following statements modify agent007's search path, and grant access privileges to schemas and tables that are on the new search path:

=> ALTER USER agent007 SEARCH_PATH store, public;
ALTER USER
=> GRANT ALL ON SCHEMA store, public TO agent007;
GRANT PRIVILEGE
=> GRANT SELECT ON ALL TABLES IN SCHEMA store, public TO agent007;
GRANT PRIVILEGE
=> \c - agent007
You are now connected as user "agent007".
=> SHOW SEARCH_PATH;
    name     |                     setting
-------------+-------------------------------------------------
 search_path | store, public, v_catalog, v_monitor, v_internal
(1 row)

To verify a user's search path, query the system table USERS:

=> SELECT search_path FROM USERS WHERE user_name='agent007';
                   search_path
-------------------------------------------------
 store, public, v_catalog, v_monitor, v_internal
(1 row)

To revert a user's search path to the database default settings, call ALTER USER and set the search path to DEFAULT. For example:


=> ALTER USER agent007 SEARCH_PATH DEFAULT;
ALTER USER
=> SELECT search_path FROM USERS WHERE user_name='agent007';
                    search_path
---------------------------------------------------
 "$user", public, v_catalog, v_monitor, v_internal
(1 row)

Ignored search path schemas

Vertica only searches among existing schemas to which the current user has access privileges. If a schema in the search path does not exist or the user lacks access privileges to it, Vertica silently excludes it from the search. For example, if agent007 lacks SELECT privileges to schema public, Vertica silently skips this schema. Vertica returns with an error only if it cannot find the table anywhere on the search path.

Setting session search path

Vertica initially sets a session's search path from the user's profile. You can change the current session's search path with SET SEARCH_PATH. You can use SET SEARCH_PATH in two ways:

  • Explicitly set the session search path to one or more schemas. For example:

    
    => \c - agent007
    You are now connected as user "agent007".
    dbadmin=> SHOW SEARCH_PATH;
        name     |                      setting
    -------------+---------------------------------------------------
     search_path | "$user", public, v_catalog, v_monitor, v_internal
    (1 row)
    
    => SET SEARCH_PATH TO store, public;
    SET
    => SHOW SEARCH_PATH;
        name     |                     setting
    -------------+-------------------------------------------------
     search_path | store, public, v_catalog, v_monitor, v_internal
    (1 row)
    
  • Set the session search path to the database default:

    
    => SET SEARCH_PATH TO DEFAULT;
    SET
    => SHOW SEARCH_PATH;
        name     |                      setting
    -------------+---------------------------------------------------
     search_path | "$user", public, v_catalog, v_monitor, v_internal
    (1 row)
    

SET SEARCH_PATH overwrites all non-system schemas in the search path, including $USER. System schemas are untouched.

3.1.5 - Creating objects that span multiple schemas

Vertica supports that reference tables across multiple schemas.

Vertica supports views that reference tables across multiple schemas. For example, a user might need to compare employee salaries to industry averages. In this case, the application queries two schemas:

  • Shared schema IndustrySchema for salary averages

  • Private schema HRSchema for company-specific salary information

Best Practice: When creating objects that span schemas, use qualified table names. This naming convention avoids confusion if the query path or table structure within the schemas changes at a later date.

3.2 - Tables in schemas

In Vertica you can create persistent and temporary tables, through CREATE TABLE and CREATE TEMPORARY TABLE, respectively.

In Vertica you can create persistent and temporary tables, through CREATE TABLE and CREATE TEMPORARY TABLE, respectively.

For detailed information on both types, see Creating Tables and Creating temporary tables.

Persistent tables

CREATE TABLE creates a table in the Vertica logical schema. For example:

CREATE TABLE vendor_dimension (
   vendor_key        INTEGER      NOT NULL PRIMARY KEY,
   vendor_name       VARCHAR(64),
   vendor_address    VARCHAR(64),
   vendor_city       VARCHAR(64),
   vendor_state      CHAR(2),
   vendor_region     VARCHAR(32),
   deal_size         INTEGER,
   last_deal_update  DATE

);

For detailed information, see Creating Tables.

Temporary tables

CREATE TEMPORARY TABLE creates a table whose data persists only during the current session. Temporary table data is never visible to other sessions.

Temporary tables can be used to divide complex query processing into multiple steps. Typically, a reporting tool holds intermediate results while reports are generated—for example, the tool first gets a result set, then queries the result set, and so on.

CREATE TEMPORARY TABLE can create tables at two scopes, global and local, through the keywords GLOBAL and LOCAL, respectively:

  • GLOBAL (default): The table definition is visible to all sessions. However, table data is session-scoped.

  • LOCAL: The table definition is visible only to the session in which it is created. When the session ends, Vertica automatically drops the table.

For detailed information, see Creating temporary tables.

4 - Creating a database design

A is a physical storage plan that optimizes query performance.

A design is a physical storage plan that optimizes query performance. Data in Vertica is physically stored in projections. When you initially load data into a table using INSERT, COPY (or COPY LOCAL), Vertica creates a default superprojection for the table. This superprojection ensures that all of the data is available for queries. However, these superprojections might not optimize database performance, resulting in slow query performance and low data compression.

To improve performance, create a design for your Vertica database that optimizes query performance and data compression. You can create a design in several ways:

Database Designer can help you minimize how much time you spend on manual database tuning. You can also use Database Designer to redesign the database incrementally as requirements such as workloads change over time.

Database Designer runs as a background process. This is useful if you have a large design that you want to run overnight. An active SSH session is not required, so design and deploy operations continue to run uninterrupted if the session ends.

4.1 - About Database Designer

Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.

Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.

During the design process, Database Designer analyzes the logical schema definition, sample data, and sample queries, and creates a physical schema (projections) in the form of a SQL script that you deploy automatically or manually. This script creates a minimal set of superprojections to ensure K-safety.

In most cases, the projections that Database Designer creates provide excellent query performance within physical constraints while using disk space efficiently.

General design options

When you run Database Designer, several general options are available:

  • Create a comprehensive or incremental design.

  • Optimize for query execution, load, or a balance of both.

  • Require K-safety.

  • Recommend unsegmented projections when feasible.

  • Analyze statistics before creating the design.

Design input

Database Designer bases its design on the following information that you provide:

  • Design queries that you typically run during normal database operations.

  • Design tables that contain sample data.

Output

Database Designer yields the following output:

  • A design script that creates the projections for the design in a way that meets the optimization objectives and distributes data uniformly across the cluster.

  • A deployment script that creates and refreshes the projections for your design. For comprehensive designs, the deployment script contains commands that remove non-optimized projections. The deployment script includes the full design script.

  • A backup script that contains SQL statements to deploy the design that existed on the system before deployment. This file is useful in case you need to revert to the pre-deployment design.

Design restrictions

Database Designer-generated designs:

  • Exclude live aggregate or Top-K projections. You must create these manually. See CREATE PROJECTION.

  • Do not sort, segment, or partition projections on LONG VARBINARY and LONG VARCHAR columns.

Post-design options

While running Database Designer, you can choose to deploy your design automatically after the deployment script is created, or to deploy it manually, after you have reviewed and tested the design. Vertica recommends that you test the design on a non-production server before deploying the design to your production server.

4.2 - How Database Designer creates a design

Database Designer-generated designs can include the following recommendations:.

Design recommendations

Database Designer-generated designs can include the following recommendations:

  • Sort buddy projections in the same order, which can significantly improve load, recovery, and site node performance. All buddy projections have the same base name so that they can be identified as a group.

  • Accepts unlimited queries for a comprehensive design.

  • Identifies similar design queries and assigns them a signature.

    For queries with the same signature, Database Designer weights the queries, depending on how many queries have that signature. It then considers the weighted query when creating a design.

  • Recommends and creates projections in a way that minimizes data skew by distributing data uniformly across the cluster.

  • Produces higher quality designs by considering UPDATE, DELETE, and SELECT statements.

4.3 - Who can run Database Designer

Two types of users can use Database Designer to create an optimal database design.

Two types of users can use Database Designer to create an optimal database design. DBADMIN users, and users with the DBDUSER role. The topics in this section describe how DBDUSERs can use Database Designer.

4.3.1 - Granting and enabling the DBDUSER role

For a non-DBADMIN user to be able to run Database Designer using Management Console, follow the steps described in Allowing the DBDUSER to Run Database Designer Using Management Console.

For a non-DBADMIN user to be able to run Database Designer using Management Console, follow the steps described in Allowing the DBDUSER to run Database Designer using Management Console.

For a non-DBADMIN user to be able to run Database Designer programmatically, following the steps described in Allowing the DBDUSER to run Database Designer programmatically.

4.3.1.1 - Allowing the DBDUSER to run Database Designer using Management Console

To allow a user with the DBDUSER role to run Database Designer using Management Console, you must create the user on the Vertica server.

To allow a user with the DBDUSER role to run Database Designer using Management Console, you must create the user on the Vertica server.

As DBADMIN, take these steps on the server:

  1. Add a temporary folder to all cluster nodes.

    => CREATE LOCATION '/tmp/dbd' ALL NODES;
    
  2. Create the user who needs access to Database Designer.

    => CREATE USER new_user;
    
  3. Grant the user the privilege to create schemas on the database for which they want to create a design.

    => GRANT CREATE ON DATABASE new_database TO new_user;
    
  4. Grant the DBDUSER role to the new user.

    => GRANT DBDUSER TO new_user;
    
  5. On all nodes in the cluster, grant the user access to the temporary folder.

    => GRANT ALL ON LOCATION '/tmp/dbd' TO new_user;
    
  6. Grant the new user access to the database schema and its tables.

    => GRANT ALL ON SCHEMA user_schema TO new_user;
    => GRANT ALL ON ALL TABLES IN SCHEMA user_schema TO new_user;
    

After you have completed this task, map the MC user to new_user:

  1. Log in to Management Console as an MC Super user.

  2. Click MC Settings.

  3. Click User Management.

  4. To create a new MC user, click Add.To use an existing MC user, select the user and click Edit.

  5. Next to the DB access level window, click Add.

  6. In the Add Permissions window, do the following:

    1. From the Choose a database drop-down list, select the database for which you want the user to be able to create a design.

    2. In the Database username field, enter the user name you created on the Vertica server, new_user in this example.

    3. In the Database password field, enter the database password.

    4. In the Restrict access drop-down list, select the level of MC user you want for this user.

  7. Click OK to save your changes.

  8. Log out of the MC Super user account.

The MC user is now mapped to the user that you created on the Vertica server. Log in as the MC user and use Database Designer to create an optimized design for your database.

For more information about MC users, see Users in Management Console.

4.3.1.2 - Allowing the DBDUSER to run Database Designer programmatically

To allow a user with the DBDUSER role to run Database Designer programmatically, take these steps:.

To allow a user with the DBDUSER role to run Database Designer programmatically, take these steps:

  1. The DBADMIN user must grant the DBDUSER role:

    => GRANT DBDUSER TO <username>;
    

    This role persists until the DBADMIN user revokes it.

  2. For a non-DBADMIN user to run the Database Designer programmatically or using Management Console, one of the following two steps must happen first:

    • If the user's default role is already DBDUSER, skip this step. Otherwise, The user must enable the DBDUSER role:

      => SET ROLE DBDUSER;
      
    • The DBADMIN must add DBDUSER as the default role for that user:

      => ALTER USER  DEFAULT ROLE DBDUSER;
      

4.3.2 - DBDUSER capabilities and limitations

The DBDUSER role has the following capabilities and limitations:.

The DBDUSER role has the following capabilities and limitations:

  • A DBDUSER cannot create a design with a K-safety less than the system K-safety. If the designs violate the current K-safety by not having enough buddy projections for the tables, the design does not complete.

  • A DBDUSER cannot explicitly change the ancient history mark (AHM), even during deployment of their design.

When you create a design, you automatically have privileges to manipulate that design. Other tasks may require that the DBDUSER have additional privileges:

To... DBDUSER must have...
Submit design tables
  • USAGE privilege on the design table schema

  • OWNER privilege on the design table

Submit a single design query
  • Privilege to execute the design query
Submit a file of design queries
  • Read privilege on the storage location that contains the query file

  • Privilege to execute all the queries in the file

Submit design queries from the result of a user query
  • Privilege to execute the user query

  • Privilege to execute each design query retrieved from the results of the user query

Create the design and deployment scripts
  • WRITE privilege on the storage location of the design script

  • WRITE privilege on the storage location of the deployment script

4.4 - Workflow for running Database Designer

Vertica provides three ways to run Database Designer:.

Vertica provides three ways to run Database Designer:

The following workflow is common to all these ways to run Database Designer:

4.5 - Logging projection data for Database Designer

When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify.

When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify. When you deploy the design, Database Designer creates the design based on these projections. However, space or budget constraints may prevent Database Designer from creating all the proposed projections. In addition, Database Designer may not be able to implement the projections using ideal criteria.

To get information about the projections, first enable the Database Designer logging capability. When enabled, Database Designer stores information about the proposed projections in two Data Collector tables. After Database Designer deploys the design, these logs contain information about which proposed projections were actually created. After deployment, the logs contain information about:

  • Projections that the Optimizer proposed

  • Projections that Database Designer actually created when the design was deployed

  • Projections that Database Designer created, but not with the ideal criteria that the Optimizer identified.

  • The DDL used to create all the projections

  • Column optimizations

If you do not deploy the design immediately, review the log to determine if you want to make any changes. If the design has been deployed, you can still manually create some of the projections that Database Designer did not create.

To enable the Database Designer logging capability, see Enabling logging for Database Designer.

To view the logged information, see Viewing Database Designer logs.

4.5.1 - Enabling logging for Database Designer

By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.

By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.

To enable Database Designer logging, enter the following command:

=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 1;

To disable Database Designer logging, enter the following command:

=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 0;

See also

4.5.2 - Viewing Database Designer logs

You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:.

You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:

  • DC_DESIGN_PROJECTION_CANDIDATES

  • DC_DESIGN_QUERY_PROJECTION_CANDIDATES

DC_DESIGN_PROJECTION_CANDIDATES

The DC_DESIGN_PROJECTION_CANDIDATES table contains information about all the projections that the Optimizer proposed. This table also includes the DDL that creates them. The is_a_winner field indicates if that projection was part of the actual deployed design. To view the DC_DESIGN_PROJECTION_CANDIDATES table, enter:

=> SELECT *  FROM DC_DESIGN_PROJECTION_CANDIDATES;

DC_DESIGN_QUERY_PROJECTION_CANDIDATES

The DC_DESIGN_QUERY_PROJECTION_CANDIDATES table lists plan features for all design queries.

Possible features are:

  • FULLY DISTRIBUTED JOIN

  • MERGE JOIN

  • GROUPBY PIPE

  • FULLY DISTRIBUTED GROUPBY

  • RLE PREDICATE

  • VALUE INDEX PREDICATE

  • LATE MATERIALIZATION

For all design queries, the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table includes the following plan feature information:

  • Optimizer path cost.

  • Database Designer benefits.

  • Ideal plan feature and its description, which identifies how the referenced projection should be optimized.

  • If the design was deployed, the actual plan feature and its description is included in the table. This information identifies how the referenced projection was actually optimized.

Because most projections have multiple optimizations, each projection usually has multiple rows.To view the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table, enter:

=> SELECT *  FROM DC_DESIGN_QUERY_PROJECTION_CANDIDATES;

To see example data from these tables, see Database Designer logs: example data.

4.5.3 - Database Designer logs: example data

In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database.

In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database. The output shows two records from the DC_DESIGN_PROJECTION_CANDIDATES table.

The first record contains information about the customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$ projection. The record includes the CREATE PROJECTION statement that Database Designer used to create the projection. The is_a_winner column is t, indicating that Database Designer created this projection when it deployed the design.

The second record contains information about the product_dimension_dbd_2_sort_$product_version$__$product_key$ projection. For this projection, the is_a_winner column is f. The Optimizer recommended that Database Designer create this projection as part of the design. However, Database Designer did not create the projection when it deployed the design. The log includes the DDL for the CREATE PROJECTION statement. If you want to add the projection manually, you can use that DDL. For more information, see Creating a design manually.

=> SELECT * FROM dc_design_projection_candidates;
-[ RECORD 1 ]--------+---------------------------------------------------------------
time                 | 2014-04-11 06:30:17.918764-07
node_name            | v_vmart_node0001
session_id           | localhost.localdoma-931:0x1b7
user_id              | 45035996273704962
user_name            | dbadmin
design_id            | 45035996273705182
design_table_id      | 45035996273720620
projection_id        | 45035996273726626
iteration_number     | 1
projection_name      | customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."customer_dimension_dbd_1_
            sort_$customer_gender$__$annual_income$"
(
customer_key ENCODING AUTO,
customer_type ENCODING AUTO,
customer_name ENCODING AUTO,
customer_gender ENCODING RLE,
title ENCODING AUTO,
household_id ENCODING AUTO,
customer_address ENCODING AUTO,
customer_city ENCODING AUTO,
customer_state ENCODING AUTO,
customer_region ENCODING AUTO,
marital_status ENCODING AUTO,
customer_age ENCODING AUTO,
number_of_children ENCODING AUTO,
annual_income ENCODING AUTO,
occupation ENCODING AUTO,
largest_bill_amount ENCODING AUTO,
store_membership_card ENCODING AUTO,
customer_since ENCODING AUTO,
deal_stage ENCODING AUTO,
deal_size ENCODING AUTO,
last_deal_update ENCODING AUTO
)
AS
SELECT customer_key,
customer_type,
customer_name,
customer_gender,
title,
household_id,
customer_address,
customer_city,
customer_state,
customer_region,
marital_status,
customer_age,
number_of_children,
annual_income,
occupation,
largest_bill_amount,
store_membership_card,
customer_since,
deal_stage,
deal_size,
last_deal_update
FROM public.customer_dimension
ORDER BY customer_gender,
annual_income
UNSEGMENTED ALL NODES;
is_a_winner          | t
-[ RECORD 2 ]--------+-------------------------------------------------------------
time                 | 2014-04-11 06:30:17.961324-07
node_name            | v_vmart_node0001
session_id           | localhost.localdoma-931:0x1b7
user_id              | 45035996273704962
user_name            | dbadmin
design_id            | 45035996273705182
design_table_id      | 45035996273720624
projection_id        | 45035996273726714
iteration_number     | 1
projection_name      | product_dimension_dbd_2_sort_$product_version$__$product_key$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."product_dimension_dbd_2_
        sort_$product_version$__$product_key$"
(
product_key ENCODING AUTO,
product_version ENCODING RLE,
product_description ENCODING AUTO,
sku_number ENCODING AUTO,
category_description ENCODING AUTO,
department_description ENCODING AUTO,
package_type_description ENCODING AUTO,
package_size ENCODING AUTO,
fat_content ENCODING AUTO,
diet_type ENCODING AUTO,
weight ENCODING AUTO,
weight_units_of_measure ENCODING AUTO,
shelf_width ENCODING AUTO,
shelf_height ENCODING AUTO,
shelf_depth ENCODING AUTO,
product_price ENCODING AUTO,
product_cost ENCODING AUTO,
lowest_competitor_price ENCODING AUTO,
highest_competitor_price ENCODING AUTO,
average_competitor_price ENCODING AUTO,
discontinued_flag ENCODING AUTO
)
AS
SELECT product_key,
product_version,
product_description,
sku_number,
category_description,
department_description,
package_type_description,
package_size,
fat_content,
diet_type,
weight,
weight_units_of_measure,
shelf_width,
shelf_height,
shelf_depth,
product_price,
product_cost,
lowest_competitor_price,
highest_competitor_price,
average_competitor_price,
discontinued_flag
FROM public.product_dimension
ORDER BY product_version,
product_key
UNSEGMENTED ALL NODES;
is_a_winner          | f
.
.
.

The next example shows the contents of two records in the DC_DESIGN_QUERY_PROJECTION_CANDIDATES. Both of these rows apply to projection id 45035996273726626.

In the first record, the Optimizer recommends that Database Designer optimize the customer_gender column for the GROUPBY PIPE algorithm.

In the second record, the Optimizer recommends that Database Designer optimize the public.customer_dimension table for late materialization. Late materialization can improve the performance of joins that might spill to disk.

=> SELECT * FROM dc_design_query_projection_candidates;
-[ RECORD 1 ]-----------------+------------------------------------------------------------
time                           | 2014-04-11 06:30:17.482377-07
node_name                      | v_vmart_node0001
session_id                     | localhost.localdoma-931:0x1b7
user_id                        | 45035996273704962
user_name                      | dbadmin
design_id                      | 45035996273705182
design_query_id                | 3
iteration_number               | 1
design_table_id                | 45035996273720620
projection_id                  | 45035996273726626
ideal_plan_feature             | GROUP BY PIPE
ideal_plan_feature_description | Group-by pipelined on column(s) customer_gender
dbd_benefits                   | 5
opt_path_cost                  | 211
-[ RECORD 2 ]-----------------+------------------------------------------------------------
time                           | 2014-04-11 06:30:17.48276-07
node_name                      | v_vmart_node0001
session_id                     | localhost.localdoma-931:0x1b7
user_id                        | 45035996273704962
user_name                      | dbadmin
design_id                      | 45035996273705182
design_query_id                | 3
iteration_number               | 1
design_table_id                | 45035996273720620
projection_id                  | 45035996273726626
ideal_plan_feature             | LATE MATERIALIZATION
ideal_plan_feature_description | Late materialization on table public.customer_dimension
dbd_benefits                   | 4
opt_path_cost                  | 669
.
.
.

You can view the actual plan features that Database Designer implemented for the projections it created. To do so, query the V_INTERNAL.DC_DESIGN_QUERY_PROJECTIONS table:

=> select * from v_internal.dc_design_query_projections;
-[ RECORD 1 ]-------------------+-------------------------------------------------------------
time                            | 2014-04-11 06:31:41.19199-07
node_name                       | v_vmart_node0001
session_id                      | localhost.localdoma-931:0x1b7
user_id                         | 45035996273704962
user_name                       | dbadmin
design_id                       | 45035996273705182
design_query_id                 | 1
projection_id                   | 2
design_table_id                 | 45035996273720624
actual_plan_feature             | RLE PREDICATE
actual_plan_feature_description | RLE on predicate column(s) department_description
dbd_benefits                    | 2
opt_path_cost                   | 141
-[ RECORD 2 ]-------------------+-------------------------------------------------------------
time                            | 2014-04-11 06:31:41.192292-07
node_name                       | v_vmart_node0001
session_id                      | localhost.localdoma-931:0x1b7
user_id                         | 45035996273704962
user_name                       | dbadmin
design_id                       | 45035996273705182
design_query_id                 | 1
projection_id                   | 2
design_table_id                 | 45035996273720624
actual_plan_feature             | GROUP BY PIPE
actual_plan_feature_description | Group-by pipelined on column(s) fat_content
dbd_benefits                    | 5
opt_path_cost                   | 155

4.6 - Specifying parameters for Database Designer

Before you run Database Designer to create a design, provide information that allows Database Designer to create the optimal physical schema:.

Before you run Database Designer to create a design, provide information that allows Database Designer to create the optimal physical schema:

4.6.1 - Design name

All designs that Database Designer creates must have a name that you specify.

All designs that Database Designer creates must have a name that you specify. The design name must be alphanumeric or underscore (_) characters, and can be no more than 32 characters long. (Administrative Tools and Management Console limit the design name to 16 characters.)

The design name becomes part of the files that Database Designer generates, including the deployment script, allowing the files to be easily associated with a particular Database Designer run.

4.6.2 - Design types

The Database Designer can create two distinct design types.

The Database Designer can create two distinct design types. The design you choose depends on what you are trying to accomplish:

4.6.2.1 - Comprehensive design

A comprehensive design creates an initial or replacement design for all the tables in the specified schemas.

A comprehensive design creates an initial or replacement design for all the tables in the specified schemas. Create a comprehensive design when you are creating a new database.

To help Database Designer create an efficient design, load representative data into the tables before you begin the design process. When you load data into a table, Vertica creates an unoptimized superprojection so that Database Designer has projections to optimize. If a table has no data, Database Designer cannot optimize it.

Optionally, supply Database Designer with representative queries that you plan to use so Database Designer can optimize the design for them. If you do not supply any queries, Database Designer creates a generic optimization of the superprojections that minimizes storage, with no query-specific projections.

During a comprehensive design, Database Designer creates deployment scripts that:

  • Create new projections to optimize query performance, only when they do not already exist.

  • Create replacement buddy projections when Database Designer changes the encoding of pre-existing projections that it has decided to keep.

4.6.2.2 - Incremental design

After you create and deploy a comprehensive database design, it's likely that your database will change over time in various ways.

After you create and deploy a comprehensive database design, it's likely that your database will change over time in various ways. You should periodically consider using Database Designer to create incremental designs that address these changes. Changes that warrant an incremental design can include:

  • Significant data additions or updates

  • New or modified queries that you run regularly

  • Performance issues with one or more queries

  • Schema changes

4.6.3 - Optimization objectives

When creating a design, Database Designer can optimize the design for one of three objectives:.

When creating a design, Database Designer can optimize the design for one of three objectives:

  • Load: Database Designer creates a design that is optimized for loads, minimizing database size, potentially at the expense of query performance.

  • Performance: Database Designer creates a design that is optimized for fast query performance. Because it recommends a design for optimized query performance, this design might recommend more than the Load or Balanced objectives, potentially resulting in a larger database storage size.

  • Balanced: Database Designer creates a design whose objectives are balanced between database size and query performance.

A fully optimized query has an optimization ratio of 0.99. Optimization ratio is the ratio of a query's benefits achieved in the design produced by the Database Designer to that achieved in the ideal plan. Check the optimization ratio with the OptRatio parameter in designer.log.

4.6.4 - Design tables with sample data

You must specify one or more design tables for Database Designer to deploy a design.

You must specify one or more design tables for Database Designer to deploy a design. If your schema is empty, it does not appear as a design table option.

When you specify design tables, consider the following:

  • To create the most efficient projections for your database, load a moderate amount of representative data into tables before running Database Designer. Database Designer considers the data in this table when creating the design.

  • If your design tables have a large amount if data, the Database Designer run takes a long time; if your tables have too little data, the design is not optimized. Vertica recommends that 10 GB of sample data is sufficient for creating an optimal design.

  • If you submit a design table with no data, Database Designer ignores it.

  • If one of your design tables has been dropped, you will not be able to build or deploy your design.

4.6.5 - Design queries

If you supply representative queries that you run on your database to Database Designer, it optimizes the performance of those queries.

If you supply representative queries that you run on your database to Database Designer, it optimizes the performance of those queries.

Database Designer checks the validity of all queries when you add them to your design and again when it builds the design. If a query is invalid, Database Designer ignores it.

The query file can contain up to 100 queries. Each query can be assigned a weight that indicates its relative importance so that Database Designer can prioritize it when creating the design. Database Designer groups queries that affect the design that Database Designer creates in the same way and considers one weighted query when creating a design.

The following options apply, depending on whether you create an incremental or comprehensive design:

  • Design queries are required for incremental designs.

  • Design queries are optional for comprehensive designs. If you do not provide design queries, Database Designer recommends a generic design that does not consider specific queries.

Query repository

Using Management Console, you can submit design queries from the QUERY_REQUESTS system table. This is called the query repository.

The QUERY_REQUESTS table contains queries that users have run recently. For a comprehensive design, you can submit up to 200 queries from the QUERY_REQUESTS table to Database Designer to be considered when creating the design. For an incremental design, you can submit up to 100 queries from the QUERY_REQUESTS table.

4.6.6 - Replicated and segmented projections

When creating a comprehensive design, Database Designer creates projections based on data statistics and queries.

When creating a comprehensive design, Database Designer creates projections based on data statistics and queries. It also reviews the submitted design tables to decide whether projections should be segmented (distributed across the cluster nodes) or replicated (duplicated on all cluster nodes).

For detailed information, see the following sections:

4.6.6.1 - Replicated projections

occurs when Vertica stores identical copies of data across all the nodes in your cluster.

Replication occurs when Vertica stores identical copies of data across all the nodes in your cluster.

Assuming that largest-row-count equals the number of rows in the design table with the largest number of rows, Database Designer recommends that a projection be replicated if any of the following conditions is true:

  • largest-row-count < 1,000,000 and number of rows in the table <= 10% of largest-row-count

  • largest-row-count >= 10,000,000 and number of rows in the table <= 1% of largest-row-count

  • The number of rows in the table <= 100,000

For more information about replication, see High availability with projections.

4.6.6.2 - Segmented projections

occurs when Vertica distributes data evenly across multiple database nodes so that all nodes participate in query execution.

Segmentation occurs when Vertica distributes data evenly across multiple database nodes so that all nodes participate in query execution. Projection segmentation provides high availability and recovery, and optimizes query execution.

When running Database Designer programmatically or using Management Console, you can specify to allow Database Designer to recommend unsegmented projections in the design. If you do not specify this, Database Designer recommends only segmented projections.

Database Designer recommends segmented superprojections for large tables when deploying to multiple node clusters, and recommends replicated superprojections for smaller tables.

Database Designer does not segment projections on:

  • Single-node clusters

  • LONG VARCHAR and LONG VARBINARY columns

For more information about segmentation, see High availability with projections.

4.6.7 - Statistics analysis

By default, Database Designer analyzes statistics for the design tables when adding them to the design.

By default, Database Designer analyzes statistics for the design tables when adding them to the design. This option is optional, but Vertica recommends that you analyze statistics because accurate statistics help Database Designer optimize compression and query performance.

Analyzing statistics takes time and resources. If the current statistics for the design tables are up to date, do not bother analyzing the statistics. When in doubt, analyze the statistics to make sure they are current.

For more information, see Collecting Statistics.

4.7 - Building a design

After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.

After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.

When you build a database design, Vertica generates two scripts:

  • Deployment script: design-name_deploy.sql—Contains the SQL statements that create projections for the design you are deploying, deploy the design, and drop unused projections. When the deployment script runs, it creates the optimized design. For details about how to run this script and deploy the design, see Deploying a Design.

  • Design script: design-name_design.sql—Contains the CREATE PROJECTION statements that Database Designeruses to create the design. Review this script to make sure you are happy with the design.

    The design script is a subset of the deployment script. It serves as a backup of the DDL for the projections that the deployment script creates.

When you create a design using Management Console:

  • If you submit a large number of queries to your design and build it right immediately, a timing issue could cause the queries not to load before deployment starts. If this occurs, you might see one of the following errors:

    • No queries to optimize for

    • No tables to design projections for

    To accommodate this timing issue, you may need to reset the design, check the Queries tab to make sure the queries have been loaded, and then rebuild the design. Detailed instructions are in:

  • The scripts are deleted when deployment completes. To save a copy of the deployment script after the design is built but before the deployment completes, go to the Output window and copy and paste the SQL statements to a file.

4.8 - Resetting a design

You must reset a design when:.

You must reset a design when:

  • You build a design and the output scripts described in Building a Design are not created.

  • You build a design but Database Designer cannot complete the design because the queries it expects are not loaded.

Resetting a design discards all the run-specific information of the previous Database Designer build, but retains its configuration (design type, optimization objectives, K-safety, etc.) and tables and queries.

After you reset a design, review the design to see what changes you need to make. For example, you can fix errors, change parameters, or check for and add additional tables or queries. Then you can rebuild the design.

You can only reset a design in Management Console or by using the DESIGNER_RESET_DESIGN function.

4.9 - Deploying a design

After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.

After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.

Both the design and deployment processes run in the background. This is useful if you have a large design that you want to run overnight. Because an active SSH session is not required, the design/deploy operations continue to run uninterrupted, even if the session is terminated.

Database Designer runs as a background process. Multiple users can run Database Designer concurrently without interfering with each other or using up all the cluster resources. However, if multiple users are deploying a design on the same tables at the same time, Database Designer may not be able to complete the deployment. To avoid problems, consider the following:

  • Schedule potentially conflicting Database Designer processes to run sequentially overnight so that there are no concurrency problems.

  • Avoid scheduling Database Designer runs on the same set of tables at the same time.

There are two ways to deploy your design:

4.9.1 - Deploying designs using Database Designer

OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.

OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.

If you choose to allow Database Designer to automatically deploy your script during a comprehensive design and are running Administrative Tools, Database Designer creates a backup script of your database's current design. This script helps you re-create the design of projections that may have been dropped by the new design. The backup script is located in the output directory you specified during the design process.

If you choose not to have Database Designer automatically run the deployment script (for example, if you want to maintain projections from a pre-existing deployment), you can manually run the deployment script later. See Deploying designs manually.

To deploy a design while running Database Designer, do one of the following:

  • In Management Console, select the design and click Deploy Design.

  • In the Administration Tools, select Deploy design in the Design Options window.

If you are running Database Designer programmatically, use DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY and set the deploy parameter to 'true'.

Once you have deployed your design, query the DEPLOY_STATUS system table to see the steps that the deployment took:

vmartdb=> SELECT * FROM V_MONITOR.DEPLOY_STATUS;

4.9.2 - Deploying designs manually

If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:.

If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:

  1. Make sure that the target database contains the same tables and projections as the database where you ran Database Designer. The database should also contain sample data.

  2. To deploy the projections to a test or production environment, execute the deployment script in vsql with the meta-command \i as follows, where design-name is the name of the database design:

    => \i design-name_deploy.sql
    
  3. For a K-safe database, call Vertica meta-function GET_PROJECTIONS on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.

  4. If you create projections for tables that already contains data, call REFRESH or START_REFRESH to update new projections. Otherwise, these projections are not available for query processing.

  5. Call MAKE_AHM_NOW to set the Ancient History Mark (AHM) to the most recent epoch.

  6. Call DROP PROJECTION on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.

  7. Call ANALYZE_STATISTICS on all database projections:

    => SELECT ANALYZE_STATISTICS ('');
    

    This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.

4.10 - How to create a design

There are three ways to create a design using Database Designer:.

There are three ways to create a design using Database Designer:

The following table shows what Database Designer capabilities are available in each tool:

Database Designer Capability Management Console Running Database Designer Programmatically Administrative Tools
Create design Yes Yes Yes
Design name length (# of characters) 16 32 16
Build design (create design and deployment scripts) Yes Yes Yes
Create backup script Yes
Set design type (comprehensive or incremental) Yes Yes Yes
Set optimization objective Yes Yes Yes
Add design tables Yes Yes Yes
Add design queries file Yes Yes Yes
Add single design query Yes
Use query repository Yes Yes
Set K-safety Yes Yes Yes
Analyze statistics Yes Yes Yes
Require all unsegmented projections Yes Yes
View event history Yes Yes
Set correlation analysis mode (Default = 0) Yes

4.10.1 - Using administration tools to create a design

To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user.

To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user. Follow these steps:

  1. Log in as the dbadmin user and start Administration Tools.

  2. From the main menu, start the database for which you want to create a design. The database must be running before you can create a design for it.

  3. On the main menu, select Configuration Menu and click OK.

  4. On the Configuration Menu, select Run Database Designer and click OK.

  5. On the Select a database to design window, enter the name of the database for which you are creating a design and click OK.

  6. On the Enter the directory for Database Designer output window, enter the full path to the directory to contain the design script, deployment script, backup script, and log files, and click OK.

    For information about the scripts, see Building a design.

  7. On the Database Designer window, enter a name for the design and click OK.

    For more information about design names, see Design name.

  8. On the Design Type window, choose which type of design to create and click OK.

    For a description of the design types, see Design types

  9. The Select schema(s) to add to query search path window lists all the schemas in the database that you selected. Select the schemas that contain representative data that you want Database Designer to consider when creating the design and click OK.

    For more information about choosing schema and tables to submit to Database Designer, see Design tables with sample data.

  10. On the Optimization Objectives window, select the objective you want for the database optimization:

    For details about these objectives, see Optimization objectives.

  11. The final window summarizes the choices you have made and offers you two choices:

    • Proceed with building the design, and deploying it if you specified to deploy it immediately. If you did not specify to deploy, you can review the design and deployment scripts and deploy them manually, as described in Deploying designs manually.

    • Cancel the design and go back to change some of the parameters as needed.

  12. Creating a design can take a long time.To cancel a running design from the Administration Tools window, enter Ctrl+C.

To create a design for the VMart example database, see Using Database Designer to create a comprehensive design in Getting Started.

4.11 - Running Database Designer programmatically

Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality.

Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality. Run Database Designer programmatically to perform the following tasks:

  • Optimize performance on tables that you own.

  • Create or update a design without requiring superuser or DBADMIN intervention.

  • Add individual queries and tables, or add data to your design, and then rerun Database Designer to update the design based on this new information.

  • Customize the design.

  • Use recently executed queries to set up your database to run Database Designer automatically on a regular basis.

  • Assign each design query a query weight that indicates the importance of that query in creating the design. Assign a higher weight to queries that you run frequently so that Database Designer prioritizes those queries in creating the design.

For more details about Database Designer functions, see Database Designer function categories.

4.11.1 - Database Designer function categories

Database Designer functions perform the following operations, generally performed in the following order:

  1. Create a design.

  2. Set design properties.

  3. Populate a design.

  4. Create design and deployment scripts.

  5. Get design data.

  6. Clean up.

For detailed information, see Workflow for running Database Designer programmatically. For information on required privileges, see Privileges for running Database Designer functions

Create a design

DESIGNER_CREATE_DESIGN directs Database Designer to create a design.

Set design properties

The following functions let you specify design properties:

DESIGNER_SET_DESIGN_TYPE Specifies whether the design is comprehensive or incremental.
DESIGNER_DESIGN_PROJECTION_ENCODINGS Analyzes encoding in the specified projections and creates a script that implements encoding recommendations.
DESIGNER_SET_DESIGN_KSAFETY Sets the K-safety value for a comprehensive design.
DESIGNER_SET_OPTIMIZATION_OBJECTIVE Specifies whether the design optimizes for query or load performance.
DESIGNER_SET_PROPOSE_UNSEGMENTED_PROJECTIONS Enables inclusion of unsegmented projections in the design.

Populate a design

The following functions let you add tables and queries to your Database Designer design:

DESIGNER_ADD_DESIGN_TABLES Adds the specified tables to a design.
DESIGNER_ADD_DESIGN_QUERY Adds queries to the design and weights them.
DESIGNER_ADD_DESIGN_QUERIES
DESIGNER_ADD_DESIGN_QUERIES_FROM_RESULTS

Create design and deployment scripts

The following functions populate the Database Designer workspace and create design and deployment scripts. You can also analyze statistics, deploy the design automatically, and drop the workspace after the deployment:

DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY Populates the design and creates design and deployment scripts.
DESIGNER_WAIT_FOR_DESIGN Waits for a currently running design to complete.

Reset a design

DESIGNER_RESET_DESIGN discards all the run-specific information of the previous Database Designer build or deployment of the specified design but retains its configuration.

Get design data

The following functions display information about projections and scripts that the Database Designer created:

DESIGNER_OUTPUT_ALL_DESIGN_PROJECTIONS Sends to standard output DDL statements that define design projections.
DESIGNER_OUTPUT_DEPLOYMENT_SCRIPT Sends to standard output a design's deployment script.

Clean up

The following functions cancel any running Database Designer operation or drop a Database Designer design and all its contents:

DESIGNER_CANCEL_POPULATE_DESIGN Cancels population or deployment operation for the specified design if it is currently running.
DESIGNER_DROP_DESIGN Removes the schema associated with the specified design and all its contents.
DESIGNER_DROP_ALL_DESIGNS Removes all Database Designer-related schemas associated with the current user.

4.11.2 - Workflow for running Database Designer programmatically

The following example shows the steps you take to create a design by running Database Designer programmatically.

The following example shows the steps you take to create a design by running Database Designer programmatically.

Before you run this example, you should have the DBDUSER role, and you should have enabled that role using the SET ROLE DBDUSER command:

  1. Create a table in the public schema:

    => CREATE TABLE T(
       x INT,
       y INT,
       z INT,
       u INT,
       v INT,
       w INT PRIMARY KEY
       );
    
  2. Add data to the table:

    \! perl -e 'for ($i=0; $i<100000; ++$i)   {printf("%d, %d, %d, %d, %d, %d\n", $i/10000, $i/100, $i/10, $i/2, $i, $i);}'
       | vsql -c "COPY T FROM STDIN DELIMITER ',' DIRECT;"
    
  3. Create a second table in the public schema:

    => CREATE TABLE T2(
       x INT,
       y INT,
       z INT,
       u INT,
       v INT,
       w INT PRIMARY KEY
       );
    
  4. Copy the data from table T1 to table T2 and commit the changes:

    => INSERT /*+DIRECT*/ INTO T2 SELECT * FROM T;
    => COMMIT;
    
  5. Create a new design:

    => SELECT DESIGNER_CREATE_DESIGN('my_design');
    

    This command adds information to the DESIGNS system table in the V_MONITOR schema.

  6. Add tables from the public schema to the design :

    => SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t');
    => SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t2');
    

    These commands add information to the DESIGN_TABLES system table.

  7. Create a file named queries.txt in /tmp/examples, or another directory where you have READ and WRITE privileges. Add the following two queries in that file and save it. Database Designer uses these queries to create the design:

    SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
    SELECT DISTINCT w FROM T;
    
  8. Add the queries file to the design and display the results—the numbers of accepted queries, non-design queries, and unoptimizable queries:

    => SELECT DESIGNER_ADD_DESIGN_QUERIES
         ('my_design',
         '/tmp/examples/queries.txt',
         'true'
         );
    

    The results show that both queries were accepted:

    Number of accepted queries                      =2
    Number of queries referencing non-design tables =0
    Number of unsupported queries                   =0
    Number of illegal queries                       =0
    

    The DESIGNER_ADD_DESIGN_QUERIES function populates the DESIGN_QUERIES system table.

  9. Set the design type to comprehensive. (This is the default.) A comprehensive design creates an initial or replacement design for all the design tables:

    => SELECT DESIGNER_SET_DESIGN_TYPE('my_design', 'comprehensive');
    
  10. Set the optimization objective to query. This setting creates a design that focuses on faster query performance, which might recommend additional projections. These projections could result in a larger database storage footprint:

    => SELECT DESIGNER_SET_OPTIMIZATION_OBJECTIVE('my_design', 'query');
    
  11. Create the design and save the design and deployment scripts in /tmp/examples, or another directory where you have READ and WRITE privileges. The following command:

    • Analyzes statistics

    • Doesn't deploy the design.

    • Doesn't drop the design after deployment.

    • Stops if it encounters an error.

    => SELECT DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
       ('my_design',
        '/tmp/examples/my_design_projections.sql',
        '/tmp/examples/my_design_deploy.sql',
        'True',
        'False',
        'False',
        'False'
        );
    

    This command adds information to the following system tables:

  12. Examine the status of the Database Designer run to see what projections Database Designer recommends. In the deployment_projection_name column:

    • rep indicates a replicated projection

    • super indicates a superprojection

      The deployment_status column is pending because the design has not yet been deployed.

      For this example, Database Designer recommends four projections:

      => \x
      Expanded display is on.
      => SELECT * FROM OUTPUT_DEPLOYMENT_STATUS;
      -[ RECORD 1 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 1
      deployment_projection_name | T_DBD_1_rep_my_design
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 2 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 2
      deployment_projection_name | T2_DBD_2_rep_my_design
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 3 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 3
      deployment_projection_name | T_super
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 4 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 4
      deployment_projection_name | T2_super
      deployment_status          | pending
      error_message              | N/A
      
  13. View the script /tmp/examples/my_design_deploy.sql to see how these projections are created when you run the deployment script. In this example, the script also assigns the encoding schemes RLE and COMMONDELTA_COMP to columns where appropriate.

  14. Deploy the design from the directory where you saved it:

    => \i /tmp/examples/my_design_deploy.sql
    
  15. Now that the design is deployed, delete the design:

    => SELECT DESIGNER_DROP_DESIGN('my_design');
    

4.11.3 - Privileges for running Database Designer functions

Non-DBADMIN users with the DBDUSER role can run Database Designer functions.

Non-DBADMIN users with the DBDUSER role can run Database Designer functions. Two steps are required to enable users to run these functions:

  1. A DBADMIN or superuser grants the user the DBDUSER role:

    => GRANT DBDUSER TO username;
    

    This role persists until the DBADMIN revokes it.

  2. Before the DBDUSER can run Database Designer functions, one of the following must occur:

    • The user enables the DBDUSER role:

      => SET ROLE DBDUSER;
      
    • The superuser sets the user's default role to DBDUSER:

      => ALTER USER username DEFAULT ROLE DBDUSER;
      

General DBDUSER limitations

As a DBDUSER, the following restrictions apply:

  • You can set a design's K-safety to a value less than or equal to system K-safety. You cannot change system K-safety.

  • You cannot explicitly change the ancient history mark (AHM), even during design deployment.

Design dependencies and privileges

Individual design tasks are likely to have dependencies that require specific privileges:

Task Required privileges
Add tables to a design
  • USAGE privilege on the design table schema

  • OWNER privilege on the design table

Add a single design query to the design
  • Privilege to execute the design query
Add a query file to the design
  • Read privilege on the storage location that contains the query file

  • Privilege to execute all the queries in the file

Add queries from the result of a user query to the design
  • Privilege to execute the user query

  • Privilege to execute each design query retrieved from the results of the user query

Create design and deployment scripts
  • WRITE privilege on the storage location of the design script

  • WRITE privilege on the storage location of the deployment script

4.11.4 - Resource pool for Database Designer users

When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs.

When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs. This allows multiple users to run Database Designer concurrently without interfering with each other or using up all cluster resources.

4.12 - Creating custom designs

Vertica strongly recommends that you use the physical schema design produced by , which provides , excellent query performance, and efficient use of storage space.

Vertica strongly recommends that you use the physical schema design produced by Database Designer, which provides K-safety, excellent query performance, and efficient use of storage space. If any queries run less as efficiently than you expect, consider using the Database Designer incremental design process to optimize the database design for the query.

If the projections created by Database Designer still do not meet your needs, you can write custom projections, from scratch or based on projection designs created by Database Designer.

If you are unfamiliar with writing custom projections, start by modifying an existing design generated by Database Designer.

4.12.1 - Custom design process

To create a custom design or customize an existing one:.

To create a custom design or customize an existing one:

  1. Plan the new design or modifications to an existing one. See Planning your design.

  2. Create or modify projections. See Design fundamentals and CREATE PROJECTION for more detail.

  3. Deploy projections to a test environment. See Writing and deploying custom projections.

  4. Test and modify projections as needed.

  5. After you finalize the design, deploy projections to the production environment.

4.12.2 - Planning your design

The syntax for creating a design is easy for anyone who is familiar with SQL.

The syntax for creating a design is easy for anyone who is familiar with SQL. As with any successful project, however, a successful design requires some initial planning. Before you create your first design:

  • Become familiar with standard design requirements and plan your design to include them. See Design requirements.

  • Determine how many projections you need to include in the design. See Determining the number of projections to use.

  • Determine the type of compression and encoding to use for columns. See Architecture.

  • Determine whether or not you want the database to be K-safe. Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values are 0, 1, and 2. See Designing for K-safety.

4.12.2.1 - Design requirements

A physical schema design is a script that contains CREATE PROJECTION statements.

A physical schema design is a script that contains CREATE PROJECTION statements. These statements determine which columns are included in projections and how they are optimized.

If you use Database Designer as a starting point, it automatically creates designs that meet all fundamental design requirements. If you intend to create or modify designs manually, be aware that all designs must meet the following requirements:

  • Every design must create at least one superprojection for every table in the database that is used by the client application. These projections provide complete coverage that enables users to perform ad-hoc queries as needed. They can contain joins and they are usually configured to maximize performance through sort order, compression, and encoding.

  • Query-specific projections are optional. If you are satisfied with the performance provided through superprojections, you do not need to create additional projections. However, you can maximize performance by tuning for specific query work loads.

  • Vertica recommends that all production databases have a minimum K-safety of one (K=1) to support high availability and recovery. (K-safety can be set to 0, 1, or 2.) See High availability with projections and Designing for K-safety.

  • Vertica recommends that if you have more than 20 nodes, but small tables, do not create replicated projections. If you create replicated projections, the catalog becomes very large and performance may degrade. Instead, consider segmenting those projections.

4.12.2.2 - Determining the number of projections to use

In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding.

In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding. This is especially true if the sort orders for the projections have been used to maximize performance for one or more query predicates (WHERE clauses).

However, you might want to add additional query-specific projections to increase the performance of queries that run slowly, are used frequently, or are run as part of business-critical reporting. The number of additional projections (and their buddies) that you create should be determined by:

  • Your organization's needs

  • The amount of disk space you have available on each node in the cluster

  • The amount of time available for loading data into the database

As the number of projections that are tuned for specific queries increases, the performance of these queries improves. However, the amount of disk space used and the amount of time required to load data increases as well. Therefore, you should create and test designs to determine the optimum number of projections for your database configuration. On average, organizations that choose to implement query-specific projections achieve optimal performance through the addition of a few query-specific projections.

4.12.2.3 - Designing for K-safety

Vertica recommends that all production databases have a minimum K-safety of one (K=1).

Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values for production databases are 1 and 2. Non-production databases do not have to be K-safe and can be set to 0.

A K-safe database must have at least three nodes, as shown in the following table:

K-safety level Number of required nodes
1 3+
2 5+

You can set K-safety to 1 or 2 only when the physical schema design meets certain redundancy requirements. See Requirements for a K-safe physical schema design.

Using Database Designer

To create designs that are K-safe, Vertica recommends that you use the Database Designer. When creating projections with Database Designer, projection definitions that meet K-safe design requirements are recommended and marked with a K-safety level. Database Designer creates a script that uses the MARK_DESIGN_KSAFE function to set the K-safety of the physical schema to 1. For example:

=> \i VMart_Schema_design_opt_1.sql
CREATE PROJECTION
CREATE PROJECTION
mark_design_ksafe
----------------------
Marked design 1-safe
(1 row)

By default, Vertica creates K-safe superprojections when database K-safety is greater than 0.

Monitoring K-safety

Monitoring tables can be accessed programmatically to enable external actions, such as alerts. You monitor the K-safety level by querying the SYSTEM table for settings in columns DESIGNED_FAULT_TOLERANCE and CURRENT_FAULT_TOLERANCE.

Loss of K-safety

When K nodes in your cluster fail, your database continues to run, although performance is affected. Further node failures could potentially cause the database to shut down if the failed node's data is not available from another functioning node in the cluster.

See also

K-safety in an Enterprise Mode database

4.12.2.3.1 - Requirements for a K-safe physical schema design

Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes.

Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes. (If your cluster has one or two nodes, it generates designs with a K-safety of 0. You can modify a design created for a three-node (or greater) cluster, and the K-safe requirements are already set.

If you create custom projections, your physical schema design must meet the following requirements to be able to successfully recover the database in the event of a failure:

You can use the MARK_DESIGN_KSAFE function to find out whether your schema design meets requirements for K-safety.

4.12.2.3.2 - Requirements for a physical schema design with no K-safety

If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).

If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).

If you want to start from scratch, do the following to establish minimal projection requirements for a functioning database with no K-safety (K=0):

  1. Define at least one superprojection for each table in the logical schema.

  2. Replicate (define an exact copy of) each dimension table superprojection on each node.

4.12.2.3.3 - Designing segmented projections for K-safety

Projections must comply with database K-safety requirements.

Projections must comply with database K-safety requirements. In general, you must create buddy projections for each segmented projection, where the number of buddy projections is K+1. Thus, if system K-safety is set to 1, each projection segment must be duplicated by one buddy; if K-safety is set to 2, each segment must be duplicated by two buddies.

Automatic creation of buddy projections

You can use CREATE PROJECTION so it automatically creates the number of buddy projections required to satisfy K-safety, by including SEGMENTED BY ... ALL NODES. If CREATE PROJECTION specifies K-safety (KSAFE=n), Vertica uses that setting; if the statement omits KSAFE, Vertica uses system K-safety.

In the following example, CREATE PROJECTION creates segmented projection ttt_p1 for table ttt. Because system K-safety is set to 1, Vertica requires a buddy projection for each segmented projection. The CREATE PROJECTION statement omits KSAFE, so Vertica uses system K-safety and creates two buddy projections: ttt_p1_b0 and ttt_p1_b1:

=> SELECT mark_design_ksafe(1);

  mark_design_ksafe
----------------------
 Marked design 1-safe
(1 row)

=> CREATE TABLE ttt (a int, b int);
WARNING 6978:  Table "ttt" will include privileges from schema "public"
CREATE TABLE

=> CREATE PROJECTION ttt_p1 as SELECT * FROM ttt SEGMENTED BY HASH(a) ALL NODES;
CREATE PROJECTION

=> SELECT projection_name from projections WHERE anchor_table_name='ttt';
 projection_name
-----------------
 ttt_p1_b0
 ttt_p1_b1
(2 rows)

Vertica automatically names buddy projections by appending the suffix _bn to the projection base name—for example ttt_p1_b0.

Manual creation of buddy projections

If you create a projection on a single node, and system K-safety is greater than 0, you must manually create the number of buddies required for K-safety. For example, you can create projection xxx_p1 for table xxx on a single node, as follows:

=> CREATE TABLE xxx (a int, b int);
WARNING 6978:  Table "xxx" will include privileges from schema "public"
CREATE TABLE

=> CREATE PROJECTION xxx_p1 AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0001;
CREATE PROJECTION

Because K-safety is set to 1, a single instance of this projection is not K-safe. Attempts to insert data into its anchor table xxx return with an error like this:

=> INSERT INTO xxx VALUES (1, 2);
ERROR 3586:  Insufficient projections to answer query
DETAIL:  No projections that satisfy K-safety found for table xxx
HINT:  Define buddy projections for table xxx

In order to comply with K-safety, you must create a buddy projection for projection xxx_p1. For example:

=> CREATE PROJECTION xxx_p1_buddy AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0002;
CREATE PROJECTION

Table xxx now complies with K-safety and accepts DML statements such as INSERT:

VMart=> INSERT INTO xxx VALUES (1, 2);
 OUTPUT
--------
      1
(1 row)

See also

For general information about segmented projections and buddies, see Segmented projections. For information about designing for K-safety, see Designing for K-safety and Designing for segmentation.

4.12.2.3.4 - Designing unsegmented projections for K‑Safety

In many cases, dimension tables are relatively small, so you do not need to segment them.

In many cases, dimension tables are relatively small, so you do not need to segment them. Accordingly, you should design a K-safe database so projections for its dimension tables are replicated without segmentation on all cluster nodes. You create these projections with a CREATE PROJECTION statement that includes the keywords UNSEGMENTED ALL NODES. These keywords specify to create identical instances of the projection on all cluster nodes.

The following example shows how to create an unsegmented projection for the table store.store_dimension:


=> CREATE PROJECTION store.store_dimension_proj (storekey, name, city, state)
             AS SELECT store_key, store_name, store_city, store_state
             FROM store.store_dimension
             UNSEGMENTED ALL NODES;
CREATE PROJECTION

Vertica uses the same name to identify all instances of the unsegmented projection—in this example, store.store_dimension_proj. The keyword ALL NODES specifies to replicate the projection on all nodes:


=> \dj store.store_dimension_proj
                         List of projections
 Schema |         Name         |  Owner  |       Node       | Comment
--------+----------------------+---------+------------------+---------
 store  | store_dimension_proj | dbadmin | v_vmart_node0001 |
 store  | store_dimension_proj | dbadmin | v_vmart_node0002 |
 store  | store_dimension_proj | dbadmin | v_vmart_node0003 |
(3 rows)

For more information about projection name conventions, see Projection naming.

4.12.2.4 - Designing for segmentation

You segment projections using hash segmentation.

You segment projections using hash segmentation. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns that meet the criteria could be an excellent choice for hash segmentation.

When segmenting projections, determine which columns to use to segment the projection. Choose one or more columns that have a large number of unique data values and acceptable skew in their data distribution. Primary key columns are an excellent choice for hash segmentation. The columns must be unique across all the tables being used in a query.

4.12.3 - Design fundamentals

Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point.

Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point. This ensures that you have projections that meet basic requirements.

4.12.3.1 - Writing and deploying custom projections

Before you write custom projections, review the topics in Planning Your Design carefully.

Before you write custom projections, review the topics in Planning your design carefully. Failure to follow these considerations can result in non-functional projections.

To manually modify or create a projection:

  1. Write a script with CREATE PROJECTION statements to create the desired projections.

  2. Run the script in vsql with the meta-command \i.

  3. For a K-safe database, call Vertica meta-function GET_PROJECTIONS on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.

  4. If you create projections for tables that already contains data, call REFRESH or START_REFRESH to update new projections. Otherwise, these projections are not available for query processing.

  5. Call MAKE_AHM_NOW to set the Ancient History Mark (AHM) to the most recent epoch.

  6. Call DROP PROJECTION on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.

  7. Call ANALYZE_STATISTICS on all database projections:

    => SELECT ANALYZE_STATISTICS ('');
    

    This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.

4.12.3.2 - Designing superprojections

Superprojections have the following requirements:.

Superprojections have the following requirements:

  • They must contain every column within the table.

  • For a K-safe design, superprojections must either be replicated on all nodes within the database cluster (for dimension tables) or paired with buddies and segmented across all nodes (for very large tables and medium large tables). See Projections and High availability with projections for an overview of projections and how they are stored. See Designing for K-safety for design specifics.

To provide maximum usability, superprojections need to minimize storage requirements while maximizing query performance. To achieve this, the sort order for columns in superprojections is based on storage requirements and commonly used queries.

4.12.3.3 - Sort order benefits

Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.

Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.

Minimize storage requirements

Minimizing storage saves on physical resources and increases performance by reducing disk I/O. You can minimize projection storage by prioritizing low-cardinality columns in its sort order. This reduces the number of rows Vertica stores and accesses to retrieve query results.

After identifying projection sort columns, analyze their data and choose the most effective encoding method. The Vertica optimizer gives preference to columns with run-length encoding (RLE), so be sure to use it whenever appropriate. Run-length encoding replaces sequences (runs) of identical values with a single pair that contains the value and number of occurrences. Therefore, it is especially appropriate to use it for low-cardinality columns whose run length is large.

Maximize query performance

You can facilitate query performance through column sort order as follows:

  • Where possible, sort order should prioritize columns with the lowest cardinality.

  • Do not sort projections on columns of type LONG VARBINARY and LONG VARCHAR.

See also

Choosing sort order: best practices

4.12.3.4 - Choosing sort order: best practices

When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.

When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.

Combine RLE and sort order

When dealing with predicates on low-cardinality columns, use a combination of RLE and sorting to minimize storage requirements and maximize query performance.

Suppose you have a students table contain the following values and encoding types:

Column # of Distinct Values Encoded With
gender 2 (M or F) RLE
pass_fail 2 (P or F) RLE
class 4 (freshman, sophomore, junior, or senior) RLE
name 10000 (too many to list) Auto

You might have queries similar to this one:

SELECT name FROM studentsWHERE gender = 'M' AND pass_fail = 'P' AND class = 'senior';

The fastest way to access the data is to work through the low-cardinality columns with the smallest number of distinct values before the high-cardinality columns. The following sort order minimizes storage and maximizes query performance for queries that have equality restrictions on gender, class, pass_fail, and name. Specify the ORDER BY clause of the projection as follows:

ORDER BY students.gender, students.pass_fail, students.class, students.name

In this example, the gender column is represented by two RLE entries, the pass_fail column is represented by four entries, and the class column is represented by 16 entries, regardless of the cardinality of the students table. Vertica efficiently finds the set of rows that satisfy all the predicates, resulting in a huge reduction of search effort for RLE encoded columns that occur early in the sort order. Consequently, if you use low-cardinality columns in local predicates, as in the previous example, put those columns early in the projection sort order, in increasing order of distinct cardinality (that is, in increasing order of the number of distinct values in each column).

If you sort this table with student.class first, you improve the performance of queries that restrict only on the student.class column, and you improve the compression of the student.class column (which contains the largest number of distinct values), but the other columns do not compress as well. Determining which projection is better depends on the specific queries in your workload, and their relative importance.

Storage savings with compression decrease as the cardinality of the column increases; however, storage savings with compression increase as the number of bytes required to store values in that column increases.

Maximize the advantages of RLE

To maximize the advantages of RLE encoding, use it only when the average run length of a column is greater than 10 when sorted. For example, suppose you have a table with the following columns, sorted in order of cardinality from low to high:

address.country, address.region, address.state, address.city, address.zipcode

The zipcode column might not have 10 sorted entries in a row with the same zip code, so there is probably no advantage to run-length encoding that column, and it could make compression worse. But there are likely to be more than 10 countries in a sorted run length, so applying RLE to the country column can improve performance.

Put lower cardinality column first for functional dependencies

In general, put columns that you use for local predicates (as in the previous example) earlier in the join order to make predicate evaluation more efficient. In addition, if a lower cardinality column is uniquely determined by a higher cardinality column (like city_id uniquely determining a state_id), it is always better to put the lower cardinality, functionally determined column earlier in the sort order than the higher cardinality column.

For example, in the following sort order, the Area_Code column is sorted before the Number column in the customer_info table:

ORDER BY = customer_info.Area_Code, customer_info.Number, customer_info.Address

In the query, put the Area_Code column first, so that only the values in the Number column that start with 978 are scanned.

=> SELECT AddressFROM customer_info WHERE Area_Code='978' AND Number='9780123457';

Sort for merge joins

When processing a join, the Vertica optimizer chooses from two algorithms:

  • Merge join—If both inputs are pre-sorted on the join column, the optimizer chooses a merge join, which is faster and uses less memory.

  • Hash join—Using the hash join algorithm, Vertica uses the smaller (inner) joined table to build an in-memory hash table on the join column. A hash join has no sort requirement, but it consumes more memory because Vertica builds a hash table with the values in the inner table. The optimizer chooses a hash join when projections are not sorted on the join columns.

If both inputs are pre-sorted, merge joins do not have to do any pre-processing, making the join perform faster. Vertica uses the term sort-merge join to refer to the case when at least one of the inputs must be sorted prior to the merge join. Vertica sorts the inner input side but only if the outer input side is already sorted on the join columns.

To give the Vertica query optimizer the option to use an efficient merge join for a particular join, create projections on both sides of the join that put the join column first in their respective projections. This is primarily important to do if both tables are so large that neither table fits into memory. If all tables that a table will be joined to can be expected to fit into memory simultaneously, the benefits of merge join over hash join are sufficiently small that it probably isn't worth creating a projection for any one join column.

Sort on columns in important queries

If you have an important query, one that you run on a regular basis, you can save time by putting the columns specified in the WHERE clause or the GROUP BY clause of that query early in the sort order.

If that query uses a high-cardinality column such as Social Security number, you may sacrifice storage by placing this column early in the sort order of a projection, but your most important query will be optimized.

Sort columns of equal cardinality by size

If you have two columns of equal cardinality, put the column that is larger first in the sort order. For example, a CHAR(20) column takes up 20 bytes, but an INTEGER column takes up 8 bytes. By putting the CHAR(20) column ahead of the INTEGER column, your projection compresses better.

Sort foreign key columns first, from low to high distinct cardinality

Suppose you have a fact table where the first four columns in the sort order make up a foreign key to another table. For best compression, choose a sort order for the fact table such that the foreign keys appear first, and in increasing order of distinct cardinality. Other factors also apply to the design of projections for fact tables, such as partitioning by a time dimension, if any.

In the following example, the table inventory stores inventory data, and product_key and warehouse_key are foreign keys to the product_dimension and warehouse_dimension tables:

=> CREATE TABLE inventory (
 date_key INTEGER NOT NULL,
 product_key INTEGER NOT NULL,
 warehouse_key INTEGER NOT NULL,
 ...
);
=> ALTER TABLE inventory
   ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
   REFERENCES warehouse_dimension(warehouse_key);
ALTER TABLE inventory
   ADD CONSTRAINT fk_inventory_product FOREIGN KEY(product_key)
   REFERENCES product_dimension(product_key);

The inventory table should be sorted by warehouse_key and then product, since the cardinality of the warehouse_key column is probably lower that the cardinality of the product_key.

4.12.3.5 - Prioritizing column access speed

If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank.

If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank. For more information, see Setting storage performance.

How columns are ranked

Vertica stores columns included in the projection sort order on the fastest available storage locations. Columns not included in the projection sort order are stored on slower disks. Columns for each projection are ranked as follows:

  • Columns in the sort order are given the highest priority (numbers > 1000).

  • The last column in the sort order is given the rank number 1001.

  • The next-to-last column in the sort order is given the rank number 1002, and so on until the first column in the sort order is given 1000 + # of sort columns.

  • The remaining columns are given numbers from 1000–1, starting with 1000 and decrementing by one per column.

Vertica then stores columns on disk from the highest ranking to the lowest ranking. It places highest-ranking columns on the fastest disks and the lowest-ranking columns on the slowest disks.

Overriding default column ranking

You can modify which columns are stored on fast disks by manually overriding the default ranks for these columns. To accomplish this, set the ACCESSRANK keyword in the column list. Make sure to use an integer that is not already being used for another column. For example, if you want to give a column the fastest access rank, use a number that is significantly higher than 1000 + the number of sort columns. This allows you to enter more columns over time without bumping into the access rank you set.

The following example sets column store_key's access rank to 1500:

CREATE PROJECTION retail_sales_fact_p (
     store_key ENCODING RLE ACCESSRANK 1500,
     pos_transaction_number ENCODING RLE,
     sales_dollar_amount,
     cost_dollar_amount )
AS SELECT
     store_key,
     pos_transaction_number,
     sales_dollar_amount,
     cost_dollar_amount
FROM store.store_sales_fact
ORDER BY store_key
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;