This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Creating a database design

A is a physical storage plan that optimizes query performance.

A design is a physical storage plan that optimizes query performance. Data in Vertica is physically stored in projections. When you initially load data into a table using INSERT, COPY (or COPY LOCAL), Vertica creates a default superprojection for the table. This superprojection ensures that all of the data is available for queries. However, these superprojections might not optimize database performance, resulting in slow query performance and low data compression.

To improve performance, create a design for your Vertica database that optimizes query performance and data compression. You can create a design in several ways:

Database Designer can help you minimize how much time you spend on manual database tuning. You can also use Database Designer to redesign the database incrementally as requirements such as workloads change over time.

Database Designer runs as a background process. This is useful if you have a large design that you want to run overnight. An active SSH session is not required, so design and deploy operations continue to run uninterrupted if the session ends.

1 - About Database Designer

Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.

Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.

During the design process, Database Designer analyzes the logical schema definition, sample data, and sample queries, and creates a physical schema (projections) in the form of a SQL script that you deploy automatically or manually. This script creates a minimal set of superprojections to ensure K-safety.

In most cases, the projections that Database Designer creates provide excellent query performance within physical constraints while using disk space efficiently.

General design options

When you run Database Designer, several general options are available:

  • Create a comprehensive or incremental design.

  • Optimize for query execution, load, or a balance of both.

  • Require K-safety.

  • Recommend unsegmented projections when feasible.

  • Analyze statistics before creating the design.

Design input

Database Designer bases its design on the following information that you provide:

  • Design queries that you typically run during normal database operations.

  • Design tables that contain sample data.

Output

Database Designer yields the following output:

  • Design script that creates the projections for the design in a way that meets the optimization objectives and distributes data uniformly across the cluster.

  • Deployment script that creates and refreshes the projections for your design. For comprehensive designs, the deployment script contains commands that remove non-optimized projections. The deployment script includes the full design script.

  • Backup script that contains SQL statements to deploy the design that existed on the system before deployment. This file is useful in case you need to revert to the pre-deployment design.

Design restrictions

Database Designer-generated designs:

  • Exclude live aggregate or Top-K projections. You must create these manually. See CREATE PROJECTION.

  • Do not sort, segment, or partition projections on LONG VARBINARY and LONG VARCHAR columns.

  • Do not support operations on complex types.

Post-design options

While running Database Designer, you can choose to deploy your design automatically after the deployment script is created, or to deploy it manually, after you have reviewed and tested the design. Vertica recommends that you test the design on a non-production server before deploying the design to your production server.

2 - How Database Designer creates a design

Database Designer-generated designs can include the following recommendations:.

Design recommendations

Database Designer-generated designs can include the following recommendations:

  • Sort buddy projections in the same order, which can significantly improve load, recovery, and site node performance. All buddy projections have the same base name so that they can be identified as a group.

  • Accepts unlimited queries for a comprehensive design.

  • Identifies similar design queries and assigns them a signature.

    For queries with the same signature, Database Designer weights the queries, depending on how many queries have that signature. It then considers the weighted query when creating a design.

  • Recommends and creates projections in a way that minimizes data skew by distributing data uniformly across the cluster.

  • Produces higher quality designs by considering UPDATE, DELETE, and SELECT statements.

3 - Database Designer access requirements

By default, only users with the DBADMIN role can run Database Designer.

By default, only users with the DBADMIN role can run Database Designer. Non-DBADMIN users can run Database Designer only if they are granted the necessary privileges and DBDUSER role, as described below. You can also enable users to run Database Designer on the Management Console (see Enabling Users to run Database Designer on Management Console).

  1. Add a temporary folder to all cluster nodes with CREATE LOCATION:

    => CREATE LOCATION '/tmp/dbd' ALL NODES;
    
  2. Grant the desired user CREATE privileges to create schemas on the current (DEFAULT) database, with GRANT DATABASE:

    => GRANT CREATE ON DATABASE DEFAULT TO dbd-user;
    
  3. Grant the DBDUSER role to dbd-user with GRANT ROLE:

    => GRANT DBDUSER TO dbd-user;
    
  4. On all nodes in the cluster, grant dbd-user access to the temporary folder with GRANT LOCATION:

    => GRANT ALL ON LOCATION '/tmp/dbd' TO dbd-user;
    
  5. Grant dbd-user privileges on one or more database schemas and their tables, with GRANT SCHEMA and GRANT TABLE, respectively:

    => GRANT ALL ON SCHEMA this-schema[,...] TO dbd-user;
    => GRANT ALL ON ALL TABLES IN SCHEMA this-schema[,...] TO dbd-user;
    
  6. Enable the DBDUSER role on dbd-user in one of the following ways:

    • As dbd-user, enable the DBDUSER role with SET ROLE:

      => SET ROLE DBDUSER;
      
    • As DBADMIN, automatically enable the DBDUSER role for dbd-user on each login, with ALTER USER:

      => ALTER USER dbd-user DEFAULT ROLE DBDUSER;
      

Enabling users to run Database Designer on Management Console

Users who are already granted the DBDUSER role and required privileges, as described above, can also be enabled to run Database Designer on Management Console:

  1. Log in as a superuser to Management Console.

  2. Click MC Settings.

  3. Click User Management.

  4. Specify an MC user:

    • To create an MC user, click Add.

    • To use an existing MC user, select the user and click Edit.

  5. Next to the DB access level window, click Add.

  6. In the Add Permissions window:

    1. From the Choose a database drop-down list, select the database on which to create a design.

    2. In the Database username field, enter the dbd-user user name that you created earlier.

    3. In the Database password field, enter the database password.

    4. In the Restrict access drop-down list, select the level of MC user for this user.

  7. Click OK to save your changes.

  8. Log out of the MC Super user account.

The MC user is now mapped to dbd-user. Log in as the MC user and use Database Designer to create an optimized design for your database.

DBDUSER capabilities and limitations

As a DBDUSER, the following constraints apply:

  • Designs must set K-safety to be equal to system K-safety. If a design violates K-safety by lacking enough buddy projections for tables, the design does not complete.

  • You cannot explicitly advance the ancient history mark (AHM)—for example, call MAKE_AHM_NOW—until after deploying the design.

When you create a design, you automatically have privileges to manipulate that design. Other tasks might require additional privileges:

Task Required privileges
Submit design tables
  • USAGE on the design table schema

  • OWNER on the design table

Submit a single design query
  • EXECUTE on the design query
Submit a file of design queries
  • READ privilege on the storage location that contains the query file

  • EXECUTE privilege on all queries in the file

Submit design queries from results of a user query
  • EXECUTE privilege on the user queries

  • EXECUTE privilege on each design query retrieved from the results of the user query

Create design and deployment scripts
  • WRITE privilege on the storage location of the design script

  • WRITE privilege on the storage location of the deployment script

4 - Logging projection data for Database Designer

When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify.

When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify. When you deploy the design, Database Designer creates the design based on these projections. However, space or budget constraints may prevent Database Designer from creating all the proposed projections. In addition, Database Designer may not be able to implement the projections using ideal criteria.

To get information about the projections, first enable the Database Designer logging capability. When enabled, Database Designer stores information about the proposed projections in two Data Collector tables. After Database Designer deploys the design, these logs contain information about which proposed projections were actually created. After deployment, the logs contain information about:

  • Projections that the Optimizer proposed

  • Projections that Database Designer actually created when the design was deployed

  • Projections that Database Designer created, but not with the ideal criteria that the Optimizer identified.

  • The DDL used to create all the projections

  • Column optimizations

If you do not deploy the design immediately, review the log to determine if you want to make any changes. If the design has been deployed, you can still manually create some of the projections that Database Designer did not create.

To enable the Database Designer logging capability, see Enabling logging for Database Designer.

To view the logged information, see Viewing Database Designer logs.

4.1 - Enabling logging for Database Designer

By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.

By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.

To enable Database Designer logging, enter the following command:

=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 1;

To disable Database Designer logging, enter the following command:

=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 0;

See also

4.2 - Viewing Database Designer logs

You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:.

You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:

  • DC_DESIGN_PROJECTION_CANDIDATES

  • DC_DESIGN_QUERY_PROJECTION_CANDIDATES

DC_DESIGN_PROJECTION_CANDIDATES

The DC_DESIGN_PROJECTION_CANDIDATES table contains information about all the projections that the Optimizer proposed. This table also includes the DDL that creates them. The is_a_winner field indicates if that projection was part of the actual deployed design. To view the DC_DESIGN_PROJECTION_CANDIDATES table, enter:

=> SELECT *  FROM DC_DESIGN_PROJECTION_CANDIDATES;

DC_DESIGN_QUERY_PROJECTION_CANDIDATES

The DC_DESIGN_QUERY_PROJECTION_CANDIDATES table lists plan features for all design queries.

Possible features are:

  • FULLY DISTRIBUTED JOIN

  • MERGE JOIN

  • GROUPBY PIPE

  • FULLY DISTRIBUTED GROUPBY

  • RLE PREDICATE

  • VALUE INDEX PREDICATE

  • LATE MATERIALIZATION

For all design queries, the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table includes the following plan feature information:

  • Optimizer path cost.

  • Database Designer benefits.

  • Ideal plan feature and its description, which identifies how the referenced projection should be optimized.

  • If the design was deployed, the actual plan feature and its description is included in the table. This information identifies how the referenced projection was actually optimized.

Because most projections have multiple optimizations, each projection usually has multiple rows.To view the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table, enter:

=> SELECT *  FROM DC_DESIGN_QUERY_PROJECTION_CANDIDATES;

To see example data from these tables, see Database Designer logs: example data.

4.3 - Database Designer logs: example data

In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database.

In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database. The output shows two records from the DC_DESIGN_PROJECTION_CANDIDATES table.

The first record contains information about the customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$ projection. The record includes the CREATE PROJECTION statement that Database Designer used to create the projection. The is_a_winner column is t, indicating that Database Designer created this projection when it deployed the design.

The second record contains information about the product_dimension_dbd_2_sort_$product_version$__$product_key$ projection. For this projection, the is_a_winner column is f. The Optimizer recommended that Database Designer create this projection as part of the design. However, Database Designer did not create the projection when it deployed the design. The log includes the DDL for the CREATE PROJECTION statement. If you want to add the projection manually, you can use that DDL. For more information, see Creating a design manually.

=> SELECT * FROM dc_design_projection_candidates;
-[ RECORD 1 ]--------+---------------------------------------------------------------
time                 | 2014-04-11 06:30:17.918764-07
node_name            | v_vmart_node0001
session_id           | localhost.localdoma-931:0x1b7
user_id              | 45035996273704962
user_name            | dbadmin
design_id            | 45035996273705182
design_table_id      | 45035996273720620
projection_id        | 45035996273726626
iteration_number     | 1
projection_name      | customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."customer_dimension_dbd_1_
            sort_$customer_gender$__$annual_income$"
(
customer_key ENCODING AUTO,
customer_type ENCODING AUTO,
customer_name ENCODING AUTO,
customer_gender ENCODING RLE,
title ENCODING AUTO,
household_id ENCODING AUTO,
customer_address ENCODING AUTO,
customer_city ENCODING AUTO,
customer_state ENCODING AUTO,
customer_region ENCODING AUTO,
marital_status ENCODING AUTO,
customer_age ENCODING AUTO,
number_of_children ENCODING AUTO,
annual_income ENCODING AUTO,
occupation ENCODING AUTO,
largest_bill_amount ENCODING AUTO,
store_membership_card ENCODING AUTO,
customer_since ENCODING AUTO,
deal_stage ENCODING AUTO,
deal_size ENCODING AUTO,
last_deal_update ENCODING AUTO
)
AS
SELECT customer_key,
customer_type,
customer_name,
customer_gender,
title,
household_id,
customer_address,
customer_city,
customer_state,
customer_region,
marital_status,
customer_age,
number_of_children,
annual_income,
occupation,
largest_bill_amount,
store_membership_card,
customer_since,
deal_stage,
deal_size,
last_deal_update
FROM public.customer_dimension
ORDER BY customer_gender,
annual_income
UNSEGMENTED ALL NODES;
is_a_winner          | t
-[ RECORD 2 ]--------+-------------------------------------------------------------
time                 | 2014-04-11 06:30:17.961324-07
node_name            | v_vmart_node0001
session_id           | localhost.localdoma-931:0x1b7
user_id              | 45035996273704962
user_name            | dbadmin
design_id            | 45035996273705182
design_table_id      | 45035996273720624
projection_id        | 45035996273726714
iteration_number     | 1
projection_name      | product_dimension_dbd_2_sort_$product_version$__$product_key$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."product_dimension_dbd_2_
        sort_$product_version$__$product_key$"
(
product_key ENCODING AUTO,
product_version ENCODING RLE,
product_description ENCODING AUTO,
sku_number ENCODING AUTO,
category_description ENCODING AUTO,
department_description ENCODING AUTO,
package_type_description ENCODING AUTO,
package_size ENCODING AUTO,
fat_content ENCODING AUTO,
diet_type ENCODING AUTO,
weight ENCODING AUTO,
weight_units_of_measure ENCODING AUTO,
shelf_width ENCODING AUTO,
shelf_height ENCODING AUTO,
shelf_depth ENCODING AUTO,
product_price ENCODING AUTO,
product_cost ENCODING AUTO,
lowest_competitor_price ENCODING AUTO,
highest_competitor_price ENCODING AUTO,
average_competitor_price ENCODING AUTO,
discontinued_flag ENCODING AUTO
)
AS
SELECT product_key,
product_version,
product_description,
sku_number,
category_description,
department_description,
package_type_description,
package_size,
fat_content,
diet_type,
weight,
weight_units_of_measure,
shelf_width,
shelf_height,
shelf_depth,
product_price,
product_cost,
lowest_competitor_price,
highest_competitor_price,
average_competitor_price,
discontinued_flag
FROM public.product_dimension
ORDER BY product_version,
product_key
UNSEGMENTED ALL NODES;
is_a_winner          | f
.
.
.

The next example shows the contents of two records in the DC_DESIGN_QUERY_PROJECTION_CANDIDATES. Both of these rows apply to projection id 45035996273726626.

In the first record, the Optimizer recommends that Database Designer optimize the customer_gender column for the GROUPBY PIPE algorithm.

In the second record, the Optimizer recommends that Database Designer optimize the public.customer_dimension table for late materialization. Late materialization can improve the performance of joins that might spill to disk.

=> SELECT * FROM dc_design_query_projection_candidates;
-[ RECORD 1 ]-----------------+------------------------------------------------------------
time                           | 2014-04-11 06:30:17.482377-07
node_name                      | v_vmart_node0001
session_id                     | localhost.localdoma-931:0x1b7
user_id                        | 45035996273704962
user_name                      | dbadmin
design_id                      | 45035996273705182
design_query_id                | 3
iteration_number               | 1
design_table_id                | 45035996273720620
projection_id                  | 45035996273726626
ideal_plan_feature             | GROUP BY PIPE
ideal_plan_feature_description | Group-by pipelined on column(s) customer_gender
dbd_benefits                   | 5
opt_path_cost                  | 211
-[ RECORD 2 ]-----------------+------------------------------------------------------------
time                           | 2014-04-11 06:30:17.48276-07
node_name                      | v_vmart_node0001
session_id                     | localhost.localdoma-931:0x1b7
user_id                        | 45035996273704962
user_name                      | dbadmin
design_id                      | 45035996273705182
design_query_id                | 3
iteration_number               | 1
design_table_id                | 45035996273720620
projection_id                  | 45035996273726626
ideal_plan_feature             | LATE MATERIALIZATION
ideal_plan_feature_description | Late materialization on table public.customer_dimension
dbd_benefits                   | 4
opt_path_cost                  | 669
.
.
.

You can view the actual plan features that Database Designer implemented for the projections it created. To do so, query the V_INTERNAL.DC_DESIGN_QUERY_PROJECTIONS table:

=> select * from v_internal.dc_design_query_projections;
-[ RECORD 1 ]-------------------+-------------------------------------------------------------
time                            | 2014-04-11 06:31:41.19199-07
node_name                       | v_vmart_node0001
session_id                      | localhost.localdoma-931:0x1b7
user_id                         | 45035996273704962
user_name                       | dbadmin
design_id                       | 45035996273705182
design_query_id                 | 1
projection_id                   | 2
design_table_id                 | 45035996273720624
actual_plan_feature             | RLE PREDICATE
actual_plan_feature_description | RLE on predicate column(s) department_description
dbd_benefits                    | 2
opt_path_cost                   | 141
-[ RECORD 2 ]-------------------+-------------------------------------------------------------
time                            | 2014-04-11 06:31:41.192292-07
node_name                       | v_vmart_node0001
session_id                      | localhost.localdoma-931:0x1b7
user_id                         | 45035996273704962
user_name                       | dbadmin
design_id                       | 45035996273705182
design_query_id                 | 1
projection_id                   | 2
design_table_id                 | 45035996273720624
actual_plan_feature             | GROUP BY PIPE
actual_plan_feature_description | Group-by pipelined on column(s) fat_content
dbd_benefits                    | 5
opt_path_cost                   | 155

5 - General design settings

Before you run Database Designer, you must provide specific information on the design to create.

Before you run Database Designer, you must provide specific information on the design to create.

Design name

All designs that you create with Database Designer must have unique names that conform to the conventions described in Identifiers, and are no more than 32 characters long (16 characters if you use Database Designer in Administration Tools or Management Console).

The design name is incorporated into the names of files that Database Designer generates, such as its deployment script. This can help you differentiate files that are associated with different designs.

Design type

Database Designer can create two distinct design types: comprehensive or incremental.

Comprehensive design

A comprehensive design creates an initial or replacement design for all the tables in the specified schemas. Create a comprehensive design when you are creating a new database.

To help Database Designer create an efficient design, load representative data into the tables before you begin the design process. When you load data into a table, Vertica creates an unoptimized superprojection so that Database Designer has projections to optimize. If a table has no data, Database Designer cannot optimize it.

Optionally, supply Database Designer with representative queries that you plan to use so Database Designer can optimize the design for them. If you do not supply any queries, Database Designer creates a generic optimization of the superprojections that minimizes storage, with no query-specific projections.

During a comprehensive design, Database Designer creates deployment scripts that:

  • Create projections to optimize query performance.

  • Create replacement buddy projections when Database Designer changes the encoding of existing projections that it decides to keep.

Incremental design

After you create and deploy a comprehensive database design, your database is likely to change over time in various ways. Consider using Database Designer periodically to create incremental designs that address these changes. Changes that warrant an incremental design can include:

  • Significant data additions or updates

  • New or modified queries that you run regularly

  • Performance issues with one or more queries

  • Schema changes

Optimization objective

Database Designer can optimize the design for one of three objectives:

  • Load: Designs that are optimized for loads minimize database size, potentially at the expense of query performance.
  • Query: Designs that are optimized for query performance. These designs typically favor fast query execution over load optimization, and thus result in a larger storage footprint.
  • Balanced: Designs that are balanced between database size and query performance.

A fully optimized query has an optimization ratio of 0.99. Optimization ratio is the ratio of a query's benefits achieved in the design produced by the Database Designer to that achieved in the ideal plan. The optimization ratio is set in the OptRatio parameter in designer.log.

Design tables

Database Designer needs one or more tables with a moderate amount of sample data—approximately 10 GB—to create optimal designs. Design tables with large amounts of data adversely affect Database Designer performance. Design tables with too little data prevent Database Designer from creating an optimized design. If a design table has no data, Database Designer ignores it.

Design queries

A database design that is optimized for query performance requires a set of representative queries, or design queries. Design queries are required for incremental designs, and optional for comprehensive designs. You list design queries in a SQL file that you supply as input to Database Designer. Database Designer checks the validity of the queries when you add them to your design, and again when it builds the design. If a query is invalid, Database Designer ignores it.

If you use Management Console to create a database design, you can submit queries either from an input file or from the system table QUERY_REQUESTS. For details, see Creating a design manually.

The maximum number of design queries depends on the design type: ≤200 queries for a comprehensive design, ≤100 queries for an incremental design. Optionally, you can assign weights to the design queries that signify their relative importance. Database Designer uses those weights to prioritize the queries in its design.

Segmented and unsegmented projections

When creating a comprehensive design, Database Designer creates projections based on data statistics and queries. It also reviews the submitted design tables to decide whether projections should be segmented (distributed across the cluster nodes) or unsegmented (replicated on all cluster nodes).

By default, Database Designer recommends only segmented projections. You can enable Database Designer to recommend unsegmented projections . In this case, Database Designer recommends segmented superprojections for large tables when deploying to multi-node clusters, and unsegmented superprojections for smaller tables.

Database Designer uses the following algorithm to determine whether to recommend unsegmented projections. Assuming that largest-row-count equals the number of rows in the design table with the largest number of rows, Database Designer recommends unsegmented projections if any of the following conditions is true:

  • largest-row-count< 1,000,000 ANDnumber-table-rows10%-largest-row-count

  • largest-row-count≥ 10,000,000 ANDnumber-table-rows1%-largest-row-count

  • 1,000,000 ≤ largest-row-count< 10,000,000 ANDnumber-table-rows ≤ 100,000

Database Designer does not segment projections on:

  • Single-node clusters

  • LONG VARCHAR and LONG VARBINARY columns

For more information, see High availability with projections.

Statistics analysis

By default, Database Designer analyzes statistics for design tables when they are added to the design. Accurate statistics help Database Designer optimize compression and query performance.

Analyzing statistics takes time and resources. If you are certain that design table statistics are up to date, you can specify to skip this step and avoid the overhead otherwise incurred.

For more information, see Collecting Statistics.

6 - Building a design

After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.

After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.

When you build a database design, Vertica generates two scripts:

  • Deployment script: design-name_deploy.sql—Contains the SQL statements that create projections for the design you are deploying, deploy the design, and drop unused projections. When the deployment script runs, it creates the optimized design. For details about how to run this script and deploy the design, see Deploying a Design.

  • Design script: design-name_design.sql—Contains the CREATE PROJECTION statements that Database Designeruses to create the design. Review this script to make sure you are happy with the design.

    The design script is a subset of the deployment script. It serves as a backup of the DDL for the projections that the deployment script creates.

When you create a design using Management Console:

  • If you submit a large number of queries to your design and build it right immediately, a timing issue could cause the queries not to load before deployment starts. If this occurs, you might see one of the following errors:

    • No queries to optimize for

    • No tables to design projections for

    To accommodate this timing issue, you may need to reset the design, check the Queries tab to make sure the queries have been loaded, and then rebuild the design. Detailed instructions are in:

  • The scripts are deleted when deployment completes. To save a copy of the deployment script after the design is built but before the deployment completes, go to the Output window and copy and paste the SQL statements to a file.

7 - Resetting a design

You must reset a design when:.

You must reset a design when:

  • You build a design and the output scripts described in Building a Design are not created.

  • You build a design but Database Designer cannot complete the design because the queries it expects are not loaded.

Resetting a design discards all the run-specific information of the previous Database Designer build, but retains its configuration (design type, optimization objectives, K-safety, etc.) and tables and queries.

After you reset a design, review the design to see what changes you need to make. For example, you can fix errors, change parameters, or check for and add additional tables or queries. Then you can rebuild the design.

You can only reset a design in Management Console or by using the DESIGNER_RESET_DESIGN function.

8 - Deploying a design

After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.

After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.

Both the design and deployment processes run in the background. This is useful if you have a large design that you want to run overnight. Because an active SSH session is not required, the design/deploy operations continue to run uninterrupted, even if the session is terminated.

Database Designer runs as a background process. Multiple users can run Database Designer concurrently without interfering with each other or using up all the cluster resources. However, if multiple users are deploying a design on the same tables at the same time, Database Designer may not be able to complete the deployment. To avoid problems, consider the following:

  • Schedule potentially conflicting Database Designer processes to run sequentially overnight so that there are no concurrency problems.

  • Avoid scheduling Database Designer runs on the same set of tables at the same time.

There are two ways to deploy your design:

8.1 - Deploying designs using Database Designer

OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.

OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.

If you choose to allow Database Designer to automatically deploy your script during a comprehensive design and are running Administrative Tools, Database Designer creates a backup script of your database's current design. This script helps you re-create the design of projections that may have been dropped by the new design. The backup script is located in the output directory you specified during the design process.

If you choose not to have Database Designer automatically run the deployment script (for example, if you want to maintain projections from a pre-existing deployment), you can manually run the deployment script later. See Deploying designs manually.

To deploy a design while running Database Designer, do one of the following:

  • In Management Console, select the design and click Deploy Design.

  • In the Administration Tools, select Deploy design in the Design Options window.

If you are running Database Designer programmatically, use DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY and set the deploy parameter to 'true'.

Once you have deployed your design, query the DEPLOY_STATUS system table to see the steps that the deployment took:

vmartdb=> SELECT * FROM V_MONITOR.DEPLOY_STATUS;

8.2 - Deploying designs manually

If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:.

If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:

  1. Make sure that the target database contains the same tables and projections as the database where you ran Database Designer. The database should also contain sample data.

  2. To deploy the projections to a test or production environment, execute the deployment script in vsql with the meta-command \i as follows, where design-name is the name of the database design:

    => \i design-name_deploy.sql
    
  3. For a K-safe database, call Vertica meta-function GET_PROJECTIONS on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.

  4. If you create projections for tables that already contains data, call REFRESH or START_REFRESH to update new projections. Otherwise, these projections are not available for query processing.

  5. Call MAKE_AHM_NOW to set the Ancient History Mark (AHM) to the most recent epoch.

  6. Call DROP PROJECTION on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.

  7. Call ANALYZE_STATISTICS on all database projections:

    => SELECT ANALYZE_STATISTICS ('');
    

    This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.

9 - How to create a design

There are three ways to create a design using Database Designer:.

There are three ways to create a design using Database Designer:

The following table shows what Database Designer capabilities are available in each tool:

Database Designer Capability Management Console Running Database Designer Programmatically Administrative Tools
Create design Yes Yes Yes
Design name length (# of characters) 16 32 16
Build design (create design and deployment scripts) Yes Yes Yes
Create backup script Yes
Set design type (comprehensive or incremental) Yes Yes Yes
Set optimization objective Yes Yes Yes
Add design tables Yes Yes Yes
Add design queries file Yes Yes Yes
Add single design query Yes
Use query repository Yes Yes
Set K-safety Yes Yes Yes
Analyze statistics Yes Yes Yes
Require all unsegmented projections Yes Yes
View event history Yes Yes
Set correlation analysis mode (Default = 0) Yes

9.1 - Using administration tools to create a design

To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user.

To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user. Follow these steps:

  1. Log in as the dbadmin user and start Administration Tools.

  2. From the main menu, start the database for which you want to create a design. The database must be running before you can create a design for it.

  3. On the main menu, select Configuration Menu and click OK.

  4. On the Configuration Menu, select Run Database Designer and click OK.

  5. On the Select a database to design window, enter the name of the database for which you are creating a design and click OK.

  6. On the Enter the directory for Database Designer output window, enter the full path to the directory to contain the design script, deployment script, backup script, and log files, and click OK.

    For information about the scripts, see Building a design.

  7. On the Database Designer window, enter a name for the design and click OK.

  8. On the Design Type window, choose which type of design to create and click OK.

    For details, see Design Types.

  9. The Select schema(s) to add to query search path window lists all the schemas in the database that you selected. Select the schemas that contain representative data that you want Database Designer to consider when creating the design and click OK.

    For details about choosing schema and tables to submit to Database Designer, see Design Tables with Sample Data.

  10. On the Optimization Objectives window, select the objective you want for the database optimization:

  11. The final window summarizes the choices you have made and offers you two choices:

    • Proceed with building the design, and deploying it if you specified to deploy it immediately. If you did not specify to deploy, you can review the design and deployment scripts and deploy them manually, as described in Deploying designs manually.

    • Cancel the design and go back to change some of the parameters as needed.

  12. Creating a design can take a long time.To cancel a running design from the Administration Tools window, enter Ctrl+C.

To create a design for the VMart example database, see Using Database Designer to create a comprehensive design in Getting Started.

10 - Running Database Designer programmatically

Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality.

Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality. Run Database Designer programmatically to perform the following tasks:

  • Optimize performance on tables that you own.

  • Create or update a design without requiring superuser or DBADMIN intervention.

  • Add individual queries and tables, or add data to your design, and then rerun Database Designer to update the design based on this new information.

  • Customize the design.

  • Use recently executed queries to set up your database to run Database Designer automatically on a regular basis.

  • Assign each design query a query weight that indicates the importance of that query in creating the design. Assign a higher weight to queries that you run frequently so that Database Designer prioritizes those queries in creating the design.

For more details about Database Designer functions, see Database Designer function categories.

10.1 - Database Designer function categories

Database Designer functions perform the following operations, generally performed in the following order:

  1. Create a design.

  2. Set design properties.

  3. Populate a design.

  4. Create design and deployment scripts.

  5. Get design data.

  6. Clean up.

For detailed information, see Workflow for running Database Designer programmatically. For information on required privileges, see Privileges for running Database Designer functions

Create a design

DESIGNER_CREATE_DESIGN directs Database Designer to create a design.

Set design properties

The following functions let you specify design properties:

Populate a design

The following functions let you add tables and queries to your Database Designer design:

Create design and deployment scripts

The following functions populate the Database Designer workspace and create design and deployment scripts. You can also analyze statistics, deploy the design automatically, and drop the workspace after the deployment:

Reset a design

DESIGNER_RESET_DESIGN discards all the run-specific information of the previous Database Designer build or deployment of the specified design but retains its configuration.

Get design data

The following functions display information about projections and scripts that the Database Designer created:

Cleanup

The following functions cancel any running Database Designer operation or drop a Database Designer design and all its contents:

10.2 - Workflow for running Database Designer programmatically

The following example shows the steps you take to create a design by running Database Designer programmatically.

The following example shows the steps you take to create a design by running Database Designer programmatically.

Before you run this example, you should have the DBDUSER role, and you should have enabled that role using the SET ROLE DBDUSER command:

  1. Create a table in the public schema:

    => CREATE TABLE T(
       x INT,
       y INT,
       z INT,
       u INT,
       v INT,
       w INT PRIMARY KEY
       );
    
  2. Add data to the table:

    \! perl -e 'for ($i=0; $i<100000; ++$i)   {printf("%d, %d, %d, %d, %d, %d\n", $i/10000, $i/100, $i/10, $i/2, $i, $i);}'
       | vsql -c "COPY T FROM STDIN DELIMITER ',' DIRECT;"
    
  3. Create a second table in the public schema:

    => CREATE TABLE T2(
       x INT,
       y INT,
       z INT,
       u INT,
       v INT,
       w INT PRIMARY KEY
       );
    
  4. Copy the data from table T1 to table T2 and commit the changes:

    => INSERT /*+DIRECT*/ INTO T2 SELECT * FROM T;
    => COMMIT;
    
  5. Create a new design:

    => SELECT DESIGNER_CREATE_DESIGN('my_design');
    

    This command adds information to the DESIGNS system table in the V_MONITOR schema.

  6. Add tables from the public schema to the design :

    => SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t');
    => SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t2');
    

    These commands add information to the DESIGN_TABLES system table.

  7. Create a file named queries.txt in /tmp/examples, or another directory where you have READ and WRITE privileges. Add the following two queries in that file and save it. Database Designer uses these queries to create the design:

    SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
    SELECT DISTINCT w FROM T;
    
  8. Add the queries file to the design and display the results—the numbers of accepted queries, non-design queries, and unoptimizable queries:

    => SELECT DESIGNER_ADD_DESIGN_QUERIES
         ('my_design',
         '/tmp/examples/queries.txt',
         'true'
         );
    

    The results show that both queries were accepted:

    Number of accepted queries                      =2
    Number of queries referencing non-design tables =0
    Number of unsupported queries                   =0
    Number of illegal queries                       =0
    

    The DESIGNER_ADD_DESIGN_QUERIES function populates the DESIGN_QUERIES system table.

  9. Set the design type to comprehensive. (This is the default.) A comprehensive design creates an initial or replacement design for all the design tables:

    => SELECT DESIGNER_SET_DESIGN_TYPE('my_design', 'comprehensive');
    
  10. Set the optimization objective to query. This setting creates a design that focuses on faster query performance, which might recommend additional projections. These projections could result in a larger database storage footprint:

    => SELECT DESIGNER_SET_OPTIMIZATION_OBJECTIVE('my_design', 'query');
    
  11. Create the design and save the design and deployment scripts in /tmp/examples, or another directory where you have READ and WRITE privileges. The following command:

    • Analyzes statistics

    • Doesn't deploy the design.

    • Doesn't drop the design after deployment.

    • Stops if it encounters an error.

    => SELECT DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
       ('my_design',
        '/tmp/examples/my_design_projections.sql',
        '/tmp/examples/my_design_deploy.sql',
        'True',
        'False',
        'False',
        'False'
        );
    

    This command adds information to the following system tables:

  12. Examine the status of the Database Designer run to see what projections Database Designer recommends. In the deployment_projection_name column:

    • rep indicates a replicated projection

    • super indicates a superprojection

      The deployment_status column is pending because the design has not yet been deployed.

      For this example, Database Designer recommends four projections:

      => \x
      Expanded display is on.
      => SELECT * FROM OUTPUT_DEPLOYMENT_STATUS;
      -[ RECORD 1 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 1
      deployment_projection_name | T_DBD_1_rep_my_design
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 2 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 2
      deployment_projection_name | T2_DBD_2_rep_my_design
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 3 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 3
      deployment_projection_name | T_super
      deployment_status          | pending
      error_message              | N/A
      -[ RECORD 4 ]--------------+-----------------------------
      deployment_id              | 45035996273795970
      deployment_projection_id   | 4
      deployment_projection_name | T2_super
      deployment_status          | pending
      error_message              | N/A
      
  13. View the script /tmp/examples/my_design_deploy.sql to see how these projections are created when you run the deployment script. In this example, the script also assigns the encoding schemes RLE and COMMONDELTA_COMP to columns where appropriate.

  14. Deploy the design from the directory where you saved it:

    => \i /tmp/examples/my_design_deploy.sql
    
  15. Now that the design is deployed, delete the design:

    => SELECT DESIGNER_DROP_DESIGN('my_design');
    

10.3 - Privileges for running Database Designer functions

Non-DBADMIN users with the DBDUSER role can run Database Designer functions.

Non-DBADMIN users with the DBDUSER role can run Database Designer functions. Two steps are required to enable users to run these functions:

  1. A DBADMIN or superuser grants the user the DBDUSER role:

    => GRANT DBDUSER TO username;
    

    This role persists until the DBADMIN revokes it.

  2. Before the DBDUSER can run Database Designer functions, one of the following must occur:

    • The user enables the DBDUSER role:

      => SET ROLE DBDUSER;
      
    • The superuser sets the user's default role to DBDUSER:

      => ALTER USER username DEFAULT ROLE DBDUSER;
      

General DBDUSER limitations

As a DBDUSER, the following restrictions apply:

  • You can set a design's K-safety to a value less than or equal to system K-safety. You cannot change system K-safety.

  • You cannot explicitly change the ancient history mark (AHM), even during design deployment.

Design dependencies and privileges

Individual design tasks are likely to have dependencies that require specific privileges:

Task Required privileges
Add tables to a design
  • USAGE privilege on the design table schema

  • OWNER privilege on the design table

Add a single design query to the design
  • Privilege to execute the design query
Add a query file to the design
  • Read privilege on the storage location that contains the query file

  • Privilege to execute all the queries in the file

Add queries from the result of a user query to the design
  • Privilege to execute the user query

  • Privilege to execute each design query retrieved from the results of the user query

Create design and deployment scripts
  • WRITE privilege on the storage location of the design script

  • WRITE privilege on the storage location of the deployment script

10.4 - Resource pool for Database Designer users

When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs.

When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs. This allows multiple users to run Database Designer concurrently without interfering with each other or using up all cluster resources.

11 - Creating custom designs

Vertica strongly recommends that you use the physical schema design produced by , which provides , excellent query performance, and efficient use of storage space.

Vertica strongly recommends that you use the physical schema design produced by Database Designer, which provides K-safety, excellent query performance, and efficient use of storage space. If any queries run less as efficiently than you expect, consider using the Database Designer incremental design process to optimize the database design for the query.

If the projections created by Database Designer still do not meet your needs, you can write custom projections, from scratch or based on projection designs created by Database Designer.

If you are unfamiliar with writing custom projections, start by modifying an existing design generated by Database Designer.

11.1 - Custom design process

To create a custom design or customize an existing one:.

To create a custom design or customize an existing one:

  1. Plan the new design or modifications to an existing one. See Planning your design.

  2. Create or modify projections. See Design fundamentals and CREATE PROJECTION for more detail.

  3. Deploy projections to a test environment. See Writing and deploying custom projections.

  4. Test and modify projections as needed.

  5. After you finalize the design, deploy projections to the production environment.

11.2 - Planning your design

The syntax for creating a design is easy for anyone who is familiar with SQL.

The syntax for creating a design is easy for anyone who is familiar with SQL. As with any successful project, however, a successful design requires some initial planning. Before you create your first design:

  • Become familiar with standard design requirements and plan your design to include them. See Design requirements.

  • Determine how many projections you need to include in the design. See Determining the number of projections to use.

  • Determine the type of compression and encoding to use for columns. See Architecture.

  • Determine whether or not you want the database to be K-safe. Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values are 0, 1, and 2. See Designing for K-safety.

11.2.1 - Design requirements

A physical schema design is a script that contains CREATE PROJECTION statements.

A physical schema design is a script that contains CREATE PROJECTION statements. These statements determine which columns are included in projections and how they are optimized.

If you use Database Designer as a starting point, it automatically creates designs that meet all fundamental design requirements. If you intend to create or modify designs manually, be aware that all designs must meet the following requirements:

  • Every design must create at least one superprojection for every table in the database that is used by the client application. These projections provide complete coverage that enables users to perform ad-hoc queries as needed. They can contain joins and they are usually configured to maximize performance through sort order, compression, and encoding.

  • Query-specific projections are optional. If you are satisfied with the performance provided through superprojections, you do not need to create additional projections. However, you can maximize performance by tuning for specific query work loads.

  • Vertica recommends that all production databases have a minimum K-safety of one (K=1) to support high availability and recovery. (K-safety can be set to 0, 1, or 2.) See High availability with projections and Designing for K-safety.

  • Vertica recommends that if you have more than 20 nodes, but small tables, do not create replicated projections. If you create replicated projections, the catalog becomes very large and performance may degrade. Instead, consider segmenting those projections.

11.2.2 - Determining the number of projections to use

In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding.

In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding. This is especially true if the sort orders for the projections have been used to maximize performance for one or more query predicates (WHERE clauses).

However, you might want to add additional query-specific projections to increase the performance of queries that run slowly, are used frequently, or are run as part of business-critical reporting. The number of additional projections (and their buddies) that you create should be determined by:

  • Your organization's needs

  • The amount of disk space you have available on each node in the cluster

  • The amount of time available for loading data into the database

As the number of projections that are tuned for specific queries increases, the performance of these queries improves. However, the amount of disk space used and the amount of time required to load data increases as well. Therefore, you should create and test designs to determine the optimum number of projections for your database configuration. On average, organizations that choose to implement query-specific projections achieve optimal performance through the addition of a few query-specific projections.

11.2.3 - Designing for K-safety

Vertica recommends that all production databases have a minimum K-safety of one (K=1).

Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values for production databases are 1 and 2. Non-production databases do not have to be K-safe and can be set to 0.

A K-safe database must have at least three nodes, as shown in the following table:

K-safety level Number of required nodes
1 3+
2 5+

You can set K-safety to 1 or 2 only when the physical schema design meets certain redundancy requirements. See Requirements for a K-safe physical schema design.

Using Database Designer

To create designs that are K-safe, Vertica recommends that you use the Database Designer. When creating projections with Database Designer, projection definitions that meet K-safe design requirements are recommended and marked with a K-safety level. Database Designer creates a script that uses the MARK_DESIGN_KSAFE function to set the K-safety of the physical schema to 1. For example:

=> \i VMart_Schema_design_opt_1.sql
CREATE PROJECTION
CREATE PROJECTION
mark_design_ksafe
----------------------
Marked design 1-safe
(1 row)

By default, Vertica creates K-safe superprojections when database K-safety is greater than 0.

Monitoring K-safety

Monitoring tables can be accessed programmatically to enable external actions, such as alerts. You monitor the K-safety level by querying the SYSTEM table for settings in columns DESIGNED_FAULT_TOLERANCE and CURRENT_FAULT_TOLERANCE.

Loss of K-safety

When K nodes in your cluster fail, your database continues to run, although performance is affected. Further node failures could potentially cause the database to shut down if the failed node's data is not available from another functioning node in the cluster.

See also

K-safety in an Enterprise Mode database

11.2.3.1 - Requirements for a K-safe physical schema design

Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes.

Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes. (If your cluster has one or two nodes, it generates designs with a K-safety of 0. You can modify a design created for a three-node (or greater) cluster, and the K-safe requirements are already set.

If you create custom projections, your physical schema design must meet the following requirements to be able to successfully recover the database in the event of a failure:

You can use the MARK_DESIGN_KSAFE function to find out whether your schema design meets requirements for K-safety.

11.2.3.2 - Requirements for a physical schema design with no K-safety

If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).

If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).

If you want to start from scratch, do the following to establish minimal projection requirements for a functioning database with no K-safety (K=0):

  1. Define at least one superprojection for each table in the logical schema.

  2. Replicate (define an exact copy of) each dimension table superprojection on each node.

11.2.3.3 - Designing segmented projections for K-safety

Projections must comply with database K-safety requirements.

Projections must comply with database K-safety requirements. In general, you must create buddy projections for each segmented projection, where the number of buddy projections is K+1. Thus, if system K-safety is set to 1, each projection segment must be duplicated by one buddy; if K-safety is set to 2, each segment must be duplicated by two buddies.

Automatic creation of buddy projections

You can use CREATE PROJECTION so it automatically creates the number of buddy projections required to satisfy K-safety, by including SEGMENTED BY ... ALL NODES. If CREATE PROJECTION specifies K-safety (KSAFE=n), Vertica uses that setting; if the statement omits KSAFE, Vertica uses system K-safety.

In the following example, CREATE PROJECTION creates segmented projection ttt_p1 for table ttt. Because system K-safety is set to 1, Vertica requires a buddy projection for each segmented projection. The CREATE PROJECTION statement omits KSAFE, so Vertica uses system K-safety and creates two buddy projections: ttt_p1_b0 and ttt_p1_b1:

=> SELECT mark_design_ksafe(1);

  mark_design_ksafe
----------------------
 Marked design 1-safe
(1 row)

=> CREATE TABLE ttt (a int, b int);
WARNING 6978:  Table "ttt" will include privileges from schema "public"
CREATE TABLE

=> CREATE PROJECTION ttt_p1 as SELECT * FROM ttt SEGMENTED BY HASH(a) ALL NODES;
CREATE PROJECTION

=> SELECT projection_name from projections WHERE anchor_table_name='ttt';
 projection_name
-----------------
 ttt_p1_b0
 ttt_p1_b1
(2 rows)

Vertica automatically names buddy projections by appending the suffix _bn to the projection base name—for example ttt_p1_b0.

Manual creation of buddy projections

If you create a projection on a single node, and system K-safety is greater than 0, you must manually create the number of buddies required for K-safety. For example, you can create projection xxx_p1 for table xxx on a single node, as follows:

=> CREATE TABLE xxx (a int, b int);
WARNING 6978:  Table "xxx" will include privileges from schema "public"
CREATE TABLE

=> CREATE PROJECTION xxx_p1 AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0001;
CREATE PROJECTION

Because K-safety is set to 1, a single instance of this projection is not K-safe. Attempts to insert data into its anchor table xxx return with an error like this:

=> INSERT INTO xxx VALUES (1, 2);
ERROR 3586:  Insufficient projections to answer query
DETAIL:  No projections that satisfy K-safety found for table xxx
HINT:  Define buddy projections for table xxx

In order to comply with K-safety, you must create a buddy projection for projection xxx_p1. For example:

=> CREATE PROJECTION xxx_p1_buddy AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0002;
CREATE PROJECTION

Table xxx now complies with K-safety and accepts DML statements such as INSERT:

VMart=> INSERT INTO xxx VALUES (1, 2);
 OUTPUT
--------
      1
(1 row)

See also

For general information about segmented projections and buddies, see Segmented projections. For information about designing for K-safety, see Designing for K-safety and Designing for segmentation.

11.2.3.4 - Designing unsegmented projections for K-Safety

In many cases, dimension tables are relatively small, so you do not need to segment them.

In many cases, dimension tables are relatively small, so you do not need to segment them. Accordingly, you should design a K-safe database so projections for its dimension tables are replicated without segmentation on all cluster nodes. You create these projections with a CREATE PROJECTION statement that includes the keywords UNSEGMENTED ALL NODES. These keywords specify to create identical instances of the projection on all cluster nodes.

The following example shows how to create an unsegmented projection for the table store.store_dimension:


=> CREATE PROJECTION store.store_dimension_proj (storekey, name, city, state)
             AS SELECT store_key, store_name, store_city, store_state
             FROM store.store_dimension
             UNSEGMENTED ALL NODES;
CREATE PROJECTION

Vertica uses the same name to identify all instances of the unsegmented projection—in this example, store.store_dimension_proj. The keyword ALL NODES specifies to replicate the projection on all nodes:


=> \dj store.store_dimension_proj
                         List of projections
 Schema |         Name         |  Owner  |       Node       | Comment
--------+----------------------+---------+------------------+---------
 store  | store_dimension_proj | dbadmin | v_vmart_node0001 |
 store  | store_dimension_proj | dbadmin | v_vmart_node0002 |
 store  | store_dimension_proj | dbadmin | v_vmart_node0003 |
(3 rows)

For more information about projection name conventions, see Projection naming.

11.2.4 - Designing for segmentation

You segment projections using hash segmentation.

You segment projections using hash segmentation. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns that meet the criteria could be an excellent choice for hash segmentation.

When segmenting projections, determine which columns to use to segment the projection. Choose one or more columns that have a large number of unique data values and acceptable skew in their data distribution. Primary key columns are an excellent choice for hash segmentation. The columns must be unique across all the tables being used in a query.

11.3 - Design fundamentals

Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point.

Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point. This ensures that you have projections that meet basic requirements.

11.3.1 - Writing and deploying custom projections

Before you write custom projections, review the topics in Planning Your Design carefully.

Before you write custom projections, review the topics in Planning your design carefully. Failure to follow these considerations can result in non-functional projections.

To manually modify or create a projection:

  1. Write a script with CREATE PROJECTION statements to create the desired projections.

  2. Run the script in vsql with the meta-command \i.

  3. For a K-safe database, call Vertica meta-function GET_PROJECTIONS on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.

  4. If you create projections for tables that already contains data, call REFRESH or START_REFRESH to update new projections. Otherwise, these projections are not available for query processing.

  5. Call MAKE_AHM_NOW to set the Ancient History Mark (AHM) to the most recent epoch.

  6. Call DROP PROJECTION on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.

  7. Call ANALYZE_STATISTICS on all database projections:

    => SELECT ANALYZE_STATISTICS ('');
    

    This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.

11.3.2 - Designing superprojections

Superprojections have the following requirements:.

Superprojections have the following requirements:

  • They must contain every column within the table.

  • For a K-safe design, superprojections must either be replicated on all nodes within the database cluster (for dimension tables) or paired with buddies and segmented across all nodes (for very large tables and medium large tables). See Projections and High availability with projections for an overview of projections and how they are stored. See Designing for K-safety for design specifics.

To provide maximum usability, superprojections need to minimize storage requirements while maximizing query performance. To achieve this, the sort order for columns in superprojections is based on storage requirements and commonly used queries.

11.3.3 - Sort order benefits

Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.

Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.

Minimize storage requirements

Minimizing storage saves on physical resources and increases performance by reducing disk I/O. You can minimize projection storage by prioritizing low-cardinality columns in its sort order. This reduces the number of rows Vertica stores and accesses to retrieve query results.

After identifying projection sort columns, analyze their data and choose the most effective encoding method. The Vertica optimizer gives preference to columns with run-length encoding (RLE), so be sure to use it whenever appropriate. Run-length encoding replaces sequences (runs) of identical values with a single pair that contains the value and number of occurrences. Therefore, it is especially appropriate to use it for low-cardinality columns whose run length is large.

Maximize query performance

You can facilitate query performance through column sort order as follows:

  • Where possible, sort order should prioritize columns with the lowest cardinality.

  • Do not sort projections on columns of type LONG VARBINARY and LONG VARCHAR.

See also

Choosing sort order: best practices

11.3.4 - Choosing sort order: best practices

When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.

When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.

Combine RLE and sort order

When dealing with predicates on low-cardinality columns, use a combination of RLE and sorting to minimize storage requirements and maximize query performance.

Suppose you have a students table contain the following values and encoding types:

Column # of Distinct Values Encoded With
gender 2 (M or F) RLE
pass_fail 2 (P or F) RLE
class 4 (freshman, sophomore, junior, or senior) RLE
name 10000 (too many to list) Auto

You might have queries similar to this one:

SELECT name FROM studentsWHERE gender = 'M' AND pass_fail = 'P' AND class = 'senior';

The fastest way to access the data is to work through the low-cardinality columns with the smallest number of distinct values before the high-cardinality columns. The following sort order minimizes storage and maximizes query performance for queries that have equality restrictions on gender, class, pass_fail, and name. Specify the ORDER BY clause of the projection as follows:

ORDER BY students.gender, students.pass_fail, students.class, students.name

In this example, the gender column is represented by two RLE entries, the pass_fail column is represented by four entries, and the class column is represented by 16 entries, regardless of the cardinality of the students table. Vertica efficiently finds the set of rows that satisfy all the predicates, resulting in a huge reduction of search effort for RLE encoded columns that occur early in the sort order. Consequently, if you use low-cardinality columns in local predicates, as in the previous example, put those columns early in the projection sort order, in increasing order of distinct cardinality (that is, in increasing order of the number of distinct values in each column).

If you sort this table with student.class first, you improve the performance of queries that restrict only on the student.class column, and you improve the compression of the student.class column (which contains the largest number of distinct values), but the other columns do not compress as well. Determining which projection is better depends on the specific queries in your workload, and their relative importance.

Storage savings with compression decrease as the cardinality of the column increases; however, storage savings with compression increase as the number of bytes required to store values in that column increases.

Maximize the advantages of RLE

To maximize the advantages of RLE encoding, use it only when the average run length of a column is greater than 10 when sorted. For example, suppose you have a table with the following columns, sorted in order of cardinality from low to high:

address.country, address.region, address.state, address.city, address.zipcode

The zipcode column might not have 10 sorted entries in a row with the same zip code, so there is probably no advantage to run-length encoding that column, and it could make compression worse. But there are likely to be more than 10 countries in a sorted run length, so applying RLE to the country column can improve performance.

Put lower cardinality column first for functional dependencies

In general, put columns that you use for local predicates (as in the previous example) earlier in the join order to make predicate evaluation more efficient. In addition, if a lower cardinality column is uniquely determined by a higher cardinality column (like city_id uniquely determining a state_id), it is always better to put the lower cardinality, functionally determined column earlier in the sort order than the higher cardinality column.

For example, in the following sort order, the Area_Code column is sorted before the Number column in the customer_info table:

ORDER BY = customer_info.Area_Code, customer_info.Number, customer_info.Address

In the query, put the Area_Code column first, so that only the values in the Number column that start with 978 are scanned.

=> SELECT AddressFROM customer_info WHERE Area_Code='978' AND Number='9780123457';

Sort for merge joins

When processing a join, the Vertica optimizer chooses from two algorithms:

  • Merge join—If both inputs are pre-sorted on the join column, the optimizer chooses a merge join, which is faster and uses less memory.

  • Hash join—Using the hash join algorithm, Vertica uses the smaller (inner) joined table to build an in-memory hash table on the join column. A hash join has no sort requirement, but it consumes more memory because Vertica builds a hash table with the values in the inner table. The optimizer chooses a hash join when projections are not sorted on the join columns.

If both inputs are pre-sorted, merge joins do not have to do any pre-processing, making the join perform faster. Vertica uses the term sort-merge join to refer to the case when at least one of the inputs must be sorted prior to the merge join. Vertica sorts the inner input side but only if the outer input side is already sorted on the join columns.

To give the Vertica query optimizer the option to use an efficient merge join for a particular join, create projections on both sides of the join that put the join column first in their respective projections. This is primarily important to do if both tables are so large that neither table fits into memory. If all tables that a table will be joined to can be expected to fit into memory simultaneously, the benefits of merge join over hash join are sufficiently small that it probably isn't worth creating a projection for any one join column.

Sort on columns in important queries

If you have an important query, one that you run on a regular basis, you can save time by putting the columns specified in the WHERE clause or the GROUP BY clause of that query early in the sort order.

If that query uses a high-cardinality column such as Social Security number, you may sacrifice storage by placing this column early in the sort order of a projection, but your most important query will be optimized.

Sort columns of equal cardinality by size

If you have two columns of equal cardinality, put the column that is larger first in the sort order. For example, a CHAR(20) column takes up 20 bytes, but an INTEGER column takes up 8 bytes. By putting the CHAR(20) column ahead of the INTEGER column, your projection compresses better.

Sort foreign key columns first, from low to high distinct cardinality

Suppose you have a fact table where the first four columns in the sort order make up a foreign key to another table. For best compression, choose a sort order for the fact table such that the foreign keys appear first, and in increasing order of distinct cardinality. Other factors also apply to the design of projections for fact tables, such as partitioning by a time dimension, if any.

In the following example, the table inventory stores inventory data, and product_key and warehouse_key are foreign keys to the product_dimension and warehouse_dimension tables:

=> CREATE TABLE inventory (
 date_key INTEGER NOT NULL,
 product_key INTEGER NOT NULL,
 warehouse_key INTEGER NOT NULL,
 ...
);
=> ALTER TABLE inventory
   ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
   REFERENCES warehouse_dimension(warehouse_key);
ALTER TABLE inventory
   ADD CONSTRAINT fk_inventory_product FOREIGN KEY(product_key)
   REFERENCES product_dimension(product_key);

The inventory table should be sorted by warehouse_key and then product, since the cardinality of the warehouse_key column is probably lower that the cardinality of the product_key.

11.3.5 - Prioritizing column access speed

If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank.

If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank. For more information, see Setting storage performance.

How columns are ranked

Vertica stores columns included in the projection sort order on the fastest available storage locations. Columns not included in the projection sort order are stored on slower disks. Columns for each projection are ranked as follows:

  • Columns in the sort order are given the highest priority (numbers > 1000).

  • The last column in the sort order is given the rank number 1001.

  • The next-to-last column in the sort order is given the rank number 1002, and so on until the first column in the sort order is given 1000 + # of sort columns.

  • The remaining columns are given numbers from 1000–1, starting with 1000 and decrementing by one per column.

Vertica then stores columns on disk from the highest ranking to the lowest ranking. It places highest-ranking columns on the fastest disks and the lowest-ranking columns on the slowest disks.

Overriding default column ranking

You can modify which columns are stored on fast disks by manually overriding the default ranks for these columns. To accomplish this, set the ACCESSRANK keyword in the column list. Make sure to use an integer that is not already being used for another column. For example, if you want to give a column the fastest access rank, use a number that is significantly higher than 1000 + the number of sort columns. This allows you to enter more columns over time without bumping into the access rank you set.

The following example sets column store_key's access rank to 1500:

CREATE PROJECTION retail_sales_fact_p (
     store_key ENCODING RLE ACCESSRANK 1500,
     pos_transaction_number ENCODING RLE,
     sales_dollar_amount,
     cost_dollar_amount )
AS SELECT
     store_key,
     pos_transaction_number,
     sales_dollar_amount,
     cost_dollar_amount
FROM store.store_sales_fact
ORDER BY store_key
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;