This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Creating a database design
A is a physical storage plan that optimizes query performance.
A design is a physical storage plan that optimizes query performance. Data in Vertica is physically stored in projections. When you initially load data into a table using INSERT
, COPY
(or COPY LOCAL
), Vertica creates a default superprojection for the table. This superprojection ensures that all of the data is available for queries. However, these superprojections might not optimize database performance, resulting in slow query performance and low data compression.
To improve performance, create a design for your Vertica database that optimizes query performance and data compression. You can create a design in several ways:
Database Designer can help you minimize how much time you spend on manual database tuning. You can also use Database Designer to redesign the database incrementally as requirements such as workloads change over time.
Database Designer runs as a background process. This is useful if you have a large design that you want to run overnight. An active SSH session is not required, so design and deploy operations continue to run uninterrupted if the session ends.
Tip
Vertica recommends that you first globally optimize your database using the Comprehensive setting in Database Designer. If the performance of the comprehensive design is not adequate, you can design custom projections using an incremental design and manually, as described in
Creating custom designs.
1 - About Database Designer
Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.
Vertica Database Designer uses sophisticated strategies to create a design that provides excellent performance for ad-hoc queries and specific queries while using disk space efficiently.
During the design process, Database Designer analyzes the logical schema definition, sample data, and sample queries, and creates a physical schema (projections) in the form of a SQL script that you deploy automatically or manually. This script creates a minimal set of superprojections to ensure K-safety.
In most cases, the projections that Database Designer creates provide excellent query performance within physical constraints while using disk space efficiently.
General design options
When you run Database Designer, several general options are available:
-
Create a comprehensive or incremental design.
-
Optimize for query execution, load, or a balance of both.
-
Require K-safety.
-
Recommend unsegmented projections when feasible.
-
Analyze statistics before creating the design.
Database Designer bases its design on the following information that you provide:
Output
Database Designer yields the following output:
-
A design script that creates the projections for the design in a way that meets the optimization objectives and distributes data uniformly across the cluster.
-
A deployment script that creates and refreshes the projections for your design. For comprehensive designs, the deployment script contains commands that remove non-optimized projections. The deployment script includes the full design script.
-
A backup script that contains SQL statements to deploy the design that existed on the system before deployment. This file is useful in case you need to revert to the pre-deployment design.
Design restrictions
Database Designer-generated designs:
-
Exclude live aggregate or Top-K projections. You must create these manually. See CREATE PROJECTION.
-
Do not sort, segment, or partition projections on LONG VARBINARY and LONG VARCHAR columns.
Post-design options
While running Database Designer, you can choose to deploy your design automatically after the deployment script is created, or to deploy it manually, after you have reviewed and tested the design. Vertica recommends that you test the design on a non-production server before deploying the design to your production server.
2 - How Database Designer creates a design
Database Designer-generated designs can include the following recommendations:.
Design recommendations
Database Designer-generated designs can include the following recommendations:
-
Sort buddy projections in the same order, which can significantly improve load, recovery, and site node performance. All buddy projections have the same base name so that they can be identified as a group.
Note
If you manually create projections, Database Designer recommends a buddy with the same sort order, if one does not already exist. By default, Database Designer recommends both super and non-super segmented projections with a buddy of the same sort order and segmentation.
-
Accepts unlimited queries for a comprehensive design.
-
Identifies similar design queries and assigns them a signature.
For queries with the same signature, Database Designer weights the queries, depending on how many queries have that signature. It then considers the weighted query when creating a design.
-
Recommends and creates projections in a way that minimizes data skew by distributing data uniformly across the cluster.
-
Produces higher quality designs by considering UPDATE, DELETE, and SELECT statements.
3 - Who can run Database Designer
Two types of users can use Database Designer to create an optimal database design.
Two types of users can use Database Designer to create an optimal database design. DBADMIN users, and users with the DBDUSER role. The topics in this section describe how DBDUSERs can use Database Designer.
3.1 - Granting and enabling the DBDUSER role
For a non-DBADMIN user to be able to run Database Designer using Management Console, follow the steps described in Allowing the DBDUSER to Run Database Designer Using Management Console.
For a non-DBADMIN user to be able to run Database Designer using Management Console, follow the steps described in Allowing the DBDUSER to run Database Designer using Management Console.
For a non-DBADMIN user to be able to run Database Designer programmatically, following the steps described in Allowing the DBDUSER to run Database Designer programmatically.
Important
When you grant the DBDUSER role, make sure to associate a resource pool with that user to manage resources during Database Designer runs. (For instructions about how to associate a resource pool with a user, see User profiles.)
Multiple users can run Database Designer concurrently without interfering with each other or using up all the cluster resources. When a user runs Database Designer, either using the Management Console or programmatically, its execution is mostly contained by the user's resource pool, but may spill over into system resource pools for less-intensive tasks.
3.1.1 - Allowing the DBDUSER to run Database Designer using Management Console
To allow a user with the DBDUSER role to run Database Designer using Management Console, you must create the user on the Vertica server.
To allow a user with the DBDUSER role to run Database Designer using Management Console, you must create the user on the Vertica server.
As DBADMIN, take these steps on the server:
-
Add a temporary folder to all cluster nodes.
=> CREATE LOCATION '/tmp/dbd' ALL NODES;
-
Create the user who needs access to Database Designer.
=> CREATE USER new_user;
-
Grant the user the privilege to create schemas on the database for which they want to create a design.
=> GRANT CREATE ON DATABASE new_database TO new_user;
-
Grant the DBDUSER role to the new user.
=> GRANT DBDUSER TO new_user;
-
On all nodes in the cluster, grant the user access to the temporary folder.
=> GRANT ALL ON LOCATION '/tmp/dbd' TO new_user;
-
Grant the new user access to the database schema and its tables.
=> GRANT ALL ON SCHEMA user_schema TO new_user;
=> GRANT ALL ON ALL TABLES IN SCHEMA user_schema TO new_user;
After you have completed this task, map the MC user to new_user
:
-
Log in to Management Console as an MC Super user.
-
Click MC Settings.
-
Click User Management.
-
To create a new MC user, click Add.To use an existing MC user, select the user and click Edit.
-
Next to the DB access level window, click Add.
-
In the Add Permissions window, do the following:
-
From the Choose a database drop-down list, select the database for which you want the user to be able to create a design.
-
In the Database username field, enter the user name you created on the Vertica server, new_user
in this example.
-
In the Database password field, enter the database password.
-
In the Restrict access drop-down list, select the level of MC user you want for this user.
-
Click OK to save your changes.
-
Log out of the MC Super user account.
The MC user is now mapped to the user that you created on the Vertica server. Log in as the MC user and use Database Designer to create an optimized design for your database.
For more information about MC users, see Users in Management Console.
3.1.2 - Allowing the DBDUSER to run Database Designer programmatically
To allow a user with the DBDUSER role to run Database Designer programmatically, take these steps:.
To allow a user with the DBDUSER role to run Database Designer programmatically, take these steps:
-
The DBADMIN user must grant the DBDUSER role:
=> GRANT DBDUSER TO <username>;
This role persists until the DBADMIN user revokes it.
-
For a non-DBADMIN user to run the Database Designer programmatically or using Management Console, one of the following two steps must happen first:
-
If the user's default role is already DBDUSER, skip this step. Otherwise, The user must enable the DBDUSER role:
=> SET ROLE DBDUSER;
-
The DBADMIN must add DBDUSER as the default role for that user:
=> ALTER USER DEFAULT ROLE DBDUSER;
3.2 - DBDUSER capabilities and limitations
The DBDUSER role has the following capabilities and limitations:.
The DBDUSER role has the following capabilities and limitations:
-
A DBDUSER cannot create a design with a K-safety less than the system K-safety. If the designs violate the current K-safety by not having enough buddy projections for the tables, the design does not complete.
-
A DBDUSER cannot explicitly change the ancient history mark (AHM), even during deployment of their design.
When you create a design, you automatically have privileges to manipulate that design. Other tasks may require that the DBDUSER have additional privileges:
To... |
DBDUSER must have... |
Submit design tables |
|
Submit a single design query |
- Privilege to execute the design query
|
Submit a file of design queries |
|
Submit design queries from the result of a user query |
|
Create the design and deployment scripts |
|
4 - Workflow for running Database Designer
Vertica provides three ways to run Database Designer:.
Vertica provides three ways to run Database Designer:
The following workflow is common to all these ways to run Database Designer:
5 - Logging projection data for Database Designer
When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify.
When you run Database Designer, the Optimizer proposes a set of ideal projections based on the options that you specify. When you deploy the design, Database Designer creates the design based on these projections. However, space or budget constraints may prevent Database Designer from creating all the proposed projections. In addition, Database Designer may not be able to implement the projections using ideal criteria.
To get information about the projections, first enable the Database Designer logging capability. When enabled, Database Designer stores information about the proposed projections in two Data Collector tables. After Database Designer deploys the design, these logs contain information about which proposed projections were actually created. After deployment, the logs contain information about:
-
Projections that the Optimizer proposed
-
Projections that Database Designer actually created when the design was deployed
-
Projections that Database Designer created, but not with the ideal criteria that the Optimizer identified.
-
The DDL used to create all the projections
-
Column optimizations
If you do not deploy the design immediately, review the log to determine if you want to make any changes. If the design has been deployed, you can still manually create some of the projections that Database Designer did not create.
To enable the Database Designer logging capability, see Enabling logging for Database Designer.
To view the logged information, see Viewing Database Designer logs.
5.1 - Enabling logging for Database Designer
By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.
By default, Database Designer does not log information about the projections that the Optimizer proposed and the Database Designer deploys.
To enable Database Designer logging, enter the following command:
=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 1;
To disable Database Designer logging, enter the following command:
=> ALTER DATABASE DEFAULT SET DBDLogInternalDesignProcess = 0;
See also
5.2 - Viewing Database Designer logs
You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:.
You can find data about the projections that Database Designer considered and deployed in two Data Collector tables:
DC_DESIGN_PROJECTION_CANDIDATES
The DC_DESIGN_PROJECTION_CANDIDATES table contains information about all the projections that the Optimizer proposed. This table also includes the DDL that creates them. The is_a_winner
field indicates if that projection was part of the actual deployed design. To view the DC_DESIGN_PROJECTION_CANDIDATES table, enter:
=> SELECT * FROM DC_DESIGN_PROJECTION_CANDIDATES;
DC_DESIGN_QUERY_PROJECTION_CANDIDATES
The DC_DESIGN_QUERY_PROJECTION_CANDIDATES table lists plan features for all design queries.
Possible features are:
For all design queries, the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table includes the following plan feature information:
-
Optimizer path cost.
-
Database Designer benefits.
-
Ideal plan feature and its description, which identifies how the referenced projection should be optimized.
-
If the design was deployed, the actual plan feature and its description is included in the table. This information identifies how the referenced projection was actually optimized.
Because most projections have multiple optimizations, each projection usually has multiple rows.To view the DC_DESIGN_QUERY_PROJECTION_CANDIDATES table, enter:
=> SELECT * FROM DC_DESIGN_QUERY_PROJECTION_CANDIDATES;
To see example data from these tables, see Database Designer logs: example data.
5.3 - Database Designer logs: example data
In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database.
In the following example, Database Designer created the logs after creating a comprehensive design for the VMart sample database. The output shows two records from the DC_DESIGN_PROJECTION_CANDIDATES table.
The first record contains information about the customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$ projection. The record includes the CREATE PROJECTION statement that Database Designer used to create the projection. The is_a_winner
column is t
, indicating that Database Designer created this projection when it deployed the design.
The second record contains information about the product_dimension_dbd_2_sort_$product_version$__$product_key$ projection. For this projection, the is_a_winner
column is f
. The Optimizer recommended that Database Designer create this projection as part of the design. However, Database Designer did not create the projection when it deployed the design. The log includes the DDL for the CREATE PROJECTION statement. If you want to add the projection manually, you can use that DDL. For more information, see Creating a design manually.
=> SELECT * FROM dc_design_projection_candidates;
-[ RECORD 1 ]--------+---------------------------------------------------------------
time | 2014-04-11 06:30:17.918764-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_table_id | 45035996273720620
projection_id | 45035996273726626
iteration_number | 1
projection_name | customer_dimension_dbd_1_sort_$customer_gender$__$annual_income$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."customer_dimension_dbd_1_
sort_$customer_gender$__$annual_income$"
(
customer_key ENCODING AUTO,
customer_type ENCODING AUTO,
customer_name ENCODING AUTO,
customer_gender ENCODING RLE,
title ENCODING AUTO,
household_id ENCODING AUTO,
customer_address ENCODING AUTO,
customer_city ENCODING AUTO,
customer_state ENCODING AUTO,
customer_region ENCODING AUTO,
marital_status ENCODING AUTO,
customer_age ENCODING AUTO,
number_of_children ENCODING AUTO,
annual_income ENCODING AUTO,
occupation ENCODING AUTO,
largest_bill_amount ENCODING AUTO,
store_membership_card ENCODING AUTO,
customer_since ENCODING AUTO,
deal_stage ENCODING AUTO,
deal_size ENCODING AUTO,
last_deal_update ENCODING AUTO
)
AS
SELECT customer_key,
customer_type,
customer_name,
customer_gender,
title,
household_id,
customer_address,
customer_city,
customer_state,
customer_region,
marital_status,
customer_age,
number_of_children,
annual_income,
occupation,
largest_bill_amount,
store_membership_card,
customer_since,
deal_stage,
deal_size,
last_deal_update
FROM public.customer_dimension
ORDER BY customer_gender,
annual_income
UNSEGMENTED ALL NODES;
is_a_winner | t
-[ RECORD 2 ]--------+-------------------------------------------------------------
time | 2014-04-11 06:30:17.961324-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_table_id | 45035996273720624
projection_id | 45035996273726714
iteration_number | 1
projection_name | product_dimension_dbd_2_sort_$product_version$__$product_key$
projection_statement | CREATE PROJECTION v_dbd_sarahtest_sarahtest."product_dimension_dbd_2_
sort_$product_version$__$product_key$"
(
product_key ENCODING AUTO,
product_version ENCODING RLE,
product_description ENCODING AUTO,
sku_number ENCODING AUTO,
category_description ENCODING AUTO,
department_description ENCODING AUTO,
package_type_description ENCODING AUTO,
package_size ENCODING AUTO,
fat_content ENCODING AUTO,
diet_type ENCODING AUTO,
weight ENCODING AUTO,
weight_units_of_measure ENCODING AUTO,
shelf_width ENCODING AUTO,
shelf_height ENCODING AUTO,
shelf_depth ENCODING AUTO,
product_price ENCODING AUTO,
product_cost ENCODING AUTO,
lowest_competitor_price ENCODING AUTO,
highest_competitor_price ENCODING AUTO,
average_competitor_price ENCODING AUTO,
discontinued_flag ENCODING AUTO
)
AS
SELECT product_key,
product_version,
product_description,
sku_number,
category_description,
department_description,
package_type_description,
package_size,
fat_content,
diet_type,
weight,
weight_units_of_measure,
shelf_width,
shelf_height,
shelf_depth,
product_price,
product_cost,
lowest_competitor_price,
highest_competitor_price,
average_competitor_price,
discontinued_flag
FROM public.product_dimension
ORDER BY product_version,
product_key
UNSEGMENTED ALL NODES;
is_a_winner | f
.
.
.
The next example shows the contents of two records in the DC_DESIGN_QUERY_PROJECTION_CANDIDATES. Both of these rows apply to projection id 45035996273726626.
In the first record, the Optimizer recommends that Database Designer optimize the customer_gender
column for the GROUPBY PIPE algorithm.
In the second record, the Optimizer recommends that Database Designer optimize the public.customer_dimension table for late materialization. Late materialization can improve the performance of joins that might spill to disk.
=> SELECT * FROM dc_design_query_projection_candidates;
-[ RECORD 1 ]-----------------+------------------------------------------------------------
time | 2014-04-11 06:30:17.482377-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 3
iteration_number | 1
design_table_id | 45035996273720620
projection_id | 45035996273726626
ideal_plan_feature | GROUP BY PIPE
ideal_plan_feature_description | Group-by pipelined on column(s) customer_gender
dbd_benefits | 5
opt_path_cost | 211
-[ RECORD 2 ]-----------------+------------------------------------------------------------
time | 2014-04-11 06:30:17.48276-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 3
iteration_number | 1
design_table_id | 45035996273720620
projection_id | 45035996273726626
ideal_plan_feature | LATE MATERIALIZATION
ideal_plan_feature_description | Late materialization on table public.customer_dimension
dbd_benefits | 4
opt_path_cost | 669
.
.
.
You can view the actual plan features that Database Designer implemented for the projections it created. To do so, query the V_INTERNAL.DC_DESIGN_QUERY_PROJECTIONS table:
=> select * from v_internal.dc_design_query_projections;
-[ RECORD 1 ]-------------------+-------------------------------------------------------------
time | 2014-04-11 06:31:41.19199-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 1
projection_id | 2
design_table_id | 45035996273720624
actual_plan_feature | RLE PREDICATE
actual_plan_feature_description | RLE on predicate column(s) department_description
dbd_benefits | 2
opt_path_cost | 141
-[ RECORD 2 ]-------------------+-------------------------------------------------------------
time | 2014-04-11 06:31:41.192292-07
node_name | v_vmart_node0001
session_id | localhost.localdoma-931:0x1b7
user_id | 45035996273704962
user_name | dbadmin
design_id | 45035996273705182
design_query_id | 1
projection_id | 2
design_table_id | 45035996273720624
actual_plan_feature | GROUP BY PIPE
actual_plan_feature_description | Group-by pipelined on column(s) fat_content
dbd_benefits | 5
opt_path_cost | 155
6 - Specifying parameters for Database Designer
Before you run Database Designer to create a design, provide information that allows Database Designer to create the optimal physical schema:.
Before you run Database Designer to create a design, provide information that allows Database Designer to create the optimal physical schema:
6.1 - Design name
All designs that Database Designer creates must have a name that you specify.
All designs that Database Designer creates must have a name that you specify. The design name must be alphanumeric or underscore (_) characters, and can be no more than 32 characters long. (Administrative Tools and Management Console limit the design name to 16 characters.)
The design name becomes part of the files that Database Designer generates, including the deployment script, allowing the files to be easily associated with a particular Database Designer run.
6.2 - Design types
The Database Designer can create two distinct design types.
The Database Designer can create two distinct design types. The design you choose depends on what you are trying to accomplish:
6.2.1 - Comprehensive design
A comprehensive design creates an initial or replacement design for all the tables in the specified schemas.
A comprehensive design creates an initial or replacement design for all the tables in the specified schemas. Create a comprehensive design when you are creating a new database.
To help Database Designer create an efficient design, load representative data into the tables before you begin the design process. When you load data into a table, Vertica creates an unoptimized superprojection so that Database Designer has projections to optimize. If a table has no data, Database Designer cannot optimize it.
Optionally, supply Database Designer with representative queries that you plan to use so Database Designer can optimize the design for them. If you do not supply any queries, Database Designer creates a generic optimization of the superprojections that minimizes storage, with no query-specific projections.
During a comprehensive design, Database Designer creates deployment scripts that:
-
Create new projections to optimize query performance, only when they do not already exist.
-
Create replacement buddy projections when Database Designer changes the encoding of pre-existing projections that it has decided to keep.
6.2.2 - Incremental design
After you create and deploy a comprehensive database design, it's likely that your database will change over time in various ways.
After you create and deploy a comprehensive database design, it's likely that your database will change over time in various ways. You should periodically consider using Database Designer to create incremental designs that address these changes. Changes that warrant an incremental design can include:
-
Significant data additions or updates
-
New or modified queries that you run regularly
-
Performance issues with one or more queries
-
Schema changes
6.3 - Optimization objectives
When creating a design, Database Designer can optimize the design for one of three objectives:.
When creating a design, Database Designer can optimize the design for one of three objectives:
-
Load: Database Designer creates a design that is optimized for loads, minimizing database size, potentially at the expense of query performance.
-
Performance: Database Designer creates a design that is optimized for fast query performance. Because it recommends a design for optimized query performance, this design might recommend more than the Load or Balanced objectives, potentially resulting in a larger database storage size.
-
Balanced: Database Designer creates a design whose objectives are balanced between database size and query performance.
A fully optimized query has an optimization ratio of 0.99. Optimization ratio is the ratio of a query's benefits achieved in the design produced by the Database Designer to that achieved in the ideal plan. Check the optimization ratio with the OptRatio parameter in designer.log.
6.4 - Design tables with sample data
You must specify one or more design tables for Database Designer to deploy a design.
You must specify one or more design tables for Database Designer to deploy a design. If your schema is empty, it does not appear as a design table option.
When you specify design tables, consider the following:
-
To create the most efficient projections for your database, load a moderate amount of representative data into tables before running Database Designer. Database Designer considers the data in this table when creating the design.
-
If your design tables have a large amount if data, the Database Designer run takes a long time; if your tables have too little data, the design is not optimized. Vertica recommends that 10 GB of sample data is sufficient for creating an optimal design.
-
If you submit a design table with no data, Database Designer ignores it.
-
If one of your design tables has been dropped, you will not be able to build or deploy your design.
6.5 - Design queries
If you supply representative queries that you run on your database to Database Designer, it optimizes the performance of those queries.
If you supply representative queries that you run on your database to Database Designer, it optimizes the performance of those queries.
Database Designer checks the validity of all queries when you add them to your design and again when it builds the design. If a query is invalid, Database Designer ignores it.
The query file can contain up to 100 queries. Each query can be assigned a weight that indicates its relative importance so that Database Designer can prioritize it when creating the design. Database Designer groups queries that affect the design that Database Designer creates in the same way and considers one weighted query when creating a design.
The following options apply, depending on whether you create an incremental or comprehensive design:
-
Design queries are required for incremental designs.
-
Design queries are optional for comprehensive designs. If you do not provide design queries, Database Designer recommends a generic design that does not consider specific queries.
Query repository
Using Management Console, you can submit design queries from the QUERY_REQUESTS system table. This is called the query repository.
The QUERY_REQUESTS table contains queries that users have run recently. For a comprehensive design, you can submit up to 200 queries from the QUERY_REQUESTS table to Database Designer to be considered when creating the design. For an incremental design, you can submit up to 100 queries from the QUERY_REQUESTS table.
6.6 - Replicated and segmented projections
When creating a comprehensive design, Database Designer creates projections based on data statistics and queries.
When creating a comprehensive design, Database Designer creates projections based on data statistics and queries. It also reviews the submitted design tables to decide whether projections should be segmented (distributed across the cluster nodes) or replicated (duplicated on all cluster nodes).
For detailed information, see the following sections:
6.6.1 - Replicated projections
occurs when Vertica stores identical copies of data across all the nodes in your cluster.
Replication occurs when Vertica stores identical copies of data across all the nodes in your cluster.
Assuming that largest-row-count equals the number of rows in the design table with the largest number of rows, Database Designer recommends that a projection be replicated if any of the following conditions is true:
-
largest-row-count < 1,000,000 and number of rows in the table <= 10% of largest-row-count
-
largest-row-count >= 10,000,000 and number of rows in the table <= 1% of largest-row-count
-
The number of rows in the table <= 100,000
For more information about replication, see High availability with projections.
6.6.2 - Segmented projections
occurs when Vertica distributes data evenly across multiple database nodes so that all nodes participate in query execution.
Segmentation occurs when Vertica distributes data evenly across multiple database nodes so that all nodes participate in query execution. Projection segmentation provides high availability and recovery, and optimizes query execution.
When running Database Designer programmatically or using Management Console, you can specify to allow Database Designer to recommend unsegmented projections in the design. If you do not specify this, Database Designer recommends only segmented projections.
Database Designer recommends segmented superprojections for large tables when deploying to multiple node clusters, and recommends replicated superprojections for smaller tables.
Database Designer does not segment projections on:
For more information about segmentation, see High availability with projections.
6.7 - Statistics analysis
By default, Database Designer analyzes statistics for the design tables when adding them to the design.
By default, Database Designer analyzes statistics for the design tables when adding them to the design. This option is optional, but Vertica recommends that you analyze statistics because accurate statistics help Database Designer optimize compression and query performance.
Analyzing statistics takes time and resources. If the current statistics for the design tables are up to date, do not bother analyzing the statistics. When in doubt, analyze the statistics to make sure they are current.
For more information, see Collecting Statistics.
7 - Building a design
After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.
After you have created design tables and loaded data into them, and then specified the parameters you want Database Designer to use when creating the physical schema, direct Database Designer to create the scripts necessary to build the design.
Note
You cannot stop a running database if Database Designer is building a database design.
When you build a database design, Vertica generates two scripts:
-
Deployment script: design-name
_deploy.sql—Contains the SQL statements that create projections for the design you are deploying, deploy the design, and drop unused projections. When the deployment script runs, it creates the optimized design. For details about how to run this script and deploy the design, see Deploying a Design.
-
Design script: design-name
_design.sql—Contains the CREATE PROJECTION
statements that Database Designeruses to create the design. Review this script to make sure you are happy with the design.
The design script is a subset of the deployment script. It serves as a backup of the DDL for the projections that the deployment script creates.
When you create a design using Management Console:
-
If you submit a large number of queries to your design and build it right immediately, a timing issue could cause the queries not to load before deployment starts. If this occurs, you might see one of the following errors:
To accommodate this timing issue, you may need to reset the design, check the Queries tab to make sure the queries have been loaded, and then rebuild the design. Detailed instructions are in:
-
The scripts are deleted when deployment completes. To save a copy of the deployment script after the design is built but before the deployment completes, go to the Output window and copy and paste the SQL statements to a file.
8 - Resetting a design
You must reset a design when:.
You must reset a design when:
-
You build a design and the output scripts described in Building a Design are not created.
-
You build a design but Database Designer cannot complete the design because the queries it expects are not loaded.
Resetting a design discards all the run-specific information of the previous Database Designer build, but retains its configuration (design type, optimization objectives, K-safety, etc.) and tables and queries.
After you reset a design, review the design to see what changes you need to make. For example, you can fix errors, change parameters, or check for and add additional tables or queries. Then you can rebuild the design.
You can only reset a design in Management Console or by using the DESIGNER_RESET_DESIGN function.
9 - Deploying a design
After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.
After running Database Designer to generate a deployment script, Vertica recommends that you test your design on a non-production server before you deploy it to your production server.
Both the design and deployment processes run in the background. This is useful if you have a large design that you want to run overnight. Because an active SSH session is not required, the design/deploy operations continue to run uninterrupted, even if the session is terminated.
Note
You cannot stop a running database if Database Designer is building or deploying a database design.
Database Designer runs as a background process. Multiple users can run Database Designer concurrently without interfering with each other or using up all the cluster resources. However, if multiple users are deploying a design on the same tables at the same time, Database Designer may not be able to complete the deployment. To avoid problems, consider the following:
-
Schedule potentially conflicting Database Designer processes to run sequentially overnight so that there are no concurrency problems.
-
Avoid scheduling Database Designer runs on the same set of tables at the same time.
There are two ways to deploy your design:
9.1 - Deploying designs using Database Designer
OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.
OpenText recommends that you run Database Designer and deploy optimized projections right after loading your tables with sample data because Database Designer provides projections optimized for the current state of your database.
If you choose to allow Database Designer to automatically deploy your script during a comprehensive design and are running Administrative Tools, Database Designer creates a backup script of your database's current design. This script helps you re-create the design of projections that may have been dropped by the new design. The backup script is located in the output directory you specified during the design process.
If you choose not to have Database Designer automatically run the deployment script (for example, if you want to maintain projections from a pre-existing deployment), you can manually run the deployment script later. See Deploying designs manually.
To deploy a design while running Database Designer, do one of the following:
-
In Management Console, select the design and click Deploy Design.
-
In the Administration Tools, select Deploy design in the Design Options window.
If you are running Database Designer programmatically, use DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY and set the deploy
parameter to 'true'.
Once you have deployed your design, query the DEPLOY_STATUS system table to see the steps that the deployment took:
vmartdb=> SELECT * FROM V_MONITOR.DEPLOY_STATUS;
9.2 - Deploying designs manually
If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:.
If you choose not to have Database Designer deploy your design at design time, you can deploy the design later using the deployment script:
-
Make sure that the target database contains the same tables and projections as the database where you ran Database Designer. The database should also contain sample data.
-
To deploy the projections to a test or production environment, execute the deployment script in vsql with the meta-command
\i
as follows, where design-name
is the name of the database design:
=> \i design-name_deploy.sql
-
For a K-safe database, call Vertica meta-function
GET_PROJECTIONS
on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.
-
If you create projections for tables that already contains data, call
REFRESH
or
START_REFRESH
to update new projections. Otherwise, these projections are not available for query processing.
-
Call
MAKE_AHM_NOW
to set the Ancient History Mark (AHM) to the most recent epoch.
-
Call
DROP PROJECTION
on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.
-
Call
ANALYZE_STATISTICS
on all database projections:
=> SELECT ANALYZE_STATISTICS ('');
This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.
10 - How to create a design
There are three ways to create a design using Database Designer:.
There are three ways to create a design using Database Designer:
The following table shows what Database Designer capabilities are available in each tool:
Database Designer Capability |
Management Console |
Running Database Designer Programmatically |
Administrative Tools |
Create design |
Yes |
Yes |
Yes |
Design name length (# of characters) |
16 |
32 |
16 |
Build design (create design and deployment scripts) |
Yes |
Yes |
Yes |
Create backup script |
|
|
Yes |
Set design type (comprehensive or incremental) |
Yes |
Yes |
Yes |
Set optimization objective |
Yes |
Yes |
Yes |
Add design tables |
Yes |
Yes |
Yes |
Add design queries file |
Yes |
Yes |
Yes |
Add single design query |
|
Yes |
|
Use query repository |
Yes |
Yes |
|
Set K-safety |
Yes |
Yes |
Yes |
Analyze statistics |
Yes |
Yes |
Yes |
Require all unsegmented projections |
Yes |
Yes |
|
View event history |
Yes |
Yes |
|
Set correlation analysis mode (Default = 0) |
|
Yes |
|
10.1 - Using administration tools to create a design
To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user.
To use the Administration Tools interface to create an optimized design for your database, you must be a DBADMIN user. Follow these steps:
-
Log in as the dbadmin user and start Administration Tools.
-
From the main menu, start the database for which you want to create a design. The database must be running before you can create a design for it.
-
On the main menu, select Configuration Menu and click OK.
-
On the Configuration Menu, select Run Database Designer and click OK.
-
On the Select a database to design window, enter the name of the database for which you are creating a design and click OK.
-
On the Enter the directory for Database Designer output window, enter the full path to the directory to contain the design script, deployment script, backup script, and log files, and click OK.
For information about the scripts, see Building a design.
-
On the Database Designer window, enter a name for the design and click OK.
For more information about design names, see Design name.
-
On the Design Type window, choose which type of design to create and click OK.
For a description of the design types, see Design types
-
The Select schema(s) to add to query search path window lists all the schemas in the database that you selected. Select the schemas that contain representative data that you want Database Designer to consider when creating the design and click OK.
For more information about choosing schema and tables to submit to Database Designer, see Design tables with sample data.
-
On the Optimization Objectives window, select the objective you want for the database optimization:
For details about these objectives, see Optimization objectives.
-
The final window summarizes the choices you have made and offers you two choices:
-
Proceed with building the design, and deploying it if you specified to deploy it immediately. If you did not specify to deploy, you can review the design and deployment scripts and deploy them manually, as described in Deploying designs manually.
-
Cancel the design and go back to change some of the parameters as needed.
-
Creating a design can take a long time.To cancel a running design from the Administration Tools window, enter Ctrl+C.
To create a design for the VMart example database, see Using Database Designer to create a comprehensive design in Getting Started.
11 - Running Database Designer programmatically
Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality.
Vertica provides a set of meta-functions that enable programmatic access to Database Designer functionality. Run Database Designer programmatically to perform the following tasks:
-
Optimize performance on tables that you own.
-
Create or update a design without requiring superuser or DBADMIN intervention.
-
Add individual queries and tables, or add data to your design, and then rerun Database Designer to update the design based on this new information.
-
Customize the design.
-
Use recently executed queries to set up your database to run Database Designer automatically on a regular basis.
-
Assign each design query a query weight that indicates the importance of that query in creating the design. Assign a higher weight to queries that you run frequently so that Database Designer prioritizes those queries in creating the design.
For more details about Database Designer functions, see Database Designer function categories.
11.1 - Database Designer function categories
Database Designer functions perform the following operations, generally performed in the following order:
-
Create a design.
-
Set design properties.
-
Populate a design.
-
Create design and deployment scripts.
-
Get design data.
-
Clean up.
Important
You can also use meta-function
DESIGNER_SINGLE_RUN, which encapsulates all of these steps with a single call. The meta-function iterates over all queries within a specified timespan, and returns with a design ready for deployment.
For detailed information, see Workflow for running Database Designer programmatically. For information on required privileges, see Privileges for running Database Designer functions
Caution
Before running Database Designer functions on an existing schema, back up the current design by calling
EXPORT_CATALOG.
Create a design
DESIGNER_CREATE_DESIGN directs Database Designer to create a design.
Set design properties
The following functions let you specify design properties:
Populate a design
The following functions let you add tables and queries to your Database Designer design:
Create design and deployment scripts
The following functions populate the Database Designer workspace and create design and deployment scripts. You can also analyze statistics, deploy the design automatically, and drop the workspace after the deployment:
Reset a design
DESIGNER_RESET_DESIGN discards all the run-specific information of the previous Database Designer build or deployment of the specified design but retains its configuration.
Get design data
The following functions display information about projections and scripts that the Database Designer created:
Clean up
The following functions cancel any running Database Designer operation or drop a Database Designer design and all its contents:
11.2 - Workflow for running Database Designer programmatically
The following example shows the steps you take to create a design by running Database Designer programmatically.
The following example shows the steps you take to create a design by running Database Designer programmatically.
Important
Before running Database Designer functions on an existing schema, back up the current design by calling function
EXPORT_CATALOG.
Before you run this example, you should have the DBDUSER role, and you should have enabled that role using the SET ROLE DBDUSER command:
-
Create a table in the public schema:
=> CREATE TABLE T(
x INT,
y INT,
z INT,
u INT,
v INT,
w INT PRIMARY KEY
);
-
Add data to the table:
\! perl -e 'for ($i=0; $i<100000; ++$i) {printf("%d, %d, %d, %d, %d, %d\n", $i/10000, $i/100, $i/10, $i/2, $i, $i);}'
| vsql -c "COPY T FROM STDIN DELIMITER ',' DIRECT;"
-
Create a second table in the public schema:
=> CREATE TABLE T2(
x INT,
y INT,
z INT,
u INT,
v INT,
w INT PRIMARY KEY
);
-
Copy the data from table T1
to table T2
and commit the changes:
=> INSERT /*+DIRECT*/ INTO T2 SELECT * FROM T;
=> COMMIT;
-
Create a new design:
=> SELECT DESIGNER_CREATE_DESIGN('my_design');
This command adds information to the DESIGNS system table in the V_MONITOR schema.
-
Add tables from the public schema to the design :
=> SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t');
=> SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t2');
These commands add information to the DESIGN_TABLES system table.
-
Create a file named queries.txt
in /tmp/examples
, or another directory where you have READ and WRITE privileges. Add the following two queries in that file and save it. Database Designer uses these queries to create the design:
SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
SELECT DISTINCT w FROM T;
-
Add the queries file to the design and display the results—the numbers of accepted queries, non-design queries, and unoptimizable queries:
=> SELECT DESIGNER_ADD_DESIGN_QUERIES
('my_design',
'/tmp/examples/queries.txt',
'true'
);
The results show that both queries were accepted:
Number of accepted queries =2
Number of queries referencing non-design tables =0
Number of unsupported queries =0
Number of illegal queries =0
The DESIGNER_ADD_DESIGN_QUERIES function populates the DESIGN_QUERIES system table.
-
Set the design type to comprehensive. (This is the default.) A comprehensive design creates an initial or replacement design for all the design tables:
=> SELECT DESIGNER_SET_DESIGN_TYPE('my_design', 'comprehensive');
-
Set the optimization objective to query. This setting creates a design that focuses on faster query performance, which might recommend additional projections. These projections could result in a larger database storage footprint:
=> SELECT DESIGNER_SET_OPTIMIZATION_OBJECTIVE('my_design', 'query');
-
Create the design and save the design and deployment scripts in /tmp/examples
, or another directory where you have READ and WRITE privileges. The following command:
-
Analyzes statistics
-
Doesn't deploy the design.
-
Doesn't drop the design after deployment.
-
Stops if it encounters an error.
=> SELECT DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
('my_design',
'/tmp/examples/my_design_projections.sql',
'/tmp/examples/my_design_deploy.sql',
'True',
'False',
'False',
'False'
);
This command adds information to the following system tables:
-
Examine the status of the Database Designer run to see what projections Database Designer recommends. In the deployment_projection_name
column:
-
rep
indicates a replicated projection
-
super
indicates a superprojection
The deployment_status
column is pending
because the design has not yet been deployed.
For this example, Database Designer recommends four projections:
=> \x
Expanded display is on.
=> SELECT * FROM OUTPUT_DEPLOYMENT_STATUS;
-[ RECORD 1 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 1
deployment_projection_name | T_DBD_1_rep_my_design
deployment_status | pending
error_message | N/A
-[ RECORD 2 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 2
deployment_projection_name | T2_DBD_2_rep_my_design
deployment_status | pending
error_message | N/A
-[ RECORD 3 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 3
deployment_projection_name | T_super
deployment_status | pending
error_message | N/A
-[ RECORD 4 ]--------------+-----------------------------
deployment_id | 45035996273795970
deployment_projection_id | 4
deployment_projection_name | T2_super
deployment_status | pending
error_message | N/A
-
View the script /tmp/examples/my_design_deploy.sql
to see how these projections are created when you run the deployment script. In this example, the script also assigns the encoding schemes RLE and COMMONDELTA_COMP to columns where appropriate.
-
Deploy the design from the directory where you saved it:
=> \i /tmp/examples/my_design_deploy.sql
-
Now that the design is deployed, delete the design:
=> SELECT DESIGNER_DROP_DESIGN('my_design');
11.3 - Privileges for running Database Designer functions
Non-DBADMIN users with the DBDUSER role can run Database Designer functions.
Non-DBADMIN users with the DBDUSER role can run Database Designer functions. Two steps are required to enable users to run these functions:
-
A DBADMIN or superuser grants the user the DBDUSER role:
=> GRANT DBDUSER TO username;
This role persists until the DBADMIN revokes it.
-
Before the DBDUSER can run Database Designer functions, one of the following must occur:
-
The user enables the DBDUSER role:
=> SET ROLE DBDUSER;
-
The superuser sets the user's default role to DBDUSER:
=> ALTER USER username DEFAULT ROLE DBDUSER;
General DBDUSER limitations
As a DBDUSER, the following restrictions apply:
-
You can set a design's K-safety to a value less than or equal to system K-safety. You cannot change system K-safety.
-
You cannot explicitly change the ancient history mark (AHM), even during design deployment.
Design dependencies and privileges
Individual design tasks are likely to have dependencies that require specific privileges:
Task |
Required privileges |
Add tables to a design |
|
Add a single design query to the design |
- Privilege to execute the design query
|
Add a query file to the design |
|
Add queries from the result of a user query to the design |
|
Create design and deployment scripts |
|
11.4 - Resource pool for Database Designer users
When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs.
When you grant a user the DBDUSER role, be sure to associate a resource pool with that user to manage resources during Database Designer runs. This allows multiple users to run Database Designer concurrently without interfering with each other or using up all cluster resources.
Note
When a user runs Database Designer, execution is mostly contained in the user's resource pool. However, Vertica might also use other system resource pools to perform less-intensive tasks.
12 - Creating custom designs
Vertica strongly recommends that you use the physical schema design produced by , which provides , excellent query performance, and efficient use of storage space.
Vertica strongly recommends that you use the physical schema design produced by Database Designer, which provides K-safety, excellent query performance, and efficient use of storage space. If any queries run less as efficiently than you expect, consider using the Database Designer incremental design process to optimize the database design for the query.
If the projections created by Database Designer still do not meet your needs, you can write custom projections, from scratch or based on projection designs created by Database Designer.
If you are unfamiliar with writing custom projections, start by modifying an existing design generated by Database Designer.
12.1 - Custom design process
To create a custom design or customize an existing one:.
To create a custom design or customize an existing one:
-
Plan the new design or modifications to an existing one. See Planning your design.
-
Create or modify projections. See Design fundamentals and CREATE PROJECTION for more detail.
-
Deploy projections to a test environment. See Writing and deploying custom projections.
-
Test and modify projections as needed.
-
After you finalize the design, deploy projections to the production environment.
12.2 - Planning your design
The syntax for creating a design is easy for anyone who is familiar with SQL.
The syntax for creating a design is easy for anyone who is familiar with SQL. As with any successful project, however, a successful design requires some initial planning. Before you create your first design:
-
Become familiar with standard design requirements and plan your design to include them. See Design requirements.
-
Determine how many projections you need to include in the design. See Determining the number of projections to use.
-
Determine the type of compression and encoding to use for columns. See Architecture.
-
Determine whether or not you want the database to be K-safe. Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values are 0, 1, and 2. See Designing for K-safety.
12.2.1 - Design requirements
A physical schema design is a script that contains CREATE PROJECTION statements.
A physical schema design is a script that contains CREATE PROJECTION statements. These statements determine which columns are included in projections and how they are optimized.
If you use Database Designer as a starting point, it automatically creates designs that meet all fundamental design requirements. If you intend to create or modify designs manually, be aware that all designs must meet the following requirements:
-
Every design must create at least one superprojection for every table in the database that is used by the client application. These projections provide complete coverage that enables users to perform ad-hoc queries as needed. They can contain joins and they are usually configured to maximize performance through sort order, compression, and encoding.
-
Query-specific projections are optional. If you are satisfied with the performance provided through superprojections, you do not need to create additional projections. However, you can maximize performance by tuning for specific query work loads.
-
Vertica recommends that all production databases have a minimum K-safety of one (K=1) to support high availability and recovery. (K-safety can be set to 0, 1, or 2.) See High availability with projections and Designing for K-safety.
-
Vertica recommends that if you have more than 20 nodes, but small tables, do not create replicated projections. If you create replicated projections, the catalog becomes very large and performance may degrade. Instead, consider segmenting those projections.
12.2.2 - Determining the number of projections to use
In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding.
In many cases, a design that consists of a set of superprojections (and their buddies) provides satisfactory performance through compression and encoding. This is especially true if the sort orders for the projections have been used to maximize performance for one or more query predicates (WHERE clauses).
However, you might want to add additional query-specific projections to increase the performance of queries that run slowly, are used frequently, or are run as part of business-critical reporting. The number of additional projections (and their buddies) that you create should be determined by:
-
Your organization's needs
-
The amount of disk space you have available on each node in the cluster
-
The amount of time available for loading data into the database
As the number of projections that are tuned for specific queries increases, the performance of these queries improves. However, the amount of disk space used and the amount of time required to load data increases as well. Therefore, you should create and test designs to determine the optimum number of projections for your database configuration. On average, organizations that choose to implement query-specific projections achieve optimal performance through the addition of a few query-specific projections.
12.2.3 - Designing for K-safety
Vertica recommends that all production databases have a minimum K-safety of one (K=1).
Vertica recommends that all production databases have a minimum K-safety of one (K=1). Valid K-safety values for production databases are 1 and 2. Non-production databases do not have to be K-safe and can be set to 0.
A K-safe database must have at least three nodes, as shown in the following table:
K-safety level |
Number of required nodes |
1 |
3+ |
2 |
5+ |
Note
Vertica only supports K-safety levels 1 and 2.
You can set K-safety to 1 or 2 only when the physical schema design meets certain redundancy requirements. See Requirements for a K-safe physical schema design.
Using Database Designer
To create designs that are K-safe, Vertica recommends that you use the Database Designer. When creating projections with Database Designer, projection definitions that meet K-safe design requirements are recommended and marked with a K-safety level. Database Designer creates a script that uses the
MARK_DESIGN_KSAFE
function to set the K-safety of the physical schema to 1. For example:
=> \i VMart_Schema_design_opt_1.sql
CREATE PROJECTION
CREATE PROJECTION
mark_design_ksafe
----------------------
Marked design 1-safe
(1 row)
By default, Vertica creates K-safe superprojections when database K-safety is greater than 0.
Monitoring K-safety
Monitoring tables can be accessed programmatically to enable external actions, such as alerts. You monitor the K-safety level by querying the
SYSTEM
table for settings in columns DESIGNED_FAULT_TOLERANCE
and CURRENT_FAULT_TOLERANCE
.
Loss of K-safety
When K nodes in your cluster fail, your database continues to run, although performance is affected. Further node failures could potentially cause the database to shut down if the failed node's data is not available from another functioning node in the cluster.
See also
K-safety in an Enterprise Mode database
12.2.3.1 - Requirements for a K-safe physical schema design
Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes.
Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at least three nodes. (If your cluster has one or two nodes, it generates designs with a K-safety of 0. You can modify a design created for a three-node (or greater) cluster, and the K-safe requirements are already set.
If you create custom projections, your physical schema design must meet the following requirements to be able to successfully recover the database in the event of a failure:
You can use the
MARK_DESIGN_KSAFE
function to find out whether your schema design meets requirements for K-safety.
12.2.3.2 - Requirements for a physical schema design with no K-safety
If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).
If you use Database Designer to generate an comprehensive design that you can modify and you do not want the design to be K-safe, set K-safety level to 0 (zero).
If you want to start from scratch, do the following to establish minimal projection requirements for a functioning database with no K-safety (K=0):
-
Define at least one superprojection for each table in the logical schema.
-
Replicate (define an exact copy of) each dimension table superprojection on each node.
12.2.3.3 - Designing segmented projections for K-safety
Projections must comply with database K-safety requirements.
Projections must comply with database K-safety requirements. In general, you must create buddy projections for each segmented projection, where the number of buddy projections is K+1. Thus, if system K-safety is set to 1, each projection segment must be duplicated by one buddy; if K-safety is set to 2, each segment must be duplicated by two buddies.
Automatic creation of buddy projections
You can use
CREATE PROJECTION
so it automatically creates the number of buddy projections required to satisfy K-safety, by including SEGMENTED BY ... ALL NODES
. If CREATE PROJECTION
specifies K-safety (KSAFE=
n
)
, Vertica uses that setting; if the statement omits KSAFE
, Vertica uses system K-safety.
In the following example, CREATE PROJECTION
creates segmented projection ttt_p1
for table ttt
. Because system K-safety is set to 1, Vertica requires a buddy projection for each segmented projection. The CREATE PROJECTION
statement omits KSAFE
, so Vertica uses system K-safety and creates two buddy projections: ttt_p1_b0
and ttt_p1_b1
:
=> SELECT mark_design_ksafe(1);
mark_design_ksafe
----------------------
Marked design 1-safe
(1 row)
=> CREATE TABLE ttt (a int, b int);
WARNING 6978: Table "ttt" will include privileges from schema "public"
CREATE TABLE
=> CREATE PROJECTION ttt_p1 as SELECT * FROM ttt SEGMENTED BY HASH(a) ALL NODES;
CREATE PROJECTION
=> SELECT projection_name from projections WHERE anchor_table_name='ttt';
projection_name
-----------------
ttt_p1_b0
ttt_p1_b1
(2 rows)
Vertica automatically names buddy projections by appending the suffix _b
n
to the projection base name—for example ttt_p1_b0
.
Manual creation of buddy projections
If you create a projection on a single node, and system K-safety is greater than 0, you must manually create the number of buddies required for K-safety. For example, you can create projection xxx_p1
for table xxx
on a single node, as follows:
=> CREATE TABLE xxx (a int, b int);
WARNING 6978: Table "xxx" will include privileges from schema "public"
CREATE TABLE
=> CREATE PROJECTION xxx_p1 AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0001;
CREATE PROJECTION
Because K-safety is set to 1, a single instance of this projection is not K-safe. Attempts to insert data into its anchor table xxx
return with an error like this:
=> INSERT INTO xxx VALUES (1, 2);
ERROR 3586: Insufficient projections to answer query
DETAIL: No projections that satisfy K-safety found for table xxx
HINT: Define buddy projections for table xxx
In order to comply with K-safety, you must create a buddy projection for projection xxx_p1
. For example:
=> CREATE PROJECTION xxx_p1_buddy AS SELECT * FROM xxx SEGMENTED BY HASH(a) NODES v_vmart_node0002;
CREATE PROJECTION
Table xxx
now complies with K-safety and accepts DML statements such as INSERT
:
VMart=> INSERT INTO xxx VALUES (1, 2);
OUTPUT
--------
1
(1 row)
See also
For general information about segmented projections and buddies, see Segmented projections. For information about designing for K-safety, see Designing for K-safety and Designing for segmentation.
12.2.3.4 - Designing unsegmented projections for K‑Safety
In many cases, dimension tables are relatively small, so you do not need to segment them.
In many cases, dimension tables are relatively small, so you do not need to segment them. Accordingly, you should design a K-safe database so projections for its dimension tables are replicated without segmentation on all cluster nodes. You create these projections with a
CREATE PROJECTION
statement that includes the keywords UNSEGMENTED ALL NODES
. These keywords specify to create identical instances of the projection on all cluster nodes.
The following example shows how to create an unsegmented projection for the table store.store_dimension
:
=> CREATE PROJECTION store.store_dimension_proj (storekey, name, city, state)
AS SELECT store_key, store_name, store_city, store_state
FROM store.store_dimension
UNSEGMENTED ALL NODES;
CREATE PROJECTION
Vertica uses the same name to identify all instances of the unsegmented projection—in this example, store.store_dimension_proj
. The keyword ALL NODES
specifies to replicate the projection on all nodes:
=> \dj store.store_dimension_proj
List of projections
Schema | Name | Owner | Node | Comment
--------+----------------------+---------+------------------+---------
store | store_dimension_proj | dbadmin | v_vmart_node0001 |
store | store_dimension_proj | dbadmin | v_vmart_node0002 |
store | store_dimension_proj | dbadmin | v_vmart_node0003 |
(3 rows)
For more information about projection name conventions, see Projection naming.
12.2.4 - Designing for segmentation
You segment projections using hash segmentation.
You segment projections using hash segmentation. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns that meet the criteria could be an excellent choice for hash segmentation.
Note
For detailed information about using hash segmentation in a projection, see
CREATE PROJECTION in the SQL Reference Manual.
When segmenting projections, determine which columns to use to segment the projection. Choose one or more columns that have a large number of unique data values and acceptable skew in their data distribution. Primary key columns are an excellent choice for hash segmentation. The columns must be unique across all the tables being used in a query.
12.3 - Design fundamentals
Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point.
Although you can write custom projections from scratch, Vertica recommends that you use Database Designer to create a design to use as a starting point. This ensures that you have projections that meet basic requirements.
12.3.1 - Writing and deploying custom projections
Before you write custom projections, review the topics in Planning Your Design carefully.
Before you write custom projections, review the topics in Planning your design carefully. Failure to follow these considerations can result in non-functional projections.
To manually modify or create a projection:
-
Write a script with
CREATE PROJECTION
statements to create the desired projections.
-
Run the script in vsql with the meta-command
\i
.
Note
You must have a database loaded with a logical schema.
-
For a K-safe database, call Vertica meta-function
GET_PROJECTIONS
on tables of the new projections. Check the output to verify that all projections have enough buddies to be identified as safe.
-
If you create projections for tables that already contains data, call
REFRESH
or
START_REFRESH
to update new projections. Otherwise, these projections are not available for query processing.
-
Call
MAKE_AHM_NOW
to set the Ancient History Mark (AHM) to the most recent epoch.
-
Call
DROP PROJECTION
on projections that are no longer needed, and would otherwise waste disk space and reduce load speed.
-
Call
ANALYZE_STATISTICS
on all database projections:
=> SELECT ANALYZE_STATISTICS ('');
This function collects and aggregates data samples and storage information from all nodes on which a projection is stored, and then writes statistics into the catalog.
12.3.2 - Designing superprojections
Superprojections have the following requirements:.
Superprojections have the following requirements:
-
They must contain every column within the table.
-
For a K-safe design, superprojections must either be replicated on all nodes within the database cluster (for dimension tables) or paired with buddies and segmented across all nodes (for very large tables and medium large tables). See Projections and High availability with projections for an overview of projections and how they are stored. See Designing for K-safety for design specifics.
To provide maximum usability, superprojections need to minimize storage requirements while maximizing query performance. To achieve this, the sort order for columns in superprojections is based on storage requirements and commonly used queries.
12.3.3 - Sort order benefits
Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.
Column sort order is an important factor in minimizing storage requirements, and maximizing query performance.
Minimize storage requirements
Minimizing storage saves on physical resources and increases performance by reducing disk I/O. You can minimize projection storage by prioritizing low-cardinality columns in its sort order. This reduces the number of rows Vertica stores and accesses to retrieve query results.
After identifying projection sort columns, analyze their data and choose the most effective encoding method. The Vertica optimizer gives preference to columns with run-length encoding (RLE), so be sure to use it whenever appropriate. Run-length encoding replaces sequences (runs) of identical values with a single pair that contains the value and number of occurrences. Therefore, it is especially appropriate to use it for low-cardinality columns whose run length is large.
You can facilitate query performance through column sort order as follows:
-
Where possible, sort order should prioritize columns with the lowest cardinality.
-
Do not sort projections on columns of type LONG VARBINARY and LONG VARCHAR.
See also
Choosing sort order: best practices
12.3.4 - Choosing sort order: best practices
When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.
When choosing sort orders for your projections, Vertica has several recommendations that can help you achieve maximum query performance, as illustrated in the following examples.
Combine RLE and sort order
When dealing with predicates on low-cardinality columns, use a combination of RLE and sorting to minimize storage requirements and maximize query performance.
Suppose you have a students
table contain the following values and encoding types:
Column |
# of Distinct Values |
Encoded With |
gender |
2 (M or F) |
RLE |
pass_fail |
2 (P or F) |
RLE |
class |
4 (freshman, sophomore, junior, or senior) |
RLE |
name |
10000 (too many to list) |
Auto |
You might have queries similar to this one:
SELECT name FROM studentsWHERE gender = 'M' AND pass_fail = 'P' AND class = 'senior';
The fastest way to access the data is to work through the low-cardinality columns with the smallest number of distinct values before the high-cardinality columns. The following sort order minimizes storage and maximizes query performance for queries that have equality restrictions on gender
, class
, pass_fail
, and name
. Specify the ORDER BY clause of the projection as follows:
ORDER BY students.gender, students.pass_fail, students.class, students.name
In this example, the gender
column is represented by two RLE entries, the pass_fail
column is represented by four entries, and the class
column is represented by 16 entries, regardless of the cardinality of the students
table. Vertica efficiently finds the set of rows that satisfy all the predicates, resulting in a huge reduction of search effort for RLE encoded columns that occur early in the sort order. Consequently, if you use low-cardinality columns in local predicates, as in the previous example, put those columns early in the projection sort order, in increasing order of distinct cardinality (that is, in increasing order of the number of distinct values in each column).
If you sort this table with student.class
first, you improve the performance of queries that restrict only on the student.class
column, and you improve the compression of the student.class
column (which contains the largest number of distinct values), but the other columns do not compress as well. Determining which projection is better depends on the specific queries in your workload, and their relative importance.
Storage savings with compression decrease as the cardinality of the column increases; however, storage savings with compression increase as the number of bytes required to store values in that column increases.
Maximize the advantages of RLE
To maximize the advantages of RLE encoding, use it only when the average run length of a column is greater than 10 when sorted. For example, suppose you have a table with the following columns, sorted in order of cardinality from low to high:
address.country, address.region, address.state, address.city, address.zipcode
The zipcode
column might not have 10 sorted entries in a row with the same zip code, so there is probably no advantage to run-length encoding that column, and it could make compression worse. But there are likely to be more than 10 countries in a sorted run length, so applying RLE to the country column can improve performance.
Put lower cardinality column first for functional dependencies
In general, put columns that you use for local predicates (as in the previous example) earlier in the join order to make predicate evaluation more efficient. In addition, if a lower cardinality column is uniquely determined by a higher cardinality column (like city_id uniquely determining a state_id), it is always better to put the lower cardinality, functionally determined column earlier in the sort order than the higher cardinality column.
For example, in the following sort order, the Area_Code column is sorted before the Number column in the customer_info table:
ORDER BY = customer_info.Area_Code, customer_info.Number, customer_info.Address
In the query, put the Area_Code
column first, so that only the values in the Number
column that start with 978 are scanned.
=> SELECT AddressFROM customer_info WHERE Area_Code='978' AND Number='9780123457';
Sort for merge joins
When processing a join, the Vertica optimizer chooses from two algorithms:
-
Merge join—If both inputs are pre-sorted on the join column, the optimizer chooses a merge join, which is faster and uses less memory.
-
Hash join—Using the hash join algorithm, Vertica uses the smaller (inner) joined table to build an in-memory hash table on the join column. A hash join has no sort requirement, but it consumes more memory because Vertica builds a hash table with the values in the inner table. The optimizer chooses a hash join when projections are not sorted on the join columns.
If both inputs are pre-sorted, merge joins do not have to do any pre-processing, making the join perform faster. Vertica uses the term sort-merge join to refer to the case when at least one of the inputs must be sorted prior to the merge join. Vertica sorts the inner input side but only if the outer input side is already sorted on the join columns.
To give the Vertica query optimizer the option to use an efficient merge join for a particular join, create projections on both sides of the join that put the join column first in their respective projections. This is primarily important to do if both tables are so large that neither table fits into memory. If all tables that a table will be joined to can be expected to fit into memory simultaneously, the benefits of merge join over hash join are sufficiently small that it probably isn't worth creating a projection for any one join column.
Sort on columns in important queries
If you have an important query, one that you run on a regular basis, you can save time by putting the columns specified in the WHERE clause or the GROUP BY clause of that query early in the sort order.
If that query uses a high-cardinality column such as Social Security number, you may sacrifice storage by placing this column early in the sort order of a projection, but your most important query will be optimized.
Sort columns of equal cardinality by size
If you have two columns of equal cardinality, put the column that is larger first in the sort order. For example, a CHAR(20) column takes up 20 bytes, but an INTEGER column takes up 8 bytes. By putting the CHAR(20) column ahead of the INTEGER column, your projection compresses better.
Sort foreign key columns first, from low to high distinct cardinality
Suppose you have a fact table where the first four columns in the sort order make up a foreign key to another table. For best compression, choose a sort order for the fact table such that the foreign keys appear first, and in increasing order of distinct cardinality. Other factors also apply to the design of projections for fact tables, such as partitioning by a time dimension, if any.
In the following example, the table inventory
stores inventory data, and product_key
and warehouse_key
are foreign keys to the product_dimension
and warehouse_dimension
tables:
=> CREATE TABLE inventory (
date_key INTEGER NOT NULL,
product_key INTEGER NOT NULL,
warehouse_key INTEGER NOT NULL,
...
);
=> ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
REFERENCES warehouse_dimension(warehouse_key);
ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_product FOREIGN KEY(product_key)
REFERENCES product_dimension(product_key);
The inventory table should be sorted by warehouse_key and then product, since the cardinality of the warehouse_key
column is probably lower that the cardinality of the product_key
.
12.3.5 - Prioritizing column access speed
If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank.
If you measure and set the performance of storage locations within your cluster, Vertica uses this information to determine where to store columns based on their rank. For more information, see Setting storage performance.
How columns are ranked
Vertica stores columns included in the projection sort order on the fastest available storage locations. Columns not included in the projection sort order are stored on slower disks. Columns for each projection are ranked as follows:
-
Columns in the sort order are given the highest priority (numbers > 1000).
-
The last column in the sort order is given the rank number 1001.
-
The next-to-last column in the sort order is given the rank number 1002, and so on until the first column in the sort order is given 1000 + # of sort columns.
-
The remaining columns are given numbers from 1000–1, starting with 1000 and decrementing by one per column.
Vertica then stores columns on disk from the highest ranking to the lowest ranking. It places highest-ranking columns on the fastest disks and the lowest-ranking columns on the slowest disks.
Overriding default column ranking
You can modify which columns are stored on fast disks by manually overriding the default ranks for these columns. To accomplish this, set the ACCESSRANK
keyword in the column list. Make sure to use an integer that is not already being used for another column. For example, if you want to give a column the fastest access rank, use a number that is significantly higher than 1000 + the number of sort columns. This allows you to enter more columns over time without bumping into the access rank you set.
The following example sets column store_key
's access rank to 1500:
CREATE PROJECTION retail_sales_fact_p (
store_key ENCODING RLE ACCESSRANK 1500,
pos_transaction_number ENCODING RLE,
sales_dollar_amount,
cost_dollar_amount )
AS SELECT
store_key,
pos_transaction_number,
sales_dollar_amount,
cost_dollar_amount
FROM store.store_sales_fact
ORDER BY store_key
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;