Before you run Database Designer to create a design, provide information that allows Database Designer to create the optimal physical schema:
This is the multi-page printable view of this section. Click here to print.
Specifying parameters for Database Designer
- 1: Design name
- 2: Design types
- 2.1: Comprehensive design
- 2.2: Incremental design
- 3: Optimization objectives
- 4: Design tables with sample data
- 5: Design queries
- 6: Replicated and segmented projections
- 6.1: Replicated projections
- 6.2: Segmented projections
- 7: Statistics analysis
1 - Design name
All designs that Database Designer creates must have a name that you specify. The design name must be alphanumeric or underscore (_) characters, and can be no more than 32 characters long. (Administrative Tools and Management Console limit the design name to 16 characters.)
The design name becomes part of the files that Database Designer generates, including the deployment script, allowing the files to be easily associated with a particular Database Designer run.
2 - Design types
The Database Designer can create two distinct design types. The design you choose depends on what you are trying to accomplish:
2.1 - Comprehensive design
A comprehensive design creates an initial or replacement design for all the tables in the specified schemas. Create a comprehensive design when you are creating a new database.
To help Database Designer create an efficient design, load representative data into the tables before you begin the design process. When you load data into a table, Vertica creates an unoptimized superprojection so that Database Designer has projections to optimize. If a table has no data, Database Designer cannot optimize it.
Optionally, supply Database Designer with representative queries that you plan to use so Database Designer can optimize the design for them. If you do not supply any queries, Database Designer creates a generic optimization of the superprojections that minimizes storage, with no query-specific projections.
During a comprehensive design, Database Designer creates deployment scripts that:
-
Create new projections to optimize query performance, only when they do not already exist.
-
Create replacement buddy projections when Database Designer changes the encoding of pre-existing projections that it has decided to keep.
2.2 - Incremental design
After you create and deploy a comprehensive database design, it's likely that your database will change over time in various ways. You should periodically consider using Database Designer to create incremental designs that address these changes. Changes that warrant an incremental design can include:
-
Significant data additions or updates
-
New or modified queries that you run regularly
-
Performance issues with one or more queries
-
Schema changes
3 - Optimization objectives
When creating a design, Database Designer can optimize the design for one of three objectives:
-
Load: Database Designer creates a design that is optimized for loads, minimizing database size, potentially at the expense of query performance.
-
Performance: Database Designer creates a design that is optimized for fast query performance. Because it recommends a design for optimized query performance, this design might recommend more than the Load or Balanced objectives, potentially resulting in a larger database storage size.
-
Balanced: Database Designer creates a design whose objectives are balanced between database size and query performance.
A fully optimized query has an optimization ratio of 0.99. Optimization ratio is the ratio of a query's benefits achieved in the design produced by the Database Designer to that achieved in the ideal plan. Check the optimization ratio with the OptRatio parameter in designer.log.
4 - Design tables with sample data
You must specify one or more design tables for Database Designer to deploy a design. If your schema is empty, it does not appear as a design table option.
When you specify design tables, consider the following:
-
To create the most efficient projections for your database, load a moderate amount of representative data into tables before running Database Designer. Database Designer considers the data in this table when creating the design.
-
If your design tables have a large amount if data, the Database Designer run takes a long time; if your tables have too little data, the design is not optimized. Vertica recommends that 10 GB of sample data is sufficient for creating an optimal design.
-
If you submit a design table with no data, Database Designer ignores it.
-
If one of your design tables has been dropped, you will not be able to build or deploy your design.
5 - Design queries
If you supply representative queries that you run on your database to Database Designer, it optimizes the performance of those queries.
Database Designer checks the validity of all queries when you add them to your design and again when it builds the design. If a query is invalid, Database Designer ignores it.
The query file can contain up to 100 queries. Each query can be assigned a weight that indicates its relative importance so that Database Designer can prioritize it when creating the design. Database Designer groups queries that affect the design that Database Designer creates in the same way and considers one weighted query when creating a design.
The following options apply, depending on whether you create an incremental or comprehensive design:
-
Design queries are required for incremental designs.
-
Design queries are optional for comprehensive designs. If you do not provide design queries, Database Designer recommends a generic design that does not consider specific queries.
Query repository
Using Management Console, you can submit design queries from the QUERY_REQUESTS system table. This is called the query repository.
The QUERY_REQUESTS table contains queries that users have run recently. For a comprehensive design, you can submit up to 200 queries from the QUERY_REQUESTS table to Database Designer to be considered when creating the design. For an incremental design, you can submit up to 100 queries from the QUERY_REQUESTS table.
6 - Replicated and segmented projections
When creating a comprehensive design, Database Designer creates projections based on data statistics and queries. It also reviews the submitted design tables to decide whether projections should be segmented (distributed across the cluster nodes) or replicated (duplicated on all cluster nodes).
For detailed information, see the following sections:
6.1 - Replicated projections
Replication occurs when Vertica stores identical copies of data across all the nodes in your cluster.
Assuming that largest-row-count equals the number of rows in the design table with the largest number of rows, Database Designer recommends that a projection be replicated if any of the following conditions is true:
-
largest-row-count < 1,000,000 and number of rows in the table <= 10% of largest-row-count
-
largest-row-count >= 10,000,000 and number of rows in the table <= 1% of largest-row-count
-
The number of rows in the table <= 100,000
For more information about replication, see High availability with projections.
6.2 - Segmented projections
Segmentation occurs when Vertica distributes data evenly across multiple database nodes so that all nodes participate in query execution. Projection segmentation provides high availability and recovery, and optimizes query execution.
When running Database Designer programmatically or using Management Console, you can specify to allow Database Designer to recommend unsegmented projections in the design. If you do not specify this, Database Designer recommends only segmented projections.
Database Designer recommends segmented superprojections for large tables when deploying to multiple node clusters, and recommends replicated superprojections for smaller tables.
Database Designer does not segment projections on:
-
Single-node clusters
-
LONG VARCHAR and LONG VARBINARY columns
For more information about segmentation, see High availability with projections.
7 - Statistics analysis
By default, Database Designer analyzes statistics for the design tables when adding them to the design. This option is optional, but Vertica recommends that you analyze statistics because accurate statistics help Database Designer optimize compression and query performance.
Analyzing statistics takes time and resources. If the current statistics for the design tables are up to date, do not bother analyzing the statistics. When in doubt, analyze the statistics to make sure they are current.
For more information, see Collecting Statistics.