Creating projections

Vertica supports two methods for creating projections: Database Designer and the CREATE PROJECTION statement.

Vertica supports two methods for creating projections: Database Designer and the CREATE PROJECTION statement.

Creating projections with Database Designer

Vertica recommends that you use Database Designer to design your physical schema, by running it on a representative sample of your data. Database Designer generates SQL for creating projections as follows:

  1. Analyzes your logical schema, sample data, and sample queries (optional).

  2. Designs a physical schema in the form of a SQL script that you can deploy automatically or manually.

For more information, see Creating a database design.

Manually creating projections

CREATE PROJECTION defines a projection, as in the following example:

=> CREATE PROJECTION retail_sales_fact_p (
    store_key ENCODING RLE,
       pos_transaction_number ENCODING RLE,
    sales_dollar_amount,
    cost_dollar_amount )
AS SELECT
    store_key,
    pos_transaction_number,
    sales_dollar_amount,
    cost_dollar_amount
FROM store.store_sales_fact
ORDER BY store_key
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;

A projection definition includes the following components:

Column list and encoding

This portion of the SQL statement lists every column in the projection and defines the encoding for each column. Vertica supports encoded data, which helps query execution to incur less disk I/O.

CREATE PROJECTION retail_sales_fact_P (
    store_key ENCODING RLE,
    pos_transaction_number ENCODING RLE,
    sales_dollar_amount,
       cost_dollar_amount )

Base query

A projection's base query clause identifies which columns to include in the projection.


AS SELECT
    store_key,
    pos_transaction_number,
    sales_dollar_amount,
    cost_dollar_amount

Sort order

A projection's ORDER BY clause determines how to sort projection data. The sort order localizes logically grouped values so a disk read can identify many results at once. For maximum performance, do not sort projections on LONG VARBINARY and LONG VARCHAR columns. For more information see ORDER BY clause.

ORDER BY store_key

Segmentation

A projection's segmentation clause specifies how to distribute projection data across all nodes in the database. Even load distribution helps maximize access to projection data. For large tables, distribute projection data in segments with SEGMENTED BY HASH. For example:

SEGMENTED BY HASH(pos_transaction_number) ALL NODES;

For small tables, use the UNSEGMENTED keyword to replicate table data. Vertica creates identical copies of an unsegmented projection on all cluster nodes. Replication ensures high availability and recovery.

For maximum performance, do not segment projections on LONG VARBINARY and LONG VARCHAR columns.

For more design considerations, see Creating custom designs.