Segmented projections

Projection segmentation achieves the following goals:.

You typically create segmented projections for large fact tables. Vertica splits segmented projections into chunks (segments) of similar size and distributes these segments evenly across the cluster. System K-safety determines how many duplicates (buddies) of each segment are created and maintained on different nodes.

Projection segmentation achieves the following goals:

  • Ensures high availability and recovery.

  • Spreads the query execution workload across multiple nodes.

  • Allows each node to be optimized for different query workloads.

Hash Segmentation

Vertica uses hash segmentation to segment large projections. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns typically meet these criteria, so they are often used as hash function arguments.

You create segmented projections with a CREATE PROJECTION statement that includes a SEGMENTED BY clause.

The following CREATE PROJECTION statement creates projection public.employee_dimension_super. It specifies to include all columns in table public.employee_dimension. The hash segmentation clause invokes the Vertica HASH function to segment projection data on the column employee_key; it also includes the ALL NODES clause, which specifies to distribute projection data evenly across all nodes in the cluster:

=> CREATE PROJECTION public.employee_dimension_super
    AS SELECT * FROM public.employee_dimension
    ORDER BY employee_key
    SEGMENTED BY hash(employee_key) ALL NODES;

If the database is K-safe, Vertica creates multiple buddies for this projection and distributes them on different nodes across the cluster. In this case, database K-safety is set to 1, so Vertica creates two buddies for this projection. It uses the projection name employee_dimension_super as the basename for the two buddy identifiers it creates—in this example, employee_dimension_super_b0 and employee_dimension_super_b1:

=> SELECT projection_name FROM projections WHERE projection_basename='employee_dimension_super';
       projection_name
-----------------------------
 employee_dimension_super_b0
 employee_dimension_super_b1
(2 rows)