Segmented projections
You typically create segmented projections for large fact tables. Vertica splits segmented projections into chunks (segments) of similar size and distributes these segments evenly across the cluster. System K-safety determines how many duplicates (buddies) of each segment are created and maintained on different nodes.
Projection segmentation achieves the following goals:
-
Ensures high availability and recovery.
-
Spreads the query execution workload across multiple nodes.
-
Allows each node to be optimized for different query workloads.
Hash Segmentation
Vertica uses hash segmentation to segment large projections. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns typically meet these criteria, so they are often used as hash function arguments.
You create segmented projections with a
CREATE PROJECTION
statement that includes a SEGMENTED BY
clause.
The following CREATE PROJECTION
statement creates projection public.employee_dimension_super
. It specifies to include all columns in table public.employee_dimension
. The hash segmentation clause invokes the Vertica HASH
function to segment projection data on the column employee_key
; it also includes the ALL NODES
clause, which specifies to distribute projection data evenly across all nodes in the cluster:
=> CREATE PROJECTION public.employee_dimension_super
AS SELECT * FROM public.employee_dimension
ORDER BY employee_key
SEGMENTED BY hash(employee_key) ALL NODES;
If the database is K-safe, Vertica creates multiple buddies for this projection and distributes them on different nodes across the cluster. In this case, database K-safety is set to 1, so Vertica creates two buddies for this projection. It uses the projection name employee_dimension_super
as the basename for the two buddy identifiers it creates—in this example, employee_dimension_super_b0
and employee_dimension_super_b1
:
=> SELECT projection_name FROM projections WHERE projection_basename='employee_dimension_super';
projection_name
-----------------------------
employee_dimension_super_b0
employee_dimension_super_b1
(2 rows)