This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Projections
Projections provide the following benefits:.
Unlike traditional databases that store data in tables, Vertica physically stores table data in projections, which are collections of table columns.
Projections store data in a format that optimizes query execution. Similar to materialized views, they store result sets on disk rather than compute them each time they are used in a query. Vertica automatically refreshes these result sets with updated or new data.
Projections provide the following benefits:
-
Compress and encode data to reduce storage space. Vertica also operates on the encoded data representation whenever possible to avoid the cost of decoding. This combination of compression and encoding optimizes disk space while maximizing query performance.
-
Facilitate distribution across the database cluster. Depending on their size, projections can be segmented or replicated across cluster nodes. For instance, projections for large tables can be segmented and distributed across all nodes. Unsegmented projections for small tables can be replicated across all nodes.
-
Transparent to end-users. The Vertica query optimizer automatically picks the best projection to execute a given query.
-
Provide high availability and recovery. Vertica duplicates table columns on at least K+1 nodes in the cluster. If one machine fails in a K-Safe environment, the database continues to operate using replicated data on the remaining nodes. When the node resumes normal operation, it automatically queries other nodes to recover data and lost objects. For more information, see High availability with fault groups and High availability with projections.
1 - Projection types
A Vertica table typically has multiple projections, each defined to contain different content.
A Vertica table typically has multiple projections, each defined to contain different content. Content for the projections of a given table can differ in scope and organization. These differences can generally be divided into the following projection types:
Superprojections
For each table in the database, Vertica requires at least one superprojection that contains all columns in the table. In the absence of a query-specific projection, Vertica uses the table's superprojection, which can support any query and DML operation.
Under certain conditions, Vertica automatically creates a table's superprojection immediately on table creation. Vertica also creates a superprojection when you first load data into that table, if none already exists.
CREATE PROJECTION
can create a superprojection if it specifies to include all table columns. A table can have multiple superprojections.
While superprojections can support all queries on a table, they do not facilitate optimal execution of specific queries.
Query-specific projections
A query-specific projection is a projection that contains only the subset of table columns needed to process a given query. Query-specific projections significantly improve performance of queries for which they are optimized.
Aggregate projections
Queries that include expressions or aggregate functions, such as SUM and COUNT, can perform more efficiently when using projections that already contain the aggregated data. This is especially true for queries on large quantities of data.
Vertica provides several types of projections for storing data that is returned from aggregate functions or expressions:
-
Live aggregate projection: Projection that contains columns with values that are aggregated from columns in its anchor table. You can also define live aggregate projections that include user-defined transform functions.
-
Top-K projection: Type of live aggregate projection that returns the top k
rows from a partition of selected rows. Create a Top-K projection that satisfies the criteria for a Top-K query.
-
Projection that pre-aggregates UDTF results: Live aggregate projection that invokes user-defined transform functions (UDTFs). To minimize overhead when you query those projections of this type, Vertica processes the UDTF functions in the background and stores their results on disk.
-
Projection that contains expressions: Projection with columns whose values are calculated from anchor table columns.
For more information, see Pre-aggregating data in projections.
2 - Creating projections
Vertica supports two methods for creating projections: Database Designer and the CREATE PROJECTION statement.
Vertica supports two methods for creating projections: Database Designer and the CREATE PROJECTION statement.
Creating projections with Database Designer
Vertica recommends that you use Database Designer to design your physical schema, by running it on a representative sample of your data. Database Designer generates SQL for creating projections as follows:
-
Analyzes your logical schema, sample data, and sample queries (optional).
-
Designs a physical schema in the form of a SQL script that you can deploy automatically or manually.
For more information, see Creating a database design.
Manually creating projections
CREATE PROJECTION
defines a projection, as in the following example:
=> CREATE PROJECTION retail_sales_fact_p (
store_key ENCODING RLE,
pos_transaction_number ENCODING RLE,
sales_dollar_amount,
cost_dollar_amount )
AS SELECT
store_key,
pos_transaction_number,
sales_dollar_amount,
cost_dollar_amount
FROM store.store_sales_fact
ORDER BY store_key
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;
A projection definition includes the following components:
Column list and encoding
This portion of the SQL statement lists every column in the projection and defines the encoding for each column. Vertica supports encoded data, which helps query execution to incur less disk I/O.
CREATE PROJECTION retail_sales_fact_P (
store_key ENCODING RLE,
pos_transaction_number ENCODING RLE,
sales_dollar_amount,
cost_dollar_amount )
Base query
A projection's base query clause identifies which columns to include in the projection.
AS SELECT
store_key,
pos_transaction_number,
sales_dollar_amount,
cost_dollar_amount
Sort order
A projection's ORDER BY
clause determines how to sort projection data. The sort order localizes logically grouped values so a disk read can identify many results at once. For maximum performance, do not sort projections on LONG VARBINARY and LONG VARCHAR columns. For more information see ORDER BY clause.
ORDER BY store_key
Segmentation
A projection's segmentation clause specifies how to distribute projection data across all nodes in the database. Even load distribution helps maximize access to projection data. For large tables, distribute projection data in segments with SEGMENTED BY HASH
. For example:
SEGMENTED BY HASH(pos_transaction_number) ALL NODES;
For small tables, use the UNSEGMENTED
keyword to replicate table data. Vertica creates identical copies of an unsegmented projection on all cluster nodes. Replication ensures high availability and recovery.
For maximum performance, do not segment projections on LONG VARBINARY and LONG VARCHAR columns.
For more design considerations, see Creating custom designs.
3 - Projection naming
Vertica identifies projections according to the following conventions, where proj-basename is the name assigned to this projection by CREATE PROJECTION.
Vertica identifies projections according to the following conventions, where proj-basename
is the name assigned to this projection by CREATE PROJECTION.
Unsegmented projections
Unsegmented projections conform to the following naming conventions:
proj-basename _super |
The auto projection that Vertica creates when data is loaded for the first time into a new unsegmented table. Vertica uses the anchor table name to create the projection base name proj-basename and appends the string _super . The auto projection is always a superprojection. |
proj-basename _unseg |
An unsegmented projection, where proj-basename and the anchor table name are identical. If no other projection was previously created with this base name (including an auto projection), Vertica appends the string _unseg to the projection name. If the projection is copied on all nodes, this projection name maps to all instances. |
Segmented projections
Enterprise Mode
In Enterprise Mode, segmented projections use the following naming convention:
proj-basename_boffset
This name identifies buddy projections for a segmented projection, where offset
is the projection's node location relative to all other buddy projections. All buddy projections share the same project base name. For example:
=> SELECT projection_basename, projection_name FROM projections WHERE anchor_table_name = 'store_orders';
projection_basename | projection_name
---------------------+-----------------
store_orders | store_orders_b0
store_orders | store_orders_b1
(2 rows)
One exception applies: Vertica uses the following convention to name live aggregate projections: proj-basename
, proj-basename
_b1
, and so on.
Eon Mode
In Eon Mode, segmented projections use the following naming convention:
proj-basename
Note
Eon Mode uses shards in communal storage to segment table data, which are functionally equivalent to Enterprise Mode buddy projections. For details, see
Shards and subscriptions.
Projections of renamed and copied tables
Vertica uses the same logic to rename existing projections in two cases:
In both cases, Vertica uses the following algorithm to rename projections:
-
Iterate over all projections anchored on the renamed or new table, and check whether their names are prefixed by the original table name:
-
If yes, compare the original table name and projection base name:
-
If the new base name is the same as the original table name, then replace the base name with the new table name in the table and projection names.
-
If the new base name is prefixed by the original table name, then replace the prefix with the new table name, remove any version strings that were appended to the old base name (such as old-basename
_v1
), and generate projection names with the new base name.
-
Check whether the new projection names already exist. If not, save them. Otherwise, resolve name conflicts by appending version numbers as needed to the new base name—new-basename
_v1
, new-basename
_v2
, and so on.
Examples
An auto projection is always a superprojection:
=> CREATE TABLE store.store_dimension
store_key int NOT NULL,
store_name varchar(64),
...
) UNSEGMENTED ALL NODES;
CREATE TABLE
=> COPY store.store_dim FROM '/home/dbadmin/store_dimension_data.txt';
50
=> SELECT anchor_table_name, projection_basename, projection_name FROM projections WHERE anchor_table_name = 'store_dimension';
anchor_table_name | projection_basename | projection_name
-------------------+---------------------+-----------------------
store_dimension | store_dimension | store_dimension_super
store_dimension | store_dimension | store_dimension_super
store_dimension | store_dimension | store_dimension_super
(3 rows)
An unsegmented projection name has the _unseg
suffix on all nodes:
=> CREATE TABLE store.store_dimension(
store_key int NOT NULL,
store_name varchar(64),
...
);
CREATE TABLE
=> CREATE PROJECTION store_dimension AS SELECT * FROM store.store_dimension UNSEGMENTED ALL NODES;
WARNING 6922: Projection name was changed to store_dimension_unseg because it conflicts with the basename of the table store_dimension
CREATE PROJECTION
=> SELECT anchor_table_name, projection_basename, projection_name FROM projections WHERE anchor_table_name = 'store_dimension';
anchor_table_name | projection_basename | projection_name
-------------------+---------------------+-----------------------
store_dimension | store_dimension | store_dimension_unseg
store_dimension | store_dimension | store_dimension_unseg
store_dimension | store_dimension | store_dimension_unseg
(3 rows)
The following example creates the segmented table testRenameSeg
and populates it with data:
=> CREATE TABLE testRenameSeg (a int, b int);
CREATE TABLE
dbadmin=> INSERT INTO testRenameSeg VALUES (1,2);
OUTPUT
--------
1
(1 row)
dbadmin=> COMMIT;
COMMIT
Vertica automatically creates two buddy superprojections for this table:
=> \dj testRename*
List of projections
Schema | Name | Owner | Node | Comment
--------+-----------------------+---------+------------------+---------
public | testRenameSeg_b0 | dbadmin | |
public | testRenameSeg_b1 | dbadmin | |
The following CREATE PROJECTION
statements explicitly create additional projections for the table:
=> CREATE PROJECTION nameTestRenameSeg_p AS SELECT * FROM testRenameSeg;
=> CREATE PROJECTION testRenameSeg_p AS SELECT * FROM testRenameSeg;
=> CREATE PROJECTION testRenameSeg_pLap AS SELECT b, MAX(a) a FROM testRenameSeg GROUP BY b;
=> CREATE PROJECTION newTestRenameSeg AS SELECT * FROM testRenameSeg;
=> \dj *testRenameSeg*
List of projections
Schema | Name | Owner | Node | Comment
--------+------------------------+---------+------+---------
public | nameTestRenameSeg_p_b0 | dbadmin | |
public | nameTestRenameSeg_p_b1 | dbadmin | |
public | newTestRenameSeg_b0 | dbadmin | |
public | newTestRenameSeg_b1 | dbadmin | |
public | testRenameSeg_b0 | dbadmin | |
public | testRenameSeg_b1 | dbadmin | |
public | testRenameSeg_pLap | dbadmin | |
public | testRenameSeg_pLap_b1 | dbadmin | |
public | testRenameSeg_p_b0 | dbadmin | |
public | testRenameSeg_p_b1 | dbadmin | |
(10 rows)
If you rename the anchor table, Vertica also renames its projections:
=> ALTER TABLE testRenameSeg RENAME TO newTestRenameSeg;
ALTER TABLEn=> \dj *testRenameSeg*
List of projections
Schema | Name | Owner | Node | Comment
--------+--------------------------+---------+------+---------
public | nameTestRenameSeg_p_b0 | dbadmin | |
public | nameTestRenameSeg_p_b1 | dbadmin | |
public | newTestRenameSeg_b0 | dbadmin | |
public | newTestRenameSeg_b1 | dbadmin | |
public | newTestRenameSeg_pLap_b0 | dbadmin | |
public | newTestRenameSeg_pLap_b1 | dbadmin | |
public | newTestRenameSeg_p_b0 | dbadmin | |
public | newTestRenameSeg_p_b1 | dbadmin | |
public | newTestRenameSeg_v1_b0 | dbadmin | |
public | newTestRenameSeg_v1_b1 | dbadmin | |
(10 rows)
Two sets of buddy projections are not renamed, as their names are not prefixed by the original table name:
-
nameTestRenameSeg_p_b0
-
nameTestRenameSeg_p_b1
-
newTestRenameSeg_b0
-
newTestRenameSeg_b1
When renaming the other projections, Vertica identified a potential conflict between the table's superprojection—originally testRenameSeg
—and existing projection newTestRenameSeg
. It resolved this conflict by appending version numbers _v1
and _v2
to the superprojection's new name:
-
newTestRenameSeg_v1_b0
-
newTestRenameSeg_v1_b1
4 - Auto-projections
Auto-projections are that Vertica automatically generates for tables, both temporary and persistent.
Auto-projections are superprojections that Vertica automatically generates for tables, both temporary and persistent. In general, if no projections have been defined for a table, Vertica automatically creates projections for that table when you first load data into it. The following rules apply to all auto-projections:
-
Vertica creates the auto-projection in the same schema as the table.
-
Auto-projections conform to encoding, sort order, segmentation, and K-safety as specified in the table's creation statement.
-
If the table creation statement contains an AS SELECT clause, Vertica uses some properties of the projection definition's underlying query.
Auto-projection triggers
The conditions for creating auto-projections differ, depending on whether the table is temporary or persistent:
Table type |
Auto-projection trigger |
Temporary |
CREATE TEMPORARY TABLE statement, unless it includes NO PROJECTION . |
Persistent |
CREATE TABLE statement contains one of these clauses:
If none of these conditions are true, Vertica automatically creates a superprojection (if one does not already exist) when you first load data into the table with INSERT or COPY.
|
Default segmentation and sort order
If CREATE TABLE
or CREATE TEMPORARY TABLE
omits a segmentation (SEGMENTED BY
or UNSEGMENTED
) or ORDER BY
clause, Vertica segments and sorts auto-projections as follows:
-
If the table creation statement omits a segmentation (SEGMENTED BY or UNSEGMENTED
) clause, Vertica checks configuration parameter SegmentAutoProjection
to determine whether to create an auto projection that is segmented or unsegmented. By default, this parameter is set to 1 (enable).
-
If SegmentAutoProjection
is enabled and a table's creation statement also omits an ORDER BY
clause, Vertica segments and sorts the table's auto-projection according to the table's manner of creation:
-
If CREATE [TEMPORARY] TABLE
contains an AS SELECT clause and the query output is segmented, the auto-projection uses the same segmentation. If the result set is already sorted, the projection uses the same sort order.
-
In all other cases, Vertica evaluates table column constraints to determine how to sort and segment the projection, as shown below:
Constraints |
Sorted by: |
Segmented by: |
Primary key |
Primary key |
Primary key |
Primary and foreign keys |
- Foreign keys
2. Primary key
|
Primary key |
Foreign keys only |
- Foreign keys
2. Remaining columns excluding LONG data types, up to the limit set in configuration parameter MaxAutoSortColumns (by default 8).
|
All columns excluding LONG data types, up to the limit set in configuration parameter MaxAutoSegColumns (by default 8).
Vertica orders segmentation as follows:
1. Small (≤ 8 byte) data type columns: Columns are specified in the same order as they are defined in the table CREATE statement.
1. Large (>8 byte) data type columns: Columns are ordered by ascending size.
|
None |
All columns excluding LONG data types, in the order specified by CREATE TABLE . |
For example, the following table is defined with no primary or foreign keys:
=> CREATE TABLE testAutoProj(c10 char (10), v1 varchar(140) DEFAULT v2||v3, i int, c5 char(5), v3 varchar (80), d timestamp, v2 varchar(60), c1 char(1));
CREATE TABLE
=> INSERT INTO testAutoProj VALUES
('1234567890',
DEFAULT,
1,
'abcde',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor ',
current_timestamp,
'incididunt ut labore et dolore magna aliqua. Eu scelerisque',
'a');
OUTPUT
--------
1
(1 row)
=> COMMIT;
COMMIT
Before the INSERT statement loads data into this table for the first time, Vertica automatically creates a superprojection for the table:
=> SELECT export_objects('', 'testAutoProj_b0');
--------------------------------------------------------
CREATE PROJECTION public.testAutoProj_b0 /*+basename(testAutoProj),createtype(L)*/
( c10, v1, i, c5, v3, d, v2, c1 )
AS
SELECT testAutoProj.c10,
testAutoProj.v1,
testAutoProj.i,
testAutoProj.c5,
testAutoProj.v3,
testAutoProj.d,
testAutoProj.v2,
testAutoProj.c1
FROM public.testAutoProj
ORDER BY testAutoProj.c10,
testAutoProj.v1,
testAutoProj.i,
testAutoProj.c5,
testAutoProj.v3,
testAutoProj.d,
testAutoProj.v2,
testAutoProj.c1
SEGMENTED BY hash(testAutoProj.i, testAutoProj.c5, testAutoProj.d, testAutoProj.c1, testAutoProj.c10, testAutoProj.v2, testAutoProj.v3, testAutoProj.v1) ALL NODES OFFSET 0;
SELECT MARK_DESIGN_KSAFE(1);
(1 row)
5 - Unsegmented projections
In many cases, dimension tables are relatively small, so you do not need to segment them. Accordingly, you should design a K-safe database so projections for its dimension tables are replicated without segmentation on all cluster nodes. You create unsegmented projections with a
CREATE PROJECTION
statement that includes the clause UNSEGMENTED ALL NODES
. This clause specifies to create identical instances of the projection on all cluster nodes.
The following example shows how to create an unsegmented projection for the table store.store_dimension
:
=> CREATE PROJECTION store.store_dimension_proj (storekey, name, city, state)
AS SELECT store_key, store_name, store_city, store_state
FROM store.store_dimension
UNSEGMENTED ALL NODES;
CREATE PROJECTION
Vertica uses the same name to identify all instances of the unsegmented projection—in this example, store.store_dimension_proj
. The keyword ALL NODES
specifies to replicate the projection on all nodes:
=> \dj store.store_dimension_proj
List of projections
Schema | Name | Owner | Node | Comment
--------+----------------------+---------+------------------+---------
store | store_dimension_proj | dbadmin | v_vmart_node0001 |
store | store_dimension_proj | dbadmin | v_vmart_node0002 |
store | store_dimension_proj | dbadmin | v_vmart_node0003 |
(3 rows)
For more information about projection name conventions, see Projection naming.
6 - Segmented projections
Projection segmentation achieves the following goals:.
You typically create segmented projections for large fact tables. Vertica splits segmented projections into chunks (segments) of similar size and distributes these segments evenly across the cluster. System K-safety determines how many duplicates (buddies) of each segment are created and maintained on different nodes.
Projection segmentation achieves the following goals:
-
Ensures high availability and recovery.
-
Spreads the query execution workload across multiple nodes.
-
Allows each node to be optimized for different query workloads.
Hash Segmentation
Vertica uses hash segmentation to segment large projections. Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns typically meet these criteria, so they are often used as hash function arguments.
You create segmented projections with a
CREATE PROJECTION
statement that includes a SEGMENTED BY
clause.
The following CREATE PROJECTION
statement creates projection public.employee_dimension_super
. It specifies to include all columns in table public.employee_dimension
. The hash segmentation clause invokes the Vertica HASH
function to segment projection data on the column employee_key
; it also includes the ALL NODES
clause, which specifies to distribute projection data evenly across all nodes in the cluster:
=> CREATE PROJECTION public.employee_dimension_super
AS SELECT * FROM public.employee_dimension
ORDER BY employee_key
SEGMENTED BY hash(employee_key) ALL NODES;
If the database is K-safe, Vertica creates multiple buddies for this projection and distributes them on different nodes across the cluster. In this case, database K-safety is set to 1, so Vertica creates two buddies for this projection. It uses the projection name employee_dimension_super
as the basename for the two buddy identifiers it creates—in this example, employee_dimension_super_b0
and employee_dimension_super_b1
:
=> SELECT projection_name FROM projections WHERE projection_basename='employee_dimension_super';
projection_name
-----------------------------
employee_dimension_super_b0
employee_dimension_super_b1
(2 rows)
7 - K-safe database projections
K-safety is implemented differently for segmented and unsegmented projections, as described below.
K-safety is implemented differently for segmented and unsegmented projections, as described below. Examples assume database K-safety is set to 1 in a 3-node database, and uses projections for two tables:
-
store.store_orders_fact
is a large fact table. The projection for this table should be segmented. Vertica distributes projection segments uniformly across the cluster.
-
store.store_dimension
is a smaller dimension table. The projection for this table should be unsegmented. Vertica copies a complete instance of this projection on each cluster node.
Segmented projections
In a K-safe database, the database requires K+1 instances, or buddies, of each projection segment. For example, if database K-safety is set to 1, the database requires two instances, or buddies, of each projection segment.
You can set K-safety on individual segmented projections through the
CREATE PROJECTION
option KSAFE
. Projection K-safety must be equal to or greater than database K-safety. If you omit setting KSAFE
, the projection obtains K-safety from the database.
The following
CREATE PROJECTION
defines a segmented projection for the fact table store.store_orders_fact
:
=> CREATE PROJECTION store.store_orders_fact
(prodkey, ordernum, storekey, total)
AS SELECT product_key, order_number, store_key, quantity_ordered*unit_price
FROM store.store_orders_fact
SEGMENTED BY HASH(product_key, order_number) ALL NODES KSAFE 1;
CREATE PROJECTION
The following keywords in the CREATE PROJECTION
statement pertain to setting projection K-safety:
SEGMENTED BY |
Specifies how to segment projection data for distribution across the cluster. In this example, the segmentation expression specifies Vertica's built-in
HASH function. |
ALL NODES |
Specifies to distribute projection segments across all cluster nodes. |
K-SAFE 1 |
Sets K-safety to 1. Vertica creates two projection buddies with these identifiers:
|
Unsegmented projections
In a K-safe database, unsegmented projections must be replicated on all nodes. Thus, the CREATE PROJECTION
statement for an unsegmented projection must include the segmentation clause UNSEGMENTED ALL NODES
. This instructs Vertica to create identical instances (buddies) of the projection on all cluster nodes. If you create an unsegmented projection on a single node, Vertica regards it unsafe and does not use it.
The following example shows how to create an unsegmented projection for the table store.store_dimension
:
=> CREATE PROJECTION store.store_dimension_proj (storekey, name, city, state)
AS SELECT store_key, store_name, store_city, store_state
FROM store.store_dimension
UNSEGMENTED ALL NODES;
CREATE PROJECTION
Vertica uses the same name to identify all instances of the unsegmented projection—in this example, store.store_dimension_proj
. The keyword ALL NODES
specifies to replicate the projection on all nodes:
=> \dj store.store_dimension_proj
List of projections
Schema | Name | Owner | Node | Comment
--------+----------------------+---------+------------------+---------
store | store_dimension_proj | dbadmin | v_vmart_node0001 |
store | store_dimension_proj | dbadmin | v_vmart_node0002 |
store | store_dimension_proj | dbadmin | v_vmart_node0003 |
(3 rows)
For more information about projection name conventions, see Projection naming.
8 - Partition range projections
Vertica supports projections that specify a range of partition keys.
Vertica supports projections that specify a range of partition keys. By default, projections store all rows of partitioned table data. Over time, this requirement can incur increasing overhead:
- As data accumulates, increasing amounts of storage are required for large amounts of data that are queried infrequently, if at all.
- Large projections can deter optimizations such as better encodings, or changes to the projection sort order or segmentation. Changes to the projection's DDL like these require you to refresh the entire projection. Depending on the projection size, this refresh operation might span hours or even days.
You can minimize these problems by creating projections for partitioned tables that specify a relatively narrow range of partition keys. For example, the table store_orders
is partitioned on order_date
, as follows:
=> CREATE TABLE public.store_orders(order_no int, order_date timestamp NOT NULL, shipper varchar(20), ship_date date);
CREATE TABLE
=> ALTER TABLE store_orders PARTITION BY order_date::DATE GROUP BY date_trunc('month', (order_date)::DATE);
ALTER TABLE
If desired, you can create a projection of store_orders
that specifies a contiguous range of the table's partition keys. In the following example, the projection ytd_orders
specifies to include only orders that were placed since the first day of the year:
=> CREATE PROJECTION ytd_orders AS SELECT * FROM store_orders ORDER BY order_date
ON PARTITION RANGE BETWEEN date_trunc('year',now())::date AND NULL;
WARNING 4468: Projection <public.ytd_orders_b0> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
WARNING 4468: Projection <public.ytd_orders_b1> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
=> SELECT refresh();
refresh
---------------------------------------------------------------------------------------
Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"public"."ytd_orders_b1": [store_orders] [refreshed] [scratch] [0] [0]
"public"."ytd_orders_b0": [store_orders] [refreshed] [scratch] [0] [0]
(1 row)
Each ytd_orders
buddy projection requires only 7 ROS containers per node, versus the 77 containers required by the anchor table's superprojection:
=> SELECT COUNT (DISTINCT ros_id) NumROS, projection_name, node_name FROM PARTITIONS WHERE projection_name ilike 'store_orders_b%' GROUP BY node_name, projection_name ORDER BY node_name;
NumROS | projection_name | node_name
--------+-----------------+------------------
77 | store_orders_b0 | v_vmart_node0001
77 | store_orders_b1 | v_vmart_node0001
77 | store_orders_b0 | v_vmart_node0002
77 | store_orders_b1 | v_vmart_node0002
77 | store_orders_b0 | v_vmart_node0003
77 | store_orders_b1 | v_vmart_node0003
(6 rows)
=> SELECT COUNT (DISTINCT ros_id) NumROS, projection_name, node_name FROM PARTITIONS WHERE projection_name ilike 'ytd_orders%' GROUP BY node_name, projection_name ORDER BY node_name;
NumROS | projection_name | node_name
--------+-----------------+------------------
7 | ytd_orders_b0 | v_vmart_node0001
7 | ytd_orders_b1 | v_vmart_node0001
7 | ytd_orders_b0 | v_vmart_node0002
7 | ytd_orders_b1 | v_vmart_node0002
7 | ytd_orders_b0 | v_vmart_node0003
7 | ytd_orders_b1 | v_vmart_node0003
(6 rows)
Partition range requirements
Partition range expressions must conform with requirements that apply to table-level partitioning—for example, partition key format and data type validation.
The following requirements and constraints specifically apply to partition range projections:
-
The anchor table must already be partitioned.
-
Partition range expressions must be compatible with the table's partition expression.
-
The first range expression must resolve to a partition key that is smaller or equal to the second expression.
-
If the projection is unsegmented, at least one superprojection of the anchor table must also be unsegmented. If not, Vertica adds the projection to the database catalog, but throws a warning that this projection cannot be used to process queries until you create an unsegmented superprojection.
-
Partition range expressions do not support subqueries.
Anchor table dependencies
As noted earlier, a partition range projection depends on the anchor table being partitioned on the same expression. If you remove table partitioning from the projection's anchor table, Vertica drops the dependent projection. Similarly, if you modify the anchor table's partition clause, Vertica drops the projection.
The following exception applies: if the anchor table's new partition clause leaves the partition expression unchanged, the dependent projection is not dropped and remains available for queries. For example, the table store_orders
and its projection ytd_orders
were originally partitioned as follows:
=> ALTER TABLE store_orders PARTITION BY order_date::DATE GROUP BY DATE_TRUNC('month', (order_date)::DATE);
...
=> CREATE PROJECTION ytd_orders AS SELECT * FROM store_orders ORDER BY order_date
ON PARTITION RANGE BETWEEN date_trunc('year',now())::date AND NULL;
If you now modify store_orders
to use hierarchical partitioning, Vertica repartitions the table data as well as its partition range projection:
=> ALTER TABLE store_orders PARTITION BY order_date::DATE GROUP BY CALENDAR_HIERARCHY_DAY(order_date::DATE, 2, 2) REORGANIZE;
NOTICE 4785: Started background repartition table task
ALTER TABLE
Because both store_orders
and the ytd_orders
projection remain partitioned on the order_date
column, the ytd_orders
projection remains valid. Also, the scope of projection data remains unchanged, so the projection requires no refresh. However, in the background, the Tuple Mover silently reorganizes the projection ROS containers as per the new hierarchical partitioning of its anchor table:
=> SELECT COUNT (DISTINCT ros_id) NumROS, projection_name, node_name FROM PARTITIONS WHERE projection_name ilike 'ytd_orders%' GROUP BY node_name, projection_name ORDER BY node_name;
NumROS | projection_name | node_name
--------+-----------------+------------------
38 | ytd_orders_b0 | v_vmart_node0001
38 | ytd_orders_b1 | v_vmart_node0001
38 | ytd_orders_b0 | v_vmart_node0002
38 | ytd_orders_b1 | v_vmart_node0002
38 | ytd_orders_b0 | v_vmart_node0003
38 | ytd_orders_b1 | v_vmart_node0003
(6 rows)
Modifying existing projections
You can modify the partition range of a projection with ALTER PROJECTION. No refresh is required if the new range is within the previous range. Otherwise, a refresh is required before the projection reflects the modified partition range. Prior to refreshing, the projection continues to return data within the unmodified range.
For example, projection ytd_orders
previously specified a partition range that starts on the first day of the current year. The following ALTER PROJECTION statement changes the range to start on October 1 of last year. The new range precedes the previous one, so Vertica issues a warning to refresh the specified projection ytd_orders_b0
and its buddy projection ytd_orders_b1
:
=> ALTER PROJECTION ytd_orders_b0 ON PARTITION RANGE BETWEEN
add_months(date_trunc('year',now())::date, -3) AND NULL;
WARNING 10001: Projection "public.ytd_orders_b0" changed to out-of-date state as new partition range is not covered by existing partition range
HINT: Call refresh() or start_refresh() to refresh the projections
WARNING 10001: Projection "public.ytd_orders_b1" changed to out-of-date state as new partition range is not covered by existing partition range
HINT: Call refresh() or start_refresh() to refresh the projections
ALTER PROJECTION
You can change a regular projection to a partition range projection provided no history or data loss will occur, such as by doing the following:
=> ALTER PROJECTION foo ON PARTITION RANGE BETWEEN '22' AND NULL;
Dynamic partition ranges
A projection's partition range can be static, set by expressions that always resolve to the same value. For example, the following projection specifies a static range between 06/01/21 and 06/30/21:
=> CREATE PROJECTION last_month_orders AS SELECT * FROM store_orders ORDER BY order_date ON PARTITION RANGE BETWEEN
'2021-06-01' AND '2021-06-30';
...
CREATE PROJECTION
More typically, partition range expressions use stable date functions such as ADD_MONTHS, DATE_TRUNC and NOW to specify a dynamic range. In the following example, the partition range is set from the first day of the previous month. As the calendar date advances to the next month, the partition range advances with it:
=> ALTER PROJECTION last_month_orders_b0 ON PARTITION RANGE BETWEEN
add_months(date_trunc('month', now())::date, -1) AND NULL;
ALTER PROJECTION
As a best practice, always leave the maximum range open-ended by setting it to NULL, and rely on queries to determine the maximum amount of data to fetch. For example, a query that fetches all store orders placed last month might look like this:
=> SELECT * from store_orders WHERE order_date BETWEEN
add_months(date_trunc('month', now())::date, -1) AND
add_months(date_trunc('month', now())::date + dayofmonth(now()), -1);
The query plan generated to execute this query shows that it uses the partition range projection last_month_orders
:
=> EXPLAIN SELECT * from store_orders WHERE order_date BETWEEN
add_months(date_trunc('month', now())::date, -1) AND
add_months(date_trunc('month', now())::date + dayofmonth(now()), -1);
Access Path:
+-STORAGE ACCESS for store_orders [Cost: 34, Rows: 763 (NO STATISTICS)] (PATH ID: 1)
| Projection: public.last_month_orders_b0
| Materialize: store_orders.order_date, store_orders.order_no, store_orders.shipper, store_orders.ship_date
| Filter: ((store_orders.order_date >= '2021-06-01 00:00:00'::timestamp(0)) AND (store_orders.order_date <= '2021-06-3
0 00:00:00'::timestamp(0)))
| Execute on: All Nodes
Dynamic partition range maintenance
The Projection Maintainer is a background service that checks projections with projection range expressions hourly. If the value of either expression in a projection changes, the Projection Maintainer compares the new and old values in PARTITION_RANGE_MIN
and PARTITION_RANGE_MAX
to determine whether the partition range contracted or expanded:
-
If the partition range contracted in either direction—that is, PARTITION_RANGE_MIN is greater, or PARTITION_RANGE_MAX is smaller than its previous value—then the Projection Maintainer acts as follows:
-
Updates the system table PROJECTIONS with new values in columns PARTITION_RANGE_MIN
and PARTITION_RANGE_MAX
.
-
Queues a MERGEOUT request to purge unused data from this range. The projection remains available to execute queries within the updated range.
-
If the partition range expanded in either direction—that is, PARTITION_RANGE_MIN is smaller, or PARTITION_RANGE_MAX is greater than its previous value—then the Projection Maintainer leaves the projection and the PROJECTIONS table unchanged. Because the partition range remains unchanged, Vertica regards the existing projection data as up to date, so it also can never be refreshed.
For example, the following projection creates a partition range that includes all orders in the current month:
=> CREATE PROJECTION mtd_orders AS SELECT * FROM store_orders ON PARTITION RANGE BETWEEN
date_trunc('month', now())::date AND NULL;
If you create this partition in July of 2021, the minimum partition range expression—date_trunc('month', now())::date
—initially resolves to the first day of the month: 2021-07-01. At the start of the following month, sometime between 2021-08-01 00:00
and 2021-08-01 01:00
, the Projection Maintainer compares the minimum range expression against system time. It then acts as follows:
-
Updates the PROJECTIONS table and sets PARTITION_RANGE_MIN
for projection mtd_orders
to 2021-08-01
.
-
Queues a MERGEOUT request to purge from this projection's partition range all rows with keys that predate 2021-08-01
.
Important
Given the example shown above, you might consider setting the projection's maximum partition range expression as follows:
add_months(date_trunc('month', now()), 1) - 1
This expression would always resolve to the last day of the current month. With each new month, the maximum partition range would be one month greater than its earlier value. As noted earlier, the Projection Maintainer ignores any expansion of a partition range, so it would leave the minimum and maximum partition range values for mtd_orders
unchanged. To avoid issues like this, always set the maximum partition expression to NULL.
9 - Refreshing projections
When you create a projection for a table that already contains data, Vertica does not automatically load that data into the new projection.
When you create a projection for a table that already contains data, Vertica does not automatically load that data into the new projection. Instead, you must explicitly refresh that projection. Until you do so, the projection cannot participate in executing queries on its anchor table.
You can refresh a projection with one of the following functions:
-
START_REFRESH refreshes projections in the current schema with the latest data of their respective anchor tables. START_REFRESH runs asynchronously in the background.
-
REFRESH synchronously refreshes one or more table projections in the foreground.
Both functions update system tables that maintain information about a projection's refresh status: PROJECTION_REFRESHES, PROJECTIONS, and PROJECTION_CHECKPOINT_EPOCHS.
If a refresh would violate a table or schema disk quota, the operation fails. For more information, see Disk quotas.
You can query the PROJECTION_REFRESHES and PROJECTIONS system tables to view the progress of the refresh operation. You can also call the GET_PROJECTIONS function to view the final status of projection refreshes for a given table:
=> SELECT GET_PROJECTIONS('customer_dimension');
GET_PROJECTIONS
----------------------------------------------------------------------------------------------------------
Current system K is 1.
# of Nodes: 3.
Table public.customer_dimension has 2 projections.
Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
public.customer_dimension_b1 [Segmented: Yes] [Seg Cols: "public.customer_dimension.customer_key"] [K: 1]
[public.customer_dimension_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
public.customer_dimension_b0 [Segmented: Yes] [Seg Cols: "public.customer_dimension.customer_key"] [K: 1]
[public.customer_dimension_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
(1 row)
Refresh methods
Vertica can refresh a projection from one of its buddies, if one is available. In this case, the target projection gets the source buddy's historical data. Otherwise, the projection is refreshed from scratch with data of the latest epoch at the time of the refresh operation. In this case, the projection cannot participate in historical queries on any epoch that precedes the refresh operation.
To determine the method used to refresh a given projection, query the REFRESH_METHOD column from the PROJECTION_REFRESHES system table.
10 - Dropping projections
Projections can be dropped explicitly through the DROP PROJECTION statement.
Projections can be dropped explicitly through the DROP PROJECTION statement. They are also implicitly dropped when you drop their anchor table.