Partition range projections
Vertica supports projections that specify a range of partition keys. By default, projections store all rows of partitioned table data. Over time, this requirement can incur increasing overhead:
- As data accumulates, increasing amounts of storage are required for large amounts of data that are queried infrequently, if at all.
- Large projections can deter optimizations such as better encodings, or changes to the projection sort order or segmentation. Changes to the projection's DDL like these require you to refresh the entire projection. Depending on the projection size, this refresh operation might span hours or even days.
You can minimize these problems by creating projections for partitioned tables that specify a relatively narrow range of partition keys. For example, the table store_orders
is partitioned on order_date
, as follows:
=> CREATE TABLE public.store_orders(order_no int, order_date timestamp NOT NULL, shipper varchar(20), ship_date date);
CREATE TABLE
=> ALTER TABLE store_orders PARTITION BY order_date::DATE GROUP BY date_trunc('month', (order_date)::DATE);
ALTER TABLE
If desired, you can create a projection of store_orders
that specifies a contiguous range of the table's partition keys. In the following example, the projection ytd_orders
specifies to include only orders that were placed since the first day of the year:
=> CREATE PROJECTION ytd_orders AS SELECT * FROM store_orders ORDER BY order_date
ON PARTITION RANGE BETWEEN date_trunc('year',now())::date AND NULL;
WARNING 4468: Projection <public.ytd_orders_b0> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
WARNING 4468: Projection <public.ytd_orders_b1> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
=> SELECT refresh();
refresh
---------------------------------------------------------------------------------------
Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"public"."ytd_orders_b1": [store_orders] [refreshed] [scratch] [0] [0]
"public"."ytd_orders_b0": [store_orders] [refreshed] [scratch] [0] [0]
(1 row)
Each ytd_orders
buddy projection requires only 7 ROS containers per node, versus the 77 containers required by the anchor table's superprojection:
=> SELECT COUNT (DISTINCT ros_id) NumROS, projection_name, node_name FROM PARTITIONS WHERE projection_name ilike 'store_orders_b%' GROUP BY node_name, projection_name ORDER BY node_name;
NumROS | projection_name | node_name
--------+-----------------+------------------
77 | store_orders_b0 | v_vmart_node0001
77 | store_orders_b1 | v_vmart_node0001
77 | store_orders_b0 | v_vmart_node0002
77 | store_orders_b1 | v_vmart_node0002
77 | store_orders_b0 | v_vmart_node0003
77 | store_orders_b1 | v_vmart_node0003
(6 rows)
=> SELECT COUNT (DISTINCT ros_id) NumROS, projection_name, node_name FROM PARTITIONS WHERE projection_name ilike 'ytd_orders%' GROUP BY node_name, projection_name ORDER BY node_name;
NumROS | projection_name | node_name
--------+-----------------+------------------
7 | ytd_orders_b0 | v_vmart_node0001
7 | ytd_orders_b1 | v_vmart_node0001
7 | ytd_orders_b0 | v_vmart_node0002
7 | ytd_orders_b1 | v_vmart_node0002
7 | ytd_orders_b0 | v_vmart_node0003
7 | ytd_orders_b1 | v_vmart_node0003
(6 rows)
Partition range requirements
Partition range expressions must conform with requirements that apply to table-level partitioning—for example, partition key format and data type validation.
The following requirements and constraints specifically apply to partition range projections:
-
The anchor table must already be partitioned.
-
Partition range expressions must be compatible with the table's partition expression.
-
The first range expression must resolve to a partition key that is smaller or equal to the second expression.
-
If the projection is unsegmented, at least one superprojection of the anchor table must also be unsegmented. If not, Vertica adds the projection to the database catalog, but throws a warning that this projection cannot be used to process queries until you create an unsegmented superprojection.
-
Partition range expressions do not support subqueries.
Anchor table dependencies
As noted earlier, a partition range projection depends on the anchor table being partitioned on the same expression. If you remove table partitioning from the projection's anchor table, Vertica drops the dependent projection. Similarly, if you modify the anchor table's partition clause, Vertica drops the projection.
The following exception applies: if the anchor table's new partition clause leaves the partition expression unchanged, the dependent projection is not dropped and remains available for queries. For example, the table store_orders
and its projection ytd_orders
were originally partitioned as follows:
=> ALTER TABLE store_orders PARTITION BY order_date::DATE GROUP BY DATE_TRUNC('month', (order_date)::DATE);
...
=> CREATE PROJECTION ytd_orders AS SELECT * FROM store_orders ORDER BY order_date
ON PARTITION RANGE BETWEEN date_trunc('year',now())::date AND NULL;
If you now modify store_orders
to use hierarchical partitioning, Vertica repartitions the table data as well as its partition range projection:
=> ALTER TABLE store_orders PARTITION BY order_date::DATE GROUP BY CALENDAR_HIERARCHY_DAY(order_date::DATE, 2, 2) REORGANIZE;
NOTICE 4785: Started background repartition table task
ALTER TABLE
Because both store_orders
and the ytd_orders
projection remain partitioned on the order_date
column, the ytd_orders
projection remains valid. Also, the scope of projection data remains unchanged, so the projection requires no refresh. However, in the background, the Tuple Mover silently reorganizes the projection ROS containers as per the new hierarchical partitioning of its anchor table:
=> SELECT COUNT (DISTINCT ros_id) NumROS, projection_name, node_name FROM PARTITIONS WHERE projection_name ilike 'ytd_orders%' GROUP BY node_name, projection_name ORDER BY node_name;
NumROS | projection_name | node_name
--------+-----------------+------------------
38 | ytd_orders_b0 | v_vmart_node0001
38 | ytd_orders_b1 | v_vmart_node0001
38 | ytd_orders_b0 | v_vmart_node0002
38 | ytd_orders_b1 | v_vmart_node0002
38 | ytd_orders_b0 | v_vmart_node0003
38 | ytd_orders_b1 | v_vmart_node0003
(6 rows)
Modifying existing projections
You can modify the partition range of a projection with ALTER PROJECTION. No refresh is required if the new range is within the previous range. Otherwise, a refresh is required before the projection reflects the modified partition range. Prior to refreshing, the projection continues to return data within the unmodified range.
For example, projection ytd_orders
previously specified a partition range that starts on the first day of the current year. The following ALTER PROJECTION statement changes the range to start on October 1 of last year. The new range precedes the previous one, so Vertica issues a warning to refresh the specified projection ytd_orders_b0
and its buddy projection ytd_orders_b1
:
=> ALTER PROJECTION ytd_orders_b0 ON PARTITION RANGE BETWEEN
add_months(date_trunc('year',now())::date, -3) AND NULL;
WARNING 10001: Projection "public.ytd_orders_b0" changed to out-of-date state as new partition range is not covered by existing partition range
HINT: Call refresh() or start_refresh() to refresh the projections
WARNING 10001: Projection "public.ytd_orders_b1" changed to out-of-date state as new partition range is not covered by existing partition range
HINT: Call refresh() or start_refresh() to refresh the projections
ALTER PROJECTION
You can change a regular projection to a partition range projection provided no history or data loss will occur, such as by doing the following:
=> ALTER PROJECTION foo ON PARTITION RANGE BETWEEN '22' AND NULL;
Dynamic partition ranges
A projection's partition range can be static, set by expressions that always resolve to the same value. For example, the following projection specifies a static range between 06/01/21 and 06/30/21:
=> CREATE PROJECTION last_month_orders AS SELECT * FROM store_orders ORDER BY order_date ON PARTITION RANGE BETWEEN
'2021-06-01' AND '2021-06-30';
...
CREATE PROJECTION
More typically, partition range expressions use stable date functions such as ADD_MONTHS, DATE_TRUNC and NOW to specify a dynamic range. In the following example, the partition range is set from the first day of the previous month. As the calendar date advances to the next month, the partition range advances with it:
=> ALTER PROJECTION last_month_orders_b0 ON PARTITION RANGE BETWEEN
add_months(date_trunc('month', now())::date, -1) AND NULL;
ALTER PROJECTION
As a best practice, always leave the maximum range open-ended by setting it to NULL, and rely on queries to determine the maximum amount of data to fetch. For example, a query that fetches all store orders placed last month might look like this:
=> SELECT * from store_orders WHERE order_date BETWEEN
add_months(date_trunc('month', now())::date, -1) AND
add_months(date_trunc('month', now())::date + dayofmonth(now()), -1);
The query plan generated to execute this query shows that it uses the partition range projection last_month_orders
:
=> EXPLAIN SELECT * from store_orders WHERE order_date BETWEEN
add_months(date_trunc('month', now())::date, -1) AND
add_months(date_trunc('month', now())::date + dayofmonth(now()), -1);
Access Path:
+-STORAGE ACCESS for store_orders [Cost: 34, Rows: 763 (NO STATISTICS)] (PATH ID: 1)
| Projection: public.last_month_orders_b0
| Materialize: store_orders.order_date, store_orders.order_no, store_orders.shipper, store_orders.ship_date
| Filter: ((store_orders.order_date >= '2021-06-01 00:00:00'::timestamp(0)) AND (store_orders.order_date <= '2021-06-3
0 00:00:00'::timestamp(0)))
| Execute on: All Nodes
Dynamic partition range maintenance
The Projection Maintainer is a background service that checks projections with projection range expressions hourly. If the value of either expression in a projection changes, the Projection Maintainer compares the new and old values in PARTITION_RANGE_MIN
and PARTITION_RANGE_MAX
to determine whether the partition range contracted or expanded:
-
If the partition range contracted in either direction—that is, PARTITION_RANGE_MIN is greater, or PARTITION_RANGE_MAX is smaller than its previous value—then the Projection Maintainer acts as follows:
-
Updates the system table PROJECTIONS with new values in columns
PARTITION_RANGE_MIN
andPARTITION_RANGE_MAX
. -
Queues a MERGEOUT request to purge unused data from this range. The projection remains available to execute queries within the updated range.
-
-
If the partition range expanded in either direction—that is, PARTITION_RANGE_MIN is smaller, or PARTITION_RANGE_MAX is greater than its previous value—then the Projection Maintainer leaves the projection and the PROJECTIONS table unchanged. Because the partition range remains unchanged, Vertica regards the existing projection data as up to date, so it also can never be refreshed.
For example, the following projection creates a partition range that includes all orders in the current month:
=> CREATE PROJECTION mtd_orders AS SELECT * FROM store_orders ON PARTITION RANGE BETWEEN
date_trunc('month', now())::date AND NULL;
If you create this partition in July of 2021, the minimum partition range expression—date_trunc('month', now())::date
—initially resolves to the first day of the month: 2021-07-01. At the start of the following month, sometime between 2021-08-01 00:00
and 2021-08-01 01:00
, the Projection Maintainer compares the minimum range expression against system time. It then acts as follows:
-
Updates the PROJECTIONS table and sets
PARTITION_RANGE_MIN
for projectionmtd_orders
to2021-08-01
. -
Queues a MERGEOUT request to purge from this projection's partition range all rows with keys that predate
2021-08-01
.
Important
Given the example shown above, you might consider setting the projection's maximum partition range expression as follows:
add_months(date_trunc('month', now()), 1) - 1
This expression would always resolve to the last day of the current month. With each new month, the maximum partition range would be one month greater than its earlier value. As noted earlier, the Projection Maintainer ignores any expansion of a partition range, so it would leave the minimum and maximum partition range values for mtd_orders
unchanged. To avoid issues like this, always set the maximum partition expression to NULL.