Hierarchical partitioning
Hierarchical partitions organize data into partition groups for more efficient storage. When partitioning by date, the oldest partitions are grouped by year, more recent partitions are grouped by month, and the most recent partitions remain ungrouped. Grouping is dynamic: as recent data ages, the Tuple Mover merges those partitions into month groups, and eventually into year groups. Hierarchical partitioning reduces the number of ROS files Vertica uses and thus reduces ROS pushback.
Use the CALENDAR_HIERARCHY_DAY function as the partitioning GROUP BY
expression in a table's PARTITION BY
clause, as in the following example:
=> CREATE TABLE public.store_orders(
order_no int,
order_date timestamp NOT NULL,
shipper varchar(20),
ship_date date )
PARTITION BY order_date::DATE
GROUP BY CALENDAR_HIERARCHY_DAY(order_date::DATE, 3, 2);
Managing timestamped data
Partition consolidation strategies are especially important for managing timestamped data, where the number of partitions can quickly escalate and risk ROS pushback. For example, the following statements create the store_orders
table and load data into it. The CREATE TABLE statement includes a simple partition clause that partitions data by date:
=> DROP TABLE IF EXISTS public.store_orders CASCADE;
=> CREATE TABLE public.store_orders(
order_no int,
order_date timestamp NOT NULL,
shipper varchar(20),
ship_date date )
UNSEGMENTED ALL NODES PARTITION BY order_date::DATE;
=> COPY store_orders FROM '/home/dbadmin/export_store_orders_data.txt';
As COPY loads the new table data into ROS storage, it executes this table's partition clause by dividing daily orders into separate partitions, where each partition requires its own ROS container:
=> SELECT COUNT (DISTINCT ros_id) NumROS, node_name FROM PARTITIONS
WHERE projection_name ilike '%store_orders_super%'
GROUP BY node_name ORDER BY node_name;
NumROS | node_name
--------+------------------
809 | v_vmart_node0001
809 | v_vmart_node0002
809 | v_vmart_node0003
(3 rows)
This is far above the recommended maximum of 50 partitions per projection. This number is also close to the default system limit of 1024 ROS containers per projection, risking ROS pushback in the near future.
You can approach this problem in several ways:
-
Consolidate table data into larger partitions—for example, partition by month instead of day. However, partitioning data at this level might limit effective use of partition management functions.
-
Regularly archive older partitions, and thereby minimize the number of accumulated partitions. However, this requires an extra layer of data management, and also inhibits access to historical data.
Alternatively, you can use CALENDAR_HIERARCHY_DAY to automatically merge partitions into a date-based hierarchy of partition groups. Each partition group is stored in its own set of ROS containers, apart from other groups. You specify this function in the table partition clause as follows:
PARTITION BY partition-expression
GROUP BY CALENDAR_HIERARCHY_DAY( partition-expression [, active-months[, active-years] ] )
For example, given the previous table, you can repartition it as follows:
=> ALTER TABLE public.store_orders
PARTITION BY order_date::DATE
GROUP BY CALENDAR_HIERARCHY_DAY(order_date::DATE, 2, 2) REORGANIZE;
The partition expression must match the expression used in the GROUP BY
expression. When using CALENDAR_HIERARCHY_DAY, the expression must be a DATE.
Important
The CALENDAR_HIERARCHY_DAY algorithm assumes that most table activity is focused on recent dates. Setting active-years
and active-months
to a low number ≥ 2 serves to isolate most merge activity to date-specific containers, and incurs minimal overhead. Vertica recommends that you use the default setting of 2 for active-years
and active-months
. For most users, these settings achieve an optimal balance between ROS storage and performance.
As a best practice, never set active-years
and active-months
to 0.
Grouping DATE data hierarchically
CALENDAR_HIERARCHY_DAY creates hierarchies of partition groups and merges partitions into the appropriate groups. It does so by evaluating the partition expression of each table row with the following algorithm, to determine its partition group key:
GROUP BY (
CASE WHEN DATEDIFF('YEAR', , NOW()::TIMESTAMPTZ(6)) >=
THEN DATE_TRUNC('YEAR', ::DATE)
WHEN DATEDIFF('MONTH', , NOW()::TIMESTAMPTZ(6)) >=
THEN DATE_TRUNC('MONTH', ::DATE)
ELSE DATE_TRUNC('DAY', ::DATE) END);
In this example, the algorithm compares order_date
in each row to the current date as follows:
-
Determines if
order_date
is in an inactive year. If it is, the row is merged into a ROS container for that year. -
Otherwise, for an active year, CALENDAR_HIERARCHY_DAY evaluates
order_date
to determine if it is in an inactive month. If it is, the row is merged into a ROS container for that month. -
Otherwise, for an active month, the row is merged into a ROS container for that day. Any rows where
order_date
is a future date is treated in the same way.
For example, if the current date is 2017-09-26, CALENDAR_HIERARCHY_DAY resolves active-years
and active-months
to the following date spans:
-
active-years
: 2016-01-01 to 2017-12-31. Partitions in active years are grouped into monthly ROS containers or are merged into daily ROS containers. Partitions from earlier years are regarded as inactive and merged into yearly ROS containers. -
active-months
: 2017-08-01 to 2017-09-30. Partitions in active months are merged into daily ROS containers.
Now, the total number of ROS containers is reduced to 40 per projection:
=> SELECT COUNT (DISTINCT ros_id) NumROS, node_name FROM PARTITIONS
WHERE projection_name ilike '%store_orders_super%'
GROUP BY node_name ORDER BY node_name;
NumROS | node_name
--------+------------------
40 | v_vmart_node0001
40 | v_vmart_node0002
40 | v_vmart_node0003
(3 rows)
Regardless of how the Tuple Mover groups and merges partitions, it always identifies one or more partitions or partition groups as active. For details, see Active and inactive partitions.
Dynamic regrouping
As shown earlier, CALENDAR_HIERARCHY_DAY references the current date when it creates partition group keys and merges partitions. As the calendar advances, the Tuple Mover reevaluates the partition group keys of tables that are partitioned with this function, and moves partitions as needed to different ROS containers.
Thus, given the previous example, on 2017-10-01 the Tuple Mover creates a monthly ROS container for August partitions. All partition keys between 2017-08-01 and 2017-08-31 are merged into the new ROS container 2017-08:
Likewise, on 2018-01-01, the Tuple Mover creates a ROS container for 2016 partitions. All partition keys between 2016-01-01 and 2016-12-31 that were previously grouped by month are merged into the new yearly ROS container:
Caution
After older partitions are grouped into months and years, any partition operation that acts on a subset of older partition groups is likely to split ROS containers into smaller ROS containers for each partition—for example, MOVE_PARTITIONS_TO_TABLE, whereforce-split
is set to true. These operations can lead to ROS pushback. If you anticipate frequent partition operations on hierarchically grouped partitions, consider modifying the partition expression so partitions are grouped no higher than months.
Customizing partition group hierarchies
Vertica provides a single function, CALENDAR_HIERARCHY_DAY, to facilitate hierarchical partitioning. Vertica stores the GROUP BY
clause as a CASE statement that you can edit to suit your own requirements.
For example, Vertica stores the store_orders
partition clause as follows:
=> SELECT EXPORT_TABLES('','store_orders');
...
CREATE TABLE public.store_orders ( ... )
...
PARTITION BY ((store_orders.order_date)::date)
GROUP BY (
CASE WHEN ("datediff"('year', (store_orders.order_date)::date, ((now())::timestamptz(6))::date) >= 2)
THEN (date_trunc('year', (store_orders.order_date)::date))::date
WHEN ("datediff"('month', (store_orders.order_date)::date, ((now())::timestamptz(6))::date) >= 2)
THEN (date_trunc('month', (store_orders.order_date)::date))::date
ELSE (store_orders.order_date)::date END);
You can modify the CASE statement to customize the hierarchy of partition groups. For example, the following CASE statement creates a hierarchy of months, days, and hours:
=> ALTER TABLE store_orders
PARTITION BY (store_orders.order_date)
GROUP BY (
CASE
WHEN DATEDIFF('MONTH', store_orders.order_date, NOW()::TIMESTAMPTZ(6)) >= 2
THEN DATE_TRUNC('MONTH', store_orders.order_date::DATE)
WHEN DATEDIFF('DAY', store_orders.order_date, NOW()::TIMESTAMPTZ(6)) >= 2
THEN DATE_TRUNC('DAY', store_orders.order_date::DATE)
ELSE DATE_TRUNC('hour', store_orders.order_date::DATE) END);
Alternatively, you can write a user-defined SQL function to partition values in different ways. The following example defines a function that partitions timestamps by hour:
=> CREATE OR REPLACE FUNCTION public.My_Calendar_By_Hour
( tsz timestamp(0)
, p1 int
, p2 int
)
return timestamp(0)
as
begin
return (case
when datediff('day', trunc(tsz, 'hh'), now() at time zone 'utc') > p1
then date_trunc('month', trunc(tsz, 'hh'))
when datediff('day', trunc(tsz, 'hh'), now() at time zone 'utc') > p2
then date_trunc('day' , trunc(tsz, 'hh'))
else date_trunc('hour' , trunc(tsz, 'hh'))
end);
end;
You can then use this function in the partition clause:
=> ALTER TABLE store_orders
PARTITION BY ((col2 at time zone 'utc')::timestamp(0))
GROUP BY My_Calendar_By_Hour((col2 at time zone 'utc')::timestamp(0), 31, 1)
REORGANIZE;