Creating storage policies for low-priority data
If some of your data is in a partitioned table, you can move less-queried partitions to less-expensive storage such as HDFS. The data is still accessible in queries, just at a slower speed. In this scenario, the faster storage is often referred to as "hot storage," and the slower storage is referred to as "cold storage."
Suppose you have a table named messages (containing social-media messages) that is partitioned by the year and month of the message's timestamp. You can list the partitions in the table by querying the PARTITIONS system table.
=> SELECT partition_key, projection_name, node_name, location_label FROM partitions
ORDER BY partition_key;
partition_key | projection_name | node_name | location_label
--------------+-----------------+------------------+----------------
201309 | messages_b1 | v_vmart_node0001 |
201309 | messages_b0 | v_vmart_node0003 |
201309 | messages_b1 | v_vmart_node0002 |
201309 | messages_b1 | v_vmart_node0003 |
201309 | messages_b0 | v_vmart_node0001 |
201309 | messages_b0 | v_vmart_node0002 |
201310 | messages_b0 | v_vmart_node0002 |
201310 | messages_b1 | v_vmart_node0003 |
201310 | messages_b0 | v_vmart_node0001 |
. . .
201405 | messages_b0 | v_vmart_node0002 |
201405 | messages_b1 | v_vmart_node0003 |
201405 | messages_b1 | v_vmart_node0001 |
201405 | messages_b0 | v_vmart_node0001 |
(54 rows)
Next, suppose you find that most queries on this table access only the latest month or two of data. You might decide to move the older data to cold storage in an HDFS-based storage location. After you move the data, it is still available for queries, but with lower query performance.
To move partitions to the HDFS storage location, supply the lowest and highest partition key values to be moved in the SET_OBJECT_STORAGE_POLICY function call. The following example shows how to move data between two dates. In this example:
-
The partition key value 201309 represents September 2013.
-
The partition key value 201403 represents March 2014.
-
The name, coldstorage, is the label of the HDFS-based storage location.
-
The final argument, which is optional, is
true
, meaning that the function does not return until the move is complete. By default the function returns immediately and the data is moved when the Tuple Mover next runs. When data is old, however, the Tuple Mover runs less frequently, which would delay recovering the original storage space.
=> SELECT SET_OBJECT_STORAGE_POLICY('messages','coldstorage', '201309', '201403', 'true');
The partitions within the specified range are moved to the HDFS storage location labeled coldstorage the next time the Tuple Mover runs. This location name now displays in the PARTITIONS system table's location_label column.
=> SELECT partition_key, projection_name, node_name, location_label
FROM partitions ORDER BY partition_key;
partition_key | projection_name | node_name | location_label
--------------+-----------------+------------------+----------------
201309 | messages_b0 | v_vmart_node0003 | coldstorage
201309 | messages_b1 | v_vmart_node0001 | coldstorage
201309 | messages_b1 | v_vmart_node0002 | coldstorage
201309 | messages_b0 | v_vmart_node0001 | coldstorage
. . .
201403 | messages_b0 | v_vmart_node0002 | coldstorage
201404 | messages_b0 | v_vmart_node0001 |
201404 | messages_b0 | v_vmart_node0002 |
201404 | messages_b1 | v_vmart_node0001 |
201404 | messages_b1 | v_vmart_node0002 |
201404 | messages_b0 | v_vmart_node0003 |
201404 | messages_b1 | v_vmart_node0003 |
201405 | messages_b0 | v_vmart_node0001 |
201405 | messages_b1 | v_vmart_node0002 |
201405 | messages_b0 | v_vmart_node0002 |
201405 | messages_b0 | v_vmart_node0003 |
201405 | messages_b1 | v_vmart_node0001 |
201405 | messages_b1 | v_vmart_node0003 |
(54 rows)
After your initial data move, you can move additional data to the HDFS storage location periodically. You can move individual partitions or a range of partitions from the "hot" storage to the "cold" storage location using the same method:
=> SELECT SET_OBJECT_STORAGE_POLICY('messages', 'coldstorage', '201404', '201404', 'true');
=> SELECT projection_name, node_name, location_label
FROM PARTITIONS WHERE PARTITION_KEY = '201404';
projection_name | node_name | location_label
-----------------+------------------+----------------
messages_b0 | v_vmart_node0002 | coldstorage
messages_b0 | v_vmart_node0003 | coldstorage
messages_b1 | v_vmart_node0003 | coldstorage
messages_b0 | v_vmart_node0001 | coldstorage
messages_b1 | v_vmart_node0002 | coldstorage
messages_b1 | v_vmart_node0001 | coldstorage
(6 rows)
Moving partitions to a table stored on HDFS
Another method of moving partitions from hot storage to cold storage is to move the partitions' data to a separate table in the other storage location. This method breaks the data into two tables, one containing hot data and the other containing cold data. Use this method if you want to prevent queries from inadvertently accessing data stored in cold storage. To query the older data, you must explicitly query the cold table.
To move partitions:
-
Create a new table whose schema matches that of the existing partitioned table.
-
Set the storage policy of the new table to use the HDFS-based storage location.
-
Use the MOVE_PARTITIONS_TO_TABLE function to move a range of partitions from the hot table to the cold table. The partitions migrate when the Tuple Mover next runs.
The following example demonstrates these steps. You first create a table named cold_messages. You then assign it the HDFS-based storage location named coldstorage, and, finally, move a range of partitions.
=> CREATE TABLE cold_messages LIKE messages INCLUDING PROJECTIONS;
=> SELECT SET_OBJECT_STORAGE_POLICY('cold_messages', 'coldstorage');
=> SELECT MOVE_PARTITIONS_TO_TABLE('messages','201309','201403','cold_messages');