Reshard the default namespace

The initial number of shards is set when you create a namespace.

The initial number of shards for the default_namespace is set when you create a database. You might choose to change the number of shards in this namespace for the following reasons:

  • Improve large subcluster performance. For example, if you have a 24-node subcluster and a namespace with 6 shards, the subcluster uses Elastic Crunch Scaling (ECS) to split the responsibility for processing the data in each shard among the nodes. Re-sharding the namespace to 24 shards avoids using ECS, which improves performance because ECS is not as efficient as having a 1:1 shard:node ratio. For more information, see Using elastic crunch scaling to improve query performance.

  • Reducing the number of shards in default_namespace reduces the catalog size.

  • Improve performance after migrating from Enterprise Mode to Eon Mode. When you migrate your database from Enterprise Mode to Eon Mode, the number of shards in your the default_namespace is initially set to the number of nodes that you had in your Enterprise database. This default number of shards might not be ideal. For details, see Choosing the initial node and shard counts.

  • Scale your database effectively. To evenly distribute work among nodes, the number of nodes in the database should be a multiple of the number of shards. You might re-shard the namespace if you plan to scale the subclusters to a size that is incompatible with this guidance. For example, a namespace with 7 shards works best with subclusters that have a multiple of 7 nodes. Choosing a shard count with more divisors, such as 8, gives you greater flexibility in choosing the number of nodes in a subcluster.

You should not re-shard your default_namespace every time you scale subclusters. While in progress, re-sharding might affect the database's performance. After re-sharding, the storage containers on the subcluster are not immediately aligned with the new shard subscription bounds. This misalignment adds overhead to query execution.

Re-sharding the default_namespace

To re-shard the default_namespace, call the RESHARD_DATABASE function with the new shard count as the argument. This function takes a global catalog lock, so avoid running it during busy periods or when performing heavy ETL loads. The runtime depends on the size of your catalog.

After RESHARD_DATABASE completes, the nodes in the cluster use the new catalog shard definitions. However, the re-sharding process does not immediately alter the storage containers in communal storage. The shards continue to point to the existing storage containers. For example, if you double the number of shards in the namespace, each storage container now has two associated shards. During queries, each node filters out the data in the storage containers that does not apply to its subscribed shard. This adds a small overheard to the query. Eventually, the Tuple Mover's background reflexive mergeout processes automatically update the storage containers so they align with the new shard definitions. You can call DO_TM_TASK to run a 'RESHARDMERGEOUT' task that has the Tuple Mover immediately realign the storage containers.

The following query returns the details of any storage containers that Tuple Mover has not yet realigned:

=> SELECT * FROM storage_containers WHERE original_segment_lower_bound IS NOT NULL AND original_segment_upper_bound IS NOT NULL;

Example

This example demonstrates the re-sharding process and how it affects shard assignments and storage containers. To illustrate the impact of re-sharding, the shard assignment and storage container details are compared before and after re-sharding. The following four queries return information about the namespace's shard count, shard bounds, node subscriptions, and storage container catalog objects:

=> SELECT * FROM NAMESPACES;
   namespace_oid   |  namespace_name   | is_default | default_shard_count
-------------------+-------------------+------------+---------------------
 45035996273704988 | default_namespace | t          |                   4

=> SELECT shard_name, lower_hash_bound, upper_hash_bound FROM shards ORDER BY shard_name;

shard_name  | lower_hash_bound | upper_hash_bound
------------+------------------+------------------
replica     |                  |
segment0001 |                0 |       1073741825
segment0002 |       1073741826 |       2147483649
segment0003 |       2147483650 |       3221225473
segment0004 |       3221225474 |       4294967295
(5 rows)

=> SELECT node_name, shard_name, is_primary, is_resubscribing, is_participating_primary FROM node_subscriptions;

node_name | shard_name  | is_primary | is_resubscribing | is_participating_primary
----------+-------------+------------+------------------+--------------------------
initiator | replica     | t          | f                | t
e0        | replica     | f          | f                | t
e1        | replica     | f          | f                | t
e2        | replica     | f          | f                | t
e0        | segment0002 | t          | f                | t
e1        | segment0003 | t          | f                | t
e2        | segment0004 | t          | f                | t
initiator | segment0001 | t          | f                | t
(8 rows)

=> SELECT node_name, projection_name, storage_oid, sal_storage_id, total_row_count, deleted_row_count, segment_lower_bound, segment_upper_bound, shard_name FROM storage_containers WHERE projection_name = 't_super';

node_name | projection_name |    storage_oid    |                  sal_storage_id                  | total_row_count | deleted_row_count | segment_lower_bound | segment_upper_bound | shard_name
----------+-----------------+-------------------+--------------------------------------------------+-----------------+-------------------+---------------------+---------------------+-------------
initiator | t_super         | 45035996273842990 | 022e836bff54b0aed318df2fe73b5afe00a0000000021b2d |               4 |                 0 |                   0 |          1073741825 | segment0001
e0        | t_super         | 49539595901213486 | 024bbf043c1ca3f5c7a86a423fc7e1e300b0000000021b2d |               3 |                 0 |          1073741826 |          2147483649 | segment0002
e1        | t_super         | 54043195528583990 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b35 |               8 |                 0 |          2147483650 |          3221225473 | segment0003
e2        | t_super         | 54043195528583992 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b31 |               6 |                 0 |          3221225474 |          4294967295 | segment0004
(4 rows)

The following call to RESHARD_DATABASE changes the shard count of default_namespace to eight:

=> SELECT RESHARD_DATABASE(8);

                RESHARD_DATABASE
----------------------------------------------------------
The database has been re-sharded from 4 shards to 8 shards
(1 row)

To confirm that the reshard operation was successful, query the NAMESPACES system table:

=> SELECT * FROM NAMESPACES;
   namespace_oid   |  namespace_name   | is_default | default_shard_count
-------------------+-------------------+------------+---------------------
 45035996273704988 | default_namespace | t          |                   8

You can use the following query to view the namespace's new shard definitions:

=> SELECT shard_name, lower_hash_bound, upper_hash_bound FROM shards ORDER BY shard_name;

shard_name  | lower_hash_bound | upper_hash_bound
-------------+------------------+------------------
replica     |                  |
segment0001 |                0 |        536870913
segment0002 |        536870914 |       1073741825
segment0003 |       1073741826 |       1610612737
segment0004 |       1610612738 |       2147483649
segment0005 |       2147483650 |       2684354561
segment0006 |       2684354562 |       3221225473
segment0007 |       3221225474 |       3758096385
segment0008 |       3758096386 |       4294967295
(9 rows)

The namespace now has eight shards. Because re-sharding cut the boundary range of each shard in half, each shard is responsible for about half as much of the communal storage data.

The following query returns the namespace's new node subscriptions:

=> SELECT node_name, shard_name, is_primary, is_resubscribing, is_participating_primary FROM node_subscriptions;

node_name | shard_name  | is_primary | is_resubscribing | is_participating_primary
-----------+-------------+------------+------------------+--------------------------
initiator | replica     | t          | f                | t
e0        | replica     | f          | f                | t
e1        | replica     | f          | f                | t
e2        | replica     | f          | f                | t
initiator | segment0001 | t          | f                | t
e0        | segment0002 | t          | f                | t
e1        | segment0003 | t          | f                | t
e2        | segment0004 | t          | f                | t
initiator | segment0005 | t          | f                | t
e0        | segment0006 | t          | f                | t
e1        | segment0007 | t          | f                | t
e2        | segment0008 | t          | f                | t
(12 rows)

After re-sharding, each node now subscribes to two shards instead of one.

You can use the following query to see how re-sharding affected the database's storage container catalog objects:

=> SELECT node_name, projection_name, storage_oid, sal_storage_id, total_row_count, deleted_row_count, segment_lower_bound, segment_upper_bound, shard_name FROM storage_containers WHERE projection_name = 't_super';

node_name | projection_name |    storage_oid    |                  sal_storage_id                  | total_row_count | deleted_row_count | segment_lower_bound | segment_upper_bound | shard_name
----------+-----------------+-------------------+--------------------------------------------------+-----------------+-------------------+---------------------+---------------------+-------------
initiator | t_super         | 45035996273843145 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b35 |               8 |                 0 |          2147483650 |          3221225473 | segment0005
initiator | t_super         | 45035996273843149 | 022e836bff54b0aed318df2fe73b5afe00a0000000021b2d |               4 |                 0 |                   0 |          1073741825 | segment0001
e0        | t_super         | 49539595901213641 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b35 |               8 |                 0 |          2147483650 |          3221225473 | segment0006
e0        | t_super         | 49539595901213645 | 022e836bff54b0aed318df2fe73b5afe00a0000000021b2d |               4 |                 0 |                   0 |          1073741825 | segment0002
e1        | t_super         | 54043195528584141 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b31 |               6 |                 0 |          3221225474 |          4294967295 | segment0007
e1        | t_super         | 54043195528584143 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b31 |               6 |                 0 |          1073741826 |          2147483649 | segment0003
e2        | t_super         | 54043195528584137 | 024bbf043c1ca3f5c7a86a423fc7e1e300b0000000021b2d |               3 |                 0 |          3221225474 |          4294967295 | segment0008
e2        | t_super         | 54043195528584139 | 024bbf043c1ca3f5c7a86a423fc7e1e300b0000000021b2d |               3 |                 0 |          1073741826 |          2147483649 | segment0004
(8 rows)

The shards point to storage files with the same sal_storage_id as before the re-shard. Eventually, the TM's mergeout processes will automatically update the storage containers.

You can query the RESHARDING_EVENTS system table for information about current and historical resharding operations, such as a node's previous shard subscription bounds and the current status of the resharding operation:

=> SELECT node_name, running_status, old_shard_name, old_shard_lower_bound, old_shard_upper_bound FROM RESHARDING_EVENTS;
node_name | running_status  | old_shard_name |   old_shard_lower_bound   |   old_shard_upper_bound
----------+-----------------+----------------+---------------------------+-------------------------
e0        | Running         | segment0001    |                         0 |              1073741825
e0        | Running         | segment0002    |                1073741826 |              2147483649
e0        | Running         | segment0003    |                2147483650 |              3221225473
e0        | Running         | segment0004    |                3221225474 |              4294967295
e1        | Running         | segment0001    |                         0 |              1073741825
e1        | Running         | segment0002    |                1073741826 |              2147483649
e1        | Running         | segment0003    |                2147483650 |              3221225473
e1        | Running         | segment0004    |                3221225474 |              4294967295
initiator | Running         | segment0001    |                         0 |              1073741825
initiator | Running         | segment0002    |                1073741826 |              2147483649
initiator | Running         | segment0003    |                2147483650 |              3221225473
initiator | Running         | segment0004    |                3221225474 |              4294967295
(12 rows)

After you re-shard the namespace, you can query the DC_ROSES_CREATED table to track the original ROS containers and DVMiniROSes from which the new storage containers were derived:

=> SELECT node_name, projection_name, storage_oid, old_storage_oid, is_dv FROM DC_ROSES_CREATED;
     node_name       | projection_name  |    storage_oid    |   old_storage_oid  | is_dv
---------------------+------------------+-------------------+-----------------------------
initiator            | t_super          | 45035996273860625 | 45035996273843149  |  f
initiator            | t_super          | 45035996273860632 |                 0  |  f
e0                   | t_super          | 45035996273843149 |                 0  |  f
(3 rows)