Change the number of shards in the database

The initial number of shards is set when you create a database.

The initial number of shards is set when you create a database. You might choose to change the number of shards in a database for the following reasons:

  • Improve large subcluster performance. For example, if you have a 24-node subcluster that has 6 shards, the subcluster uses Elastic Crunch Scaling (ECS) to split the responsibility for processing the data in each shard among the nodes. Re-sharding the database to 24 shards avoids the necessity of ECS and improves performance as ECS is not as efficient as having a one-to-one shard to node ratio. For more information, see Using elastic crunch scaling to improve query performance.

  • Reduce catalog size. If your catalog size has grown due to a high number of shards in your database, you might choose to reduce the number of shards.

  • Improve performance after migrating from Enterprise Mode to Eon Mode. When you migrate your database from Enterprise Mode to Eon Mode, the number of shards in your Eon database is initially set to the number of nodes that you had in your Enterprise database. This default number of shards might not be ideal. For details, see Choosing the Number of Shards and the Initial Node Count.

  • Scale your database effectively. To evenly distribute work among nodes, the number of nodes in the database should be a multiple, or even a divisor, of the number of shards. You might re-shard your database if you plan to scale the subclusters to a size that is incompatible with this guidance. For example, a database with seven shards should only have subclusters that have a multiple of seven nodes. Choosing a shard count with more divisors, such as eight, gives you greater flexibility in choosing the number of nodes in a subcluster.

You should not re-shard your database every time you scale subclusters. While in progress, re-sharding might affect the database's performance. After re-sharding, the storage containers on the subcluster are not immediately aligned with the new shard subscription bounds. This misalignment adds overhead to query execution.

Re-sharding an Eon Mode database

To re-shard your database, call the RESHARD_DATABASE function with the new shard count as the argument. This function takes a global catalog lock, so avoid running it during busy periods or when performing heavy ETL loads. The runtime depends on the size of your catalog.

After RESHARD_DATABASE completes, the nodes in the cluster use the new catalog shard definitions. However, the re-sharding process does not immediately alter the storage containers in communal storage. The shards continue to point to the existing storage containers. For example, if you double the number of shards in your database, each storage container now has two associated shards. During queries, each node filters out the data in the storage containers that does not apply to its subscribed shard. This adds a small overheard to the query. Eventually, the Tuple Mover's background reflexive mergeout processes automatically update the storage containers so they align with the new shard definitions. You can call DO_TM_TASK to run a 'RESHARDMERGEOUT' task that has the Tuple Mover immediately realign the storage containers.

The following query returns the details of any storage containers that Tuple Mover has not yet realigned:

=> SELECT * FROM storage_containers WHERE original_segment_lower_bound IS NOT NULL AND original_segment_upper_bound IS NOT NULL;

Example

This example demonstrates the re-sharding process and how it affects shard assignments and storage containers. To illustrate the impact of re-sharding, the shard assignment and storage container details are compared before and after re-sharding. The following three queries return information about the database's shards, node subscriptions, and storage container catalog objects:

=> SELECT shard_name, lower_hash_bound, upper_hash_bound FROM shards ORDER BY shard_name;

shard_name  | lower_hash_bound | upper_hash_bound
------------+------------------+------------------
replica     |                  |
segment0001 |                0 |       1073741825
segment0002 |       1073741826 |       2147483649
segment0003 |       2147483650 |       3221225473
segment0004 |       3221225474 |       4294967295
(5 rows)

=> SELECT node_name, shard_name, is_primary, is_resubscribing, is_participating_primary FROM node_subscriptions;

node_name | shard_name  | is_primary | is_resubscribing | is_participating_primary
----------+-------------+------------+------------------+--------------------------
initiator | replica     | t          | f                | t
e0        | replica     | f          | f                | t
e1        | replica     | f          | f                | t
e2        | replica     | f          | f                | t
e0        | segment0002 | t          | f                | t
e1        | segment0003 | t          | f                | t
e2        | segment0004 | t          | f                | t
initiator | segment0001 | t          | f                | t
(8 rows)

=> SELECT node_name, projection_name, storage_oid, sal_storage_id, total_row_count, deleted_row_count, segment_lower_bound, segment_upper_bound, shard_name FROM storage_containers WHERE projection_name = 't_super';

node_name | projection_name |    storage_oid    |                  sal_storage_id                  | total_row_count | deleted_row_count | segment_lower_bound | segment_upper_bound | shard_name
----------+-----------------+-------------------+--------------------------------------------------+-----------------+-------------------+---------------------+---------------------+-------------
initiator | t_super         | 45035996273842990 | 022e836bff54b0aed318df2fe73b5afe00a0000000021b2d |               4 |                 0 |                   0 |          1073741825 | segment0001
e0        | t_super         | 49539595901213486 | 024bbf043c1ca3f5c7a86a423fc7e1e300b0000000021b2d |               3 |                 0 |          1073741826 |          2147483649 | segment0002
e1        | t_super         | 54043195528583990 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b35 |               8 |                 0 |          2147483650 |          3221225473 | segment0003
e2        | t_super         | 54043195528583992 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b31 |               6 |                 0 |          3221225474 |          4294967295 | segment0004
(4 rows)

The following call to RESHARD_DATABASE changes the number of shards to eight:


=> SELECT RESHARD_DATABASE(8);

                RESHARD_DATABASE
----------------------------------------------------------
The database has been re-sharded from 4 shards to 8 shards
(1 row)

You can use the following query to view the database's new shard definitions:


=> SELECT shard_name, lower_hash_bound, upper_hash_bound FROM shards ORDER BY shard_name;

shard_name  | lower_hash_bound | upper_hash_bound
-------------+------------------+------------------
replica     |                  |
segment0001 |                0 |        536870913
segment0002 |        536870914 |       1073741825
segment0003 |       1073741826 |       1610612737
segment0004 |       1610612738 |       2147483649
segment0005 |       2147483650 |       2684354561
segment0006 |       2684354562 |       3221225473
segment0007 |       3221225474 |       3758096385
segment0008 |       3758096386 |       4294967295
(9 rows)

The database now has eight shards. Because re-sharding cut the boundary range of each shard in half, each shard is responsible for about half as much of the communal storage data.

The following query returns the database's new node subscriptions:


=> SELECT node_name, shard_name, is_primary, is_resubscribing, is_participating_primary FROM node_subscriptions;

node_name | shard_name  | is_primary | is_resubscribing | is_participating_primary
-----------+-------------+------------+------------------+--------------------------
initiator | replica     | t          | f                | t
e0        | replica     | f          | f                | t
e1        | replica     | f          | f                | t
e2        | replica     | f          | f                | t
initiator | segment0001 | t          | f                | t
e0        | segment0002 | t          | f                | t
e1        | segment0003 | t          | f                | t
e2        | segment0004 | t          | f                | t
initiator | segment0005 | t          | f                | t
e0        | segment0006 | t          | f                | t
e1        | segment0007 | t          | f                | t
e2        | segment0008 | t          | f                | t
(12 rows)

After re-sharding, each node now subscribes to two shards instead of one.

You can use the following query to see how re-sharding affected the database's storage container catalog objects:


=> SELECT node_name, projection_name, storage_oid, sal_storage_id, total_row_count, deleted_row_count, segment_lower_bound, segment_upper_bound, shard_name FROM storage_containers WHERE projection_name = 't_super';

node_name | projection_name |    storage_oid    |                  sal_storage_id                  | total_row_count | deleted_row_count | segment_lower_bound | segment_upper_bound | shard_name
----------+-----------------+-------------------+--------------------------------------------------+-----------------+-------------------+---------------------+---------------------+-------------
initiator | t_super         | 45035996273843145 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b35 |               8 |                 0 |          2147483650 |          3221225473 | segment0005
initiator | t_super         | 45035996273843149 | 022e836bff54b0aed318df2fe73b5afe00a0000000021b2d |               4 |                 0 |                   0 |          1073741825 | segment0001
e0        | t_super         | 49539595901213641 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b35 |               8 |                 0 |          2147483650 |          3221225473 | segment0006
e0        | t_super         | 49539595901213645 | 022e836bff54b0aed318df2fe73b5afe00a0000000021b2d |               4 |                 0 |                   0 |          1073741825 | segment0002
e1        | t_super         | 54043195528584141 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b31 |               6 |                 0 |          3221225474 |          4294967295 | segment0007
e1        | t_super         | 54043195528584143 | 02dac7dc405a1620c92bae1a17c7bbad00c0000000021b31 |               6 |                 0 |          1073741826 |          2147483649 | segment0003
e2        | t_super         | 54043195528584137 | 024bbf043c1ca3f5c7a86a423fc7e1e300b0000000021b2d |               3 |                 0 |          3221225474 |          4294967295 | segment0008
e2        | t_super         | 54043195528584139 | 024bbf043c1ca3f5c7a86a423fc7e1e300b0000000021b2d |               3 |                 0 |          1073741826 |          2147483649 | segment0004
(8 rows)

The shards point to storage files with the same sal_storage_id as before the re-shard. Eventually, the TM's mergeout processes will automatically update the storage containers.

You can query the RESHARDING_EVENTS system table for information about current and historical resharding operations, such as a node's previous shard subscription bounds and the current status of the resharding operation:

=> SELECT node_name, running_status, old_shard_name, old_shard_lower_bound, old_shard_upper_bound FROM RESHARDING_EVENTS;
node_name | running_status  | old_shard_name |   old_shard_lower_bound   |   old_shard_upper_bound
----------+-----------------+----------------+---------------------------+-------------------------
e0        | Running         | segment0001    |                         0 |              1073741825
e0        | Running         | segment0002    |                1073741826 |              2147483649
e0        | Running         | segment0003    |                2147483650 |              3221225473
e0        | Running         | segment0004    |                3221225474 |              4294967295
e1        | Running         | segment0001    |                         0 |              1073741825
e1        | Running         | segment0002    |                1073741826 |              2147483649
e1        | Running         | segment0003    |                2147483650 |              3221225473
e1        | Running         | segment0004    |                3221225474 |              4294967295
initiator | Running         | segment0001    |                         0 |              1073741825
initiator | Running         | segment0002    |                1073741826 |              2147483649
initiator | Running         | segment0003    |                2147483650 |              3221225473
initiator | Running         | segment0004    |                3221225474 |              4294967295
(12 rows)