Reshard the default namespace
Important
You can only change the number of shards in thedefault_namespace
if it is the only namespace in your database. If your database contains any non-default namespaces, running RESHARD_DATABASE results in an error.
The initial number of shards for the default_namespace
is set when you create a database. You might choose to change the number of shards in this namespace for the following reasons:
-
Improve large subcluster performance. For example, if you have a 24-node subcluster and a namespace with 6 shards, the subcluster uses Elastic Crunch Scaling (ECS) to split the responsibility for processing the data in each shard among the nodes. Re-sharding the namespace to 24 shards avoids using ECS, which improves performance because ECS is not as efficient as having a 1:1 shard:node ratio. For more information, see Using elastic crunch scaling to improve query performance.
-
Reducing the number of shards in
default_namespace
reduces the catalog size. -
Improve performance after migrating from Enterprise Mode to Eon Mode. When you migrate your database from Enterprise Mode to Eon Mode, the number of shards in your the
default_namespace
is initially set to the number of nodes that you had in your Enterprise database. This default number of shards might not be ideal. For details, see Choosing the initial node and shard counts. -
Scale your database effectively. To evenly distribute work among nodes, the number of nodes in the database should be a multiple of the number of shards. You might re-shard the namespace if you plan to scale the subclusters to a size that is incompatible with this guidance. For example, a namespace with 7 shards works best with subclusters that have a multiple of 7 nodes. Choosing a shard count with more divisors, such as 8, gives you greater flexibility in choosing the number of nodes in a subcluster.
You should not re-shard your default_namespace
every time you scale subclusters. While in progress, re-sharding might affect the database's performance. After re-sharding, the storage containers on the subcluster are not immediately aligned with the new shard subscription bounds. This misalignment adds overhead to query execution.
Re-sharding the default_namespace
To re-shard the default_namespace
, call the RESHARD_DATABASE function with the new shard count as the argument. This function takes a global catalog lock, so avoid running it during busy periods or when performing heavy ETL loads. The runtime depends on the size of your catalog.
After RESHARD_DATABASE completes, the nodes in the cluster use the new catalog shard definitions. However, the re-sharding process does not immediately alter the storage containers in communal storage. The shards continue to point to the existing storage containers. For example, if you double the number of shards in the namespace, each storage container now has two associated shards. During queries, each node filters out the data in the storage containers that does not apply to its subscribed shard. This adds a small overheard to the query. Eventually, the Tuple Mover's background reflexive mergeout processes automatically update the storage containers so they align with the new shard definitions. You can call DO_TM_TASK to run a 'RESHARDMERGEOUT' task that has the Tuple Mover immediately realign the storage containers.
The following query returns the details of any storage containers that Tuple Mover has not yet realigned:
Example
This example demonstrates the re-sharding process and how it affects shard assignments and storage containers. To illustrate the impact of re-sharding, the shard assignment and storage container details are compared before and after re-sharding. The following four queries return information about the namespace's shard count, shard bounds, node subscriptions, and storage container catalog objects:
The following call to RESHARD_DATABASE changes the shard count of default_namespace
to eight:
To confirm that the reshard operation was successful, query the NAMESPACES system table:
You can use the following query to view the namespace's new shard definitions:
The namespace now has eight shards. Because re-sharding cut the boundary range of each shard in half, each shard is responsible for about half as much of the communal storage data.
The following query returns the namespace's new node subscriptions:
After re-sharding, each node now subscribes to two shards instead of one.
You can use the following query to see how re-sharding affected the database's storage container catalog objects:
The shards point to storage files with the same sal_storage_id
as before the re-shard. Eventually, the TM's mergeout processes will automatically update the storage containers.
You can query the RESHARDING_EVENTS system table for information about current and historical resharding operations, such as a node's previous shard subscription bounds and the current status of the resharding operation:
After you re-shard the namespace, you can query the DC_ROSES_CREATED table to track the original ROS containers and DVMiniROSes from which the new storage containers were derived: