Namespaces and shards

In Eon Mode, Vertica stores data communally in a shared data storage location (for example, in S3 when running on AWS).

In Eon Mode, Vertica stores data communally in a shared data storage location, such as in S3 when running on AWS. All nodes are capable of accessing all of the data in the communal storage location. In order for nodes to divide the work of processing queries, Vertica must divide the data between them in some way. It breaks the data in communal storage into segments called shards. Each node in your database subscribes to a subset of the shards in the communal storage location. The shards in your communal storage location are similar to a collection of segmented projections in an Enterprise Mode database.

The number of shards that data is segmented into depends on the namespace to which the data belongs. Namespaces are a collection of schemas and tables in your database that are grouped under a common name and segmented into the number of shards defined by that namespace. In Eon Mode databases, namespaces represent the top-level data structure in the Vertica object hierarchy. Every table and schema in the database belongs to a namespace. By default, the database contains a single default namespace, default_namespace, which is created on database creation with the shard count specified during setup. For details about creating and managing additional namespaces, see Managing namespaces.

When you create a table or schema, you can specify the namespace under which the object is created. If no namespace is specified, the table or schema is created under the default namespace.

The recommended shard count for a namespace depends primarily on the size of the tables it will store and the expected complexity of its query workload. Generally, the database performs best when large tables with complex analytic workloads are segmented with a larger shard count, and small tables with simpler query workloads are segmented with a smaller shard count. Aligning table size and query complexity to shard count also minimizes catalog overhead. If your database contains namespaces with a range of shard counts, you can create tables and schemas under the namespace that best fits these basic sizing guidelines, or those of your particular use case. For a more detailed exploration of shard count and various use cases, see Configuring your Vertica cluster for Eon Mode.

For the best performance, the number of shards you choose for a namespace should be no greater than twice the number of nodes. At most, you should limit the shard:node ratio to no greater than 3:1. MC warns you to take all aspects of shard count into consideration. The number of shards should always be a multiple (or an even divisor) of the number of nodes in your database. See Choosing shard and initial node counts for more information.

For efficiency, Vertica transfers metadata about shards directly between database nodes. This peer-to-peer transfer applies only to metadata; the actual data that is stored on each node gets copied from communal storage to the node's depot as needed.

Shard subscriptions and k-safety

When K-safety is 1 or higher (high availability), each shard has more than one node subscribing to it in each subcluster. One of the subscribers is responsible for executing queries involving the shard. The other subscribers act as backups. If the main subscriber shuts down or is stopped, then another subscriber takes its place. See Data integrity and high availability in an Eon Mode database for more information.

Each shard in a namespace has a primary subscriber. This subscriber is a primary node that maintains the data in the shard by planning Tuple Mover operations on it. This node can delegate executing these actions to another node in the database cluster. See Tuple mover for more information about these operations. If the primary subscriber node is stopped or goes down, Vertica chooses another primary node that subscribes to the shard as the shard's primary subscriber. If all of the primary nodes that subscribe to a shard go down or are stopped, your database goes into read-only mode to maintain data integrity. Any primary node that is the sole subscriber to a shard is a critical node.