The Vertica Glossary defines terms that are common and specific to Vertica.
This is the multi-page printable view of this section. Click here to print.
Glossary
- 1: Access rank
- 2: Administration host
- 3: Administration tools
- 4: Agent
- 5: Anchor table
- 6: Ancient history mark (AHM)
- 7: Apportioned load
- 8: Authentication service (AS)
- 9: Bitstring
- 10: Buddy projection
- 11: Bulk loading
- 12: C-store
- 13: Cardinality
- 14: Catalog
- 15: Catalog path
- 16: Cell
- 17: Cluster
- 18: Columnar tables
- 19: Communal storage
- 20: Connection node
- 21: Control node
- 22: Execution node
- 23: Cooperative parse
- 24: Correlated columns
- 25: Critical node
- 26: Current epoch
- 27: Data collector
- 28: Data path
- 29: Database
- 30: Database Designer
- 31: Database superuser
- 32: Default subcluster
- 33: DELTAVAL (delta encoding)
- 34: Depot
- 35: Depot warming
- 36: Derived column
- 37: Dimension table
- 38: Directed query
- 39: Encoding
- 40: Enterprise Mode
- 41: Eon Mode
- 42: Epoch
- 43: Epoch map
- 44: Executor node
- 45: Expression
- 46: External procedure
- 47: Event series
- 48: Flexible tables
- 49: Flattening (subqueries and views)
- 50: Full backup
- 51: GENERAL pool
- 52: Grant
- 53: Grouped ROS
- 54: Hash segmentation
- 55: Historical data
- 56: Historical query
- 57: Immutable (invariant) functions
- 58: Initiator node
- 59: K-safety
- 60: Last epoch
- 61: Last good epoch (LGE)
- 62: Live aggregate projection
- 63: Locale
- 64: Logical schema
- 65: Location label
- 66: Management Console
- 67: MC SUPER administrator (superuser)
- 68: MC-managed database
- 69: Mergeout
- 70: Meta-functions
- 71: Metadata
- 72: Mutual mode
- 73: Node
- 74: Non-repeatable read
- 75: NUL
- 76: NULL
- 77: OID
- 78: Parallel load
- 79: Path (quality plan)
- 80: Partitioning
- 81: Phantom read
- 82: Physical schema
- 83: Physical schema design
- 84: Primary column
- 85: Primary node
- 86: Primary subcluster
- 87: Primary subscriber
- 88: Projection
- 89: Projection set
- 90: Query cluster level
- 91: Query optimizer
- 92: Quorum
- 93: Recovery
- 94: Referential integrity
- 95: Refresh (projections)
- 96: Resegmentation
- 97: Resource pool
- 98: Resource manager
- 99: Role
- 100: Rollback
- 101: ROS (Read optimized store)
- 102: Savepoint
- 103: Secondary subcluster
- 104: Segmentation
- 105: Server mode
- 106: Session
- 107: Shard
- 108: Shard coverage
- 109: Snapshot isolation
- 110: Spread
- 111: Sort-merge join
- 112: Stable functions
- 113: Statement
- 114: Storage location
- 115: Storage policy
- 116: Strict
- 117: Subcluster
- 118: Superprojection
- 119: Temp location
- 120: Temp space
- 121: Top-k projection
- 122: Transaction
- 123: Tuple mover (TM)
- 124: UDx
- 125: Up-to-date (projection)
- 126: User-defined SQL function
- 127: View
- 128: Volatile functions
- 129: vsql
- 130: Window (analytic)
- 131: Working data size
- 132: Workload analyzer
1 - Access rank
Determines the speed at which a column can be accessed. Columns are stored on disk from the highest ranking to the lowest ranking in which the highest ranking columns are placed on the fastest disks and the lowest ranking columns are placed on the slowest disks.
2 - Administration host
The host on which the Vertica rpm package was manually installed. Always run the Administration Tools on this host if possible.
3 - Administration tools
Vertica Administration Tools provides a graphical user interface for managing a Vertica database. With this tool, you can perform various tasks, including:
-
View the state of the database cluster.
-
Create a database.
-
Start a database.
-
Stop a database.
-
Run Database Designer.
-
Connect to a database using vsql.
4 - Agent
A daemon process that runs on each Vertica cluster node. The agent is used by certain clients, such as Management Console, to administer Vertica.
Agents monitor Vertica database clusters and communicate with their clients to provide the following functionality:
-
Provide local access, command, and control over database instances on a given node, using functionality similar to Administration tools
-
Report log-level data from the Administration Tools and Vertica log files
-
Cache details from long-running jobs—such as create/start/stop database operations—that you can view through your browser
-
Track changes to data-collection and monitoring utilities and communicate updates to clients
-
Specifically for MC, communicate between all cluster nodes and MC through a webhook subscription, which automates information sharing and reports on cluster-specific issues like node state, alerts, events, and so on
The agent runs on port 5444, which must be accessible to agent clients.
5 - Anchor table
Database table that is the source for data in a projection. The anchor table and its projection must be in the same schema. The privileges to create, access, or alter a projection are based on the anchor tables that the projection references, as well as the schemas that contain them.
In the following simple statement, public data is the anchor table:
=> CREATE PROJECTION publicdataproj AS (SELECT * FROM publicdata);
6 - Ancient history mark (AHM)
Also known as AHM, the ancient history mark is the oldest epoch whose data is accessible to historical queries. Any data that precedes the AHM is eligible to be purged.
For detailed information, see Understanding Vertica Epochs in the Vertica Knowledge Base.
7 - Apportioned load
An apportioned load is a divisible load, such that you can load a single data file on more than one node. Apportioned load divides the load at planning time, based on available nodes and cores on each node. Each load starts at a different offset, requiring a parser that supports apportioning. Some of the parsers built into Vertica support apportioned load. Using the SDK, you can write parsers that perform apportioned loads.
8 - Authentication service (AS)
A service that usually runs on the same host as the Kerberos Key Distribution Center (KDC). The AS issues tickets for a requested service. The tickets are in turn given to users for access to the service. The AS answers requests from clients that do not send credentials with a request. The AS is generally used to gain access to the ticket-granting service (TGS) by issuing a ticket-granting ticket (TGT).
9 - Bitstring
A sequence of bits.
10 - Buddy projection
Required for K-safety. Two projections are considered to be buddies if they contain the same columns and have the same hash segmentation, using different node ordering.
For more information, see:
11 - Bulk loading
A process of loading large amounts of data, such as an initial load of historic data.
12 - C-store
A research project at MIT, Brandeis, Brown, and the University of Massachusetts (Boston), on which Vertica is based.
13 - Cardinality
Refers to the number of unique values for a given column in a relational table:
-
High cardinality: Refers to columns containing values that are highly unique, such as a customer ID or an employee e-mail address. For example, in the Vertica VMart schema, the
employee_dimension
table contains anemployee_key
column. This column contains values that uniquely identify each employee. Since the values in this column are unique and could be numerous, the column's cardinality type is referred to as high cardinality. -
Normal cardinality: Refers to columns containing values that are less unique, such as job titles and street addresses. An example of a normal-cardinality column would be
job_title
oremployee_first_name
in theemployee_dimension
table, where many employees could share the same job title or same first name. -
Low cardinality: Refers to a low number of unique values, relative to the overall number of records in a table. For example, in the
employee_dimension
table, the column calledemployee_gender
would contain two unique values: 'Male' or 'Female'. Since there are only two values possible in this column, cardinality is low.
14 - Catalog
A set of files that contains information (metadata) about the objects—such as nodes, tables, constraints, and projections—in a database. Vertica maintains a catalog on each node in the cluster.
15 - Catalog path
A storage location used to store the database catalog.
16 - Cell
A cell is space in a database object allocated for a single piece of data. For a list of supported data types that can populate a cell, see Data types.
17 - Cluster
The concept of Cluster in the Vertica Analytics Platform is a collection of hosts with the Vertica software packages (RPM or DEB) that are in one admin tools domain. You can access and manage a cluster from one admintools initiator host.
18 - Columnar tables
Vertica database tables consisting of structured data columns. The term differentiates these tables from Flex (or Flexible) tables, which minimally contain one column of unstructured, or semi-structured data. Flex tables can also have structured data columns, but they are not required. Compare with Flexible tables.
19 - Communal storage
The shared storage locations containing an Eon Mode database's data. The data within the communal storage is the canonical copy of the data. Vertica does not consider data as being committed until it has been written to communal storage.
Communal data storage locations are based on object stores, such as an S3 bucket in the AWS cloud or a PureStorage FlashBlade appliance.
20 - Connection node
In the context of workload routing, the connection node is a node in the default subcluster that the client first connects to before being routed to an execution node in some subcluster as specified by the routing rule associated with the client's workload. The connection node then acts as a proxy between the client and the execution node.
21 - Control node
A node that connects to the Spread service to send and receive cluster-wide broadcast messages. In databases where the large cluster feature is disabled, all nodes are control nodes. In databases where the large cluster feature is enabled, a subset of nodes are control nodes. Vertica assigns non-control nodes to a control node. These dependent nodes rely on their control node to relay broadcast messages to them. See Large Cluster for more information.
22 - Execution node
In the context of workload routing, the execution node handles the execution of client queries. Clients are routed to a subcluster containing the execution node based on routing rules.
23 - Cooperative parse
A cooperative parse occurs when a single node uses multiple threads to parse data for loading. Cooperative parse divides a load at execution time, based on how threads are scheduled, if the parser supports cooperation. Some of the parsers built into Vertica support cooperative parse. Using the SDK, you can write parsers that perform cooperative parse.
24 - Correlated columns
Two columns are correlated if the value of one column is related to the value of the other column. For example, state name and country name columns are strongly correlated because the city name usually, but perhaps not always, identifies the state name. The city of Conshohoken is uniquely associated with Pennsylvania, whereas the city of Boston exists in Georgia, Indiana, Kentucky, New York, Virginia, and Massachusetts. In this case, city name is strongly correlated with state name.
25 - Critical node
A critical node is a node whose failure would cause the database to become unsafe and force a shutdown. Nodes can become critical for the following reasons:
-
A node has the only copy of a particular projection.
-
Fewer than half of your nodes are active. In an Eon Mode database, at least half of your primary nodes must be up to maintain data integrity.
The V_MONITOR.CRITICAL_NODES system table lists the critical nodes, if any, in your cluster.
For more information, see K-safety in an Enterprise Mode database for Enterprise Mode databases, and Data integrity and high availability in an Eon Mode database for Eon Mode databases.
26 - Current epoch
The epoch into which data (COPY, INSERT, UPDATE, and DELETE operations) is currently being written.
27 - Data collector
A utility that collects and retains database monitoring information. For details, see Data Collector utility.
28 - Data path
A storage location that contains actual database data files.
29 - Database
A cluster of nodes that, when active, can perform distributed data storage and SQL statement execution through administrative, interactive, and programmatic user interfaces.
30 - Database Designer
A tool that analyzes a logical schema definition, sample queries, and sample data, and creates a physical schema (projections) in the form of a SQL script that you deploy automatically or manually. The script creates a minimal set of superprojections to ensure K-safety, and, optionally, non-superprojections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints.
The Database Designer can create two distinct design types. The design you choose depends on what you are trying to accomplish:
You can also create custom designs if the Database Designer does not meet your needs.
For detailed information, see Creating a database design.
31 - Database superuser
The automatically-created database user who has the same name as the Linux database administrator account and who can bypass all GRANT/REVOKE authorization, or any user that has been granted the PSEUDOSUPERUSER role. Do not confuse the concept of a database superuser with Linux superuser (root) privilege. A database superuser cannot have Linux superuser privilege.
32 - Default subcluster
The default subcluster is the subcluster Vertica adds new nodes to if you do not specify a subcluster to contain the new nodes. Your database can only have a single default subcluster. You can make a subcluster the default using the ALTER SUBCLUSTER statement.
33 - DELTAVAL (delta encoding)
Stores only the differences between sequential data values instead of the values themselves.
34 - Depot
A cache of data maintained by the nodes in an Eon Mode database to limit reads from communal storage. Retrieving data from communal storage often has high latency and potentially limited bandwidth, especially in cloud environments. Each node in the database caches the data it reads from and writes to the communal storage. When processing a query, they first check this local cache for the data they need. If the data is cached locally, the nodes use the cached version instead of retrieving the data from communal storage.
35 - Depot warming
On startup, the depots of new nodes are empty, while the depots of restarted nodes often contain stale data that must be refreshed. When depot warming is enabled, a node that is undergoing startup preemptively loads its depot with frequently queried and pinned data. When the node completes startup and begins to execute queries, its depot already contains much of the data it needs to process those queries. This reduces the need to fetch data from communal storage, and expedites query performance accordingly.
Note
Fetching data to a warming depot can delay node startup.For details, see Managing depot caching.
36 - Derived column
A column whose values are calculated by an expression at load time. The expression is specified within the COPY statement, and the column exists in the target database.
37 - Dimension table
Sometimes called a lookup or reference table, a dimension table is one of a set of companion tables to a large (fact/anchor) table in a star schema. It contains the PRIMARY KEY column corresponding to the join columns in fact tables. For example, a business might use a dimension table to contain item codes and descriptions.
Dimension tables can be connected to other dimension tables to form a hierarchy of dimensions in a snowflake schema.
38 - Directed query
A saved set of instructions that direct the optimizer to generate a query plan for a given query. The query plan consists of SQL annotated with hints. A directed query pairs two components:
-
Input query: A query that triggers use of this directed query when it is active.
-
Annotated query: A SQL statement with embedded optimizer hints. The annotated query is used by the optimizer in creating a query plan for the specified input query.
39 - Encoding
The process of converting data into a standard format. In Vertica, encoded data can be processed directly, while compressed data cannot. Vertica uses a number of different encoding strategies, depending on column data type, table cardinality, and sort order.
40 - Enterprise Mode
A database mode that optimizes your database for analytic speed. This is the "original" mode of the Vertica database—the only mode it supported before Eon Mode was introduced. In this mode, data is stored by the database nodes. This proximity allows nodes to operate on locally stored data for most queries, reducing query times. An Enterprise Mode database is harder to scale up and down than an Eon Mode database, which is optimized for scalability.
41 - Eon Mode
Eon Mode is the database mode that optimizes your database for scalability. This mode separates data storage from computing resources. By separating the two, you can add or remove nodes from your cluster to adjust growing or shrinking analytic workloads. Scaling the cluster size does not affect running queries. Eon Mode is especially well-suited for cloud environments. See Eon Mode concepts for more information.
42 - Epoch
A logical unit of time in which a single change is made to data in your Vertica database.
For detailed information, see Epochs.
43 - Epoch map
A catalog object that provides mapping between time and epochs. Specifically, an epoch map contains a list of epoch numbers and their associated timestamps.
44 - Executor node
Any node that participates in executing a specific SQL statement. The initiator node can, and usually does, also function as an executor node.
45 - Expression
An expression is anything that can represent a single column in a SELECT statement.
For example, the following are all expressions:
1+2
foo(3)
x
46 - External procedure
A procedure external to Vertica that you create, maintain, and store on the server.
47 - Event series
Tables with a time column, most typically a timestamp data type.
48 - Flexible tables
Vertica database tables that minimally contain two columns:
__identity__
: A real column with an incrementing IDENTITY value for partitioning and sorting. Used if no other columns serve this purpose.
__raw__
: A real LONG VARBINARY
column containing unstructured, or semi-structured data.
You can create Flex tables with additional real columns, but they are not required. Compare with Columnar tables.
49 - Flattening (subqueries and views)
Occurs when a subquery or named view is internally rewritten so the subquery is combined with the outer query block. The result sets of the original and flattened queries are exactly the same, but the flattened query usually benefits from significant performance improvements.
50 - Full backup
Consists of copying each catalog and all data files (ROS containers) on each node, as well as the complete
/opt/vertica/config
directory.
51 - GENERAL pool
A special built-in pool that represents the total amount of RAM available to the resource manager for use by queries. Other pools can borrow memory from the GENERAL pool. See also Built-in pools.
52 - Grant
Vertica defines GRANT in two ways:
-
Grant a user privileges to access database objects using any GRANT statement, except GRANT (Authentication).
-
Associate a user-defined authentication method with a user through GRANT (Authentication). This operation differs from GRANT <privileges> as authentication methods are “associated” with a user or role and privileges are “granted” to a user or role.
Examples
Grant Access Privileges to a User
This example shows how to grant a user, Joe, privileges to access the online_sales schema:
=> GRANT USAGE ON SCHEMA online_sales TO Joe;
Associate an Authentication Method with a User
This example shows how to associate the v_ldap authentication method to user jsmith:
=> GRANT AUTHENTICATION v_ldap TO jsmith;
53 - Grouped ROS
A grouped ROS is a highly-optimized, read-oriented physical storage structure organized by projection. A grouped ROS makes heavy use of compression and indexing. Unlike a ROS, a grouped ROS stores data for two or more grouped columns in one disk file.
54 - Hash segmentation
Specifies how to segment projection data for distribution across all cluster nodes. You can specify segmentation for a table and a projection. If a table definition specifies segmentation, Vertica uses it for that table's auto-projections.
It is strongly recommended that you use Vertica's built-in
HASH
function, which distributes data evenly across the cluster, and facilitates optimal query execution.
For more detailed information, see Segmented projections.
55 - Historical data
Refers to any data in memory or physical storage other than the current epoch. Historical data includes all COPY, INSERT, UPDATE, and DELETE operations, including deleted rows. This allows Vertica to run a historical query on data written up to and including the epoch representing the specified date and time.
56 - Historical query
A query that retrieves data from a snapshot of the database, taken at a specific timestamp or epoch. For details, see Historical queries.
57 - Immutable (invariant) functions
When run with a given set of arguments, immutable functions always produce the same result, regardless of environment or session settings such as locale.
58 - Initiator node
In the context of a client connection, the initiator node is the node associated with the specific host to which the connection was made. The initiator node can, and usually does, also function as an executor node, and is generally where the most descriptive log messages reside.
59 - K-safety
K-safety sets fault tolerance for the database cluster, where K can be set to 0, 1, or 2. The value of K specifies how many copies Vertica creates of Segmented projectionsdata. If K-safety for a database is set to 1 or 2, Vertica creates K+1 instances, or buddies, of each projection segment. Vertica distributes these buddies across the database cluster, such that projection data is protected in the event of node failure. If any node fails, the database can continue to process queries so long as buddies of data on the failed node remain available elsewhere on the cluster.
For more information, see Designing for K-safety.
60 - Last epoch
The last epoch is the current epoch minus one. SELECT statements under READ COMMITTED isolation read from the last epoch.
61 - Last good epoch (LGE)
A term used in manual recovery, LGE (Last Good Epoch) refers to the most recent epoch that can be recovered.
62 - Live aggregate projection
A live aggregate projection contains columns with values that are aggregated from columns in its anchor table. When you load data into the table, Vertica aggregates the data before loading it into the live aggregate projection. On subsequent loads—for example, through INSERTor COPY—Vertica recalculates aggregations with the new data and updates the projection.
63 - Locale
Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate GROUP BY
and ORDER BY
clauses, joins, and the analytic ORDER BY
clause.
The default locale for a Vertica database is en_US@collation=binary
(English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default en_US@collation=binary
collation, regardless of the session collation. Any locale-specific collation is applied at query time.
If you set the locale to null, Vertica sets the locale to en_US_POSIX
. You can set the locale back to the default locale and collation by issuing the vsql meta-command \locale
. For example:
Note
=> set locale to '';
INFO 2567: Canonical locale: 'en_US_POSIX'
Standard collation: 'LEN'
English (United States, Computer)
SET
=> \locale en_US@collation=binary;
INFO 2567: Canonical locale: 'en_US'
Standard collation: 'LEN_KBINARY'
English (United States)
=> \locale
en_US@collation-binary;
64 - Logical schema
Consists of a set of tables and referential integrity constraints in a Vertica database. The objects in the logical schema are visible to SQL users. The logical schema does not include projections, which make up the physical schema.
65 - Location label
A label assigned to a storage location. Location labels identify the location so you can create object storage policies. The labeled location you use in a storage policy becomes the default storage location for the object.
66 - Management Console
A database management tool that provides a unified view of your Vertica database and lets you monitor multiple clusters from a single point of access. See Management Console.
67 - MC SUPER administrator (superuser)
Called Super on the MC interface, the MC SUPER administrator is the Linux user account that gets created when you configure MC. For details, see the following:
68 - MC-managed database
A Vertica database that an MC SUPER or ADMIN user creates on MC or imports into the MC interface, along with the database cluster. When MC users are granted database privileges, their privileges are defined though the MC itself and pertain only to databases managed on the MC. See Users, roles, and privileges in MC.
69 - Mergeout
Mergeout is a Tuple Mover process that consolidates ROS containers and purges deleted records.
70 - Meta-functions
Used to query or change the internal state of Vertica and are not part of the SQL standard.
Meta-functions can be used in a top-level SELECT statement only, and the statement cannot contain other clauses such as FROM or WHERE.
71 - Metadata
Data that describes the data in a database, such as data type, compression, constraints, and so forth, for each column in the tables. Metadata is stored in the Vertica catalog.
72 - Mutual mode
When a database is configured for TLS/SSL security in mutual mode, incoming client requests verify the certificate of the server, and the server also requires that each client present a certificate and private key so that the server can verify the client.
73 - Node
A host configured to run an instance of Vertica. It is a member of the database cluster. For a database to have the ability to recover from the failure of a node requires a database K-safety value of at least 1 (3+ nodes).
74 - Non-repeatable read
Occurs in a READ COMMITTED isolation level when two identical queries in the same transaction produce different results. This occurs when another transaction commits changes that alter the results of the query after the first query has completed and before the second query has begun.
75 - NUL
Represents a character whose ASCII/Unicode code is zero, sometimes qualified "ASCII NUL".
76 - NULL
Means no value, and is true of a field (column) or constant but not of a character.
77 - OID
An object identifier (OID) is an identifier used to name an object. Structurally, an OID consists of a node in a hierarchically-assigned namespace, formally defined using the International Telecommunication Union-Telecommunication's (ITU-T) Abstract Syntax Notation standard (ASN.1). Vertica database objects are always assigned an OID. The OID can be used directly in some statements, such as HAS_TABLE_PRIVILEGE.
Source: Wikipedia
78 - Parallel load
Parallel load is the process of loading data on any available node in the cluster (not necessarily the local node). Use this approach in combination with wildcards (such as *.dat) to load data files in parallel in a distributed manner. See COPY ON ANY NODE in Parameters.
79 - Path (quality plan)
The execution strategy of the Vertica cost-based query optimizer, denoting a sub operation in the query plan.
80 - Partitioning
Specifies how data is organized within individual nodes. Partitioning attempts to introduce hot spots within the node, providing a convenient way to drop data and reclaim the disk space.
Note
Partitioning and segmentation are terms often used interchangeably for other databases, but they have completely separate goals in Vertica regarding data localization. See Partitioning and segmentation in the Administrator's Guide for details.81 - Phantom read
Occurs in a READ COMMITTED isolation level when two identical queries in the same transaction produce different collections of rows. This occurs because a table lock is not acquired on SELECT during the initial query.
82 - Physical schema
Consists of a set of projections used to store data on disk. The projections in the physical schema are based on the objects in the logical schema.
83 - Physical schema design
A usable K=1 design based on an analysis of the sample data files and queries (if available). The physical schema design contains segmented (fact table) superprojections and replicated (dimension table) superprojections.
84 - Primary column
A column that is loaded from raw data, and not derived from an expression.
85 - Primary node
In Eon Mode, a primary node is a node that is a member of a primary subcluster. Primary nodes are the only nodes in the database that Vertica considers when determining whether the database is able to maintain data integrity. Vertica requires that more than half of the primary nodes in the database be up. Also, all shards must have at least one primary node as a subscriber. If either of these conditions are not met, Vertica shuts the database down to prevent potential data corruption.
86 - Primary subcluster
In Eon Mode, a primary subcluster is a type of subcluster that is intended to form the core of your database. Vertica only considers the nodes in primary subclusters when determining the K-safety of your database. Your database can remain running as long as half the nodes in your primary subclusters are up, and all shards have at least one node in a primary subcluster as a subscriber. See Subcluster Types and Elastic Scaling for more information.
87 - Primary subscriber
In an Eon Mode database, each shard has a primary subscriber node. This node is responsible for running Tuple Mover processes that maintain the data in the shard.
88 - Projection
Optimized collections of table columns that provide physical storage for data. A projection can contain some or all of the columns of one or more tables.
For more information about Vertica projections, see Projections.
89 - Projection set
A group of buddy projections that are safe for a given level of K-safety. When K=1, there are two buddies in a set; when K=2, there are three buddies. The Database Designer assigns all projections in a projection set the same base name so they can be identified as a group.
A projection must be part of a projection set before it is refreshed. Once a projection set is created (by creating buddies), the set is refreshed in a single transaction.
90 - Query cluster level
Determines the number of sets used to group similar queries. The query cluster level can be any integer from one (1) to the number of queries to be included in the physical schema design.
Queries are generally grouped based on the columns they access and the way in which they are used. The following work loads typically use different types of queries and are placed in different query clusters: drill downs, large aggregations, and large joins. For example, if a reporting tool and dashboard both access the same database, the reporting tool is likely to use a drill down to access a subset of data and the dashboard is likely to use a large aggregation to look across a large range of data. In this case, there would be at least two (2) query clusters.
91 - Query optimizer
The component that evaluates different strategies for running a query and picks the best one.
92 - Quorum
Your database has a minimum number of nodes that must be up and part of the database cluster for the database to continue operating. When this minimum requirement is met, your database has a quorum of nodes.
The number of nodes that make up a quorum depends on the database mode:
-
In Enterprise Mode, over half the nodes (50% of total nodes + 1) must be up.
-
In Eon Mode, over half of the primary nodes in the cluster must be up.
The database can lose quorum if too many nodes go down. The database's reaction to losing quorum also depends on the database's mode:
-
An Enterprise Mode database that loses quorum shuts down to prevent potential data corruption.
-
An Eon Mode database that loses quorum goes into read-only mode to prevent potential data corruption. In this mode, you can usually execute queries, but DDL and DML statements fail with an error.
For more on database availability, see K-safety in an Enterprise Mode database and Data integrity and high availability in an Eon Mode database.
93 - Recovery
Vertica can restore the database to a fully functional state after one or more nodes in the system experiences a software- or hardware-related failure. Vertica recovers nodes by querying replicas of the data stored on other nodes. For example, a hardware failure can cause a node to lose database objects or to miss changes made to the database (INSERTs, UPDATEs, and so on) while offline. When the node comes back online, queries other nodes in the cluster to recover lost objects and catch up with database changes.
94 - Referential integrity
Consists of a set of constraints (logical schema objects) that define primary key and foreign key columns.
-
Each small table must have a PRIMARY KEY constraint.
-
The large table must contain columns that can be used to join the large table to smaller tables.
-
Outer join queries produce the same results as the corresponding inner join query if there is a FOREIGN KEY constraint on the outer table. Note that the inner table of the outer join query must always have a PRIMARY KEY constraint on its join columns.
95 - Refresh (projections)
Ensures that all projections on a node are up-to-date (can participate in query execution). This process could take a long time, depending on how much data is in the table(s).
For more information, see Refreshing projections.
96 - Resegmentation
A process that Vertica performs automatically during query execution that distributes the rows of an existing projection or intermediate relation evenly to each node in the cluster. At the end of resegmentation, every row from the input relation is on exactly one node. Vertica resegments data when the input does not have the segmentation required to compute the requested result efficiently and correctly.
97 - Resource pool
A resource pool comprises a pre-allocated subset of the system resources, with an associated queue. A resource pool is created using the CREATE RESOURCE POOL command as described in the SQL Reference Manual.
98 - Resource manager
In a single-user environment, the system can devote all resources to a single query and get the most efficient execution for that one query. However, in a environment where several concurrent queries are expected to run at once, there is tension between providing each query the maximum amount of resources (thereby getting fastest run time for that query) and serving multiple queries simultaneously with a reasonable run time. The Resource Manager provides options and controls for resolving this tension, while ensuring that every query eventually gets serviced and that true system limits are respected at all times.
99 - Role
A role groups together a set of privileges that can be assigned to a user or another role. You can use roles to quickly grant or revoke privileges on multiple tables, schemas, functions or other database entities to one or more users with a single command.
Vertica's implementation of roles conforms to the SQL Standard T331 for basic roles.
100 - Rollback
Transaction rollbacks restore a database to an earlier state by discarding changes made by that transaction. Statement-level rollbacks discard only the changes initiated by the reverted statements. Transaction-level rollbacks discard all changes made by the transaction.
With a ROLLBACK
statement, you can explicitly roll back to a named savepoint within the transaction, or discard the entire transaction. Vertica can also initiate automatic rollbacks in two cases:
-
An individual statement returns an
ERROR
message. In this case, Vertica rolls back the statement. -
DDL errors, systemic failures, dead locks, and resource constraints return a
ROLLBACK
message. In this case, Vertica rolls back the entire transaction.
Explicit and automatic rollbacks always release any locks that the transaction holds.
101 - ROS (Read optimized store)
Read Optimized Store (ROS) is a highly optimized, read-oriented, disk storage structure, organized by projection. ROS makes heavy use of compression and indexing.
102 - Savepoint
A savepoint is a special marker inside a transaction that allows commands that execute after the savepoint to be rolled back. The transaction is restored to the state that preceded the savepoint.
Vertica supports two types of savepoints:
-
An implicit savepoint is automatically established after each successful command within a transaction. This savepoint is used to roll back the next statement if it returns an error. A transaction maintains one implicit savepoint, which it rolls forward with each successful command. Implicit savepoints are available to Vertica only and cannot be referenced directly.
-
Named savepoints are labeled markers within a transaction that you set through
SAVEPOINT
statements. A named savepoint can later be referenced in the same transaction throughRELEASE SAVEPOINT
, which destroys it, andROLLBACK TO SAVEPOINT
, which rolls back all operations that followed the savepoint. Named savepoints can be especially useful in nested transactions: a nested transaction that begins with a savepoint can be rolled back entirely, if necessary.
103 - Secondary subcluster
A secondary subcluster is a type of subcluster that is easy to start and shutdown on demand. In Eon Mode, Vertica does not consider the nodes in a secondary subcluster when determining whether the database has shard coverage or a quorum of nodes. You can stop or remove secondary subclusters without causing the database to go into read-only mode. See Subcluster Types and Elastic Scaling for more information.
104 - Segmentation
Defines how physical data storage (projections) is stored in a database cluster using the CREATE PROJECTION statement. The goal is to distribute data evenly across multiple nodes in the database so that all nodes can participate in query execution.
Note
Partitioning and segmentation are terms often used interchangeably for other databases, but they have completely separate goals in Vertica regarding data localization. See Partitioning and segmentation in the Administrator's Guide for details.105 - Server mode
When a database is configured for TLS/SSL security in server mode, incoming client requests do verify the certificate of the server, but the server does not verify the clients. In Vertica, the server is the Vertica database server, and the client is any Vertica user who logs into the database. The Vertica client user may log in to the database directly in a command window, or may connect to the Vertica database server via the Management Console running in a browser.
106 - Session
An occurrence of a user interacting with a database through the use of SQL statements. You can start a session using vsql or a JDBC application. In Vertica, the scope of a session is the same as that of a connection. Connecting to the database starts a session, and exiting ends it.
Session-scoped data is preserved beyond the lifetime of a single transaction. Terminating a session truncates a table and deletes all rows.
107 - Shard
A subset of the data and associated metadata stored in an Eon Mode database. Shards are how Vertica divides the work of processing queries, data loads, and data maintenance between the nodes in a subcluster. Nodes subscribe to one or more shards. When processing a query, each node is responsible for processing the data in the shard or shards to which it subscribes.
The number of shards that data is segmented into depends on the shard count of the namespace to which the data belongs. By default, the database contains a single namespace, default_namespace
, with the shard count specified during database setup. You can create additional namespaces using the CREATE NAMESPACE statement.
Shards have a single primary subscriber in the database cluster. The primary subscriber node maintains the data in the shard by running Tuple Mover processes on it.
108 - Shard coverage
In an Eon Mode database, nodes in a subcluster subscribe to one or more shards in communal storage. Each node handles the data in the shard or shards it subscribes to when processing queries. When a subcluster has at least one node subscribed to each shard, it has shard coverage. It is able to process queries because it has access to all of the data in the database. In a K-safe database, each subcluster has at least two nodes subscribing to each node. If the subcluster loses too many nodes in rapid succession, it may no longer have a node subscribing to each shard. In this case, the subcluster loses shard coverage and cannot process queries.
The database cluster as a whole must have at least one primary node subscribed to each shard in communal storage. This state is called having primary shard coverage. Primary nodes coordinate the maintenance of data in communal storage. If the cluster loses primary nodes to the point that it does not have at least one subscribed to each shard, it loses primary shard coverage. In this case, the database goes into read-only mode because it is unsafe to alter the data in communal storage when one or more shards do not have a primary node subscriber.
See Data integrity and high availability in an Eon Mode database.
109 - Snapshot isolation
An historical query that gets data from the latest epoch (AT LATEST EPOCH
). For details, see Historical queries.
110 - Spread
An open source toolkit used in Vertica to provide a high performance messaging service that is resilient to network faults. Spread daemons start automatically when a database starts up for the first time, and the spread process runs on control nodes in the cluster.
111 - Sort-merge join
If both inputs are pre-sorted, merge joins do not have to do any pre-processing. Vertica uses the term sort-merge join to refer to the case when one of the inputs must be sorted prior to the merge join. Vertica sorts the inner input side but only if the outer input side is already sorted on the join keys.
112 - Stable functions
When run with a given set of arguments, stable functions produce the same result within a single query or scan operation. However, a stable function can produce different results when issued under different environments or at different times, such as change of locale and time zone—for example, SYSDATE.
See also Immutable (invariant) functions.
113 - Statement
A line of SQL, excluding the semicolon. For example, the following are all statements:
SELECT * FROM foo WHERE bar = 6
CREATE TABLE t1 (i INT);
INSERT INTO t1 VALUES(3)
114 - Storage location
A directory path used by Vertica to store catalog, actual data files, and temporary data files. You can also create storage locations for users, and then grant one or more users access to the storage. You can also create a storage location with a location label, for use in storage policies.
115 - Storage policy
A database object you create to associate a labeled location as the default storage location for the object. Database object can be a database, schema, table, or min- and max- ranges of a partition.
116 - Strict
Indicates that a function always returns null when any of its input arguments is null.
117 - Subcluster
A subset of a cluster in an Eon Mode database. Subclusters isolate workloads to a group of nodes, letting you separate tasks such as load and query.
Subclusters come in two types:
-
Primary subclusters count when Vertica determines whether the database can continue to run safely. Your database shuts down if less than half the primary nodes in your database are up, or does not have a primary node as a subscriber. Primary subclusters are best used for data loading and DDE workloads.
-
Secondary subclusters do not count towards maintaining data integrity. You can shut them down without impacting the stability of the database. Secondary subclusters are best used for query workloads. They cannot commit transactions directly, so they are less efficient at performing data loads and DDE statements.
118 - Superprojection
A projection that includes all columns in an anchor table. Vertica uses superprojections to ensure support for all queries and other DML operations.
Under certain conditions, Vertica automatically creates a table's superprojection immediately on table creation. Vertica also creates a superprojection when you first load data into that table, if none already exists.
CREATE PROJECTION
can create a superprojection if it specifies to include all table columns. A table can have multiple superprojections.
For more information, see Creating projections.
119 - Temp location
A storage location used as temp space by Vertica.
120 - Temp space
Disk space temporarily occupied by temporary files created by certain query execution operations, such as hash joins and sorts, in the case when they have to spill to disk. Such operations might also be encountered during queries, recovery, refreshing projections, and so on. If a temp location is provided to the database, Vertica uses it as temp space.
121 - Top-k projection
A projection that configures the projection data for a Top-K query. A Top-K query is one that retrieves the top k rows from a group of tuples.
A Top-K projection is a type of live aggregate projection.
122 - Transaction
One or more operations that are executed as a unit of work. At the user level, transactions occur in the current session by a user or script running one or more SQL statements. When you commit a transaction, any changes you make to data in tables using INSERT
, DELETE
, UPDATE
, MERGE
, and COPY
during the transaction become permanent. If you roll back the transaction, all changes made to table data are undone.
Vertica supports Atomicity, Consistency, Isolation, and Durability (ACID) for SQL transactions.
123 - Tuple mover (TM)
The Tuple Mover manages ROS data storage. On mergeout, it combines small ROS containers into larger ones and purges deleted data.
124 - UDx
User-defined extensions (UDxs) are functions contained in external libraries that are developed in C++, Python, Java, or R using the Vertica SDK. The external libraries are defined in the Vertica catalog using the CREATE LIBRARY statement. They are best suited for analytic operations that are difficult to perform in SQL, or that need to be performed frequently enough that their speed is a major concern.
125 - Up-to-date (projection)
A projection is up-to-date (or up to date) if it is eligible to participate in query execution. Projections on empty tables are up-to-date upon creation. If the table has data loaded already, newly created projections are marked not up-to-date until refreshed. If a projection is refreshed while a node is down, that projection can be marked up-to-date even though it is missing data on one of the nodes. This is because the node will build the data during the recovery process before participating in queries.
126 - User-defined SQL function
User-defined SQL functions let you define and store commonly-used SQL expressions as a function. User-defined SQL functions are useful for executing complex queries and combining Vertica built-in functions. You simply call the function name you assigned in your query.
A user-defined SQL function can be used anywhere in a query where an ordinary SQL expression can be used, except in a table partition clause or the projection segmentation clause.
127 - View
A named logical relation specified by an associated query that can be accessed similarly to a table in the FROM clause of a SQL statement. The results of the query are not stored but obtained on the fly when the SQL referencing the view is executed.
128 - Volatile functions
Regardless of their arguments or environment, volatile functions can return a different result with each invocation—for example, UUID_GENERATE.
129 - vsql
vsql is a character-based, interactive, front-end utility that lets you type SQL statements and see the results. It also provides a number of meta-commands and various shell-like features that facilitate writing scripts and automating a variety of tasks.
For more information, see Installing the vsql client and the more general topic, Using vsql.
130 - Window (analytic)
An analytic function's OVER
clause specifies how to partition, sort, and frame function input with respect to the current row. The input data is the result set that the query returns after it evaluates FROM
, WHERE
, GROUP BY
, and HAVING
clauses.
131 - Working data size
The amount of data that most of your queries operate on. In most Vertica databases, you store more data than you regularly query. For example, suppose you are storing sales data for a corporation. You may keep many years of sales data in your database. However, the majority of queries you run may only query the last year of data to determine current sales trends. In that case, your working data size is the amount of sales data you load into the database in a year.
Knowing your working data size is important when configuring an Eon Mode database. You want the total size of the depots in all of the nodes a subcluster to be large enough to fit the working data size. Note that the working data size may vary per subcluster. Some subclusters may only work on recently-loaded data, while others do many queries on a larger data set.
132 - Workload analyzer
An advisor tool that analyzes system information held in SQL system tables (monitoring APIs) and returns a set of tuning recommendations. See Analyzing workloads in the Administrator's Guide for details.