This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Depot management

The nodes of an Eon Mode database fetch data from communal storage as needed to process queries, and cache that data locally on disk.

The nodes of an Eon Mode database fetch data from communal storage as needed to process queries, and cache that data locally on disk. The cached data of all nodes within a subcluster comprise that cluster's depot. Vertica uses depots to facilitate query execution: when processing a query, Vertica first checks the current depot for the required data. If the data is unavailable, Vertica fetches it from communal storage and saves a copy in the depot to expedite future queries. Vertica also uses the depot for load operations, caching newly-loaded data in the depot before uploading it to communal storage.

1 - Managing depot caching

You can control depot caching in several ways:.

You can control depot caching in several ways:

You can monitor depot activity and settings with several V_MONITOR system tables, or with the Management Console.

Depot gateway parameters

Vertica depots can cache two types of data:

  • Queried data: The depot facilitates query execution by fetching queried data from communal storage and caching it in the depot. The cached data remains available until it is evicted to make room for fresher data, or for data that is fetched for more recent queries.

  • Loaded data: The depot expedites load operations such as COPY by temporarily caching data until it is uploaded to communal storage.

By default, depots are configured to cache both types of data.

Two configuration parameters determine whether a depot caches queried or loaded data:

Parameter Settings
UseDepotForReads

Boolean:

  • 1 (default): Search the depot for the queried data; if not found, fetch the data from communal storage.

  • 0: Bypass the depot and get queried data from communal storage.

UseDepotForWrites

Boolean:

  • 1 (default): Write loaded data to the depot, then upload files to communal storage.

  • 0: Bypass the depot and write directly to communal storage.

Both parameters can be set at session, user and database levels.

If set on the session or user levels, these parameters can be used to segregate read and write activity on the depots of different subclusters. For example, parameters UseDepotForReads and UseDepotForWrites might be set as follows for users joe and rhonda:

=> SHOW USER joe ALL;
          name           | setting
-------------------------+---------
 UseDepotForReads        | 1
 UseDepotForWrites       | 0
(2 rows)
=> SHOW USER rhonda ALL;
          name           | setting
-------------------------+---------
 UseDepotForReads        | 0
 UseDepotForWrites       | 1
(2 rows)

Given these user settings, when joe connects to a Vertica subcluster, his session only uses the current depot to process queries; all load operations are uploaded to communal storage. Conversely, rhonda's sessions only use the depot to process load operations; all queries must fetch their data from communal storage.

Depot fetching

If a depot is enabled to cache queried data (UseDepotForReads = 1), you can configure how it fetches data from communal storage with configuration parameter DepotOperationsForQuery. This parameter has three settings:

  • ALL (default): Fetch file data from communal storage, if necessary displace existing files by evicting them from the depot.

  • FETCHES: Fetch file data from communal storage only if space is available; otherwise, read the queried data directly from communal storage.

  • NONE: Do not fetch file data to the depot, read the queried data directly from communal storage.

You can set fetching behavior at four levels, in ascending levels of precedence:

For example, you can set DepotOperationsForQuery at the database level as follows:

=> ALTER DATABASE default SET PARAMETER DepotOperationsForQuery = FETCHES;
ALTER DATABASE

This setting applies to all database depots unless overridden at other levels. For example, the following ALTER USER statement specifies fetching behavior for a depot when it processes queries from user joe:

=> ALTER USER joe SET PARAMETER DepotOperationsForQuery = ALL;
ALTER USER

Finally, joe can override his own DepotOperationsForQuery setting by including the DEPOT_FETCH hint in individual queries:


SELECT /*+DEPOT_FETCH(NONE)*/ count(*) FROM bar;

Evicting depot data

In general, Vertica evicts data from the depot as needed to provide room for new data, and expedite request processing. Before writing new data to the depot, Vertica evaluates it as follows:

  • Data fetched from communal storage: Vertica sizes the download and evicts data from the depot accordingly.

  • Data uploaded from a DML operation such as COPY: Vertica cannot estimate the total size of the upload before it is complete, so it sizes individual buffers and evicts data from the depot as needed.

In both cases, Vertica assesses existing depot data and determines which objects to evict from the depot as follows, in descending order of precedence (most to least vulnerable):

  1. Least recently used unpinned object evicted for any new object, pinned or unpinned.

  2. Least recently used pinned object evicted for a new pinned object.

Pinning depot objects

You can set depot pinning policies on database objects to reduce their exposure to eviction. Pinning policies can be set on individual subclusters, or on the entire database, and at different levels of granularity—table, projection, and partitions:

Pinning of... Supported by...
Tables SET_DEPOT_PIN_POLICY_TABLE
Projections SET_DEPOT_PIN_POLICY_PROJECTION
Partitions SET_DEPOT_PIN_POLICY_PARTITION

By default, pinned objects are queued for download from communal storage as needed to execute a query or DML operation. SET_DEPOT_PIN_POLICY functions can specify to override this behavior and immediately queue newly pinned objects for download: set the last Boolean argument of the function to true.

In the following example, SET_DEPOT_PIN_POLICY_TABLE pins the data of table foo and specifies to queue the data immediately for download:

=> SELECT SET_DEPOT_PIN_POLICY_TABLE ('foo', 'default_subluster', true );

Usage guidelines

Pinning one or more objects on a depot affects its retention of fetched (queried) data and uploaded (newly loaded) data. If too much depot space is claimed by pinned objects, the depot might be unable to handle load operations on unpinned objects. In this case, set configuration parameter UseDepotForWrites to 0, so load operations are routed directly to communal storage for processing. Otherwise, load operations are liable to return with an error.

To minimize contention over depot usage, consider the following guidelines:

  • Pin only those objects that are most active in DML operations and queries.

  • Minimize the size of pinned data by setting policies at the smallest effective level—for example, pin only the data of a table's active partition.

Depot warming

On startup, the depots of new nodes are empty, while the depots of restarted nodes often contain stale data that must be refreshed. When depot warming is enabled, a node that is undergoing startup preemptively loads its depot with frequently queried and pinned data. When the node completes startup and begins to execute queries, its depot already contains much of the data it needs to process those queries. This reduces the need to fetch data from communal storage, and expedites query performance accordingly.

By default, depot warming is disabled (EnableDepotWarmingFromPeers = 0). A node executes depot warming as follows:

  1. The node checks configuration parameter PreFetchPinnedObjectsToDepotAtStartup. If enabled (set to 1), the node:

    • Gets from the database catalog a list of all objects that are pinned on this node's subcluster.

    • Queues the pinned objects for fetching and calculates their total size.

  2. The node checks configuration parameter EnableDepotWarmingFromPeers. If enabled (set to 1), the node:

    • Identifies a peer node in the same subcluster whose depot contents it can copy.

    • After taking into account all pinned objects, calculates how much space remains available in the warming depot.

    • Gets from the peer node a list of the most recently used objects that can fit in the depot.

    • Queues the objects for fetching.

  3. If BackgroundDepotWarming is enabled (set to 1, default), the node loads queued objects into its depot while it is warming, and continues to do so in the background after the node becomes active and starts executing queries. Otherwise (BackgroundDepotWarming = 0), node activation is deferred until the depot fetches and loads all queued objects.

Monitoring the depot

You can monitor depot activity and settings with several V_MONITOR system tables.

System table... Shows...
DATA_READS All storage locations that a query reads to obtain data.
DEPOT_EVICTIONS Details about objects that were evicted from the depot.
DEPOT_FETCH_QUEUE Pending depot requests for queried file data to fetch from communal storage.
DEPOT_FILES Objects that are cached in database depots.
DEPOT_PIN_POLICIES Objects —tables and table partitions—that are pinned to database depots.
DEPOT_SIZES Depot caching capacity per node.
DEPOT_UPLOADS Details about depot uploads to communal storage.

2 - Resizing depot caching capacity

Each node in an Eon database caches depot data in a predefined storage location.

Each node in an Eon database caches depot data in a predefined storage location. The storage location path depends on your Vertica installation's filesystem. By default, each node in a cluster can use up to 60 percent of disk space on the storage location's filesystem to cache depot data. You can change caching capacity with ALTER_LOCATION_SIZE, by specifying to a fixed size or a percentage of total disk space. The function can specify a single node, a subcluster, or all nodes in the database cluster. You can increase depot caching capacity for each node up to 80 percent.

In the following example, ALTER_LOCATION_SIZE increases depot caching capacity to 80 percent of disk space on the storage location's filesystem. The function supplies an empty string as the second (node-name) argument, so the change applies to all nodes:

=> SELECT node_name, location_label, location_path, max_size, disk_percent FROM storage_locations WHERE location_usage = 'DEPOT' ORDER BY node_name;
    node_name     | location_label  |      location_path      |  max_size   | disk_percent
------------------+-----------------+-------------------------+-------------+--------------
 v_vmart_node0001 | auto-data-depot | /home/dbadmin/verticadb | 36060108800 | 70%
 v_vmart_node0002 | auto-data-depot | /home/dbadmin/verticadb | 36059377664 | 70%
 v_vmart_node0003 | auto-data-depot | /home/dbadmin/verticadb | 36060108800 | 70%
(3 rows)

=> SELECT alter_location_size('depot', '','80%');
 alter_location_size
---------------------
 depotSize changed.
(1 row)

=> SELECT node_name, location_label, location_path, max_size, disk_percent FROM storage_locations WHERE location_usage = 'DEPOT' ORDER BY node_name;
    node_name     | location_label  |      location_path      |  max_size   | disk_percent
------------------+-----------------+-------------------------+-------------+--------------
 v_vmart_node0001 | auto-data-depot | /home/dbadmin/verticadb | 41211552768 | 80%
 v_vmart_node0002 | auto-data-depot | /home/dbadmin/verticadb | 41210717184 | 80%
 v_vmart_node0003 | auto-data-depot | /home/dbadmin/verticadb | 41211552768 | 80%
(3 rows)

Rescaling depot capacity

When a database is revived on an instance with greater or lesser disk space than it had previously, Vertica evaluates the depot size settings that were previously in effect. If depot size was specified as a percentage of available disk space, Vertica proportionately rescales depot capacity. For example, if depot caching capacity for a given node was set to 70 percent, the revived node applies that setting to the new disk space and adjusts depot caching capacity accordingly. If depot capacity was set to a fixed size, Vertica applies that setting, unless doing so will consume more than 80 percent of available disk space. In that case, Vertica automatically adjusts depot size as needed.