Managing depot caching
You can control depot caching in several ways:
-
Configure gateway parameters so a depot caches only queried data or loaded data.
-
Control fetching of queried data from communal storage.
-
Manage eviction of cached data.
-
Enable depot warming on new and restarted nodes.
You can monitor depot activity and settings with several V_MONITOR
system tables, or with the Management Console.
Note
Depot caching is supported only on primary shard subscriber nodes.Depot gateway parameters
Vertica depots can cache two types of data:
-
Queried data: The depot facilitates query execution by fetching queried data from communal storage and caching it in the depot. The cached data remains available until it is evicted to make room for fresher data, or for data that is fetched for more recent queries.
-
Loaded data: The depot expedites load operations such as COPY by temporarily caching data until it is uploaded to communal storage.
By default, depots are configured to cache both types of data.
Two configuration parameters determine whether a depot caches queried or loaded data:
Parameter | Settings |
---|---|
UseDepotForReads |
Boolean:
|
UseDepotForWrites |
Boolean:
|
Both parameters can be set at session, user and database levels.
If set on the session or user levels, these parameters can be used to segregate read and write activity on the depots of different subclusters. For example, parameters UseDepotForReads and UseDepotForWrites might be set as follows for users joe
and rhonda
:
=> SHOW USER joe ALL;
name | setting
-------------------------+---------
UseDepotForReads | 1
UseDepotForWrites | 0
(2 rows)
=> SHOW USER rhonda ALL;
name | setting
-------------------------+---------
UseDepotForReads | 0
UseDepotForWrites | 1
(2 rows)
Given these user settings, when joe
connects to a Vertica subcluster, his session only uses the current depot to process queries; all load operations are uploaded to communal storage. Conversely, rhonda
's sessions only use the depot to process load operations; all queries must fetch their data from communal storage.
Depot fetching
If a depot is enabled to cache queried data (UseDepotForReads = 1
), you can configure how it fetches data from communal storage with configuration parameter DepotOperationsForQuery. This parameter has three settings:
-
ALL
(default): Fetch file data from communal storage, if necessary displace existing files by evicting them from the depot. -
FETCHES
: Fetch file data from communal storage only if space is available; otherwise, read the queried data directly from communal storage. -
NONE
: Do not fetch file data to the depot, read the queried data directly from communal storage.
You can set fetching behavior at four levels, in ascending levels of precedence:
-
Database: ALTER DATABASE...SET PARAMETER
-
Per user: ALTER USER...SET PARAMETER
-
Per session: ALTER SESSION...SET PARAMETER
-
Per query: DEPOT_FETCH hint
For example, you can set DepotOperationsForQuery at the database level as follows:
=> ALTER DATABASE default SET PARAMETER DepotOperationsForQuery = FETCHES;
ALTER DATABASE
This setting applies to all database depots unless overridden at other levels. For example, the following ALTER USER statement specifies fetching behavior for a depot when it processes queries from user joe
:
=> ALTER USER joe SET PARAMETER DepotOperationsForQuery = ALL;
ALTER USER
Finally, joe
can override his own DepotOperationsForQuery setting by including the DEPOT_FETCH hint in individual queries:
SELECT /*+DEPOT_FETCH(NONE)*/ count(*) FROM bar;
Evicting depot data
In general, Vertica evicts data from the depot as needed to provide room for new data, and expedite request processing. Before writing new data to the depot, Vertica evaluates it as follows:
-
Data fetched from communal storage: Vertica sizes the download and evicts data from the depot accordingly.
-
Data uploaded from a DML operation such as COPY: Vertica cannot estimate the total size of the upload before it is complete, so it sizes individual buffers and evicts data from the depot as needed.
In both cases, Vertica assesses existing depot data and determines which objects to evict from the depot as follows, in descending order of precedence (most to least vulnerable):
-
Least recently used unpinned object evicted for any new object, pinned or unpinned.
-
Least recently used pinned object evicted for a new pinned object.
Pinning depot objects
You can set depot pinning policies on database objects to reduce their exposure to eviction. Pinning policies can be set on individual subclusters, or on the entire database, and at different levels of granularity—table, projection, and partitions:
Pinning of... | Supported by... |
---|---|
Tables | SET_DEPOT_PIN_POLICY_TABLE |
Projections | SET_DEPOT_PIN_POLICY_PROJECTION |
Partitions | SET_DEPOT_PIN_POLICY_PARTITION |
By default, pinned objects are queued for download from communal storage as needed to execute a query or DML operation. SET_DEPOT_PIN_POLICY functions can specify to override this behavior and immediately queue newly pinned objects for download: set the last Boolean argument of the function to true
.
In the following example, SET_DEPOT_PIN_POLICY_TABLE pins the data of table foo
and specifies to queue the data immediately for download:
=> SELECT SET_DEPOT_PIN_POLICY_TABLE ('foo', 'default_subluster', true );
Tip
How soon Vertica downloads a pinned object from communal storage depends on a number of factors, including space availability and precedence of other pinned objects that are queued for download. You can force immediate download of queued objects by calling FINISH_FETCHING_FILES.Usage guidelines
Pinning one or more objects on a depot affects its retention of fetched (queried) data and uploaded (newly loaded) data. If too much depot space is claimed by pinned objects, the depot might be unable to handle load operations on unpinned objects. In this case, set configuration parameter UseDepotForWrites to 0, so load operations are routed directly to communal storage for processing. Otherwise, load operations are liable to return with an error.
To minimize contention over depot usage, consider the following guidelines:
-
Pin only those objects that are most active in DML operations and queries.
-
Minimize the size of pinned data by setting policies at the smallest effective level—for example, pin only the data of a table's active partition.
Depot warming
On startup, the depots of new nodes are empty, while the depots of restarted nodes often contain stale data that must be refreshed. When depot warming is enabled, a node that is undergoing startup preemptively loads its depot with frequently queried and pinned data. When the node completes startup and begins to execute queries, its depot already contains much of the data it needs to process those queries. This reduces the need to fetch data from communal storage, and expedites query performance accordingly.
Note
Fetching data to a warming depot can delay node startup.By default, depot warming is disabled (EnableDepotWarmingFromPeers = 0). A node executes depot warming as follows:
-
The node checks configuration parameter PreFetchPinnedObjectsToDepotAtStartup. If enabled (set to 1), the node:
-
Gets from the database catalog a list of all objects that are pinned on this node's subcluster.
-
Queues the pinned objects for fetching and calculates their total size.
-
-
The node checks configuration parameter EnableDepotWarmingFromPeers. If enabled (set to 1), the node:
-
Identifies a peer node in the same subcluster whose depot contents it can copy.
-
After taking into account all pinned objects, calculates how much space remains available in the warming depot.
-
Gets from the peer node a list of the most recently used objects that can fit in the depot.
-
Queues the objects for fetching.
-
-
If BackgroundDepotWarming is enabled (set to 1, default), the node loads queued objects into its depot while it is warming, and continues to do so in the background after the node becomes active and starts executing queries. Otherwise (BackgroundDepotWarming = 0), node activation is deferred until the depot fetches and loads all queued objects.
Monitoring the depot
You can monitor depot activity and settings with several V_MONITOR system tables.
Tip
You can also use the Management Console to monitor depot activity. For details, see Monitoring depot activity in MCSystem table... | Shows... |
---|---|
DATA_READS | All storage locations that a query reads to obtain data. |
DEPOT_EVICTIONS | Details about objects that were evicted from the depot. |
DEPOT_FETCH_QUEUE | Pending depot requests for queried file data to fetch from communal storage. |
DEPOT_FILES | Objects that are cached in database depots. |
DEPOT_PIN_POLICIES | Objects —tables and table partitions—that are pinned to database depots. |
DEPOT_SIZES | Depot caching capacity per node. |
DEPOT_UPLOADS | Details about depot uploads to communal storage. |