Writing bundle metadata to the catalog

Vertica internally stores physical table data in bundles together with metadata on the bundle contents.

Vertica internally stores physical table data in bundles together with metadata on the bundle contents. The query optimizer uses bundle metadata to look up and fetch the data it needs for a given query.

Vertica stores bundle metadata in the database catalog. This is especially beneficial in Eon mode: instead of fetching this metadata from remote (S3) storage, the optimizer can find it in the local catalog. This minimizes S3 reads, and facilitates faster query planning and overall execution.

Vertica writes bundle metadata to the catalog on two events:

  • Any DML operation that changes table content, such as INSERT, UPDATE, or COPY. Vertica writes bundle metadata to the catalog on the new or changed table data. DML operations have no effect on bundle metadata for existing table data.

  • Invocations of function UPDATE_STORAGE_CATALOG, as an argument to Vertica meta-function DO_TM_TASK, on existing data. You can narrow the scope of the catalog update operation to a specific projection or table. If no scope is specified, the operation is applied to the entire database.

For example, the following DO_TM_TASK call writes bundle metadata on all projections in table store.store_sales_fact:

=> SELECT DO_TM_TASK ('update_storage_catalog', 'store.store_sales_fact');
                                  do_tm_task
-------------------------------------------------------------------------------
 Task: update_storage_catalog
(Table: store.store_sales_fact) (Projection: store.store_sales_fact_b0)
(Table: store.store_sales_fact) (Projection: store.store_sales_fact_b1)
(1 row)

Validating bundle metadata

You can query system table STORAGE_BUNDLE_INFO_STATISTICS to determine which projections have invalid bundle metadata in the database catalog. For example, results from the following query show that the database catalog has invalid metadata for projections inventory_fact_b0 and inventory_fact_b1:

=> SELECT node_name, projection_name, total_ros_count, ros_without_bundle_info_count
    FROM v_monitor.storage_bundle_info_statistics where ros_without_bundle_info_count > 0
    ORDER BY projection_name, node_name;
    node_name     |  projection_name  | total_ros_count | ros_without_bundle_info_count
------------------+-------------------+-----------------+-------------------------------
 v_vmart_node0001 | inventory_fact_b0 |               1 |                             1
 v_vmart_node0002 | inventory_fact_b0 |               1 |                             1
 v_vmart_node0003 | inventory_fact_b0 |               1 |                             1
 v_vmart_node0001 | inventory_fact_b1 |               1 |                             1
 v_vmart_node0002 | inventory_fact_b1 |               1 |                             1
 v_vmart_node0003 | inventory_fact_b1 |               1 |                             1
(6 rows)

Best practices

Updating the database catalog with UPDATE_STORAGE_CATALOG is recommended only for Eon users. Enterprise users are unlikely to see measurable performance improvements from this update.

Calls to UPDATE_STORAGE_CATALOG can incur considerable overhead, as the update process typically requires numerous and expensive S3 reads. Vertica advises against running this operation on the entire database. Instead, consider an incremental approach:

  • Call UPDATE_STORAGE_CATALOG on a single large fact table. You can use performance metrics to estimate how much time updating other files will require.

  • Identify which tables are subject to frequent queries and prioritize catalog updates accordingly.