Query plan cost estimation
The query optimizer chooses a query plan based on cost estimates. The query optimizer uses information from a number of sources to develop potential plans and determine their relative costs. These include:
-
Number of table rows
-
Column statistics, including: number of distinct values (cardinality), minimum/maximum values, distribution of values, and disk space usage
-
Access path that is likely to require fewest I/O operations, and lowest CPU, memory, and network usage
-
Available eligible projections
-
Join options: join types (merge versus hash joins), join order
-
Query predicates
-
Data segmentation across cluster nodes
Many important optimizer decisions rely on statistics, which the query optimizer uses to determine the final plan to execute a query. Therefore, it is important that statistics be up to date. Without reasonably accurate statistics, the optimizer could choose a suboptimal plan, which might affect query performance.
Vertica provides hints about statistics in the query plan. See Query plan statistics.
Cost versus execution runtime
Although costs correlate to query runtime, they do not provide an estimate of actual runtime. For example, if the optimizer determines that Plan A costs twice as much as Plan B, it is likely that Plan A will require more time to run. However, this cost estimate does not necessarily indicate that Plan A will run twice as long as Plan B.
Also, plan costs for different queries are not directly comparable. For example, if the estimated cost of Plan X for query1 is greater than the cost of Plan Y for query2, it is not necessarily true that Plan X's runtime is greater than Plan Y's runtime.