This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Profiling query plans

To monitor real-time flow of data through a query plan and its individual , query the following system tables:.

To monitor real-time flow of data through a query plan and its individual paths, query the following system tables:

EXECUTION_ENGINE_PROFILES and QUERY_PLAN_PROFILES. These tables provides data on how Vertica executed a query plan and its individual paths:

Each query plan path has a unique ID, as shown in the following EXPLAIN output fragment.

Both tables provide path-specific data. For example, QUERY_PLAN_PROFILES provides high-level data for each path, which includes:

  • Length of a query operation execution

  • How much memory that path's operation consumed

  • Size of data sent/received over the network

For example, you might observe that a GROUP BY HASH operation executed in 0.2 seconds using 100MB of memory.

Requirements

Real-time profiling minimally requires the ID of the transaction to monitor. If the transaction includes multiple statements, you also need the statement ID. You can get statement and transaction IDs by issuing PROFILE on the query to profile. You can then use these identifiers to query system tables EXECUTION_ENGINE_PROFILES and QUERY_PLAN_PROFILES.

For more information, see Profiling single statements.

1 - Getting query plan status for small queries

Real-time profiling counters, stored in system table EXECUTION_ENGINE_PROFILES, are available for all currently executing statements, including internal operations, such as a .

Real-time profiling counters, stored in system table EXECUTION_ENGINE_PROFILES, are available for all currently executing statements, including internal operations, such as a mergeout.

Profiling counters are available after query execution completes, if any one of the following conditions is true:

  • The query was run via the PROFILE command

  • Systemwide profiling is enabled by Vertica meta-function ENABLE_PROFILING.

  • The query ran more than two seconds.

Profiling counters are saved in system table EXECUTION_ENGINE_PROFILES until the storage quota is exceeded.

For example:

  1. Profile the query to get transaction_id and statement_id from from EXECUTION_ENGINE_PROFILES. For example:

    => PROFILE SELECT * FROM t1 JOIN t2 ON t1.x = t2.y;
    NOTICE 4788:  Statement is being profiled
    HINT:  Select * from v_monitor.execution_engine_profiles where transaction_id=45035996273955065 and statement_id=4;
    NOTICE 3557:  Initiator memory for query: [on pool general: 248544 KB, minimum: 248544 KB]
    NOTICE 5077:  Total memory required by query: [248544 KB]
     x | y |   z
    ---+---+-------
     3 | 3 | three
    (1 row)
    
  2. Query system table QUERY_PLAN_PROFILES.

    => SELECT ... FROM query_plan_profiles
         WHERE transaction_id=45035996273955065 and statement_id=4;
         ORDER BY transaction_id, statement_id, path_id, path_line_index;
    

2 - Getting query plan status for large queries

Real-time profiling is designed to monitor large (long-running) queries.

Real-time profiling is designed to monitor large (long-running) queries. Take the following steps to monitor plans for large queries:

  1. Get the statement and transaction IDs for the query plan you want to profile by querying system table CURRENT_SESSION:

    => SELECT transaction_id, statement_id from current_session;
      transaction_id   | statement_id
    -------------------+--------------
     45035996273955001 |            4
    (1 row)
    
  2. Run the query:

    => SELECT * FROM t1 JOIN t2 ON x=y JOIN ext on y=z;
    
  3. Query system table QUERY_PLAN_PROFILES, and sort on the transaction_id, statement_id, path_id, and path_line_index columns.

    => SELECT ... FROM query_plan_profiles WHERE transaction_id=45035996273955001 and statement_id=4
       ORDER BY transaction_id, statement_id, path_id, path_line_index;
    

You can also use the Linux watch command to monitor long-running queries (see Real-time profiling).

Example

The following series of commands creates a table for a long-running query and then queries system table QUERY_PLAN_PROFILES:

  1. Create table longq:

    => CREATE TABLE longq(x int);
    CREATE TABLE
    => COPY longq FROM STDIN;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 1
    >> 2
    >> 3
    >> 4
    >> 5
    >> 6
    >> 7
    >> 8
    >> 9
    >> 10
    >> \.
    => INSERT INTO longq SELECT f1.x+f2.x+f3.x+f4.x+f5.x+f6.x+f7.x
          FROM longq f1
          CROSS JOIN longq f2
          CROSS JOIN longq f3
          CROSS JOIN longq f4
          CROSS JOIN longq f5
          CROSS JOIN longq f6
          CROSS JOIN longq f7;
      OUTPUT
    ----------
     10000000
    (1 row)
    => COMMIT;
    COMMIT
    
  2. Suppress query output on the terminal window by using the vsql \o command:

    => \o /home/dbadmin/longQprof
    
  3. Query the new table:

    => SELECT * FROM longq;
    
  4. Get the transaction and statement IDs:

    => SELECT transaction_id, statement_id from current_session;
      transaction_id   | statement_id
    -------------------+--------------
     45035996273955021 |            4
    (1 row)
    
  5. Turn off the \o command so Vertica continues to save query plan information to the file you specified. Alternatively, leave it on and examine the file after you query system table QUERY_PLAN_PROFILES.

    => \o
    
  6. Query system table QUERY_PLAN_PROFILES:

    => SELECT
         transaction_id,
         statement_id,
         path_id,
         path_line_index,
         is_executing,
         running_time,
         path_line
       FROM query_plan_profiles
       WHERE transaction_id=45035996273955021 AND statement_id=4
       ORDER BY transaction_id, statement_id, path_id, path_line_index;
    

3 - Improving readability of QUERY_PLAN_PROFILES output

Output from the QUERY_PLAN_PROFILES table can be very wide because of the path_line column.

Output from the QUERY_PLAN_PROFILES table can be very wide because of the path_line column. To facilitate readability, query QUERY_PLAN_PROFILES using one or more of the following options:

  • Sort output by transaction_id, statement_id, path_id, and path_line_index:

    => SELECT ... FROM query_plan_profiles
         WHERE ...
        ORDER BY transaction_id, statement_id, path_id, path_line_index;
    
  • Use column aliases to decrease column width:

    => SELECT statement_id AS sid, path_id AS id, path_line_index AS order,
         is_started AS start, is_completed AS end, is_executing AS exe,
         running_time AS run, memory_allocated_bytes AS mem,
         read_from_disk_bytes AS read, received_bytes AS rec,
         sent_bytes AS sent, FROM query_plan_profiles
         WHERE transaction_id=45035996273910558 AND statement_id=3
         ORDER BY transaction_id, statement_id, path_id, path_line_index;
    
  • Use the vsql \o command to redirect EXPLAIN output to a file:

    => \o /home/dbadmin/long-queries
    => EXPLAIN SELECT * FROM customer_dimension;
    => \o
    

4 - Managing query profile data

Vertica retains data for queries until the storage quota for the table is exceeded, when it automatically purges the oldest queries to make room for new ones.

Vertica retains data for queries until the storage quota for the table is exceeded, when it automatically purges the oldest queries to make room for new ones. You can also clear profiled data by calling one of the following functions:

  • CLEAR_PROFILING clears profiled data from memory. For example, the following command clears profiling for general query-run information, such as the query strings used and the duration of queries.

    => SELECT CLEAR_PROFILING('query');
    
  • CLEAR_DATA_COLLECTOR clears all memory and disk records on the Data Collector tables and functions and resets collection statistics in system table DATA_COLLECTOR.

  • FLUSH_DATA_COLLECTOR waits until memory logs are moved to disk and then flushes the Data Collector, synchronizing the DataCollector log with the disk storage.

Configuring data retention policies

Vertica retains the historical data it gathers as specified by the configured retention policies.

5 - Analyzing suboptimal query plans

If profiling uncovers a suboptimal query, invoking one of the following functions might help:.

If profiling uncovers a suboptimal query, invoking one of the following functions might help:

  • ANALYZE_WORKLOAD analyzes system information held in system tables and provides tuning recommendations that are based on a combination of statistics, system and data collector events, and database-table-projection design.

  • ANALYZE_STATISTICS collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table or column.

You can also run your query through the Database Designer. See Incremental Design.