Monitoring exports

You can review information about exports, including numbers of row groups, file sizes, and file names.

You can review information about exports, including numbers of row groups, file sizes, and file names.

The export statements are UDxs. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (__RAW__), which holds a VMap of whatever additional data an individual UDx logged. The export statements log details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.

The following statement defines a view showing only the events from EXPORT TO PARQUET, materializing the VMap values.

=> CREATE VIEW parquet_export_events AS
   SELECT
    report_time,
    node_name,
    session_id,
    user_id,
    user_name,
    transaction_id,
    statement_id,
    request_id,
    udx_name,
    event_type,
    file,
    created,
    closed,
    rows,
    row_groups,
    size_mb,
    row_group_file,
    row_group_n,
    row_group_rows,
    row_group_bytes,
    row_group_expected_size_mb, 
    row_groups_exceeding_target
   FROM
    v_monitor.udx_events
   WHERE
    udx_name ilike 'ParquetExport%';

The exporters report the following UDx-specific columns:

Column Name Data Type Description
EVENT_TYPE VARCHAR What was written. All exporters report FILE events. In addition, the Parquet exporter reports ROW_GROUP events. Row groups are only logged if debugging is enabled.
FILE VARCHAR For EVENT_TYPE of FILE, name of the output file.
CREATED TIMESTAMPTZ For EVENT_TYPE of FILE, when the file was created.
CLOSED TIMESTAMPTZ For EVENT_TYPE of FILE, when the file was closed after writing.
ROWS INTEGER For EVENT_TYPE of FILE, the total number of rows in the file.
ROW_GROUPS INTEGER For EVENT_TYPE of FILE, the number of row groups, for formats that use them (Parquet and ORC).
SIZE_MB FLOAT For EVENT_TYPE of FILE, file size.
ROW_GROUP_FILE VARCHAR For EVENT_TYPE of ROW_GROUP, name of the file containing the row group.
ROW_GROUP_N INTEGER For EVENT_TYPE of ROW_GROUP, row group number.
ROW_GROUP_ROWS INTEGER For EVENT_TYPE of ROW_GROUP, number of rows written in this row group.
ROW_GROUP_BYTES INTEGER For EVENT_TYPE of ROW_GROUP, number of bytes written in this row group.
ROW_GROUP_EXPECTED_SIZE_MB INTEGER For Parquet exports and an EVENT_TYPE of FILE, target size of each row group.
ROW_GROUPS_EXCEEDING_TARGET INTEGER For Parquet exports and an EVENT_TYPE of FILE, number of row groups exceeding the target size.

The following example shows the results of a single export.

=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
                                  file                                 | rows  | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
 /data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1          | 0.667203
 /data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1          | 0.660922
(2 rows)

In this table, the output directory name (/data/out) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.

ROW_GROUP events are only recorded when debugging is enabled:

=> SELECT SET_DEBUG_LOG_ALLNODES('UDX', 'ALL');