Monitoring exports
You can review information about exports, including numbers of row groups, file sizes, and file names.
The export statements are UDxs. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (__RAW__), which holds a VMap of whatever additional data an individual UDx logged. The export statements log details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.
The following statement defines a view showing only the events from EXPORT TO PARQUET
, materializing the VMap values.
=> CREATE VIEW parquet_export_events AS
SELECT
report_time,
node_name,
session_id,
user_id,
user_name,
transaction_id,
statement_id,
request_id,
udx_name,
event_type,
file,
created,
closed,
rows,
row_groups,
size_mb,
row_group_file,
row_group_n,
row_group_rows,
row_group_bytes,
row_group_expected_size_mb,
row_groups_exceeding_target
FROM
v_monitor.udx_events
WHERE
udx_name ilike 'ParquetExport%';
The exporters report the following UDx-specific columns:
Column Name | Data Type | Description |
---|---|---|
EVENT_TYPE | VARCHAR | What was written. All exporters report FILE events. In addition, the Parquet exporter reports ROW_GROUP events. Row groups are only logged if debugging is enabled. |
FILE | VARCHAR | For EVENT_TYPE of FILE, name of the output file. |
CREATED | TIMESTAMPTZ | For EVENT_TYPE of FILE, when the file was created. |
CLOSED | TIMESTAMPTZ | For EVENT_TYPE of FILE, when the file was closed after writing. |
ROWS | INTEGER | For EVENT_TYPE of FILE, the total number of rows in the file. |
ROW_GROUPS | INTEGER | For EVENT_TYPE of FILE, the number of row groups, for formats that use them (Parquet and ORC). |
SIZE_MB | FLOAT | For EVENT_TYPE of FILE, file size. |
ROW_GROUP_FILE | VARCHAR | For EVENT_TYPE of ROW_GROUP, name of the file containing the row group. |
ROW_GROUP_N | INTEGER | For EVENT_TYPE of ROW_GROUP, row group number. |
ROW_GROUP_ROWS | INTEGER | For EVENT_TYPE of ROW_GROUP, number of rows written in this row group. |
ROW_GROUP_BYTES | INTEGER | For EVENT_TYPE of ROW_GROUP, number of bytes written in this row group. |
ROW_GROUP_EXPECTED_SIZE_MB | INTEGER | For Parquet exports and an EVENT_TYPE of FILE, target size of each row group. |
ROW_GROUPS_EXCEEDING_TARGET | INTEGER | For Parquet exports and an EVENT_TYPE of FILE, number of row groups exceeding the target size. |
The following example shows the results of a single export.
=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
file | rows | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
/data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1 | 0.667203
/data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1 | 0.660922
(2 rows)
In this table, the output directory name (/data/out
) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.
ROW_GROUP events are only recorded when debugging is enabled:
=> SELECT SET_DEBUG_LOG_ALLNODES('UDX', 'ALL');