Monitoring exports
You can review information about exports, including numbers of row groups, file sizes, and file names.
The export statements are UDxs. The UDX_EVENTS system table records events logged during UDx execution, including timestamps, node names, and session IDs. This table contains a column (RAW), which holds a VMap of whatever additional data an individual UDx logged. The export statements log details about the exported files in this table. While you can work with this table directly and materialize values from the VMap column, you might prefer to define a view to simplify your access.
The following statement defines a view showing only the events from EXPORT TO PARQUET
, materializing the VMap values.
=> CREATE VIEW parquet_export_events AS
SELECT
report_time,
node_name,
session_id,
user_id,
user_name,
transaction_id,
statement_id,
request_id,
udx_name,
file,
created,
closed,
rows,
row_groups,
size_mb
FROM
v_monitor.udx_events
WHERE
udx_name ilike 'ParquetExport%';
The exporters report the following UDx-specific columns:
Column Name | Data Type | Description |
---|---|---|
FILE | VARCHAR | Name of the output file. |
CREATED | TIMESTAMPTZ | When the file was created. |
CLOSED | TIMESTAMPTZ | When the file was closed after writing. |
ROWS | INTEGER | The total number of rows in the file. |
ROW_GROUPS | INTEGER | The number of row groups, for formats that use them (Parquet and ORC). |
SIZE_MB | FLOAT | File size. |
The following example shows the results of a single export.
=> SELECT file,rows,row_groups,size_mb FROM PARQUET_EXPORT_EVENTS;
file | rows | row_groups | size_mb
-----------------------------------------------------------------------+-------+------------+----------
/data/outgZxN3irt/450c4213-v_vmart_node0001-139770732459776-0.parquet | 29696 | 1 | 0.667203
/data/outgZxN3irt/9df1c797-v_vmart_node0001-139770860660480-0.parquet | 29364 | 1 | 0.660922
(2 rows)
In this table, the output directory name (/data/out) is appended with a generated string (gZxN3irt). For exports to HDFS or to local file systems (including NFS), EXPORT TO PARQUET first writes data into a scratch directory and then renames it at the end of the operation. The events are logged during export and so show the temporary name. Some output destinations, such as AWS S3, do not support rename operations, so in those cases this table does not show generated names.