Viewing license compliance for Hadoop file formats

You can use the EXTERNAL_TABLE_DETAILS system table to gather information about all of your tables based on Hadoop file formats.

You can use the EXTERNAL_TABLE_DETAILS system table to gather information about all of your tables based on Hadoop file formats. This information can help you understand how much of your license's data allowance is used by ORC and Parquet-based data.

Vertica computes the values in this table at query time, so to avoid performance problems, restrict your queries to filter by table_schema, table_name, or source_format. These three columns are the only columns you can use in a predicate, but you may use all of the usual predicate operators.

=> SELECT * FROM EXTERNAL_TABLE_DETAILS
    WHERE source_format = 'PARQUET' OR source_format = 'ORC';
-[ RECORD 1 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schema_oid            | 45035996273704978
table_schema          | public
table_oid             | 45035996273760390
table_name            | ORC_demo
source_format         | ORC
total_file_count      | 5
total_file_size_bytes | 789
source_statement      | COPY FROM 'ORC_demo/*' ORC
file_access_error     |
-[ RECORD 2 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schema_oid            | 45035196277204374
table_schema          | public
table_oid             | 45035996274460352
table_name            | Parquet_demo
source_format         | PARQUET
total_file_count      | 3
total_file_size_bytes | 498
source_statement      | COPY FROM 'Parquet_demo/*' PARQUET
file_access_error     |

When computing the size of an external table, Vertica counts all data found in the location specified by the COPY FROM clause. If you have a directory that contains ORC and delimited files, for example, and you define your external table with "COPY FROM *" instead of "COPY FROM *.orc", this table includes the size of the delimited files. (You would probably also encounter errors when querying that external table.) When you query this table Vertica does not validate your table definition; it just uses the path to find files to report.

You can also use the AUDIT function to find the size of a specific table or schema. When using the AUDIT function on ORC or PARQUET external tables, the error tolerance and confidence level parameters are ignored. Instead, the AUDIT always returns the size of the ORC or Parquet files on disk.

=> select AUDIT('customers_orc');
   AUDIT
-----------
 619080883
(1 row)