Verify license compliance for ORC and Parquet data

If you are upgrading from a version before 9.1.0 and:.

If you are upgrading from a version before 9.1.0 and:

  • Your database has external tables based on ORC or Parquet files (whether stored locally on the Vertica cluster or on a Hadoop cluster)

  • Your Vertica license has a raw data allowance

follow the steps in this topic before upgrading.

Background

Vertica licenses can include a raw data allowance. Since 2016, Vertica licenses have allowed you to use ORC and Parquet data in external tables. This data has always counted against any raw data allowance in your license. Previously, the audit of data in ORC and Parquet format was handled manually. Because this audit was not automated, the total amount of data in your native tables and external tables could exceed your licensed allowance for some time before being spotted.

Starting in version 9.1.0, Vertica automatically audits ORC and Parquet data in external tables. This auditing begins soon after you install or upgrade to version 9.1.0. If your Vertica license includes a raw data allowance and you have data in external tables based on Parquet or ORC files, review your license compliance before upgrading to Vertica 9.1.x. Verifying your database is compliant with your license terms avoids having your database become non-compliant soon after you upgrade.

Verifying your ORC and Parquet usage complies with your license terms

To verify your data usage is compliant with your license, run the following query as the database administrator:

SELECT (database_size_bytes + file_size_bytes) <= license_size_bytes
       "license_compliant?"
       FROM (SELECT database_size_bytes,
                    license_size_bytes FROM license_audits
                    WHERE audited_data='Total'
                    ORDER BY audit_end_timestamp DESC LIMIT 1) dbs,
            (SELECT sum(total_file_size_bytes) file_size_bytes
                    FROM external_table_details
                    WHERE source_format IN ('ORC', 'PARQUET')) ets;

This query returns one of three values:

  • If you do not have any external data in ORC or Parquet format, the query returns 0 rows:

     license_compliant?
    --------------------
    (0 rows)
    

    In this case, you can proceed with your upgrade.

  • If you have data in external tables based on ORC or Parquet format, and that data does not cause your database to exceed your raw data allowance, the query returns t:

     license_compliant?
    --------------------
     t
    (1 row)
    

    In this case, you can proceed with your upgrade.

  • If the data in your external tables based on ORC and Parquet causes your database to exceed your raw data allowance, the query returns f:

     license_compliant?
    --------------------
     f
    (1 row)
    

    In this case, resolve the compliance issue before you upgrade. See below for more information.

Resolving non-compliance

If query in the previous section indicates that your database is not in compliance with your license, you should resolve this issue before upgrading. There are two ways you can bring your database into compliance:

  • Contact Vertica to upgrade your license to a larger data size allowance. See Obtaining a license key file.

  • Delete data (either from ORC and Parquet-based external tables or Vertica native tables) to bring your data size into compliance with your license. You should always backup any data you are about to delete from Vertica. Dropping external tables is a less disruptive way to reduce the size of your database, as the data is not lost—it is still in the files that your external table is based on.