This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Table management functions

This section contains the functions associated with the Vertica library table management.

1: COPY_TABLE
2: INFER_EXTERNAL_TABLE_DDL
3: INFER_TABLE_DDL
4: PURGE_TABLE
5: REBALANCE_TABLE

This section contains the functions associated with the Vertica library table management.

1 - COPY_TABLE

Copies one table to another.

Copies one table to another. This lightweight, in-memory function copies the DDL and all user-created projections from the source table. Projection statistics for the source table are also copied. Thus, the source and target tables initially have identical definitions and share the same storage.

Note

Although they share storage space, Vertica regards the tables as discrete objects for license capacity purposes. For example, a single-terabyte table and its copy initially consume only one TB of space. However, your Vertica license regards them as separate objects that consume two TB of space.

After the copy operation is complete, the source and copy tables are independent of each other, so you can perform DML operations on one table without impacting the other. These operations can increase the overall storage required for both tables.

Caution

If you create multiple copies of the same table concurrently, one or more of the copy operations is liable to fail. Instead, copy tables sequentially.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

COPY_TABLE (
    '[[database.]schema.]source-table',
    '[[database.]schema.]target-table'
)

Parameters

[database.]schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
*source-table*: The source table to copy. Vertica copies all data from this table to the target table.
*target-table*: The target table of the source table. If the target table already exists, Vertica appends the source to the existing table.
If the table does not exist, Vertica creates a table from the source table's definition, by calling CREATE TABLE with LIKE and INCLUDING PROJECTIONS clause. The new table inherits ownership from the source table. For details, see Replicating a table.

Privileges

Non-superuser:

Source table: SELECT
Target schema/table (new): CREATE
Target table (existing): INSERT

Table attribute requirements

The following attributes of both tables must be identical:

Column definitions, including NULL/NOT NULL constraints
Segmentation
Partitioning expression
Number of projections
Projection sort order
Primary and unique key constraints. However, the key constraints do not have to be identically enabled.

Note
If the target table has primary or unique key constraints enabled and moving the partitions will insert duplicate key values into the target table, Vertica rolls back the operation. Enforcing constraints requires disk reads and can slow the copy process.
Number and definitions of text indices.
If the destination table already exists, the source and destination tables must have identical access policies.

Additionally, If access policies exist on the source table, the following must be true:

Access policies on both tables must be identical.
One of the following must be true:
- The executing user owns the source table.
- AccessPolicyManagementSuperuserOnly is set to true. See Managing access policies for details.

Table restrictions

The following restrictions apply to the source and target tables:

If the source and target partitions are in different storage tiers, Vertica returns a warning but the operation proceeds. The partitions remain in their existing storage tier.
If the source table contains a sequence, Vertica converts the sequence to an integer before copying it to the target table. If the target table contains auto-increment, identity, or named sequence columns, Vertica cancels the copy and displays an error message.
The following tables cannot be used as sources or targets:
- Temporary tables
- Virtual tables
- System tables
- External tables

Examples

If you call COPY_TABLE and the target table does not exist, the function creates the table automatically. In the following example, COPY_TABLE creates the target table public.newtable. Vertica also copies all the constraints associated with the source table public.product_dimension except foreign key constraints:

=> SELECT COPY_TABLE ( 'public.product_dimension', 'public.newtable');
-[ RECORD 1 ]--------------------------------------------------
copy_table | Created table public.newtable.
Copied table public.product_dimension to public.newtable

2 - INFER_EXTERNAL_TABLE_DDL

This function is deprecated and will be removed in a future release.

Deprecated

This function is deprecated and will be removed in a future release. Instead, use INFER_TABLE_DDL.

Inspects a file in Parquet, ORC, or Avro format and returns a CREATE EXTERNAL TABLE AS COPY statement that can be used to read the file. This statement might be incomplete. It could also contain more columns or columns with longer names than what Vertica supports; this function does not enforce Vertica system limits. Always inspect the output and address any issues before using it to create a table.

This function supports partition columns for the ORC and Parquet formats only. Parquet and ORC files contain insufficient information to infer the type of partition columns, so this function shows these columns with a data type of UNKNOWN and emits a warning.

The function handles most data types, including complex types. If an input type is not supported in Vertica, the function emits a warning.

By default, the function uses strong typing for complex types. You can instead treat the column as a flexible complex type by setting the vertica_type_for_complex_type parameter to LONG VARBINARY.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

INFER_EXTERNAL_TABLE_DDL( path USING PARAMETERS param=value[,...] )

Arguments

path: Path to a file or directory. Any path that is valid for COPY and uses a file format supported by this function is valid.

Parameters

format: Input format (string), one of 'Parquet', 'ORC', or 'Avro'. This parameter is required.
table_name: The name of the external table to create. This parameter is required.
Do not include a schema name as part of the table name; use the table_schema parameter.
table_schema: The schema in which to create the external table. If omitted, the function does not include a schema in the output.
vertica_type_for_complex_type: Type used to represent all columns of complex types, if you do not want to expand them fully. The only supported value is LONG VARBINARY. For more information, see Flexible complex types.

Privileges

Non-superuser: READ privileges on the USER-accessible storage location.

Examples

In the following example, the input file contains data for a table with two integer columns. The table definition can be fully inferred, and you can use the returned SQL statement as-is.

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/orders/*.orc'
        USING PARAMETERS format = 'orc', table_name = 'orders');

                INFER_EXTERNAL_TABLE_DDL
--------------------------------------------------------------------------------------------------
create external table "orders" (
  "id" int,
  "quantity" int
) as copy from '/data/orders/*.orc' orc;
(1 row)

To create a table in a schema, use the table_schema parameter. Do not add it to the table name; the function treats it as a name with a period in it, not a schema.

The following example shows output with complex types. You can use the definition as-is or modify the VARCHAR sizes:

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/people/*.parquet'
        USING PARAMETERS format = 'parquet', table_name = 'employees');
WARNING 9311:  This generated statement contains one or more varchar/varbinary columns which default to length 80
                    INFER_EXTERNAL_TABLE_DDL
-------------------------------------------------------------------------
 create external table "employees"(
  "employeeID" int,
  "personal" Row(
    "name" varchar,
    "address" Row(
      "street" varchar,
      "city" varchar,
      "zipcode" int
    ),
    "taxID" int
  ),
  "department" varchar
 ) as copy from '/data/people/*.parquet' parquet;
(1 row)

In the following example, the input file contains a map in the "prods" column. You can read a map as an array of rows:

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/orders.parquet'
    USING PARAMETERS format='parquet', table_name='orders');
WARNING 9311:  This generated statement contains one or more varchar/varbinary columns which default to length 80
                INFER_EXTERNAL_TABLE_DDL
------------------------------------------------------------------------
 create external table "orders"(
  "orderkey" int,
  "custkey" int,
  "prods" Array[Row(
    "key" varchar,
    "value" numeric(12,2)
  )],
  "orderdate" date
 ) as copy from '/data/orders.parquet' parquet;
(1 row)

The following example uses partition columns. Types of partition column cannot be determined from the data and you must edit to specify the types. In this example, the date and region columns are in the data in addition to being partition columns, and so the table definition shows them twice:

=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/sales/*/*/*
        USING PARAMETERS format = 'parquet', table_name = 'sales');
WARNING 9262: This generated statement is incomplete because of one or more unknown column types.
Fix these data types before creating the table
                INFER_EXTERNAL_TABLE_DDL
------------------------------------------------------------------------
 create external table "sales"(
  "tx_id" int,
  "date" UNKNOWN,
  "region" UNKNOWN
) as copy from '/data/sales/*/*/*' parquet(hive_partition_cols='date,region');
(1 row)

For VARCHAR and VARBINARY columns, this function does not specify a length. The Vertica default length for these types is 80 bytes. If the data values are longer, using this table definition unmodified could cause data to be truncated. Always review VARCHAR and VARBINARY columns to determine if you need to specify a length. This function emits a warning if the input file contains columns of these types:

WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80

3 - INFER_TABLE_DDL

Inspects a file in Parquet, ORC, or Avro format and returns a CREATE TABLE or CREATE EXTERNAL TABLE statement based on its contents.

Inspects a file in Parquet, ORC, or Avro format and returns a CREATE TABLE or CREATE EXTERNAL TABLE statement based on its contents. If you use a glob to specify more than one file, the function inspects only one.

The returned statement might be incomplete if the file contains ambiguous or unknown data types. It could also contain more columns or columns with longer names than what Vertica supports; this function does not enforce Vertica system limits. Always inspect the output and address any issues before using it to create a table.

This function supports partition columns, inferred from the input path, for the ORC and Parquet formats only. Because partitioning is done through the directory structure, there is insufficient information to infer the type of partition columns. This function shows these columns with a data type of UNKNOWN and emits a warning. Partition columns apply only to external tables.

The function handles most data types, including complex types. If an input type is not supported in Vertica, the function emits a warning.

For VARCHAR and VARBINARY columns, this function does not specify a length. The Vertica default length for these types is 80 bytes. If the data values are longer, using the returned table definition unmodified could cause data to be truncated. Always review VARCHAR and VARBINARY columns to determine if you need to specify a length. This function emits a warning if the input file contains columns of these types:

WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

INFER_TABLE_DDL( path USING PARAMETERS param=value[,...] )

Arguments

path: Path to a file or glob. Any path that is valid for COPY and uses a file format supported by this function is valid. If a glob specifies more than one file, this function reads a single file.

Parameters

format: Input format (string), one of 'Parquet', 'ORC', or 'Avro'. This parameter is required.
table_name: The name of the table to create. This parameter is required.
Do not include a schema name as part of the table name; use the table_schema parameter.
table_schema: The schema in which to create the table. If omitted, the function does not include a schema in the output.
table_type: The type of table to create, either 'native' (the default) or 'external'.
with_copy_statement: For native tables, whether to include a COPY statement in addition to the CREATE TABLE statement. The default is false.

Privileges

Non-superuser: READ privileges on the USER-accessible storage location.

Examples

In the following example, the input path contains data for a table with two integer columns. The external table definition can be fully inferred, and you can use the returned SQL statement as-is. The function reads one file from the input path:

=> SELECT INFER_TABLE_DDL('/data/orders/*.orc'
        USING PARAMETERS format = 'orc', table_name = 'orders', table_type = 'external');

                INFER_TABLE_DDL
-----------------------------------------------------------------------------------
create external table "orders" (
  "id" int,
  "quantity" int
) as copy from '/data/orders/*.orc' orc;
(1 row)

To create a table in a schema, use the table_schema parameter. Do not add it to the table name; the function treats it as a name with a period in it, not a schema.

The following example shows output with complex types. You can use the definition as-is or modify the VARCHAR sizes:

=> SELECT INFER_TABLE_DDL('/data/people/*.parquet'
        USING PARAMETERS format = 'parquet', table_name = 'employees');
WARNING 9311:  This generated statement contains one or more varchar/varbinary columns which default to length 80
                    INFER_TABLE_DDL
-------------------------------------------------------------------------
 create table "employees"(
  "employeeID" int,
  "personal" Row(
    "name" varchar,
    "address" Row(
      "street" varchar,
      "city" varchar,
      "zipcode" int
    ),
    "taxID" int
  ),
  "department" varchar
 );
(1 row)

In the following example, the input file contains a map in the "prods" column. You can read a map as an array of rows:

=> SELECT INFER_TABLE_DDL('/data/orders.parquet'
    USING PARAMETERS format='parquet', table_name='orders');
WARNING 9311:  This generated statement contains one or more varchar/varbinary columns which default to length 80
                INFER_TABLE_DDL
------------------------------------------------------------------------
 create table "orders"(
  "orderkey" int,
  "custkey" int,
  "prods" Array[Row(
    "key" varchar,
    "value" numeric(12,2)
  )],
  "orderdate" date
 );
(1 row)

The following example returns the definition of a native table and the COPY statement:

=> SELECT INFER_TABLE_DDL('/data/orders/*.orc'
        USING PARAMETERS format = 'orc', table_name = 'orders',
                         table_type = 'native', with_copy_statement = true);

                INFER_TABLE_DDL
--------------------------------------------------------------------------------------------------
create table "orders" (
  "id" int,
  "quantity" int
);
copy "orders" from '/data/orders/*.orc' orc;

In the following example, the data contains one materialized column and two partition columns. The date and region columns are in the data in addition to being partition columns, and so the table definition shows them twice. Partition columns are always of unknown type:

=> SELECT INFER_TABLE_DDL('/data/sales/*/*/*
        USING PARAMETERS format = 'parquet', table_name = 'sales', table_type = 'external');
WARNING 9262: This generated statement is incomplete because of one or more unknown column types.
Fix these data types before creating the table
                INFER_TABLE_DDL
------------------------------------------------------------------------
 create external table "sales"(
  "tx_id" int,
  "date" UNKNOWN,
  "region" UNKNOWN
) as copy from '/data/sales/*/*/*' parquet(hive_partition_cols='date,region');
(1 row)

4 - PURGE_TABLE

This function was formerly named PURGE_TABLE_PROJECTIONS().

Note

This function was formerly named PURGE_TABLE_PROJECTIONS(). Vertica still supports the former function name.

Permanently removes deleted data from physical storage so disk space can be reused. You can purge historical data up to and including the Ancient History Mark epoch.

Purges all projections of the specified table. You cannot use this function to purge temporary tables.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

PURGE_TABLE ( '[[database.]schema.]table' )

Parameters

[database.]schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
table: The table to purge.

Privileges

Table owner
USAGE privilege on schema

Caution

PURGE_TABLE could temporarily take up significant disk space while the data is being purged.

Examples

The following example purges all projections for the store sales fact table located in the Vmart schema:

=> SELECT PURGE_TABLE('store.store_sales_fact');

5 - REBALANCE_TABLE

Synchronously rebalances data in the specified table.

A rebalance operation performs the following tasks:

Distributes data based on:
- User-defined fault groups, if specified
- Large cluster automatic fault groups
Redistributes database projection data across all nodes.

This is a meta-function. You must call meta-functions in a top-level SELECT statement.

Behavior type

Volatile

Syntax

REBALANCE_TABLE('[[database.]schema.]table-name')

Parameters

schema: Database and schema. The default schema is public. If you specify a database, it must be the current database.
table-name: The table to rebalance.

Privileges

Superuser

When to rebalance

Rebalancing is useful or even necessary after you perform the following tasks:

Mark one or more nodes as ephemeral in preparation of removing them from the cluster.
Add one or more nodes to the cluster so that Vertica can populate the empty nodes with data.
Change the scaling factor of an elastic cluster, which determines the number of storage containers used to store a projection across the database.
Set the control node size or realign control nodes on a large cluster layout
Add nodes to or remove nodes from a fault group.

Tip

By default, before performing a rebalance, Vertica queries system tables to compute the size of all projections involved in the rebalance task. This query can add significant overhead to the rebalance operation. To disable this query, set projection configuration parameter RebalanceQueryStorageContainers to 0.

Examples

The following command shows how to rebalance data on the specified table.

=> SELECT REBALANCE_TABLE('online_sales.online_sales_fact');
REBALANCE_TABLE
-------------------
 REBALANCED
(1 row)

Table management functions

1 - COPY_TABLE

Note

Caution

Behavior type

Syntax

Parameters

Privileges

Table attribute requirements

Note

Table restrictions

Examples

See also

2 - INFER_EXTERNAL_TABLE_DDL

Deprecated

Behavior type

Syntax

Arguments

Parameters

Privileges

Examples

3 - INFER_TABLE_DDL

Behavior type

Syntax

Arguments

Parameters

Privileges

Examples

4 - PURGE_TABLE

Note

Behavior type

Syntax

Parameters

Privileges

Caution

Examples

See also

5 - REBALANCE_TABLE

Behavior type

Syntax

Parameters

Privileges

When to rebalance

Tip

Examples

See also