This section contains the functions associated with the Vertica library table management.
This is the multi-page printable view of this section. Click here to print.
Table management functions
- 1: COPY_TABLE
- 2: INFER_EXTERNAL_TABLE_DDL
- 3: INFER_TABLE_DDL
- 4: PURGE_TABLE
- 5: REBALANCE_TABLE
1 - COPY_TABLE
Copies one table to another. This lightweight, in-memory function copies the DDL and all user-created projections from the source table. Projection statistics for the source table are also copied. Thus, the source and target tables initially have identical definitions and share the same storage.
Note
Although they share storage space, Vertica regards the tables as discrete objects for license capacity purposes. For example, a single-terabyte table and its copy initially consume only one TB of space. However, your Vertica license regards them as separate objects that consume two TB of space.After the copy operation is complete, the source and copy tables are independent of each other, so you can perform DML operations on one table without impacting the other. These operations can increase the overall storage required for both tables.
Caution
If you create multiple copies of the same table concurrently, one or more of the copy operations is liable to fail. Instead, copy tables sequentially.This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
COPY_TABLE (
'[[database.]schema.]source-table',
'[[database.]schema.]target-table'
)
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.*
source-table*
- The source table to copy. Vertica copies all data from this table to the target table.
*
target-table*
- The target table of the source table. If the target table already exists, Vertica appends the source to the existing table.
If the table does not exist, Vertica creates a table from the source table's definition, by calling
CREATE TABLE
withLIKE
andINCLUDING PROJECTIONS
clause. The new table inherits ownership from the source table. For details, see Replicating a table.
Privileges
Non-superuser:
-
Source table: SELECT
-
Target schema/table (new): CREATE
-
Target table (existing): INSERT
Table attribute requirements
The following attributes of both tables must be identical:
-
Column definitions, including NULL/NOT NULL constraints
-
Segmentation
-
Partitioning expression
-
Number of projections
-
Projection sort order
-
Primary and unique key constraints. However, the key constraints do not have to be identically enabled.
Note
If the target table has primary or unique key constraints enabled and moving the partitions will insert duplicate key values into the target table, Vertica rolls back the operation. Enforcing constraints requires disk reads and can slow the copy process. -
Number and definitions of text indices.
-
If the destination table already exists, the source and destination tables must have identical access policies.
Additionally, If access policies exist on the source table, the following must be true:
-
Access policies on both tables must be identical.
-
One of the following must be true:
-
The executing user owns the source table.
-
AccessPolicyManagementSuperuserOnly
is set to true. See Managing access policies for details.
-
Table restrictions
The following restrictions apply to the source and target tables:
-
If the source and target partitions are in different storage tiers, Vertica returns a warning but the operation proceeds. The partitions remain in their existing storage tier.
-
If the source table contains a sequence, Vertica converts the sequence to an integer before copying it to the target table. If the target table contains auto-increment, identity, or named sequence columns, Vertica cancels the copy and displays an error message.
-
The following tables cannot be used as sources or targets:
-
Temporary tables
-
Virtual tables
-
System tables
-
External tables
-
Examples
If you call COPY_TABLE and the target table does not exist, the function creates the table automatically. In the following example, COPY_TABLE creates the target table public.newtable
. Vertica also copies all the constraints associated with the source table public.product_dimension
except foreign key constraints:
=> SELECT COPY_TABLE ( 'public.product_dimension', 'public.newtable');
-[ RECORD 1 ]--------------------------------------------------
copy_table | Created table public.newtable.
Copied table public.product_dimension to public.newtable
See also
Creating a table from other tables2 - INFER_EXTERNAL_TABLE_DDL
Deprecated
This function is deprecated and will be removed in a future release. Instead, use INFER_TABLE_DDL.Inspects a file in Parquet, ORC, or Avro format and returns a CREATE EXTERNAL TABLE AS COPY statement that can be used to read the file. This statement might be incomplete. It could also contain more columns or columns with longer names than what Vertica supports; this function does not enforce Vertica system limits. Always inspect the output and address any issues before using it to create a table.
This function supports partition columns for the ORC and Parquet formats only. Parquet and ORC files contain insufficient information to infer the type of partition columns, so this function shows these columns with a data type of UNKNOWN and emits a warning.
The function handles most data types, including complex types. If an input type is not supported in Vertica, the function emits a warning.
By default, the function uses strong typing for complex types. You can instead treat the column as a flexible complex type by setting the vertica_type_for_complex_type
parameter to LONG VARBINARY.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
INFER_EXTERNAL_TABLE_DDL( path USING PARAMETERS param=value[,...] )
Arguments
path
- Path to a file or directory. Any path that is valid for COPY and uses a file format supported by this function is valid.
Parameters
format
- Input format (string), one of 'Parquet', 'ORC', or 'Avro'. This parameter is required.
table_name
- The name of the external table to create. This parameter is required.
Do not include a schema name as part of the table name; use the
table_schema
parameter. table_schema
- The schema in which to create the external table. If omitted, the function does not include a schema in the output.
vertica_type_for_complex_type
- Type used to represent all columns of complex types, if you do not want to expand them fully. The only supported value is LONG VARBINARY. For more information, see Flexible complex types.
Privileges
Non-superuser: READ privileges on the USER-accessible storage location.
Examples
In the following example, the input file contains data for a table with two integer columns. The table definition can be fully inferred, and you can use the returned SQL statement as-is.
=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/orders/*.orc'
USING PARAMETERS format = 'orc', table_name = 'orders');
INFER_EXTERNAL_TABLE_DDL
--------------------------------------------------------------------------------------------------
create external table "orders" (
"id" int,
"quantity" int
) as copy from '/data/orders/*.orc' orc;
(1 row)
To create a table in a schema, use the table_schema
parameter. Do not add it to the table name; the function treats it as a name with a period in it, not a schema.
The following example shows output with complex types. You can use the definition as-is or modify the VARCHAR sizes:
=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/people/*.parquet'
USING PARAMETERS format = 'parquet', table_name = 'employees');
WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80
INFER_EXTERNAL_TABLE_DDL
-------------------------------------------------------------------------
create external table "employees"(
"employeeID" int,
"personal" Row(
"name" varchar,
"address" Row(
"street" varchar,
"city" varchar,
"zipcode" int
),
"taxID" int
),
"department" varchar
) as copy from '/data/people/*.parquet' parquet;
(1 row)
In the following example, the input file contains a map in the "prods" column. You can read a map as an array of rows:
=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/orders.parquet'
USING PARAMETERS format='parquet', table_name='orders');
WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80
INFER_EXTERNAL_TABLE_DDL
------------------------------------------------------------------------
create external table "orders"(
"orderkey" int,
"custkey" int,
"prods" Array[Row(
"key" varchar,
"value" numeric(12,2)
)],
"orderdate" date
) as copy from '/data/orders.parquet' parquet;
(1 row)
The following example uses partition columns. Types of partition column cannot be determined from the data and you must edit to specify the types. In this example, the date and region columns are in the data in addition to being partition columns, and so the table definition shows them twice:
=> SELECT INFER_EXTERNAL_TABLE_DDL('/data/sales/*/*/*
USING PARAMETERS format = 'parquet', table_name = 'sales');
WARNING 9262: This generated statement is incomplete because of one or more unknown column types.
Fix these data types before creating the table
INFER_EXTERNAL_TABLE_DDL
------------------------------------------------------------------------
create external table "sales"(
"tx_id" int,
"date" UNKNOWN,
"region" UNKNOWN
) as copy from '/data/sales/*/*/*' parquet(hive_partition_cols='date,region');
(1 row)
For VARCHAR and VARBINARY columns, this function does not specify a length. The Vertica default length for these types is 80 bytes. If the data values are longer, using this table definition unmodified could cause data to be truncated. Always review VARCHAR and VARBINARY columns to determine if you need to specify a length. This function emits a warning if the input file contains columns of these types:
WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80
3 - INFER_TABLE_DDL
Inspects a file in Parquet, ORC, or Avro format and returns a CREATE TABLE or CREATE EXTERNAL TABLE statement based on its contents. If you use a glob to specify more than one file, the function inspects only one.
The returned statement might be incomplete if the file contains ambiguous or unknown data types. It could also contain more columns or columns with longer names than what Vertica supports; this function does not enforce Vertica system limits. Always inspect the output and address any issues before using it to create a table.
This function supports partition columns, inferred from the input path, for the ORC and Parquet formats only. Because partitioning is done through the directory structure, there is insufficient information to infer the type of partition columns. This function shows these columns with a data type of UNKNOWN and emits a warning. Partition columns apply only to external tables.
The function handles most data types, including complex types. If an input type is not supported in Vertica, the function emits a warning.
For VARCHAR and VARBINARY columns, this function does not specify a length. The Vertica default length for these types is 80 bytes. If the data values are longer, using the returned table definition unmodified could cause data to be truncated. Always review VARCHAR and VARBINARY columns to determine if you need to specify a length. This function emits a warning if the input file contains columns of these types:
WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
INFER_TABLE_DDL( path USING PARAMETERS param=value[,...] )
Arguments
path
- Path to a file or glob. Any path that is valid for COPY and uses a file format supported by this function is valid. If a glob specifies more than one file, this function reads a single file.
Parameters
format
- Input format (string), one of 'Parquet', 'ORC', or 'Avro'. This parameter is required.
table_name
- The name of the table to create. This parameter is required.
Do not include a schema name as part of the table name; use the
table_schema
parameter. table_schema
- The schema in which to create the table. If omitted, the function does not include a schema in the output.
table_type
- The type of table to create, either 'native' (the default) or 'external'.
with_copy_statement
- For native tables, whether to include a COPY statement in addition to the CREATE TABLE statement. The default is false.
Privileges
Non-superuser: READ privileges on the USER-accessible storage location.
Examples
In the following example, the input path contains data for a table with two integer columns. The external table definition can be fully inferred, and you can use the returned SQL statement as-is. The function reads one file from the input path:
=> SELECT INFER_TABLE_DDL('/data/orders/*.orc'
USING PARAMETERS format = 'orc', table_name = 'orders', table_type = 'external');
INFER_TABLE_DDL
-----------------------------------------------------------------------------------
create external table "orders" (
"id" int,
"quantity" int
) as copy from '/data/orders/*.orc' orc;
(1 row)
To create a table in a schema, use the table_schema
parameter. Do not add it to the table name; the function treats it as a name with a period in it, not a schema.
The following example shows output with complex types. You can use the definition as-is or modify the VARCHAR sizes:
=> SELECT INFER_TABLE_DDL('/data/people/*.parquet'
USING PARAMETERS format = 'parquet', table_name = 'employees');
WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80
INFER_TABLE_DDL
-------------------------------------------------------------------------
create table "employees"(
"employeeID" int,
"personal" Row(
"name" varchar,
"address" Row(
"street" varchar,
"city" varchar,
"zipcode" int
),
"taxID" int
),
"department" varchar
);
(1 row)
In the following example, the input file contains a map in the "prods" column. You can read a map as an array of rows:
=> SELECT INFER_TABLE_DDL('/data/orders.parquet'
USING PARAMETERS format='parquet', table_name='orders');
WARNING 9311: This generated statement contains one or more varchar/varbinary columns which default to length 80
INFER_TABLE_DDL
------------------------------------------------------------------------
create table "orders"(
"orderkey" int,
"custkey" int,
"prods" Array[Row(
"key" varchar,
"value" numeric(12,2)
)],
"orderdate" date
);
(1 row)
The following example returns the definition of a native table and the COPY statement:
=> SELECT INFER_TABLE_DDL('/data/orders/*.orc'
USING PARAMETERS format = 'orc', table_name = 'orders',
table_type = 'native', with_copy_statement = true);
INFER_TABLE_DDL
--------------------------------------------------------------------------------------------------
create table "orders" (
"id" int,
"quantity" int
);
copy "orders" from '/data/orders/*.orc' orc;
In the following example, the data contains one materialized column and two partition columns. The date and region columns are in the data in addition to being partition columns, and so the table definition shows them twice. Partition columns are always of unknown type:
=> SELECT INFER_TABLE_DDL('/data/sales/*/*/*
USING PARAMETERS format = 'parquet', table_name = 'sales', table_type = 'external');
WARNING 9262: This generated statement is incomplete because of one or more unknown column types.
Fix these data types before creating the table
INFER_TABLE_DDL
------------------------------------------------------------------------
create external table "sales"(
"tx_id" int,
"date" UNKNOWN,
"region" UNKNOWN
) as copy from '/data/sales/*/*/*' parquet(hive_partition_cols='date,region');
(1 row)
4 - PURGE_TABLE
Note
This function was formerly named PURGE_TABLE_PROJECTIONS(). Vertica still supports the former function name.Permanently removes deleted data from physical storage so disk space can be reused. You can purge historical data up to and including the Ancient History Mark epoch.
Purges all projections of the specified table. You cannot use this function to purge temporary tables.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
PURGE_TABLE ( '[[database.]schema.]table' )
Parameters
[
database
.]
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table
- The table to purge.
Privileges
-
Table owner
-
USAGE privilege on schema
Caution
PURGE_TABLE could temporarily take up significant disk space while the data is being purged.Examples
The following example purges all projections for the store sales fact table located in the Vmart schema:
=> SELECT PURGE_TABLE('store.store_sales_fact');
See also
5 - REBALANCE_TABLE
Synchronously rebalances data in the specified table.
A rebalance operation performs the following tasks:
-
Distributes data based on:
-
User-defined fault groups, if specified
-
Large cluster automatic fault groups
-
-
Redistributes database projection data across all nodes.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
REBALANCE_TABLE('[[database.]schema.]table-name')
Parameters
schema
Database and schema. The default schema is
public
. If you specify a database, it must be the current database.table-name
- The table to rebalance.
Privileges
Superuser
When to rebalance
Rebalancing is useful or even necessary after you perform the following tasks:
-
Mark one or more nodes as ephemeral in preparation of removing them from the cluster.
-
Add one or more nodes to the cluster so that Vertica can populate the empty nodes with data.
-
Change the scaling factor of an elastic cluster, which determines the number of storage containers used to store a projection across the database.
-
Set the control node size or realign control nodes on a large cluster layout
-
Add nodes to or remove nodes from a fault group.
Tip
By default, before performing a rebalance, Vertica queries system tables to compute the size of all projections involved in the rebalance task. This query can add significant overhead to the rebalance operation. To disable this query, set projection configuration parameter RebalanceQueryStorageContainers to 0.Examples
The following command shows how to rebalance data on the specified table.
=> SELECT REBALANCE_TABLE('online_sales.online_sales_fact');
REBALANCE_TABLE
-------------------
REBALANCED
(1 row)