This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Complex types

Complex types such as structures (also known as rows), arrays, and maps are composed of primitive types and sometimes other complex types.

Complex types such as structures (also known as rows), arrays, and maps are composed of primitive types and sometimes other complex types. Complex types can be used in the following ways:

  • Arrays and rows (in any combination) can be used as column data types in both native and external tables.

  • Sets of primitive element types can be used as column data types in native and external tables.

  • Arrays and rows, but not combinations of them, can be created as literals, for example to use in query expressions.

The MAP type is a legacy type. To represent maps, use ARRAY[ROW].

If a flex table has a real column that uses a complex type, the values from that column are not included in the __raw__ column. For more information, see Loading Data into Flex Table Real Columns.

1 - ARRAY

Represents array data.

Represents array data. There are two types of arrays in Vertica:

  • Native array: a one-dimensional array of a primitive type. Native arrays are tracked in the TYPES system table and used in native tables.

  • Non-native array: all other supported arrays, including arrays that contain other arrays (multi-dimensional arrays) or structs (ROWs). Non-native arrays have some usage restrictions. Non-native arrays are tracked in the COMPLEX_TYPES system table.

Both types of arrays operate in the same way, but they have different OIDs.

Arrays can be bounded, meaning they specify a maximum element count, or unbounded. Unbounded arrays have a maximum binary size, which can be set explicitly or defaulted. See Limits on Element Count and Collection Size.

Selected parsers support using COPY to load arrays. See the documentation of individual parsers for more information.

Syntax

In column definitions:


ARRAY[data_type, max_elements] |
ARRAY[data_type](max_size) |
ARRAY[data_type]

In literals:

ARRAY[value[, ...] ]

Restrictions

  • Native arrays support only data of primitive types, for example, int, UUID, and so on.

  • Array dimensionality is enforced. A column cannot contain arrays of varying dimensions. For example, a column that contains a three-dimensional array can only contain other three-dimensional arrays; it cannot simultaneously include a one-dimensional array. However, the arrays in a column can vary in size, where one array can contain four elements while another contains ten.

  • Array bounds, if specified, are enforced for all operations that load or alter data. Unbounded arrays may have as many elements as will fit in the allotted binary size.

  • An array has a maximum binary size. If this size is not set when the array is defined, a default value is used.

  • Arrays do not support LONG types (like LONG VARBINARY or LONG VARCHAR) or user-defined types (like Geometry).

Syntax for column definition

Arrays used in column definitions can be either bounded or unbounded. Bounded arrays must specify a maximum number of elements. Unbounded arrays can specify a maximum binary size (in bytes) for the array, or the value of DefaultArrayBinarySize is used. You can specify a bound or a binary size but not both. For more information about these values, see Limits on Element Count and Collection Size.

Type Syntax Semantics
Bounded array

ARRAY[data_type, max_elements]

Example:

ARRAY[VARCHAR(50),100]

Can contain no more than max_elements elements. Attempting to add more is an error.

Has a binary size of the size of the data type multiplied by the maximum number of elements (possibly rounded up).

Unbounded array with maximum binary size

ARRAY[data_type](max_size)

Example:

ARRAY[VARCHAR(50)](32000)

Can contain as many elements as fit in max_size. Ignores the value of DefaultArrayBinarySize.
Unbounded array with default binary size

ARRAY[data_type]

Example:

ARRAY[VARCHAR(50)]

Can contain as many elements as fit in the default binary size.

Equivalent to:

ARRAY[data_type](DefaultArrayBinarySize)

The following example defines a table for customers using an unbounded array:

=> CREATE TABLE customers (id INT, name VARCHAR, email ARRAY[VARCHAR(50)]);

The following example uses a bounded array for customer email addresses and an unbounded array for order history:

=> CREATE TABLE customers (id INT, name VARCHAR, email ARRAY[VARCHAR(50),5], orders ARRAY[INT]);

The following example uses an array that has ROW elements:

=> CREATE TABLE orders(
  orderid INT,
  accountid INT,
  shipments ARRAY[
    ROW(
      shipid INT,
      address ROW(
        street VARCHAR,
        city VARCHAR,
        zip INT
        ),
      shipdate DATE
    )
  ]
 );

To declare a multi-dimensional array, use nesting. For example, ARRAY[ARRAY[int]] specifies a two-dimensional array.

Syntax for direct construction (literals)

Use the ARRAY keyword to construct an array value. The following example creates an array of integer values.

=> SELECT ARRAY[1,2,3];
 array
-------
 [1,2,3]
(1 row)

You can nest an array inside another array, as in the following example.

=> SELECT ARRAY[ARRAY[1],ARRAY[2]];
   array
-----------
 [[1],[2]]
(1 row)

If an array of arrays contains no null elements and no function calls, you can abbreviate the syntax:

=> SELECT ARRAY[[1,2],[3,4]];
     array
---------------
 [[1,2],[3,4]]
(1 row)

---not valid:
=> SELECT ARRAY[[1,2],null,[3,4]];

ERROR 4856:  Syntax error at or near "null" at character 20
LINE 1: SELECT ARRAY[[1,2],null,[3,4]];
                           ^

Array literals can contain elements of all scalar types, ROW, and ARRAY. ROW elements must all have the same set of fields:

=> SELECT ARRAY[ROW(1,2),ROW(1,3)];
               array
-----------------------------------
 [{"f0":1,"f1":2},{"f0":1,"f1":3}]
(1 row)

=> SELECT ARRAY[ROW(1,2),ROW(1,3,'abc')];
ERROR 3429:  For 'ARRAY', types ROW(int,int) and ROW(int,int,unknown) are inconsistent

Because the elements are known at the time you directly construct an array, these arrays are implicitly bounded.

You can use ARRAY literals in comparisons, as in the following example:

=> SELECT id.name, id.num, GPA FROM students
   WHERE major = ARRAY[ROW('Science','Physics')];
 name  | num | GPA
-------+-----+-----
 bob   | 121 | 3.3
 carol | 123 | 3.4
(2 rows)

Output format

Queries of array columns return JSON format, with the values shown in comma-separated lists in brackets. The following example shows a query that includes array columns.

=> SELECT cust_custkey,cust_custstaddress,cust_custcity,cust_custstate from cust;
cust_custkey |               cust_custstaddress                      |                cust_custcity                | cust_custstate
-------------+-------  ----------------------------------------------+---------------------------------------------+----------------
      342176 | ["668 SW New Lane","518 Main Ave","7040 Campfire Dr"] | ["Winchester","New Hyde Park","Massapequa"] | ["VA","NY","NY"]
      342799 | ["2400 Hearst Avenue","3 Cypress Street"]             | ["Berkeley","San Antonio"]                  | ["CA","TX"]
      342845 | ["336 Boylston Street","180 Clarkhill Rd"]            | ["Boston","Amherst"]                        | ["MA","MA"]
      342321 | ["95 Fawn Drive"]                                     | ["Allen Park"]                              | ["MI"]
      342989 | ["5 Thompson St"]                                     | ["Massillon"]                               | ["OH"]
(5 rows)

Note that JSON format escapes some characters that would not be escaped in native VARCHARs. For example, if you insert "c:\users\data" into an array, the JSON output for that value is "c:\\users\\data".

Element access

Arrays are 0-indexed. The first element's ordinal position is 0, second is 1, and so on.

You can access (dereference) elements from an array by index:

=> SELECT (ARRAY['a','b','c','d','e'])[1];
array
-------
b
(1 row)

To specify a range, use the format start:end. The end of the range is non-inclusive.

=> SELECT(ARRAY['a','b','c','d','e','f','g'])[1:4];
array
---------
["b","c","d"]
(1 row)

To dereference an element from a multi-dimensional array, put each index in brackets:

=> SELECT(ARRAY[ARRAY[1,2],ARRAY[3,4]])[0][0];
 array
-------
 1
(1 row)

Out-of-bound index references return NULL.

Limits on element count and collection size

When declaring a collection type for a table column, you can limit either the number of elements or the total binary size of the collection. During query processing, Vertica always reserves the maximum memory needed for the column, based on either the element count or the binary size. If this size is much larger than your data actually requires, setting one of these limits can improve query performance by reducing the amount of memory that must be reserved for the column.

You can change the bounds of a collection, including changing between bounded and unbounded collections, by casting. See Casting.

A bounded collection specifies a maximum element count. A value in a bounded collection column may contain fewer elements, but it may not contain more. Any attempt to insert more elements into a bounded collection than the declared maximum is an error. A bounded collection has a binary size that is the product of the data-type size and the maximum number of elements, possibly rounded up.

An unbounded collection specifies a binary size in bytes, explicitly or implicitly. It may contain as many elements as can fit in that binary size.

If a nested array specifies bounds for all dimensions, Vertica sets a single bound that is the product of the bounds. In the following example, the inner and outer arrays each have a bound of 10, but only a total element count of 100 is enforced.

ARRAY[ARRAY[INT,10],10]

If a nested array specifies a bound for only the outer collection, it is treated as the total bound. The previous example is equivalent to the following:

ARRAY[ARRAY[INT],100]

You must either specify bounds for all nested collections or specify a bound only for the outer one. For any other distribution of bounds, Vertica treats the collection as unbounded.

Instead of specifying a bound, you can specify a maximum binary size for an unbounded collection. The binary size acts as an absolute limit, regardless of how many elements the collection contains. Collections that do not specify a maximum binary size use the value of DefaultArrayBinarySize. This size is set at the time the collection is defined and is not affected by later changes to the value of DefaultArrayBinarySize.

You cannot set a maximum binary size for a bounded collection, only an unbounded one.

You can change the bounds or the binary size of an array column using ALTER TABLE as in the following example:

=> ALTER TABLE cust ALTER COLUMN orders SET DATA TYPE ARRAY[INTEGER](100);

If the change reduces the size of the collection and would result in data loss, the change fails.

Comparisons

All collections support equality (=), inequality (<>), and null-safe equality (<=>). 1D collections also support comparison operators (<, <=, >, >=) between collections of the same type (arrays or sets). Comparisons follow these rules:

  • A null collection is ordered last.

  • Non-null collections are compared element by element, using the ordering rules of the element's data type. The relative order of the first pair of non-equal elements determines the order of the two collections.

  • If all elements in both collections are equal up to the length of the shorter collection, the shorter collection is ordered before the longer one.

  • If all elements in both collections are equal and the collections are of equal length, the collections are equal.

Null-handling

Null semantics for collections are consistent with normal columns in most regards. See NULL sort order for more information on null-handling.

The null-safe equality operator (<=>) behaves differently from equality (=) when the collection is null rather than empty. Comparing a collection to NULL strictly returns null:

=> SELECT ARRAY[1,3] = NULL;
?column?
----------

(1 row)

=> SELECT ARRAY[1,3] <=> NULL;
 ?column?
----------
 f
(1 row)

In the following example, the grants column in the table is null for employee 99:

=> SELECT grants = NULL FROM employees WHERE id=99;
 ?column?
----------

(1 row)

=> SELECT grants <=> NULL FROM employees WHERE id=99;
 ?column?
----------
 t
(1 row)

Empty collections are not null and behave as expected:

=> SELECT ARRAY[]::ARRAY[INT] = ARRAY[]::ARRAY[INT];
 ?column?
----------
 t
(1 row)

Collections are compared element by element. If a comparison depends on a null element, the result is unknown (null), not false. For example, ARRAY[1,2,null]=ARRAY[1,2,null] and ARRAY[1,2,null]=ARRAY[1,2,3] both return null, but ARRAY[1,2,null]=ARRAY[1,4,null] returns false because the second elements do not match.

Casting

Casting an array casts each element of the array. You can therefore cast between data types following the same rules as for casts of scalar values.

You can cast both literal arrays and array columns explicitly:

=> SELECT ARRAY['1','2','3']::ARRAY[INT];
  array
---------
[1,2,3]
(1 row)

You can change the bound of an array or set by casting. When casting to a bounded native array, inputs that are too long are truncated. When casting to a non-native array (an array containing complex data types including other arrays), if the new bounds are too small for the data the cast fails:

=> SELECT ARRAY[1,2,3]::ARRAY[VARCHAR,2];
   array
-----------
 ["1","2"]
(1 row)

=> SELECT ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]::ARRAY[ARRAY[VARCHAR,2],2];
ERROR 9227:  Output array isn't big enough
DETAIL:  Type limit is 4 elements, but value has 6 elements

If you cast to a bounded multi-dimensional array, you must specify the bounds at all levels:

=> SELECT ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]::ARRAY[ARRAY[VARCHAR,5],10];
             array
-------------------------------
 [["1","2","3"],["4","5","6"]]
(1 row)

=> SELECT ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]::ARRAY[ARRAY[VARCHAR,2]];
WARNING 9753:  Collection type bound will not be used
DETAIL:  A bound was provided for an inner dimension, but not for an outer dimension
             array
-------------------------------
 [["1","2","3"],["4","5","6"]]
(1 row)

Assignment casts and implicit casts work the same way as for scalars:

=> CREATE TABLE transactions (tid INT, prod_ids ARRAY[VARCHAR,100], quantities ARRAY[INT,100]);
CREATE TABLE

=> INSERT INTO transactions VALUES (12345, ARRAY['p1265', 'p4515'], ARRAY[15,2]);
 OUTPUT
--------
      1
(1 row)

=> CREATE TABLE txreport (prod_ids ARRAY[VARCHAR(12),100], quants ARRAY[VARCHAR(32),100]);
CREATE TABLE

=> INSERT INTO txreport SELECT prod_ids, quantities FROM transactions;
 OUTPUT
--------
      1
(1 row)

=> SELECT * FROM txreport;
     prod_ids      |   quants
-------------------+------------
 ["p1265","p4515"] | ["15","2"]
(1 row)

You can perform explicit casts, but not implicit casts, between the ARRAY and SET types (native arrays only). If the collection is unbounded and the data type does not change, the binary size is preserved. For example, if you cast an ARRAY[INT] to a SET[INT], the set has the same binary size as the array.

If you cast from one element type to another, the resulting collection uses the default binary size. If this would cause the data not to fit, the cast fails.

You cannot cast from an array to an array with a different dimensionality, for example from a two-dimensional array to a one-dimensional array.

Functions and operators

See Collection functions for a comprehensive list of functions that can be used to manipulate arrays and sets.

Collections can be used in the following ways:

Collections cannot be used in the following ways:

  • As part of an IN or NOT IN expression.

  • As partition columns when creating tables.

  • With ANALYZE_STATISTICS or TopK projections.

  • Non-native arrays only: ORDER BY, PARTITION BY, DEFAULT, SET USING, or constraints.

2 - MAP

Represents map data in external tables in the Parquet, ORC, and Avro formats only.

Represents map data in external tables in the Parquet, ORC, and Avro formats only. A MAP must use only primitive types and may not contain other complex types. You can use the MAP type in a table definition to consume columns in the data, but you cannot query those columns.

A superior alternative to MAP is ARRAY[ROW]. An array of rows can use all supported complex types and can be queried. This is the representation that INFER_TABLE_DDL suggests. For Avro data, the ROW must have fields named key and value.

Within a single table you must define all map columns using the same approach, MAP or ARRAY[ROW].

Syntax

In column definitions:

MAP<key,value>

Map input format for column definition

In a column definition in an external table, a MAP consists of a key-value pair, specified as types. The table in the following example defines a map of product IDs to names.

=> CREATE EXTERNAL TABLE store (storeID INT, inventory MAP<INT,VARCHAR(100)>)
    AS COPY FROM '...' PARQUET;

3 - ROW

Represents structured data (structs).

Represents structured data (structs). A ROW can contain fields of any primitive or complex type supported by Vertica.

Syntax

In column definitions:

ROW([field] type[, ...])

If the field name is omitted, Vertica generates names starting with "f0".

In literals:

ROW(value [AS field] [, ...]) [AS name(field[, ...])]

Syntax for column definition

In a column definition, a ROW consists of one or more comma-separated pairs of field names and types. In the following example, the Parquet data file contains a struct for the address, which is read as a ROW in an external table:

=> CREATE EXTERNAL TABLE customers (name VARCHAR,
    address ROW(street VARCHAR, city VARCHAR, zipcode INT))
    AS COPY FROM '...' PARQUET;

ROWs can be nested; a field can have a type of ROW:

=> CREATE TABLE employees(
    employeeID INT,
    personal ROW(
      name VARCHAR,
      address ROW(street VARCHAR, city VARCHAR, zipcode INT),
      taxID INT),
    department VARCHAR);

ROWs can contain arrays:

=> CREATE TABLE customers(
  name VARCHAR,
  contact ROW(
    street VARCHAR,
    city VARCHAR,
    zipcode INT,
    email ARRAY[VARCHAR]
  ),
  accountid INT );

When loading data, the primitive types in the table definition must match those in the data. The ROW structure must also match; a ROW must contain all and only the fields in the struct in the data.

Restrictions on ROW columns

ROW columns have several restrictions:

  • Maximum nesting depth is 100.
  • Vertica tables support up to 9800 columns and fields. The ROW itself is not counted, only its fields.
  • ROW columns cannot use any constraints (such as NOT NULL) or defaults.
  • ROW fields cannot be auto_increment or setof.
  • ROW definition must include at least one field.
  • Row is a reserved keyword within a ROW definition, but is permitted as the name of a table or column.
  • Tables containing ROW columns cannot also contain IDENTITY, default, SET USING, or named sequence columns.

Syntax for direct construction (literals)

In a literal, such as a value in a comparison operation, a ROW consists of one or more values. If you omit field names in the ROW expression, Vertica generates them automatically. If you do not coerce types, Vertica infers the types from the data values.

=> SELECT ROW('Amy',2,false);
                row
--------------------------------------------
 {"f0":"Amy","f1":2,"f2":false}
(1 row)

You can use an AS clause to name the ROW and its fields:

=> SELECT ROW('Amy',2,false) AS student(name, id, current);
               student
--------------------------------------------
 {"name":"Amy","id":2,"current":false}
(1 row)

You can also name individual fields using AS. This query produces the same output as the previous one:

=> SELECT ROW('Amy' AS name, 2 AS id, false AS current) AS student;

You do not need to name all fields.

In an array of ROW elements, if you use AS to name fields and the names differ among the elements, Vertica uses the right-most names for all elements:

=> SELECT ARRAY[ROW('Amy' AS name, 2 AS id),ROW('Fred' AS first_name, 4 AS id)];
                           array
------------------------------------------------------------
 [{"first_name":"Amy","id":2},{"first_name":"Fred","id":4}]
(1 row)

You can coerce types explicitly:

=> SELECT ROW('Amy',2.5::int,false::varchar);
               row
------------------------------------------
 {"f0":"Amy","f1":3,"f2":"f"}
(1 row)

Escape single quotes in literal inputs using single quotes, as in the following example:

=> SELECT ROW('Howard''s house',2,false);
                      row
---------------------------------------------------
 {"f0":"Howard's house","f1":2,"f2":false}
(1 row)

You can use fields of all scalar types, ROW, and ARRAY, as in the following example:

=> SELECT id.name, major, GPA FROM students
   WHERE id = ROW('alice',119, ARRAY['alice@example.com','ap16@cs.example.edu']);
 name  |               major                | GPA
-------+------------------------------------+-----
 alice | [{"school":"Science","dept":"CS"}] | 3.8
(1 row)

Output format

ROW values are output in JSON format as in the following example.

=> CREATE EXTERNAL TABLE customers (name VARCHAR,
    address ROW(street VARCHAR, city VARCHAR, zipcode INT))
    AS COPY FROM '...' PARQUET;

=> SELECT address FROM customers WHERE address.city ='Pasadena';
                              address
--------------------------------------------------------------------
 {"street":"100 Main St Apt 4B","city":"Pasadena","zipcode":91001}
 {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":91001}
 {"street":"23 Fifth Ave Apt 8C","city":"Pasadena","zipcode":91001}
 {"street":"15 Raymond Dr","city":"Pasadena","zipcode":91003}
(4 rows)

The following table specifies the mappings from Vertica data types to JSON data types.

Vertica Type JSON Type
Integer Integer
Float Numeric
Numeric Numeric
Boolean Boolean
All others String

Comparisons

ROW supports equality (=), inequality (<>), and null-safe equality (<=>) between inputs that have the same set of fields. ROWs that contain only primitive types, including nested ROWs of primitive types, also support comparison operators (<, <=, >, >=).

Two ROWs are equal if and only if all fields are equal. Vertica compares fields in order until an inequality is found or all fields have been compared. The evaluation of the first non-equal field determines which ROW is greater:

=> SELECT ROW(1, 'joe') > ROW(2, 'bob');
?column?
----------
f
(1 row)

Comparisons between ROWs with different schemas fail:

=> SELECT ROW(1, 'joe') > ROW(2, 'bob', 123);
ERROR 5162:  Unequal number of entries in row expressions

If the result of a comparison depends on a null field, the result is null:

=>  select row(1, null, 3) = row(1, 2, 3);
 ?column?
----------

(1 row)

Null-handling

If a struct exists but a field value is null, Vertica assigns NULL as its value in the ROW. A struct where all fields are null is treated as a ROW with null fields. If the struct itself is null, Vertica reads the ROW as NULL.

Casting

Casting a ROW casts each field. You can therefore cast between data types following the same rules as for casts of scalar values.

The following example casts the contact ROW in the customers table, changing the zipcode field from INT to VARCHAR and adding a bound to the array:

=> SELECT contact::ROW(VARCHAR,VARCHAR,VARCHAR,ARRAY[VARCHAR,20]) FROM customers;
                                                         contact

--------------------------------------------------------------------------------
-----------------------------------------
 {"street":"911 San Marcos St","city":"Austin","zipcode":"73344","email":["missy@mit.edu","mcooper@cern.gov"]}
 {"street":"100 Main St Apt 4B","city":"Pasadena","zipcode":"91001","email":["shelly@meemaw.name","cooper@caltech.edu"]}
 {"street":"100 Main St Apt 4A","city":"Pasadena","zipcode":"91001","email":["hofstadter@caltech.edu"]}
 {"street":"23 Fifth Ave Apt 8C","city":"Pasadena","zipcode":"91001","email":[]}
 {"street":null,"city":"Pasadena","zipcode":"91001","email":["raj@available.com"]}

(6 rows)

You can specify new field names to change them in the output:

=> SELECT contact::ROW(str VARCHAR, city VARCHAR, zip VARCHAR, email ARRAY[VARCHAR,
20]) FROM customers;
                                                     contact

--------------------------------------------------------------------------------
----------------------------------
 {"str":"911 San Marcos St","city":"Austin","zip":"73344","email":["missy@mit.edu","mcooper@cern.gov"]}
 {"str":"100 Main St Apt 4B","city":"Pasadena","zip":"91001","email":["shelly@meemaw.name","cooper@caltech.edu"]}
 {"str":"100 Main St Apt 4A","city":"Pasadena","zip":"91001","email":["hofstadter@caltech.edu"]}
 {"str":"23 Fifth Ave Apt 8C","city":"Pasadena","zip":"91001","email":[]}
 {"str":null,"city":"Pasadena","zip":"91001","email":["raj@available.com"]}
(6 rows)

Supported operators and predicates

ROW values may be used in queries in the following ways:

  • INNER and OUTER JOIN

  • Comparisons, IN, BETWEEN (non-nullable filters only)

  • IS NULL, IS NOT NULL

  • CASE

  • GROUP BY, ORDER BY

  • SELECT DISTINCT

  • Arguments to user-defined scalar, transform, and analytic functions

The following operators and predicates are not supported for ROW values:

  • Math operators

  • Type coercion of whole rows (coercion of field values is supported)

  • BITWISE, LIKE

  • MLA (ROLLUP, CUBE, GROUPING SETS)

  • Aggregate functions including MAX, MIN, and SUM

  • Set operators including UNION, UNION ALL, MINUS, and INTERSECT

COUNT is not supported for ROWs returned from user-defined scalar functions, but is supported for ROW columns and literals.

In comparison operations (including implicit comparisons like ORDER BY), a ROW literal is treated as the sequence of its field values. For example, the following two statements are equivalent:

GROUP BY ROW(zipcode, city)
GROUP BY zipcode, city

Using rows in views and subqueries

You can use ROW columns to construct views and in subqueries. Consider employee and customer tables with the following definitions:

=> CREATE EXTERNAL TABLE customers(name VARCHAR,
        address ROW(street VARCHAR, city VARCHAR, zipcode INT), accountID INT)
    AS COPY FROM '...' PARQUET;

=> CREATE EXTERNAL TABLE employees(employeeID INT,
        personal ROW(name VARCHAR,
        address ROW(street VARCHAR, city VARCHAR, zipcode INT),
        taxID INT), department VARCHAR)
    AS COPY FROM '...' PARQUET;

The following example creates a view and queries it.

=> CREATE VIEW neighbors (num_neighbors, area(city, zipcode))
AS SELECT count(*), ROW(address.city, address.zipcode)
FROM customers GROUP BY address.city, address.zipcode;
CREATE VIEW

=> SELECT employees.personal.name, neighbors.area FROM neighbors, employees
WHERE employees.personal.address.zipcode=neighbors.area.zipcode AND neighbors.nu
m_neighbors > 1;
        name        |                area
--------------------+-------------------------------------
 Sheldon Cooper     | {"city":"Pasadena","zipcode":91001}
 Leonard Hofstadter | {"city":"Pasadena","zipcode":91001}
(2 rows)

4 - SET

Represents a collection of unordered, unique elements.

Represents a collection of unordered, unique elements. Sets may contain only primitive types. In sets, unlike in arrays, element position is not meaningful.

Sets do not support LONG types (like LONG VARBINARY or LONG VARCHAR) or user-defined types (like Geometry).

If you populate a set from an array, Vertica sorts the values and removes duplicate elements. If you do not care about element position and plan to run queries that check for the presence of specific elements (find, contains), using a set could improve query performance.

Sets can be bounded, meaning they specify a maximum element count, or unbounded. Unbounded sets have a maximum binary size, which can be set explicitly or defaulted. See Limits on Element Count and Collection Size.

Syntax

In column definitions:

SET[data_type, max_elements] |
SET[data_type](max_size) |
SET[data_type]

In literals:

SET[value[, ...] ]

Restrictions

  • Sets support only data of primitive (scalar) types.

  • Bounds, if specified, are enforced for all operations that load or alter data. Unbounded sets may have as many elements as will fit in the allotted binary size.

  • A set has a maximum binary size. If this size is not set when the set is defined, a default value is used.

Syntax for column definition

Sets used in column definitions can be either bounded or unbounded. Bounded sets must specify a maximum number of elements. Unbounded sets can specify a maximum binary size for the set, or the value of DefaultArrayBinarySize is used. You can specify a bound or a binary size but not both. For more information about these values, see Limits on Element Count and Collection Size.

Type Syntax Semantics
Bounded set

SET[data_type, max_elements]

Example:

SET[VARCHAR(50),100]

Can contain no more than max_elements elements. Attempting to add more is an error.

Has a binary size of the size of the data type multiplied by the maximum number of elements (possibly rounded up).

Unbounded set with maximum size

SET[data_type](max_size)

Example:

SET[VARCHAR(50)](32000)

Can contain as many elements as fit in max_size. Ignores the value of DefaultArrayBinarySize.
Unbounded set

SET[data_type]

Example:

SET[VARCHAR(50)]

Can contain as many elements as fit in the default binary size.

Equivalent to:

SET[data_type](DefaultArrayBinarySize)

The following example defines a table with an unbounded set colum.

=> CREATE TABLE users
(
user_id INTEGER,
display_name VARCHAR,
email_addrs SET[VARCHAR]
);

When you load array data into a column defined as a set, the array data is automatically converted to a set.

Syntax for direct construction (literals)

Use the SET keyword to construct a set value. Literal set values are contained in brackets. For example, to create a set of INT, you would do the following:

=> SELECT SET[1,2,3];
  set
-------
 [1,2,3]
(1 row)

You can explicitly convert an array to a set by casting, as in the following example:

=> SELECT ARRAY[1, 5, 2, 6, 3, 0, 6, 4]::SET[INT];
     set
-----------------
[0,1,2,3,4,5,6]
(1 row)

Notice that duplicate elements have been removed and the elements have been sorted.

Because the elements are known at the time you directly construct a set, these sets are implicitly bounded.

Output format

Sets are shown in a JSON-like format, with comma-separated elements contained in brackets (like arrays). In the following example, the email_addrs column is a set.

=> SELECT custkey,email_addrs FROM customers LIMIT 4;
 custkey |                           email_addrs
---------+------------------------------------------------------------------------
 342176  | ["joe.smith@example.com"]
 342799  | ["bob@example,com","robert.jones@example.com"]
 342845  | ["br92@cs.example.edu"]
 342321  | ["789123@example-isp.com","sjohnson@eng.example.com","sara@johnson.example.name"]

Limits on element count and collection size

When declaring a collection type for a table column, you can limit either the number of elements or the total binary size of the collection. During query processing, Vertica always reserves the maximum memory needed for the column, based on either the element count or the binary size. If this size is much larger than your data actually requires, setting one of these limits can improve query performance by reducing the amount of memory that must be reserved for the column.

You can change the bounds of a collection, including changing between bounded and unbounded collections, by casting. See Casting.

A bounded collection specifies a maximum element count. A value in a bounded collection column may contain fewer elements, but it may not contain more. Any attempt to insert more elements into a bounded collection than the declared maximum is an error. A bounded collection has a binary size that is the product of the data-type size and the maximum number of elements, possibly rounded up.

An unbounded collection specifies a binary size in bytes, explicitly or implicitly. It may contain as many elements as can fit in that binary size.

Instead of specifying a bound, you can specify a maximum binary size for an unbounded collection. The binary size acts as an absolute limit, regardless of how many elements the collection contains. Collections that do not specify a maximum binary size use the value of DefaultArrayBinarySize. This size is set at the time the collection is defined and is not affected by later changes to the value of DefaultArrayBinarySize.

You cannot set a maximum binary size for a bounded collection, only an unbounded one.

Comparisons

All collections support equality (=), inequality (<>), and null-safe equality (<=>). 1D collections also support comparison operators (<, <=, >, >=) between collections of the same type (arrays or sets). Comparisons follow these rules:

  • A null collection is ordered last.

  • Non-null collections are compared element by element, using the ordering rules of the element's data type. The relative order of the first pair of non-equal elements determines the order of the two collections.

  • If all elements in both collections are equal up to the length of the shorter collection, the shorter collection is ordered before the longer one.

  • If all elements in both collections are equal and the collections are of equal length, the collections are equal.

Null handling

Null semantics for collections are consistent with normal columns in most regards. See NULL sort order for more information on null-handling.

The null-safe equality operator (<=>) behaves differently from equality (=) when the collection is null rather than empty. Comparing a collection to NULL strictly returns null:

=> SELECT ARRAY[1,3] = NULL;
?column?
----------

(1 row)

=> SELECT ARRAY[1,3] <=> NULL;
 ?column?
----------
 f
(1 row)

In the following example, the grants column in the table is null for employee 99:

=> SELECT grants = NULL FROM employees WHERE id=99;
 ?column?
----------

(1 row)

=> SELECT grants <=> NULL FROM employees WHERE id=99;
 ?column?
----------
 t
(1 row)

Empty collections are not null and behave as expected:

=> SELECT ARRAY[]::ARRAY[INT] = ARRAY[]::ARRAY[INT];
 ?column?
----------
 t
(1 row)

Collections are compared element by element. If a comparison depends on a null element, the result is unknown (null), not false. For example, ARRAY[1,2,null]=ARRAY[1,2,null] and ARRAY[1,2,null]=ARRAY[1,2,3] both return null, but ARRAY[1,2,null]=ARRAY[1,4,null] returns false because the second elements do not match.

Casting

Casting a set casts each element of the set. You can therefore cast between data types following the same rules as for casts of scalar values.

You can cast both literal sets and set columns explicitly:

=> SELECT SET['1','2','3']::SET[INT];
   set
---------
[1,2,3]
(1 row)

=> CREATE TABLE transactions (tid INT, prod_ids SET[VARCHAR], quantities SET[VARCHAR(32)]);

=> INSERT INTO transactions VALUES (12345, SET['p1265', 'p4515'], SET['15','2']);

=> SELECT quantities :: SET[INT] FROM transactions;
 quantities
------------
   [15,2]
(1 row)

Assignment casts and implicit casts work the same way as for scalars.

You can perform explicit casts, but not implicit casts, between ARRAY and SET types. If the collection is unbounded and the data type does not change, the binary size is preserved. For example, if you cast an ARRAY[INT] to a SET[INT], the set has the same binary size as the array.

When casting an array to a set, Vertica first casts each element and then sorts the set and removes duplicates. If two source values are cast to the same target value, one of them will be removed. For example, if you cast an array of FLOAT to a set of INT, two values in the array might be rounded to the same integer and then be treated as duplicates. This also happens if the array contains more than one value that is cast to NULL.

If you cast from one element type to another, the resulting collection uses the default binary size. If this would cause the data not to fit, the cast fails.

Functions and operators

See Collection functions for a comprehensive list of functions that can be used to manipulate arrays and sets.

Collections can be used in the following ways:

Collections cannot be used in the following ways:

  • As part of an IN or NOT IN expression.

  • As partition columns when creating tables.

  • With ANALYZE_STATISTICS or TopK projections.

  • Non-native arrays only: ORDER BY, PARTITION BY, DEFAULT, SET USING, or constraints.