ARRAY
Represents array data. There are two types of arrays in Vertica:
-
Native array: a one-dimensional array of a primitive type. Native arrays are tracked in the TYPES system table and used in native tables.
-
Non-native array: all other supported arrays, including arrays that contain other arrays (multi-dimensional arrays) or structs (ROWs). Non-native arrays have some usage restrictions. Non-native arrays are tracked in the COMPLEX_TYPES system table.
Both types of arrays operate in the same way, but they have different OIDs.
Arrays can be bounded, meaning they specify a maximum element count, or unbounded. Unbounded arrays have a maximum binary size, which can be set explicitly or defaulted. See Limits on Element Count and Collection Size.
Selected parsers support using COPY to load arrays. See the documentation of individual parsers for more information.
Syntax
ARRAY[data_type, max_elements] |
ARRAY[data_type](max_size) |
ARRAY[data_type]
In literals:
ARRAY[value[, ...] ]
Restrictions
-
Native arrays support only data of primitive types, for example, int, UUID, and so on.
-
Array dimensionality is enforced. A column cannot contain arrays of varying dimensions. For example, a column that contains a three-dimensional array can only contain other three-dimensional arrays; it cannot simultaneously include a one-dimensional array. However, the arrays in a column can vary in size, where one array can contain four elements while another contains ten.
-
Array bounds, if specified, are enforced for all operations that load or alter data. Unbounded arrays may have as many elements as will fit in the allotted binary size.
-
An array has a maximum binary size. If this size is not set when the array is defined, a default value is used.
-
Arrays do not support LONG types (like LONG VARBINARY or LONG VARCHAR) or user-defined types (like Geometry).
Syntax for column definition
Arrays used in column definitions can be either bounded or unbounded. Bounded arrays must specify a maximum number of elements. Unbounded arrays can specify a maximum binary size (in bytes) for the array, or the value of DefaultArrayBinarySize is used. You can specify a bound or a binary size but not both. For more information about these values, see Limits on Element Count and Collection Size.
Type | Syntax | Semantics |
---|---|---|
Bounded array |
Example:
|
Can contain no more than Has a binary size of the size of the data type multiplied by the maximum number of elements (possibly rounded up). |
Unbounded array with maximum binary size |
Example:
|
Can contain as many elements as fit in max_size . Ignores the value of DefaultArrayBinarySize. |
Unbounded array with default binary size |
Example:
|
Can contain as many elements as fit in the default binary size. Equivalent to:
|
The following example defines a table for customers using an unbounded array:
=> CREATE TABLE customers (id INT, name VARCHAR, email ARRAY[VARCHAR(50)]);
The following example uses a bounded array for customer email addresses and an unbounded array for order history:
=> CREATE TABLE customers (id INT, name VARCHAR, email ARRAY[VARCHAR(50),5], orders ARRAY[INT]);
The following example uses an array that has ROW elements:
=> CREATE TABLE orders(
orderid INT,
accountid INT,
shipments ARRAY[
ROW(
shipid INT,
address ROW(
street VARCHAR,
city VARCHAR,
zip INT
),
shipdate DATE
)
]
);
To declare a multi-dimensional array, use nesting. For example, ARRAY[ARRAY[int]] specifies a two-dimensional array.
Syntax for direct construction (literals)
Use the ARRAY keyword to construct an array value. The following example creates an array of integer values.
=> SELECT ARRAY[1,2,3];
array
-------
[1,2,3]
(1 row)
You can nest an array inside another array, as in the following example.
=> SELECT ARRAY[ARRAY[1],ARRAY[2]];
array
-----------
[[1],[2]]
(1 row)
If an array of arrays contains no null elements and no function calls, you can abbreviate the syntax:
=> SELECT ARRAY[[1,2],[3,4]];
array
---------------
[[1,2],[3,4]]
(1 row)
---not valid:
=> SELECT ARRAY[[1,2],null,[3,4]];
ERROR 4856: Syntax error at or near "null" at character 20
LINE 1: SELECT ARRAY[[1,2],null,[3,4]];
^
Array literals can contain elements of all scalar types, ROW, and ARRAY. ROW elements must all have the same set of fields:
=> SELECT ARRAY[ROW(1,2),ROW(1,3)];
array
-----------------------------------
[{"f0":1,"f1":2},{"f0":1,"f1":3}]
(1 row)
=> SELECT ARRAY[ROW(1,2),ROW(1,3,'abc')];
ERROR 3429: For 'ARRAY', types ROW(int,int) and ROW(int,int,unknown) are inconsistent
Because the elements are known at the time you directly construct an array, these arrays are implicitly bounded.
You can use ARRAY literals in comparisons, as in the following example:
=> SELECT id.name, id.num, GPA FROM students
WHERE major = ARRAY[ROW('Science','Physics')];
name | num | GPA
-------+-----+-----
bob | 121 | 3.3
carol | 123 | 3.4
(2 rows)
Output format
Queries of array columns return JSON format, with the values shown in comma-separated lists in brackets. The following example shows a query that includes array columns.
=> SELECT cust_custkey,cust_custstaddress,cust_custcity,cust_custstate from cust;
cust_custkey | cust_custstaddress | cust_custcity | cust_custstate
-------------+------- ----------------------------------------------+---------------------------------------------+----------------
342176 | ["668 SW New Lane","518 Main Ave","7040 Campfire Dr"] | ["Winchester","New Hyde Park","Massapequa"] | ["VA","NY","NY"]
342799 | ["2400 Hearst Avenue","3 Cypress Street"] | ["Berkeley","San Antonio"] | ["CA","TX"]
342845 | ["336 Boylston Street","180 Clarkhill Rd"] | ["Boston","Amherst"] | ["MA","MA"]
342321 | ["95 Fawn Drive"] | ["Allen Park"] | ["MI"]
342989 | ["5 Thompson St"] | ["Massillon"] | ["OH"]
(5 rows)
Note that JSON format escapes some characters that would not be escaped in native VARCHARs. For example, if you insert "c:\users\data"
into an array, the JSON output for that value is "c:\\users\\data"
.
Element access
Arrays are 0-indexed. The first element's ordinal position is 0, second is 1, and so on.
You can access (dereference) elements from an array by index:
=> SELECT (ARRAY['a','b','c','d','e'])[1];
array
-------
b
(1 row)
To specify a range, use the format start
:end
. The end of the range is non-inclusive.
=> SELECT(ARRAY['a','b','c','d','e','f','g'])[1:4];
array
---------
["b","c","d"]
(1 row)
To dereference an element from a multi-dimensional array, put each index in brackets:
=> SELECT(ARRAY[ARRAY[1,2],ARRAY[3,4]])[0][0];
array
-------
1
(1 row)
Out-of-bound index references return NULL.
Limits on element count and collection size
When declaring a collection type for a table column, you can limit either the number of elements or the total binary size of the collection. During query processing, Vertica always reserves the maximum memory needed for the column, based on either the element count or the binary size. If this size is much larger than your data actually requires, setting one of these limits can improve query performance by reducing the amount of memory that must be reserved for the column.
You can change the bounds of a collection, including changing between bounded and unbounded collections, by casting. See Casting.
A bounded collection specifies a maximum element count. A value in a bounded collection column may contain fewer elements, but it may not contain more. Any attempt to insert more elements into a bounded collection than the declared maximum is an error. A bounded collection has a binary size that is the product of the data-type size and the maximum number of elements, possibly rounded up.
An unbounded collection specifies a binary size in bytes, explicitly or implicitly. It may contain as many elements as can fit in that binary size.
If a nested array specifies bounds for all dimensions, Vertica sets a single bound that is the product of the bounds. In the following example, the inner and outer arrays each have a bound of 10, but only a total element count of 100 is enforced.
ARRAY[ARRAY[INT,10],10]
If a nested array specifies a bound for only the outer collection, it is treated as the total bound. The previous example is equivalent to the following:
ARRAY[ARRAY[INT],100]
You must either specify bounds for all nested collections or specify a bound only for the outer one. For any other distribution of bounds, Vertica treats the collection as unbounded.
Instead of specifying a bound, you can specify a maximum binary size for an unbounded collection. The binary size acts as an absolute limit, regardless of how many elements the collection contains. Collections that do not specify a maximum binary size use the value of DefaultArrayBinarySize. This size is set at the time the collection is defined and is not affected by later changes to the value of DefaultArrayBinarySize.
You cannot set a maximum binary size for a bounded collection, only an unbounded one.
You can change the bounds or the binary size of an array column using ALTER TABLE as in the following example:
=> ALTER TABLE cust ALTER COLUMN orders SET DATA TYPE ARRAY[INTEGER](100);
If the change reduces the size of the collection and would result in data loss, the change fails.
Comparisons
All collections support equality (=
), inequality (<>
), and null-safe equality (<=>
). 1D collections also support comparison operators (<
, <=
, >
, >=
) between collections of the same type (arrays or sets). Comparisons follow these rules:
-
A null collection is ordered last.
-
Non-null collections are compared element by element, using the ordering rules of the element's data type. The relative order of the first pair of non-equal elements determines the order of the two collections.
-
If all elements in both collections are equal up to the length of the shorter collection, the shorter collection is ordered before the longer one.
-
If all elements in both collections are equal and the collections are of equal length, the collections are equal.
Null-handling
Null semantics for collections are consistent with normal columns in most regards. See NULL sort order for more information on null-handling.
The null-safe equality operator (<=>) behaves differently from equality (=) when the collection is null rather than empty. Comparing a collection to NULL strictly returns null:
=> SELECT ARRAY[1,3] = NULL;
?column?
----------
(1 row)
=> SELECT ARRAY[1,3] <=> NULL;
?column?
----------
f
(1 row)
In the following example, the grants column in the table is null for employee 99:
=> SELECT grants = NULL FROM employees WHERE id=99;
?column?
----------
(1 row)
=> SELECT grants <=> NULL FROM employees WHERE id=99;
?column?
----------
t
(1 row)
Empty collections are not null and behave as expected:
=> SELECT ARRAY[]::ARRAY[INT] = ARRAY[]::ARRAY[INT];
?column?
----------
t
(1 row)
Collections are compared element by element. If a comparison depends on a null element, the result is unknown (null), not false. For example, ARRAY[1,2,null]=ARRAY[1,2,null]
and ARRAY[1,2,null]=ARRAY[1,2,3]
both return null, but ARRAY[1,2,null]=ARRAY[1,4,null]
returns false because the second elements do not match.
Casting
Casting an array casts each element of the array. You can therefore cast between data types following the same rules as for casts of scalar values.
You can cast both literal arrays and array columns explicitly:
=> SELECT ARRAY['1','2','3']::ARRAY[INT];
array
---------
[1,2,3]
(1 row)
You can change the bound of an array or set by casting. When casting to a bounded native array, inputs that are too long are truncated. When casting to a non-native array (an array containing complex data types including other arrays), if the new bounds are too small for the data the cast fails:
=> SELECT ARRAY[1,2,3]::ARRAY[VARCHAR,2];
array
-----------
["1","2"]
(1 row)
=> SELECT ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]::ARRAY[ARRAY[VARCHAR,2],2];
ERROR 9227: Output array isn't big enough
DETAIL: Type limit is 4 elements, but value has 6 elements
If you cast to a bounded multi-dimensional array, you must specify the bounds at all levels:
=> SELECT ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]::ARRAY[ARRAY[VARCHAR,5],10];
array
-------------------------------
[["1","2","3"],["4","5","6"]]
(1 row)
=> SELECT ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]::ARRAY[ARRAY[VARCHAR,2]];
WARNING 9753: Collection type bound will not be used
DETAIL: A bound was provided for an inner dimension, but not for an outer dimension
array
-------------------------------
[["1","2","3"],["4","5","6"]]
(1 row)
Assignment casts and implicit casts work the same way as for scalars:
=> CREATE TABLE transactions (tid INT, prod_ids ARRAY[VARCHAR,100], quantities ARRAY[INT,100]);
CREATE TABLE
=> INSERT INTO transactions VALUES (12345, ARRAY['p1265', 'p4515'], ARRAY[15,2]);
OUTPUT
--------
1
(1 row)
=> CREATE TABLE txreport (prod_ids ARRAY[VARCHAR(12),100], quants ARRAY[VARCHAR(32),100]);
CREATE TABLE
=> INSERT INTO txreport SELECT prod_ids, quantities FROM transactions;
OUTPUT
--------
1
(1 row)
=> SELECT * FROM txreport;
prod_ids | quants
-------------------+------------
["p1265","p4515"] | ["15","2"]
(1 row)
You can perform explicit casts, but not implicit casts, between the ARRAY and SET types (native arrays only). If the collection is unbounded and the data type does not change, the binary size is preserved. For example, if you cast an ARRAY[INT] to a SET[INT], the set has the same binary size as the array.
If you cast from one element type to another, the resulting collection uses the default binary size. If this would cause the data not to fit, the cast fails.
You cannot cast from an array to an array with a different dimensionality, for example from a two-dimensional array to a one-dimensional array.
Functions and operators
See Collection functions for a comprehensive list of functions that can be used to manipulate arrays and sets.
Collections can be used in the following ways:
-
As the grouping column in a GROUP BY clause.
-
For native arrays only, as the sort key in an ORDER BY clause in a query, in an OVER clause (see Window partitioning), or in a CREATE PROJECTION statement.
-
As the sort key in the PARTITION BY part of an OVER clause.
-
As a JOIN key (see Joined-table).
-
In CASE expressions.
Collections cannot be used in the following ways:
-
As part of an IN or NOT IN expression.
-
As partition columns when creating tables.
-
With ANALYZE_STATISTICS or TopK projections.
-
Non-native arrays only: ORDER BY, PARTITION BY, DEFAULT, SET USING, or constraints.