COLLATION
Applies a collation to two or more strings. Use COLLATION
with ORDER BY
, GROUP BY
, and equality clauses.
Syntax
COLLATION ( 'expression' [ , 'locale_or_collation_name' ] )
Arguments
'expression'
- Any expression that evaluates to a column name or to two or more values of type
CHAR
orVARCHAR
. 'locale_or_collation_name'
- The ICU (International Components for Unicode) locale or collation name to use when collating the string. If you omit this parameter,
COLLATION
uses the collation associated with the session locale.To determine the current session locale, enter the vsql meta-command
\locale
:=> \locale en_US@collation=binary
To set the locale and collation, use
\locale
as follows:=> \locale en_US@collation=binary INFO 2567: Canonical locale: 'en_US' Standard collation: 'LEN_KBINARY' English (United States)
Locales
The locale used for COLLATION
can be one of the following:
-
The default locale
-
A session locale
-
A locale that you specify when you call
COLLATION
. If you specify the locale, Vertica applies the collation associated with that locale to the data.COLLATION
does not modify the collation for any other columns in the table.
For a list of valid ICU locales, go to Locale Explorer (ICU).
Binary and non-binary collations
The Vertica default locale is en_US@collation=binary
, which uses binary collation
. Binary collation compares binary representations of strings. Binary collation is fast, but it can result in a sort order where K
precedes c
because the binary representation of K
is lower than c
.
For non-binary collation, Vertica transforms the data according to the rules of the locale or the specified collation, and then applies the sorting rules. Suppose the locale collation is non-binary and you request a GROUP BY on string data. In this case,Vertica calls COLLATION
, whether or not you specify the function in your query.
For information about collation naming, see Collator Naming Scheme.
Examples
Collating GROUP BY results
The following examples are based on a Premium_Customer
table that contains the following data:
=> SELECT * FROM Premium_Customer;
ID | LName | FName
----+--------+---------
1 | Mc Coy | Bob
2 | Mc Coy | Janice
3 | McCoy | Jody
4 | McCoy | Peter
5 | McCoy | Brendon
6 | Mccoy | Cameron
7 | Mccoy | Lisa
The first statement shows how COLLATION
applies the collation for the EN_US
locale to the LName
column for the locale EN_US
. Vertica sorts the GROUP BY
output as follows:
-
Last names with spaces
-
Last names where "coy" starts with a lowercase letter
-
Last names where "Coy" starts with an uppercase letter
=> SELECT * FROM Premium_Customer ORDER BY COLLATION(LName, 'EN_US'), FName;
ID | LName | FName
----+--------+---------
1 | Mc Coy | Bob
2 | Mc Coy | Janice
6 | Mccoy | Cameron
7 | Mccoy | Lisa
5 | McCoy | Brendon
3 | McCoy | Jody
4 | McCoy | Peter
The next statement shows how COLLATION
collates the LName
column for the locale LEN_AS
:
-
LEN
indicates the language (L) is English (EN
). -
AS
(Alternate Shifted) instructsCOLLATION
that lowercase letters come before uppercase (shifted) letters.
In the results, the last names in which "coy" starts with a lowercase letter precede the last names where "Coy" starts with an uppercase letter.
=> SELECT * FROM Premium_Customer ORDER BY COLLATION(LName, 'LEN_AS'), FName;
ID | LName | FName
----+--------+---------
6 | Mccoy | Cameron
7 | Mccoy | Lisa
1 | Mc Coy | Bob
5 | McCoy | Brendon
2 | Mc Coy | Janice
3 | McCoy | Jody
4 | McCoy | Peter
Comparing strings with an equality clause
In the following query, COLLATION
removes spaces and punctuation when comparing two strings in English. It then determines whether the two strings still have the same value after the punctuation has been removed:
=> SELECT COLLATION ('U.S.A', 'LEN_AS') = COLLATION('USA', 'LEN_AS');
?column?
----------
t
Sorting strings in non-english languages
The following table contains data that uses the German character eszett, ß:
=> SELECT * FROM t1;
a | b | c
------------+---+----
ßstringß | 1 | 10
SSstringSS | 2 | 20
random1 | 3 | 30
random1 | 4 | 40
random2 | 5 | 50
When you specify the collation LDE_S1
:
-
LDE
indicates the language (L
) is German (DE
). -
S1
indicates the strength (S
) of 1 (primary). This value indicates that the collation does not need to consider accents and case.
The query returns the data in the following order:
=> SELECT a FROM t1 ORDER BY COLLATION(a, 'LDE_S1'));
a
------------
random1
random1
random2
SSstringSS
ßstringß