COLLATION
Applies a collation to two or more strings. Use COLLATION with ORDER BY, GROUP BY, and equality clauses.
Syntax
COLLATION ( 'expression' [ , 'locale_or_collation_name' ] )
Arguments
'expression'- Any expression that evaluates to a column name or to two or more values of type
CHARorVARCHAR. 'locale_or_collation_name'- The ICU (International Components for Unicode) locale or collation name to use when collating the string. If you omit this parameter,
COLLATIONuses the collation associated with the session locale.To determine the current session locale, enter the vsql meta-command
\locale:=> \locale en_US@collation=binaryTo set the locale and collation, use
\localeas follows:=> \locale en_US@collation=binary INFO 2567: Canonical locale: 'en_US' Standard collation: 'LEN_KBINARY' English (United States)
Locales
The locale used for COLLATION can be one of the following:
-
The default locale
-
A session locale
-
A locale that you specify when you call
COLLATION. If you specify the locale, OpenText™ Analytics Database applies the collation associated with that locale to the data.COLLATIONdoes not modify the collation for any other columns in the table.
For a list of valid ICU locales, go to Locale Explorer (ICU).
Binary and non-binary collations
The database default locale is en_US@collation=binary, which uses binary collation. Binary collation compares binary representations of strings. Binary collation is fast, but it can result in a sort order where K precedes c because the binary representation of K is lower than c.
For non-binary collation, the database transforms the data according to the rules of the locale or the specified collation, and then applies the sorting rules. Suppose the locale collation is non-binary and you request a GROUP BY on string data. In this case, the database calls COLLATION regardless of whether you specify the function in your query.
For information about collation naming, see Collator Naming Scheme.
Examples
Collating GROUP BY results
The following examples are based on a Premium_Customer table that contains the following data:
=> SELECT * FROM Premium_Customer;
ID | LName | FName
----+--------+---------
1 | Mc Coy | Bob
2 | Mc Coy | Janice
3 | McCoy | Jody
4 | McCoy | Peter
5 | McCoy | Brendon
6 | Mccoy | Cameron
7 | Mccoy | Lisa
The first statement shows how COLLATION applies the collation for the EN_US locale to the LName column for the locale EN_US. Vertica sorts the GROUP BY output as follows:
-
Last names with spaces
-
Last names where "coy" starts with a lowercase letter
-
Last names where "Coy" starts with an uppercase letter
=> SELECT * FROM Premium_Customer ORDER BY COLLATION(LName, 'EN_US'), FName;
ID | LName | FName
----+--------+---------
1 | Mc Coy | Bob
2 | Mc Coy | Janice
6 | Mccoy | Cameron
7 | Mccoy | Lisa
5 | McCoy | Brendon
3 | McCoy | Jody
4 | McCoy | Peter
The next statement shows how COLLATION collates the LName column for the locale LEN_AS:
-
LENindicates the language (L) is English (EN). -
AS(Alternate Shifted) instructsCOLLATIONthat lowercase letters come before uppercase (shifted) letters.
In the results, the last names in which "coy" starts with a lowercase letter precede the last names where "Coy" starts with an uppercase letter.
=> SELECT * FROM Premium_Customer ORDER BY COLLATION(LName, 'LEN_AS'), FName;
ID | LName | FName
----+--------+---------
6 | Mccoy | Cameron
7 | Mccoy | Lisa
1 | Mc Coy | Bob
5 | McCoy | Brendon
2 | Mc Coy | Janice
3 | McCoy | Jody
4 | McCoy | Peter
Comparing strings with an equality clause
In the following query, COLLATION removes spaces and punctuation when comparing two strings in English. It then determines whether the two strings still have the same value after the punctuation has been removed:
=> SELECT COLLATION ('U.S.A', 'LEN_AS') = COLLATION('USA', 'LEN_AS');
?column?
----------
t
Sorting strings in non-english languages
The following table contains data that uses the German character eszett, ß:
=> SELECT * FROM t1;
a | b | c
------------+---+----
ßstringß | 1 | 10
SSstringSS | 2 | 20
random1 | 3 | 30
random1 | 4 | 40
random2 | 5 | 50
When you specify the collation LDE_S1:
-
LDEindicates the language (L) is German (DE). -
S1indicates the strength (S) of 1 (primary). This value indicates that the collation does not need to consider accents and case.
The query returns the data in the following order:
=> SELECT a FROM t1 ORDER BY COLLATION(a, 'LDE_S1'));
a
------------
random1
random1
random2
SSstringSS
ßstringß