COLLATION

Applies a collation to two or more strings.

Applies a collation to two or more strings. Use COLLATION with ORDER BY, GROUP BY, and equality clauses.

Syntax

COLLATION ( 'expression' [ , 'locale_or_collation_name' ] )

Arguments

'expression'

Any expression that evaluates to a column name or to two or more values of type CHAR or VARCHAR.

'locale_or_collation_name'

The ICU (International Components for Unicode) locale or collation name to use when collating the string. If you omit this parameter, COLLATION uses the collation associated with the session locale.

To determine the current session locale, enter the vsql meta-command \locale:

=> \locale
en_US@collation=binary

To set the locale and collation, use \locale as follows:

=> \locale en_US@collation=binary
INFO 2567:  Canonical locale: 'en_US'
Standard collation: 'LEN_KBINARY'
English (United States)

Locales

The locale used for COLLATION can be one of the following:

The default locale
A session locale
A locale that you specify when you call COLLATION. If you specify the locale, Vertica applies the collation associated with that locale to the data. COLLATION does not modify the collation for any other columns in the table.

For a list of valid ICU locales, go to Locale Explorer (ICU).

Binary and non-binary collations

The Vertica default locale is en_US@collation=binary, which uses binary collation. Binary collation compares binary representations of strings. Binary collation is fast, but it can result in a sort order where K precedes c because the binary representation of K is lower than c.

For non-binary collation, Vertica transforms the data according to the rules of the locale or the specified collation, and then applies the sorting rules. Suppose the locale collation is non-binary and you request a GROUP BY on string data. In this case,Vertica calls COLLATION, whether or not you specify the function in your query.

For information about collation naming, see Collator Naming Scheme.

Examples

Collating GROUP BY results

The following examples are based on a Premium_Customer table that contains the following data:

=> SELECT * FROM Premium_Customer;
 ID | LName  | FName
----+--------+---------
  1 | Mc Coy | Bob
  2 | Mc Coy | Janice
  3 | McCoy  | Jody
  4 | McCoy  | Peter
  5 | McCoy  | Brendon
  6 | Mccoy  | Cameron
  7 | Mccoy  | Lisa

The first statement shows how COLLATION applies the collation for the EN_US locale to the LName column for the locale EN_US. Vertica sorts the GROUP BY output as follows:

Last names with spaces
Last names where "coy" starts with a lowercase letter
Last names where "Coy" starts with an uppercase letter

=> SELECT * FROM Premium_Customer ORDER BY COLLATION(LName, 'EN_US'), FName;
 ID | LName  | FName
----+--------+---------
  1 | Mc Coy | Bob
  2 | Mc Coy | Janice
  6 | Mccoy  | Cameron
  7 | Mccoy  | Lisa
  5 | McCoy  | Brendon
  3 | McCoy  | Jody
  4 | McCoy  | Peter

The next statement shows how COLLATION collates the LName column for the locale LEN_AS:

LEN indicates the language (L) is English (EN).
AS (Alternate Shifted) instructs COLLATION that lowercase letters come before uppercase (shifted) letters.

In the results, the last names in which "coy" starts with a lowercase letter precede the last names where "Coy" starts with an uppercase letter.

=> SELECT * FROM Premium_Customer ORDER BY COLLATION(LName, 'LEN_AS'), FName;
 ID | LName  | FName
----+--------+---------
  6 | Mccoy  | Cameron
  7 | Mccoy  | Lisa
  1 | Mc Coy | Bob
  5 | McCoy  | Brendon
  2 | Mc Coy | Janice
  3 | McCoy  | Jody
  4 | McCoy  | Peter

Comparing strings with an equality clause

In the following query, COLLATION removes spaces and punctuation when comparing two strings in English. It then determines whether the two strings still have the same value after the punctuation has been removed:

=> SELECT COLLATION ('U.S.A', 'LEN_AS') = COLLATION('USA', 'LEN_AS');
?column?
----------
 t

Sorting strings in non-english languages

The following table contains data that uses the German character eszett, ß:

=> SELECT * FROM t1;
     a      | b | c
------------+---+----
 ßstringß   | 1 | 10
 SSstringSS | 2 | 20
 random1    | 3 | 30
 random1    | 4 | 40
 random2    | 5 | 50

When you specify the collation LDE_S1:

LDE indicates the language (L) is German (DE).
S1 indicates the strength (S) of 1 (primary). This value indicates that the collation does not need to consider accents and case.

The query returns the data in the following order:

=> SELECT a FROM t1 ORDER BY COLLATION(a, 'LDE_S1'));
     a
------------
 random1
 random1
 random2
 SSstringSS
 ßstringß