Implement locales for international data sets
Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate GROUP BY
and ORDER BY
clauses, joins, and the analytic ORDER BY
clause.
The default locale for a Vertica database is en_US@collation=binary
(English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default en_US@collation=binary
collation, regardless of the session collation. Any locale-specific collation is applied at query time.
If you set the locale to null, Vertica sets the locale to en_US_POSIX
. You can set the locale back to the default locale and collation by issuing the vsql meta-command \locale
. For example:
Note
=> set locale to '';
INFO 2567: Canonical locale: 'en_US_POSIX'
Standard collation: 'LEN'
English (United States, Computer)
SET
=> \locale en_US@collation=binary;
INFO 2567: Canonical locale: 'en_US'
Standard collation: 'LEN_KBINARY'
English (United States)
=> \locale
en_US@collation-binary;
You can set locale through ODBC, JDBC, and ADO.net.
ICU locale support
Vertica uses the ICU library for locale support; you must specify locale using the ICU locale syntax. The locale used by the database session is not derived from the operating system (through the LANG
variable), so Vertica recommends that you set the LANG
for each node running vsql, as described in the next section.
While ICU library services can specify collation, currency, and calendar preferences, Vertica supports only the collation component. Any keywords not relating to collation are rejected. Projections are always collated using the en_US@collation=binary
collation regardless of the session collation. Any locale-specific collation is applied at query time.
The SET DATESTYLE TO
...
command provides some aspects of the calendar, but Vertica supports only dollars as currency.
Changing DB locale for a session
This examples sets the session locale to Thai.
-
At the operating-system level for each node running vsql, set the
LANG
variable to the locale language as follows:export LANG=th_TH.UTF-8
Note
If setting theLANG=
as shown does not work, the operating system support for locales may not be installed. -
For each Vertica session (from ODBC/JDBC or vsql) set the language locale.
From vsql:
\locale th_TH
-
From ODBC/JDBC:
"SET LOCALE TO th_TH;"
-
In PUTTY (or ssh terminal), change the settings as follows:
settings > window > translation > UTF-8
-
Click Apply and then click Save.
All data loaded must be in UTF-8 format, not an ISO format, as described in Delimited data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8, are not supported, so functions like SUBSTRING do not work correctly for multibyte characters. Thus, settings for locale should not work correctly. If the translation setting ISO-8859-11:2001 (Latin/Thai) works, the data is loaded incorrectly. To convert data correctly, use a utility program such as Linux
iconv
.