This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
About locale
Vertica locale specifications follow a subset of the Unicode LDML standard as implemented by the ICU library.
Locale specifies the user's language, country, and any special variant preferences, such as collation. Vertica uses locale to determine the behavior of certain string functions. Locale also determines the collation for various SQL commands that require ordering and comparison, such as aggregate GROUP BY
and ORDER BY
clauses, joins, and the analytic ORDER BY
clause.
The default locale for a Vertica database is en_US@collation=binary
(English US). You can define a new default locale that is used for all sessions on the database. You can also override the locale for individual sessions. However, projections are always collated using the default en_US@collation=binary
collation, regardless of the session collation. Any locale-specific collation is applied at query time.
If you set the locale to null, Vertica sets the locale to en_US_POSIX
. You can set the locale back to the default locale and collation by issuing the vsql meta-command \locale
. For example:
Note
=> set locale to '';
INFO 2567: Canonical locale: 'en_US_POSIX'
Standard collation: 'LEN'
English (United States, Computer)
SET
=> \locale en_US@collation=binary;
INFO 2567: Canonical locale: 'en_US'
Standard collation: 'LEN_KBINARY'
English (United States)
=> \locale
en_US@collation-binary;
You can set locale through ODBC, JDBC, and ADO.net.
Vertica locale specifications follow a subset of the Unicode LDML standard as implemented by the ICU library.
1 - Locale handling in Vertica
The following sections describes how Vertica handles locale.
The following sections describes how Vertica handles locale.
Session locale
Locale is session-scoped and applies only to queries executed in that session. You cannot specify locale for individual queries. When you start a session it obtains its locale from the configuration parameter DefaultSessionLocale
.
Query restrictions
The following restrictions apply when queries are run with locale other than the default en_US@collation=binary
:
-
When one or more of the left-side NOT IN
columns is CHAR
or VARCHAR
, multi-column NOT IN
subqueries are not supported . For example:
=> CREATE TABLE test (x VARCHAR(10), y INT);
=> SELECT ... FROM test WHERE (x,y) NOT IN (SELECT ...);
ERROR: Multi-expression NOT IN subquery is not supported because a left
hand expression could be NULL
Note
Even if columns test.x
and test.y
have a NOT NULL constraint, an error occurs.
-
If the outer query contains a GROUP BY
clause on a CHAR
or VARCHAR
column, correlated HAVING
clause subqueries are not supported. In the following example, the GROUP BY x
in the outer query causes the error:
=> DROP TABLE test CASCADE;
=> CREATE TABLE test (x VARCHAR(10));
=> SELECT COUNT(*) FROM test t GROUP BY x HAVING x
IN (SELECT x FROM test WHERE t.x||'a' = test.x||'a' );
ERROR: subquery uses ungrouped column "t.x" from outer query
-
Subqueries that use analytic functions in the HAVING
clause are not supported. For example:
=> DROP TABLE test CASCADE;
=> CREATE TABLE test (x VARCHAR(10));
=> SELECT MAX(x)OVER(PARTITION BY 1 ORDER BY 1) FROM test
GROUP BY x HAVING x IN (SELECT MAX(x) FROM test);
ERROR: Analytics query with having clause expression that involves
aggregates and subquery is not supported
Collation and projections
Projection data is sorted according to the default en_US@collation=binary
collation. Thus, regardless of the session setting, issuing the following command creates a projection sorted by col1
according to the binary collation:
=> CREATE PROJECTION p1 AS SELECT * FROM table1 ORDER BY col1;
In such cases, straße
and strasse
are not stored near each other on disk.
Sorting by binary collation also means that sort optimizations do not work in locales other than binary. Vertica returns the following warning if you create tables or projections in a non-binary locale:
WARNING: Projections are always created and persisted in the default
Vertica locale. The current locale is de_DE
When the locale is non-binary, Vertica uses the
COLLATION
function to transform input to a binary string that sorts in the proper order.
This transformation increases the number of bytes required for the input according to this formula:
result_column_width = input_octet_width * CollationExpansion + 4
The default value of configuration parameter CollationExpansion
is 5.
Character data type handling
-
CHAR
fields are displayed as fixed length, including any trailing spaces. When CHAR
fields are processed internally, they are first stripped of trailing spaces. For VARCHAR
fields, trailing spaces are usually treated as significant characters; however, trailing spaces are ignored when sorting or comparing either type of character string field using a non-binary locale.
-
The maximum length parameter for VARCHAR
and CHAR
data type refers to the number of octets (bytes) that can be stored in that field and not number of characters. When using multi-byte UTF-8 characters, the fields must be sized to accommodate from 1 to 4 bytes per character, depending on the data.
2 - Specifying locale: long form
Vertica supports long forms that specify the collation keyword.
Vertica supports long forms that specify the collation
keyword. Vertica extends long-form processing to accept collation arguments.
Syntax
[language][_script][_country][_variant][@collation-spec]
Note
The following syntax options apply:
-
Locale specification strings are case insensitive. For example, en_us
and EN_US
, are equivalent.
-
You can substitute underscores with hyphens. For example: [-script
]
Parameters
language
- A two- or three-letter lowercase code for a particular language. For example, Spanish is
es
English is en
and French is fr
. The two-letter language code uses the ISO-639 standard.
_
script
- An optional four-letter script code that follows the language code. If specified, it should be a valid script code as listed on the Unicode ISO 15924 Registry.
_
country
- A specific language convention within a generic language for a specific country or region. For example, French is spoken in many countries, but the currencies are different in each country. To allow for these differences among specific geographical, political, or cultural regions, locales are specified by two-letter, uppercase codes. For example,
FR
represents France and CA
represents Canada. The two letter country code uses the ISO-3166 standard.
_
variant
- Differences may also appear in language conventions used within the same country. For example, the Euro currency is used in several European countries while the individual country's currency is still in circulation. To handle variations inside a language and country pair, add a third code, the variant code. The variant code is arbitrary and completely application-specific. ICU adds
_EURO
to its locale designations for locales that support the Euro currency. Variants can have any number of underscored key words. For example, EURO_WIN
is a variant for the Euro currency on a Windows computer.
Another use of the variant code is to designate the Collation (sorting order) of a locale. For instance, the es__TRADITIONAL
locale uses the traditional sorting order which is different from the default modern sorting of Spanish.
@
collation-spec
- Vertica only supports the keyword
collation
, as follows:
@collation=collation-type[;arg]...
Collation can specify one or more semicolon-delimited arguments, described below.
collation-type
is set to one of the following values:
-
big5han
: Pinyin ordering for Latin, big5 charset ordering for CJK characters (used in Chinese).
-
dict
: For a dictionary-style ordering (such as in Sinhala).
-
direct
: Hindi variant.
-
gb2312/gb2312han
: Pinyin ordering for Latin, gb2312han charset ordering for CJK characters (used in Chinese).
-
phonebook
: For a phonebook-style ordering (such as in German).
-
pinyin
: Pinyin ordering for Latin and for CJK characters; that is, an ordering for CJK characters based on a character-by-character transliteration into a pinyin (used in Chinese).
-
reformed
: Reformed collation (such as in Swedish).
-
standard
: The default ordering for each language. For root it is [UCA] order; for each other locale it is the same as UCA (Unicode Collation Algorithm) ordering except for appropriate modifications to certain characters for that language. The following are additional choices for certain locales; they have effect only in certain locales.
-
stroke
: Pinyin ordering for Latin, stroke order for CJK characters (used in Chinese) not supported.
-
traditional
: For a traditional-style ordering (such as in Spanish).
-
unihan
: Pinyin ordering for Latin, Unihan radical-stroke ordering for CJK characters (used in Chinese) not supported.
-
binary
: Vertica default, providing UTF-8 octet ordering.
Notes:
-
Collations might default to root, the ICU default collation.
-
Invalid values of the collation keyword and its synonyms do not cause an error. For example, the following does not generate an error. It simply ignores the invalid value:
=> \locale en_GB@collation=xyz
INFO 2567: Canonical locale: 'en_GB@collation=xyz'
Standard collation: 'LEN'
English (United Kingdom, collation=xyz)
For more about collation options, see Unicode Locale Data Markup Language (LDML).
Collation arguments
collation
can specify one or more of the following arguments :
Parameter |
Short form |
Description |
colstrength |
S |
Sets the default strength for comparison. This feature is locale dependent.
Set colstrength to one of the following:
-
1 | primary : Ignores case and accents. Only primary differences are used during comparison—for example, a versus z .
-
2 | secondary : Ignores case. Only secondary and above differences are considered for comparison—for example, different accented forms of the same base letter such as a versus \u00E4 .
-
3 | tertiary (default): Only tertiary differences and higher are considered for comparison. Tertiary comparisons are typically used to evaluate case differences—for example, Z versus z .
-
4 | quarternary : For example, used with Hiragana.
|
colAlternate |
A |
Sets alternate handling for variable weights, as described in UCA, one of the following:
-
non-ignorable | N | D
-
shifted | S
|
colBackwards |
F |
For Latin with accents, this parameter determines which accents are sorted. It sets the comparison for the second level to be backwards.
Note
colBackwards is automatically set for French accents.
Set colBackwards to one of the following:
-
on | O : The normal UCA algorithm is used.
-
off | X: All strings that are in Fast C or D normalization form (FCD) sort correctly, but others do not necessarily sort correctly. Set to off if the strings to be compared are in FCD.
|
colNormalization |
N |
Set to one of the following:
-
on | O : The normal UCA algorithm is used.
-
off | X : All strings that are in Fast C or D normalization form (FCD) sort correctly, but others won't necessarily sort correctly. It should only be set off if the strings to be compared are in FCD.
|
colCaseLevel |
E |
Set to one of the following:
-
on | O : A level consisting only of case characteristics is inserted in front of tertiary level. To ignore accents but take cases into account, set strength to primary and case level to on.
-
off | X : This level is omitted.
|
colCaseFirst |
C |
Set to one of the following:
-
upper | U : Upper case sorts before lower case.
-
lower | L : Lower case sorts before upper case. This is useful for locales that have already supported ordering but require different order of cases. It affects case and tertiary levels.
-
off | short : Tertiary weights unaffected
|
colHiraganaQuarternary |
H |
Controls special treatment of Hiragana code points on quaternary level, one of the following:
-
on | O : Hiragana codepoints get lower values than all the other non-variable code points. The strength must be greater or equal than quaternary for this attribute to take effect.
-
off | X : Hiragana letters are treated normally.
|
colNumeric |
D |
If set to on, any sequence of Decimal Digits (General_Category = Nd in the [UCD]) is sorted at a primary level with its numeric value. For example, A-21 < A-123 . |
variableTop |
B |
Sets the default value for the variable top. All code points with primary weights less than or equal to the variable top will be considered variable, and are affected by the alternate handling.
For example, the following command sets variableTop to be HYPHEN (u2010 )
=> \locale en_US@colalternate=shifted;variabletop=u2010
|
Locale processing notes
-
Incorrect locale strings are accepted if the prefix can be resolved to a known locale version.
For example, the following works because the language can be resolved:
=> \locale en_XX
INFO 2567: Canonical locale: 'en_XX'
Standard collation: 'LEN'
English (XX)
The following does not work because the language cannot be resolved:
=> \locale xx_XX
xx_XX: invalid locale identifier
-
POSIX-type locales such as en_US.UTF-8
work to some extent in that the encoding part "UTF-8" is ignored.
-
Vertica uses the icu4c-4_2_1 library to support basic locale/collation processing with some extensions. This does not currently meet current standards for locale processing (https://tools.ietf.org/html/rfc5646).
Examples
Specify German locale as used in Germany (de
), with phonebook
-style collation:
=> \locale de_DE@collation=phonebook
INFO 2567: Canonical locale: 'de_DE@collation=phonebook'
Standard collation: 'KPHONEBOOK_LDE'
German (Germany, collation=Phonebook Sort Order)
Deutsch (Deutschland, Sortierung=Telefonbuch-Sortierregeln)
Specify German locale as used in Germany (de
), with phonebook
-style collation and strength set to secondary:
=> \locale de_DE@collation=phonebook;colStrength=secondary
INFO 2567: Canonical locale: 'de_DE@collation=phonebook'
Standard collation: 'KPHONEBOOK_LDE_S2'
German (Germany, collation=Phonebook Sort Order)
Deutsch (Deutschland, Sortierung=Telefonbuch-Sortierregeln)
3 - Specifying locale: short form
Vertica accepts locales in short form.
Vertica accepts locales in short form. You can use the short form to specify the locale and keyname pair/value names.
To determine the short form for a locale, type in the long form and view the last line of INFO, as follows:
\locale frINFO: Locale: 'fr'
INFO: French
INFO: français
INFO: Short form: 'LFR'
Examples
Specify en
(English) locale:
\locale LENINFO: Locale: 'en'
INFO: English
INFO: Short form: 'LEN'
Specify German locale as used in Germany (de
), with phonebook
-style collation:
\locale LDE_KPHONEBOOKINFO: Locale: 'de@collation=phonebook'
INFO: German (collation=Phonebook Sort Order)
INFO: Deutsch (Sortierung=Telefonbuch-Sortierregeln)
INFO: Short form: 'KPHONEBOOK_LDE'
Specify German locale as used in Germany (de
), with phonebook
-style collation:
\locale LDE_KPHONEBOOK_S2INFO: Locale: 'de@collation=phonebook'
INFO: German (collation=Phonebook Sort Order)
INFO: Deutsch (Sortierung=Telefonbuch-Sortierregeln)
INFO: Short form: 'KPHONEBOOK_LDE_S2'
4 - Supported locales
The following are the supported locale strings for Vertica.
The following are the supported locale strings for Vertica. Each locale can optionally have a list of key/value pairs (see Specifying locale: long form).
Locale Name |
Language or Variant |
Region |
af |
Afrikaans |
|
af_NA |
Afrikaans |
Namibian Afrikaans |
af_ZA |
Afrikaans |
South Africa |
am |
Ethiopic |
|
am_ET |
Ethiopic |
Ethiopia |
ar |
Arabic |
|
ar_AE |
Arabic |
United Arab Emirates |
ar_BH |
Arabic |
Bahrain |
ar_DZ |
Arabic |
Algeria |
ar_EG |
Arabic |
Egypt |
ar_IQ |
Arabic |
Iraq |
ar_JO |
Arabic |
Jordan |
ar_KW |
Arabic |
Kuwait |
ar_LB |
Arabic |
Lebanon |
ar_LY |
Arabic |
Libya |
ar_MA |
Arabic |
Morocco |
ar_OM |
Arabic |
Oman |
ar_QA |
Arabic |
Qatar |
ar_SA |
Arabic |
Saudi Arabia |
ar_SD |
Arabic |
Sudan |
ar_SY |
Arabic |
Syria |
ar_TN |
Arabic |
Tunisia |
ar_YE |
Arabic |
Yemen |
as |
Assamese |
|
as_IN |
Assamese |
India |
az |
Azerbaijani |
|
az_Cyrl |
Azerbaijani |
Cyrillic |
az_Cyrl_AZ |
Azerbaijani |
Azerbaijan Cyrillic |
az_Latn |
Azerbaijani |
Latin |
az_Latn_AZ |
Azerbaijani |
Azerbaijan Latin |
be |
Belarusian |
|
be_BY |
Belarusian |
Belarus |
bg |
Bulgarian |
|
bg_BG |
Bulgarian |
Bulgaria |
bn |
Bengali |
|
bn_BD |
Bengali |
Bangladesh |
bn_IN |
Bengali |
India |
bo |
Tibetan |
|
bo_CN |
Tibetan |
PR China |
bo_IN |
Tibetan |
India |
ca |
Catalan |
|
ca_ES |
Catalan |
Spain |
cs |
Czech |
|
cs_CZ |
Czech |
Czech Republic |
cy |
Welsh |
|
cy_GB |
Welsh |
United Kingdom |
da |
Danish |
|
da_DK |
Danish |
Denmark |
de |
German |
|
de_AT |
German |
Austria |
de_BE |
German |
Belgium |
de_CH |
German |
Switzerland |
de_DE |
German |
Germany |
de_LI |
German |
Liechtenstein |
de_LU |
German |
Luxembourg |
el |
Greek |
|
el_CY |
Greek |
Cyprus |
el_GR |
Greek |
Greece |
en |
English |
|
en_AU |
English |
Australia |
en_BE |
English |
Belgium |
en_BW |
English |
Botswana |
en_BZ |
English |
Belize |
en_CA |
English |
Canada |
en_GB |
English |
United Kingdom |
en_HK |
English |
Hong Kong S.A.R. of China |
en_IE |
English |
Ireland |
en_IN |
English |
India |
en_JM |
English |
Jamaica |
en_MH |
English |
Marshall Islands |
en_MT |
English |
Malta |
en_NA |
English |
Namibia |
en_NZ |
English |
New Zealand |
en_PH |
English |
Philippines |
en_PK |
English |
Pakistan |
en_SG |
English |
Singapore |
en_TT |
English |
Trinidad and Tobago |
en_US |
English |
United States |
en_US_POSIX |
English |
United States Posix |
en_VI |
English |
U.S. Virgin Islands |
en_ZA |
English |
Zimbabwe or South Africa |
en_ZW |
English |
Zimbabwe |
eo |
Esperanto |
|
es |
Spanish |
|
es_AR |
Spanish |
Argentina |
es_BO |
Spanish |
Bolivia |
es_CL |
Spanish |
Chile |
es_CO |
Spanish |
Columbia |
es_CR |
Spanish |
Costa Rica |
es_DO |
Spanish |
Dominican Republic |
es_EC |
Spanish |
Ecuador |
es_ES |
Spanish |
Spain |
es_GT |
Spanish |
Guatemala |
es_HN |
Spanish |
Honduras |
es_MX |
Spanish |
Mexico |
es_NI |
Spanish |
Nicaragua |
es_PA |
Spanish |
Panama |
es_PE |
Spanish |
Peru |
es_PR |
Spanish |
Puerto Rico |
es_PY |
Spanish |
Paraguay |
es_SV |
Spanish |
El Salvador |
es_US |
Spanish |
United States |
es_UY |
Spanish |
Uruguay |
es_VE |
Spanish |
Venezuela |
et |
Estonian |
|
et_EE |
Estonian |
Estonia |
eu |
Basque |
Spain |
eu_ES |
Basque |
Spain |
fa |
Persian |
|
fa_AF |
Persian |
Afghanistan |
fa_IR |
Persian |
Iran |
fi |
Finnish |
|
fi_FI |
Finnish |
Finland |
fo |
Faroese |
|
fo_FO |
Faroese |
Faroe Islands |
fr |
French |
|
fr_BE |
French |
Belgium |
fr_CA |
French |
Canada |
fr_CH |
French |
Switzerland |
fr_FR |
French |
France |
fr_LU |
French |
Luxembourg |
fr_MC |
French |
Monaco |
fr_SN |
French |
Senegal |
ga |
Gaelic |
|
ga_IE |
Gaelic |
Ireland |
gl |
Gallegan |
|
gl_ES |
Gallegan |
Spain |
gsw |
German |
|
gsw_CH |
German |
Switzerland |
gu |
Gujurati |
|
gu_IN |
Gujurati |
India |
gv |
Manx |
|
gv_GB |
Manx |
United Kingdom |
ha |
Hausa |
|
ha_Latn |
Hausa |
Latin |
ha_Latn_GH |
Hausa |
Ghana (Latin) |
ha_Latn_NE |
Hausa |
Niger (Latin) |
ha_Latn_NG |
Hausa |
Nigeria (Latin)
|
haw |
Hawaiian |
Hawaiian |
haw_US |
Hawaiian |
United States |
he |
Hebrew |
|
he_IL |
Hebrew |
Israel |
hi |
Hindi |
|
hi_IN |
Hindi |
India |
hr |
Croation |
|
hr_HR |
Croation |
Croatia |
hu |
Hungarian |
|
hu_HU |
Hungarian |
Hungary |
hy |
Armenian |
|
hy_AM |
Armenian |
Armenia |
hy_AM_REVISED |
Armenian |
Revised Armenia |
id |
Indonesian |
|
id_ID |
Indonesian |
Indonesia |
ii |
Sichuan |
|
ii_CN |
Sichuan |
Yi |
is |
Icelandic |
|
is_IS |
Icelandic |
Iceland |
it |
Italian |
|
it_CH |
Italian |
Switzerland |
it_IT |
Italian |
Italy |
ja |
Japanese |
|
ja_JP |
Japanese |
Japan |
ka |
Georgian |
|
ka_GE |
Georgian |
Georgia |
kk |
Kazakh |
|
kk_Cyrl |
Kazakh |
Cyrillic |
kk_Cyrl_KZ |
Kazakh |
Kazakhstan (Cyrillic) |
kl |
Kalaallisut |
|
kl_GL |
Kalaallisut |
Greenland |
km |
Khmer |
|
km_KH |
Khmer |
Cambodia |
kn |
Kannada |
|
kn-IN |
Kannada |
India |
ko |
Korean |
|
ko_KR |
Korean |
Korea |
kok |
Konkani |
|
kok_IN |
Konkani |
India |
kw |
Cornish |
|
kw_GB |
Cornish |
United Kingdom |
lt |
Lithuanian |
|
lt_LT |
Lithuanian |
Lithuania |
lv |
Latvian |
|
lv_LV |
Latvian |
Latvia |
mk |
Macedonian |
|
mk_MK |
Macedonian |
Macedonia |
ml |
Malayalam |
|
ml_IN |
Malayalam |
India |
mr |
Marathi |
|
mr_IN |
Marathi |
India |
ms |
Malay |
|
ms_BN |
Malay |
Brunei |
ms_MY |
Malay |
Malaysia |
mt |
Maltese |
|
mt_MT |
Maltese |
Malta |
nb |
Norwegian Bokml |
|
nb_NO |
Norwegian Bokml |
Norway |
ne |
Nepali |
|
ne_IN |
Nepali |
India |
ne_NP |
Nepali |
Nepal |
nl |
Dutch |
|
nl_BE |
Dutch |
Belgium |
nl_NL |
Dutch |
Netherlands |
nn |
Norwegian nynorsk |
|
nn_NO |
Norwegian nynorsk |
Norway |
om |
Oromo |
|
om_ET |
Oromo |
Ethiopia |
om_KE |
Oromo |
Kenya |
or |
Oriya |
|
or_IN |
Oriya |
India |
pa |
Punjabi |
|
pa_Arab |
Punjabi |
Arabic |
pa_Arab_PK |
Punjabi |
Pakistan (Arabic) |
pa_Guru |
Punjabi |
Gurmukhi |
pa_Guru_IN |
Punjabi |
India (Gurmukhi) |
pl |
Polish |
|
pl_PL |
Polish |
Poland |
ps |
Pashto |
|
ps_AF |
Pashto |
Afghanistan |
pt |
Portuguese |
|
pt_BR |
Portuguese |
Brazil |
pt_PT |
Portuguese |
Portugal |
ro |
Romanian |
|
ro_MD |
Romanian |
Moldavia |
ro_RO |
Romanian |
Romania |
ru |
Russian |
|
ru_RU |
Russian |
Russia |
ru_UA |
Russian |
Ukraine |
si |
Sinhala |
|
si_LK |
Sinhala |
Sri Lanka |
sk |
Slovak |
|
sk_SK |
Slovak |
Slovakia |
sl |
Slovenian |
|
sl_SL |
Slovenian |
Slovenia |
so |
Somali |
|
so_DJ |
Somali |
Djibouti |
so_ET |
Somali |
Ethiopia |
so_KE |
Somali |
Kenya |
so_SO |
Somali |
Somalia |
sq |
Albanian |
|
sq_AL |
Albanian |
Albania |
sr |
Serbian |
|
sr_Cyrl |
Serbian |
Cyrillic |
sr_Cyrl_BA |
Serbian |
Bosnia and Herzegovina (Cyrillic) |
sr_Cyrl_ME |
Serbian |
Montenegro (Cyrillic) |
sr_Cyrl_RS |
Serbian |
Serbia (Cyrillic)
|
sr_Latn |
Serbian |
Latin |
sr_Latn_BA |
Serbian |
Bosnia and Herzegovina (Latin) |
sr_Latn_ME |
Serbian |
Montenegro (Latin) |
sr_Latn_RS |
Serbian |
Serbia (Latin)
|
sv |
Swedish |
|
sv_FI |
Swedish |
Finland |
sv_SE |
Swedish |
Sweden |
sw |
Swahili |
|
sw_KE |
Swahili |
Kenya |
sw_TZ |
Swahili |
Tanzania |
ta |
Tamil |
|
ta_IN |
Tamil |
India |
te |
Telugu |
|
te_IN |
Telugu |
India |
th |
Thai |
|
th_TH |
Thai |
Thailand |
ti |
Tigrinya |
|
ti_ER |
Tigrinya |
Eritrea |
ti_ET |
Tigrinya |
Ethiopia |
tr |
Turkish |
|
tr_TR |
Turkish |
Turkey |
uk |
Ukrainian |
|
uk_UA |
Ukrainian |
Ukraine |
ur |
Urdu |
|
ur_IN |
Urdu |
India |
ur_PK |
Urdu |
Pakistan |
uz |
Uzbek |
|
uz_Arab |
Uzbek |
Arabic |
uz_Arab_AF |
Uzbek |
Afghanistan (Arabic) |
uz_Cryl |
Uzbek |
Cyrillic |
uz_Cryl_UZ |
Uzbek |
Uzbekistan (Cyrillic) |
uz_Latin |
Uzbek |
Latin |
us_Latin_UZ |
|
Uzbekistan (Latin) |
vi |
Vietnamese |
|
vi_VN |
Vietnamese |
Vietnam |
zh |
Chinese |
|
zh_Hans |
Chinese |
Simplified Han |
zh_Hans_CN |
Chinese |
China (Simplified Han) |
zh_Hans_HK |
Chinese |
Hong Kong SAR China (Simplified Han) |
zh_Hans_MO |
Chinese |
Macao SAR China (Simplified Han) |
zh_Hans_SG |
Chinese |
Singapore (Simplified Han) |
zh_Hant |
Chinese |
Traditional Han |
zh_Hant_HK |
Chinese |
Hong Kong SAR China (Traditional Han) |
zh_Hant_MO |
Chinese |
Macao SAR China (Traditional Han) |
zh_Hant_TW |
Chinese |
Taiwan (Traditional Han) |
zu |
Zulu |
|
zu_ZA |
Zulu |
South Africa |
5 - Locale and UTF-8 support
Vertica supports Unicode Transformation Format-8, or UTF8, where 8 equals 8-bit.
Vertica supports Unicode Transformation Format-8, or UTF8, where 8 equals 8-bit. UTF-8 is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. UTF-8 can represent any universal character in the Unicode standard. Initial encoding of byte codes and character assignments for UTF-8 coincides with ASCII. Thus, UTF8 requires little or no change for software that handles ASCII but preserves other values.
Vertica database servers expect to receive all data in UTF-8, and Vertica outputs all data in UTF-8. The ODBC API operates on data in UCS-2 on Windows systems, and normally UTF-8 on Linux systems. JDBC and ADO.NET APIs operate on data in UTF-16. Client drivers automatically convert data to and from UTF-8 when sending to and receiving data from Vertica using API calls. The drivers do not transform data loaded by executing a COPY or COPY LOCAL statement.
UTF-8 string functions
The following string functions treat VARCHAR
arguments as UTF-8 strings (when USING OCTETS
is not specified) regardless of locale setting.
String function |
Description |
LOWER |
Returns a VARCHAR value containing the argument converted to lowercase letters. |
UPPER |
Returns a VARCHAR value containing the argument converted to uppercase letters. |
INITCAP |
Capitalizes first letter of each alphanumeric word and puts the rest in lowercase. |
INSTR |
Searches string for substring and returns an integer indicating the position of the character in string that is the first character of this occurrence. |
SPLIT_PART |
Splits string on the delimiter and returns the location of the beginning of the given field (counting from one). |
POSITION |
Returns an integer value representing the character location of a specified substring with a string (counting from one). |
STRPOS |
Returns an integer value representing the character location of a specified substring within a string (counting from one). |
6 - Locale-aware string functions
Vertica provides string functions to support internationalization.
Vertica provides string functions to support internationalization. Unless otherwise specified, these string functions can optionally specify whether VARCHAR
arguments should be interpreted as octet (byte) sequences, or as (locale-aware) sequences of characters. Specify this information by adding the parameter USING OCTETS
and USING CHARACTERS
(default) to the function.
The following table lists all string functions that are locale-aware:
String function |
Description |
BTRIM |
Removes the longest string consisting only of specified characters from the start and end of a string. |
CHARACTER_LENGTH |
Returns an integer value representing the number of characters or octets in a string. |
GREATEST |
Returns the largest value in a list of expressions. |
GREATESTB |
Returns its greatest argument, using binary ordering, not UTF-8 character ordering. |
INITCAP |
Capitalizes first letter of each alphanumeric word and puts the rest in lowercase. |
INSTR |
Searches string for substring and returns an integer indicating the position of the character in string that is the first character of this occurrence. |
LEAST |
Returns the smallest value in a list of expressions. |
LEASTB |
Returns its least argument, using binary ordering, not UTF-8 character ordering. |
LEFT |
Returns the specified characters from the left side of a string. |
LENGTH |
Takes one argument as an input and returns returns an integer value representing the number of characters in a string. |
LTRIM |
Returns a VARCHAR value representing a string with leading blanks removed from the left side (beginning). |
OVERLAY |
Returns a VARCHAR value representing a string having had a substring replaced by another string. |
OVERLAYB |
Returns an octet value representing a string having had a substring replaced by another string. |
REPLACE |
replaces all occurrences of characters in a string with another set of characters. |
RIGHT |
Returns the length right-most characters of string. |
SUBSTR |
Returns a VARCHAR value representing a substring of a specified string. |
SUBSTRB |
Returns a byte value representing a substring of a specified string. |
SUBSTRING |
Given a value, a position, and an optional length, returns a value representing a substring of the specified string at the given position. |
TRANSLATE |
Replaces individual characters in string_to_replace with other characters. |
UPPER |
Returns a VARCHAR value containing the argument converted to uppercase letters. |