Locale and UTF-8 support
Vertica supports Unicode Transformation Format-8, or UTF8, where 8 equals 8-bit. UTF-8 is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. UTF-8 can represent any universal character in the Unicode standard. Initial encoding of byte codes and character assignments for UTF-8 coincides with ASCII. Thus, UTF8 requires little or no change for software that handles ASCII but preserves other values.
Vertica database servers expect to receive all data in UTF-8, and Vertica outputs all data in UTF-8. The ODBC API operates on data in UCS-2 on Windows systems, and normally UTF-8 on Linux systems. JDBC and ADO.NET APIs operate on data in UTF-16. Client drivers automatically convert data to and from UTF-8 when sending to and receiving data from Vertica using API calls. The drivers do not transform data loaded by executing a COPY or COPY LOCAL statement.
UTF-8 string functions
The following string functions treat VARCHAR
arguments as UTF-8 strings (when USING OCTETS
is not specified) regardless of locale setting.
String function | Description |
---|---|
LOWER |
Returns a VARCHAR value containing the argument converted to lowercase letters. |
UPPER |
Returns a VARCHAR value containing the argument converted to uppercase letters. |
INITCAP |
Capitalizes first letter of each alphanumeric word and puts the rest in lowercase. |
INSTR |
Searches string for substring and returns an integer indicating the position of the character in string that is the first character of this occurrence. |
SPLIT_PART |
Splits string on the delimiter and returns the location of the beginning of the given field (counting from one). |
POSITION |
Returns an integer value representing the character location of a specified substring with a string (counting from one). |
STRPOS |
Returns an integer value representing the character location of a specified substring within a string (counting from one). |