Locale handling in Vertica
The following sections describes how Vertica handles locale.
Session locale
Locale is session-scoped and applies only to queries executed in that session. You cannot specify locale for individual queries. When you start a session it obtains its locale from the configuration parameter DefaultSessionLocale
.
Query restrictions
The following restrictions apply when queries are run with locale other than the default en_US@collation=binary
:
-
When one or more of the left-side
NOT IN
columns isCHAR
orVARCHAR
, multi-columnNOT IN
subqueries are not supported . For example:=> CREATE TABLE test (x VARCHAR(10), y INT); => SELECT ... FROM test WHERE (x,y) NOT IN (SELECT ...); ERROR: Multi-expression NOT IN subquery is not supported because a left hand expression could be NULL
Note
Even if columnstest.x
andtest.y
have a NOT NULL constraint, an error occurs. -
If the outer query contains a
GROUP BY
clause on aCHAR
orVARCHAR
column, correlatedHAVING
clause subqueries are not supported. In the following example, theGROUP BY x
in the outer query causes the error:=> DROP TABLE test CASCADE; => CREATE TABLE test (x VARCHAR(10)); => SELECT COUNT(*) FROM test t GROUP BY x HAVING x IN (SELECT x FROM test WHERE t.x||'a' = test.x||'a' ); ERROR: subquery uses ungrouped column "t.x" from outer query
-
Subqueries that use analytic functions in the
HAVING
clause are not supported. For example:=> DROP TABLE test CASCADE; => CREATE TABLE test (x VARCHAR(10)); => SELECT MAX(x)OVER(PARTITION BY 1 ORDER BY 1) FROM test GROUP BY x HAVING x IN (SELECT MAX(x) FROM test); ERROR: Analytics query with having clause expression that involves aggregates and subquery is not supported
Collation and projections
Projection data is sorted according to the default en_US@collation=binary
collation. Thus, regardless of the session setting, issuing the following command creates a projection sorted by col1
according to the binary collation:
=> CREATE PROJECTION p1 AS SELECT * FROM table1 ORDER BY col1;
In such cases, straße
and strasse
are not stored near each other on disk.
Sorting by binary collation also means that sort optimizations do not work in locales other than binary. Vertica returns the following warning if you create tables or projections in a non-binary locale:
WARNING: Projections are always created and persisted in the default
Vertica locale. The current locale is de_DE
Non-binary locale input handling
When the locale is non-binary, Vertica uses the
COLLATION
function to transform input to a binary string that sorts in the proper order.
This transformation increases the number of bytes required for the input according to this formula:
result_column_width = input_octet_width * CollationExpansion + 4
The default value of configuration parameter CollationExpansion
is 5.
Character data type handling
-
CHAR
fields are displayed as fixed length, including any trailing spaces. WhenCHAR
fields are processed internally, they are first stripped of trailing spaces. ForVARCHAR
fields, trailing spaces are usually treated as significant characters; however, trailing spaces are ignored when sorting or comparing either type of character string field using a non-binary locale. -
The maximum length parameter for
VARCHAR
andCHAR
data type refers to the number of octets (bytes) that can be stored in that field and not number of characters. When using multi-byte UTF-8 characters, the fields must be sized to accommodate from 1 to 4 bytes per character, depending on the data.