Locale handling in Vertica

The following sections describes how Vertica handles locale.

The following sections describes how Vertica handles locale.

Session locale

Locale is session-scoped and applies only to queries executed in that session. You cannot specify locale for individual queries. When you start a session it obtains its locale from the configuration parameter DefaultSessionLocale.

Query restrictions

The following restrictions apply when queries are run with locale other than the default en_US@collation=binary:

  • When one or more of the left-side NOT IN columns is CHAR or VARCHAR, multi-column NOT IN subqueries are not supported . For example:

    => CREATE TABLE test (x VARCHAR(10), y INT);
    => SELECT ... FROM test WHERE (x,y) NOT IN (SELECT ...);
       ERROR: Multi-expression NOT IN subquery is not supported because a left
       hand expression could be NULL
    
  • If the outer query contains a GROUP BY clause on a CHAR or VARCHAR column, correlated HAVING clause subqueries are not supported. In the following example, the GROUP BY x in the outer query causes the error:

    => DROP TABLE test CASCADE;
    => CREATE TABLE test (x VARCHAR(10));
    => SELECT COUNT(*) FROM test t GROUP BY x HAVING x
         IN (SELECT x FROM test WHERE t.x||'a' = test.x||'a' );
       ERROR: subquery uses ungrouped column "t.x" from outer query
    
  • Subqueries that use analytic functions in the HAVING clause are not supported. For example:

    => DROP TABLE test CASCADE;
    => CREATE TABLE test (x VARCHAR(10));
    => SELECT MAX(x)OVER(PARTITION BY 1 ORDER BY 1) FROM test
         GROUP BY x HAVING x IN (SELECT MAX(x) FROM test);
       ERROR: Analytics query with having clause expression that involves
       aggregates and subquery is not supported
    

Collation and projections

Projection data is sorted according to the default en_US@collation=binary collation. Thus, regardless of the session setting, issuing the following command creates a projection sorted by col1 according to the binary collation:

=> CREATE PROJECTION p1 AS SELECT * FROM table1 ORDER BY col1;

In such cases, straße and strasse are not stored near each other on disk.

Sorting by binary collation also means that sort optimizations do not work in locales other than binary. Vertica returns the following warning if you create tables or projections in a non-binary locale:

WARNING:  Projections are always created and persisted in the default
Vertica locale. The current locale is de_DE

Non-binary locale input handling

When the locale is non-binary, Vertica uses the COLLATION function to transform input to a binary string that sorts in the proper order.

This transformation increases the number of bytes required for the input according to this formula:

result_column_width = input_octet_width * CollationExpansion + 4

The default value of configuration parameter CollationExpansion is 5.

Character data type handling

  • CHAR fields are displayed as fixed length, including any trailing spaces. When CHAR fields are processed internally, they are first stripped of trailing spaces. For VARCHAR fields, trailing spaces are usually treated as significant characters; however, trailing spaces are ignored when sorting or comparing either type of character string field using a non-binary locale.

  • The maximum length parameter for VARCHAR and CHAR data type refers to the number of octets (bytes) that can be stored in that field and not number of characters. When using multi-byte UTF-8 characters, the fields must be sized to accommodate from 1 to 4 bytes per character, depending on the data.