This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Regular expression functions
A regular expression lets you perform pattern matching on strings of characters.
A regular expression lets you perform pattern matching on strings of characters. The regular expression syntax allows you to precisely define the pattern used to match strings, giving you much greater control than wildcard matching used in the LIKE predicate. The Vertica regular expression functions let you perform tasks such as determining if a string value matches a pattern, extracting a portion of a string that matches a pattern, or counting the number of times a pattern occurs within a string.
Vertica uses the Perl Compatible Regular Expression (PCRE) library to evaluate regular expressions. As its name implies, PCRE's regular expression syntax is compatible with the syntax used by the Perl 5 programming language. You can read PCRE's documentation about its library. However, if you are unfamiliar with using regular expressions, the Perl Regular Expressions Documentation is a good introduction.
Note
The regular expression functions only operate on valid UTF-8 strings. If you try using a regular expression function on a string that is not valid UTF-8, the query fails with an error. To prevent an error from occurring, use the
ISUTF8 function as an initial clause to ensure the strings you pass to the regular expression functions are valid UTF-8 strings. Alternatively, or you can use the 'b' argument to treat the strings as binary octets, rather than UTF-8 encoded strings.
1 - MATCH_COLUMNS
Specified as an element in a SELECT list, returns all columns in queried tables that match the specified pattern.
Specified as an element in a SELECT list, returns all columns in queried tables that match the specified pattern. For example:
=> SELECT MATCH_COLUMNS ('%order%') FROM store.store_orders_fact LIMIT 3;
order_number | date_ordered | quantity_ordered | total_order_cost | reorder_level
--------------+--------------+------------------+------------------+---------------
191119 | 2003-03-09 | 15 | 4021 | 23
89985 | 2003-05-04 | 19 | 2692 | 23
246962 | 2007-06-01 | 77 | 4419 | 42
(3 rows)
Syntax
MATCH_COLUMNS ('pattern')
Arguments
pattern
- The pattern to match against all column names in the queried tables, where
pattern
typically contains one or both of the following wildcard characters:
The pattern can also include backslash (\
) characters to escape reserved characters that are embedded in column names: _
(underscore), %
(percent sign), and backlash (\
) itself.
Privileges
None
DDL usage
You can use MATCH_COLUMNS to define database objects—for example, specify it in CREATE PROJECTION to identify projection columns, or in CREATE TABLE...AS to identify columns in the new table. In all cases, Vertica expands the MATCH_COLUMNS output before it stores the object DDL. Subsequent changes to the original source table have no effect on the derived object definitions.
Restrictions
In general, MATCH_COLUMNS is specified as an element in a SELECT list. For example, CREATE PROJECTION can call MATCH_COLUMNS to specify the columns to include in a projection. However, attempts to specify columns in the projection's segmentation clause return with an error:
=> CREATE PROJECTION p_store_orders AS SELECT
MATCH_COLUMNS('%product%'),
MATCH_COLUMNS('%store%'),
order_number FROM store.store_orders_fact SEGMENTED BY MATCH_COLUMNS('products%') ALL NODES;
ERROR 0: MATCH_COLUMNS() function can only be specified as an element in a SELECT list
=> CREATE PROJECTION p_store_orders AS SELECT
MATCH_COLUMNS('%product%'),
MATCH_COLUMNS('%store%'),
order_number FROM store.store_orders_fact;
WARNING 4468: Projection <store.p_store_orders_b0> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
WARNING 4468: Projection <store.p_store_orders_b1> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
If you call MATCH_COLUMNS from a function that supports a fixed number of arguments, Vertica returns an error. For example, the UPPER function supports only one argument; so calling MATCH_COLUMNS from UPPER as follows returns an error:
=> SELECT MATCH_COLUMNS('emp%') FROM employee_dimension LIMIT 1;
-[ RECORD 1 ]-----------+---------------------------------
employee_key | 1
employee_gender | Male
employee_first_name | Craig
employee_middle_initial | F
employee_last_name | Robinson
employee_age | 22
employee_street_address | 5 Bakers St
employee_city | Thousand Oaks
employee_state | CA
employee_region | West
=> SELECT UPPER (MATCH_COLUMNS('emp%')) FROM employee_dimension;
ERROR 10465: MATCH_COLUMNS() function can only be specified as an element in a SELECT list
In contrast, the HASH function accepts an unlimited number of arguments, so calling MATCH_COLUMNS as an argument succeeds:
=> select HASH(MATCH_COLUMNS('emp%')) FROM employee_dimension LIMIT 10;
HASH
---------------------
2047284364908178817
1421997332260827278
7981613309330877388
792898558199431621
5275639269069980417
7892790768178152349
184601038712735208
3020263228621856381
7056305566297085916
3328422577712931057
(10 rows)
Other constraints
The following usages of MATCH_COLUMNS are invalid and return with an error:
-
Including MATCH_COLUMNS in the non-recursive (base) term query of a RECURSIVE WITH clause
-
Concatenating the results of MATCH_COLUMNS calls:
=> SELECT MATCH_COLUMNS ('%store%')||MATCH_COLUMNS('%store%') FROM store.store_orders_fact;
ERROR 0: MATCH_COLUMNS() function can only be specified as an element in a SELECT list
-
Setting an alias on MATCH_COLUMNS
Examples
The following CREATE PROJECTION statement uses MATCH_COLUMNS to specify table columns in the new projection:
=> CREATE PROJECTION p_store_orders AS SELECT
MATCH_COLUMNS('%product%'),
MATCH_COLUMNS('%store%'),
order_number FROM store.store_orders_fact;
WARNING 4468: Projection <store.p_store_orders_b0> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
WARNING 4468: Projection <store.p_store_orders_b1> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
=> SELECT export_objects('', 'store.p_store_orders_b0');
...
CREATE PROJECTION store.p_store_orders_b0 /*+basename(p_store_orders)*/
(
product_key,
product_version,
store_key,
order_number
)
AS
SELECT store_orders_fact.product_key,
store_orders_fact.product_version,
store_orders_fact.store_key,
store_orders_fact.order_number
FROM store.store_orders_fact
ORDER BY store_orders_fact.product_key,
store_orders_fact.product_version,
store_orders_fact.store_key,
store_orders_fact.order_number
SEGMENTED BY hash(store_orders_fact.product_key, store_orders_fact.product_version, store_orders_fact.store_key, store_orders_fact.order_number) ALL NODES OFFSET 0;
SELECT MARK_DESIGN_KSAFE(1);
(1 row)
As shown in the EXPORT_OBJECTS output, Vertica stores the result sets of the two MATCH_COLUMNS calls in the new projection's DDL. Later changes in the anchor table DDL have no effect on this projection.
2 - REGEXP_COUNT
Returns the number times a regular expression matches a string.
Returns the number times a regular expression matches a string.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_COUNT ( string-expession, pattern [, position [, regexp-modifier ]... ] )
Parameters
string-expression
The VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
The regular expression to match against string-expression
. The regular expression must conform with Perl regular expression syntax.
position
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the *
n
*th character you specify.
Default: 1
regexp-modifier
One or more single-character flags that modify how the regular expression pattern
is matched to string-expression
:
-
b
: Treat strings as binary octets, rather than UTF-8 characters.
-
c
(default): Force the match to be case sensitive.
-
i
: Force the match to be case insensitive.
-
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without the m
modifier, the start and end of line operators match only the start and end of the string.
-
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the .
operator matches any character except a newline.
-
x
: Add comments to regular expressions. The x
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
Examples
Count the number of occurrences of the substring an
in the specified string (a man, a plan, a canal: Panama
):
=> SELECT REGEXP_COUNT('a man, a plan, a canal: Panama', 'an');
REGEXP_COUNT
--------------
4
(1 row)
Find the number of occurrences of the substring an
, starting with the fifth character.
=> SELECT REGEXP_COUNT('a man, a plan, a canal: Panama', 'an',5);
REGEXP_COUNT
--------------
3
(1 row)
Find the number of occurrences of a substring containing a lower-case character followed by an
:
=> SELECT REGEXP_COUNT('a man, a plan, a canal: Panama', '[a-z]an');
REGEXP_COUNT
--------------
3
(1 row
REGEXP_COUNT specifies the i
modifier, so it ignores case:
=> SELECT REGEXP_COUNT('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
REGEXP_COUNT
--------------
4
3 - REGEXP_ILIKE
Returns true if the string contains a match for the regular expression.
Returns true if the string contains a match for the regular expression. REGEXP_ILIKE is similar to the LIKE, except that it uses a case insensitive regular expression, rather than simple wildcard character matching.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_ILIKE ( string-expression, pattern )
Parameters
string-expression
``
- The
VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
``
- The regular expression to match against
string-expression
. The regular expression must conform with Perl regular expression syntax.
Examples
This example creates a table containing several strings to demonstrate regular expressions.
-
Create table longvc
with a single, long varchar column body
, and insert data with distinct characters:
=> CREATE table longvc(body long varchar (1048576));
CREATE TABLE
=> insert into longvc values ('На берегу пустынных волн');
=> insert into longvc values ('Voin syödä lasia, se ei vahingoita minua');
=> insert into longvc values ('私はガラスを食べられます。それは私を傷つけません。');
=> insert into longvc values ('Je peux manger du verre, ça ne me fait pas mal.');
=> insert into longvc values ('zésbaésbaa');
=> insert into longvc values ('Out of the frying pan, he landed immediately in the fire');
=> SELECT * FROM longvc;
body
------------------------------------------------
На берегу пустынных волн
Voin syödä lasia, se ei vahingoita minua
私はガラスを食べられます。それは私を傷つけません。
Je peux manger du verre, ça ne me fait pas mal.
zésbaésbaa
Out of the frying pan, he landed immediately in the fire
(6 rows)
-
Pattern match table rows containing the character ç
:
=> SELECT * FROM longvc where regexp_ilike(body, 'ç');
body
-------------------------------------------------
Je peux manger du verre, ça ne me fait pas mal.
(1 row)
-
Select all rows that contain the characters A
/a
:
=> SELECT * FROM longvc where regexp_ilike(body, 'A');
body
-------------------------------------------------
Je peux manger du verre, ça ne me fait pas mal.
Voin syödä lasia, se ei vahingoita minua
zésbaésbaa
(3 rows)
-
Select all rows that contain the characters O
/o
:
=> SELECT * FROM longvc where regexp_ilike(body, 'O');
body
----------------------------------------------------------
Voin syödä lasia, se ei vahingoita minua
Out of the frying pan, he landed immediately in the fire
(2 rows)
4 - REGEXP_INSTR
Returns the starting or ending position in a string where a regular expression matches.
Returns the starting or ending position in a string where a regular expression matches. REGEXP_INSTR returns 0 if no match for the regular expression is found in the string.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_INSTR ( string-expression, pattern
[, position [, occurrence [, return-position [, regexp-modifier ]... [, captured-subexp ]]]] )
Parameters
string-expression
The VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
The regular expression to match against string-expression
. The regular expression must conform with Perl regular expression syntax.
position
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the *
n
*th character you specify.
Default: 1
occurrence
- Controls which occurrence of a pattern match in the string to return. By default, the function returns the position of the first matching substring. Use this parameter to find the position of subsequent matching substrings. For example, setting this parameter to 3 returns the position of the third substring that matches the pattern.
Default: 1
return-position
- Sets the position within the string to return. Using the default position (0), the function returns the string position of the first character of the substring that matches the pattern. If you set
return-position
to 1, the function returns the position of the first character after the end of the matching substring.
Default: 0
regexp-modifier
One or more single-character flags that modify how the regular expression pattern
is matched to string-expression
:
-
b
: Treat strings as binary octets, rather than UTF-8 characters.
-
c
(default): Force the match to be case sensitive.
-
i
: Force the match to be case insensitive.
-
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without the m
modifier, the start and end of line operators match only the start and end of the string.
-
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the .
operator matches any character except a newline.
-
x
: Add comments to regular expressions. The x
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
captured-subexp
- The captured subexpression whose position to return. By default, the function returns the position of the first character in
string
that matches the regular expression. If you set this value from 1 – 9, the function returns the subexpression captured by the corresponding set of parentheses in the regular expression. For example, setting this value to 3 returns the substring captured by the third set of parentheses in the regular expression.
Default: 0
Note
The subexpressions are numbered left to right, based on the appearance of opening parenthesis, so nested regular expressions . For example, in the regular expression \s*(\w+\s+(\w+))
, subexpression 1 is the one that captures everything but any leading whitespaces.
Examples
Find the first occurrence of a sequence of letters starting with the letter e
and ending with the letter y
in the specified string (easy come, easy go
).
=> SELECT REGEXP_INSTR('easy come, easy go','e\w*y');
REGEXP_INSTR
--------------
1
(1 row)
Starting at the second character (2
), find the first sequence of letters starting with the letter e
and ending with the letter y
:
=> SELECT REGEXP_INSTR('easy come, easy go','e\w*y',2);
REGEXP_INSTR
--------------
12
(1 row)
Starting at the first character (1
), find the second sequence of letters starting with the letter e
and ending with the letter y
:
=> SELECT REGEXP_INSTR('easy come, easy go','e\w*y',1,2);
REGEXP_INSTR
--------------
12
(1 row)
Find the position of the first character after the first whitespace:
=> SELECT REGEXP_INSTR('easy come, easy go','\s',1,1,1);
REGEXP_INSTR
--------------
6
(1 row)
Find the position of the start of the third word in a string by capturing each word as a subexpression, and returning the third subexpression's start position.
=> SELECT REGEXP_INSTR('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
REGEXP_INSTR
--------------
9
(1 row)
5 - REGEXP_LIKE
Returns true if the string matches the regular expression.
Returns true if the string matches the regular expression. REGEXP_LIKE is similar to the LIKE, except that it uses regular expressions rather than simple wildcard character matching.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_LIKE ( string-expression, pattern [, regexp-modifier ]... )
Parameters
string-expression
The VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
The regular expression to match against string-expression
. The regular expression must conform with Perl regular expression syntax.
regexp-modifier
One or more single-character flags that modify how the regular expression pattern
is matched to string-expression
:
-
b
: Treat strings as binary octets, rather than UTF-8 characters.
-
c
(default): Force the match to be case sensitive.
-
i
: Force the match to be case insensitive.
-
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without the m
modifier, the start and end of line operators match only the start and end of the string.
-
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the .
operator matches any character except a newline.
-
x
: Add comments to regular expressions. The x
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
Examples
Create a table that contains several strings:
=> CREATE TABLE t (v VARCHAR);
CREATE TABLE
=> CREATE PROJECTION t1 AS SELECT * FROM t;
CREATE PROJECTION
=> COPY t FROM stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> aaa
>> Aaa
>> abc
>> abc1
>> 123
>> \.
=> SELECT * FROM t;
v
-------
aaa
Aaa
abc
abc1
123
(5 rows)
Select all records from table t
that contain the letter a
:
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'a');
v
------
Aaa
aaa
abc
abc1
(4 rows)
Select all rows from table t
that start with the letter a
:
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'^a');
v
------
aaa
abc
abc1
(3 rows)
Select all rows that contain the substring aa
:
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'aa');
v
-----
Aaa
aaa
(2 rows)
Select all rows that contain a digit.
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'\d');
v
------
123
abc1
(2 rows)
Select all rows that contain the substring aaa
.
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'aaa');
v
-----
aaa
(1 row)
Select all rows that contain the substring aaa
using case-insensitive matching.
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'aaa', 'i');
v
-----
Aaa
aaa
(2 rows)
Select rows that contain the substring a b c
.
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'a b c');
v
---
(0 rows)
Select rows that contain the substring a b c
, ignoring space within the regular expression.
=> SELECT v FROM t WHERE REGEXP_LIKE(v,'a b c','x');
v
------
abc
abc1
(2 rows)
Add multi-line rows to table t
:
=> COPY t FROM stdin RECORD TERMINATOR '!';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Record 1 line 1
>> Record 1 line 2
>> Record 1 line 3!
>> Record 2 line 1
>> Record 2 line 2
>> Record 2 line 3!
>> \.
Select rows from table t
that start with the substring Record
and end with the substring line 2
.
=> SELECT v from t WHERE REGEXP_LIKE(v,'^Record.*line 2$');
v
---
(0 rows)
Select rows that start with the substring Record
and end with the substring line 2
, treating multiple lines as separate strings.
=> SELECT v from t WHERE REGEXP_LIKE(v,'^Record.*line 2$','m');
v
--------------------------------------------------
Record 2 line 1
Record 2 line 2
Record 2 line 3
Record 1 line 1
Record 1 line 2
Record 1 line 3
(2 rows)
6 - REGEXP_NOT_ILIKE
Returns true if the string does not match the case-insensitive regular expression.
Returns true if the string does not match the case-insensitive regular expression.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_NOT_ILIKE ( string-expression, pattern )
Parameters
string-expression
``
- The
VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
``
- The regular expression to match against
string-expression
. The regular expression must conform with Perl regular expression syntax.
Examples
-
Create a table (longvc
) with a single, long varchar column (body
). Then, insert data with some distinct characters, and query the table contents:
=> CREATE table longvc(body long varchar (1048576));
CREATE TABLE
=> insert into longvc values ('На берегу пустынных волн');
=> insert into longvc values ('Voin syödä lasia, se ei vahingoita minua');
=> insert into longvc values ('私はガラスを食べられます。それは私を傷つけません。');
=> insert into longvc values ('Je peux manger du verre, ça ne me fait pas mal.');
=> insert into longvc values ('zésbaésbaa');
=> SELECT * FROM longvc;
body
------------------------------------------------
На берегу пустынных волн
Voin syödä lasia, se ei vahingoita minua
私はガラスを食べられます。それは私を傷つけません。
Je peux manger du verre, ça ne me fait pas mal.
zésbaésbaa
(5 rows)
-
Find all rows that do not contain the character ç
:
=> SELECT * FROM longvc where regexp_not_ilike(body, 'ç');
body
----------------------------------------------------
Voin syödä lasia, se ei vahingoita minua
zésbaésbaa
На берегу пустынных волн
私はガラスを食べられます。それは私を傷つけません。
(4 rows)
-
Find all rows that do not contain the substring a
:
=> SELECT * FROM longvc where regexp_not_ilike(body, 'a');
body
----------------------------------------------------
На берегу пустынных волн
私はガラスを食べられます。それは私を傷つけません。
(2 rows)
7 - REGEXP_NOT_LIKE
Returns true if the string does not contain a match for the regular expression.
Returns true if the string does not contain a match for the regular expression. REGEXP_NOT_LIKE is a case sensitive regular expression.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_NOT_LIKE ( string-expression, pattern )
Parameters
string-expression
``
- The
VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
``
- The regular expression to match against
string-expression
. The regular expression must conform with Perl regular expression syntax.
Examples
-
Create a table (longvc
) with the LONG VARCHAR column body
. Then, insert data with some distinct characters and query the table contents:
=> CREATE table longvc(body long varchar (1048576));
CREATE TABLE
=> insert into longvc values ('На берегу пустынных волн');
=> insert into longvc values ('Voin syödä lasia, se ei vahingoita minua');
=> insert into longvc values ('私はガラスを食べられます。それは私を傷つけません。');
=> insert into longvc values ('Je peux manger du verre, ça ne me fait pas mal.');
=> insert into longvc values ('zésbaésbaa');
=> SELECT * FROM longvc;
body
------------------------------------------------
На берегу пустынных волн
Voin syödä lasia, se ei vahingoita minua
私はガラスを食べられます。それは私を傷つけません。
Je peux manger du verre, ça ne me fait pas mal.
zésbaésbaa
(5 rows)
-
Use REGEXP_NOT_LIKE
to return rows that do not contain the character ç
:
=> SELECT * FROM longvc where regexp_not_like(body, 'ç');
body
----------------------------------------------------
Voin syödä lasia, se ei vahingoita minua
zésbaésbaa
На берегу пустынных волн
私はガラスを食べられます。それは私を傷つけません。
(4 rows)
-
Return all rows that do not contain the characters *ö
and *ä
:
=> SELECT * FROM longvc where regexp_not_like(body, '.*ö.*ä');
body
----------------------------------------------------
Je peux manger du verre, ça ne me fait pas mal.
zésbaésbaa
На берегу пустынных волн
私はガラスを食べられます。それは私を傷つけません。
(4 rows)
-
Pattern match all rows that do not contain the characters z
and *ésbaa
:
=> SELECT * FROM longvc where regexp_not_like(body, 'z.*ésbaa');
body
----------------------------------------------------
Je peux manger du verre, ça ne me fait pas mal.
Voin syödä lasia, se ei vahingoita minua
zésbaésbaa
На берегу пустынных волн
私はガラスを食べられます。それは私を傷つけません。
(5 rows)
8 - REGEXP_REPLACE
Replaces all occurrences of a substring that match a regular expression with another substring.
Replaces all occurrences of a substring that match a regular expression with another substring. REGEXP_REPLACE is similar to the REPLACE function, except it uses a regular expression to select the substring to be replaced.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_REPLACE ( string-expression, target
[, replacement [, position [, occurrence[...] [, regexp-modifier]]]] )
Parameters
string-expression
The VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
The regular expression to match against string-expression
. The regular expression must conform with Perl regular expression syntax.
replacement
- The string to replace matched substrings. If you do not supply a
replacement
, the function deletes matched substrings. The replacement string can contain backreferences for substrings captured by the regular expression. The first captured substring is inserted into the replacement string using \1
, the second \2
, and so on.
position
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the
n
-th character you specify.
Default: 1
occurrence
- Controls which occurrence of a pattern match in the string to replace. By default, the function replaces all matching substrings. For example, setting this parameter to 3 replaces the third matching instance.
Default: 1
regexp-modifier
One or more single-character flags that modify how the regular expression pattern
is matched to string-expression
:
-
b
: Treat strings as binary octets, rather than UTF-8 characters.
-
c
(default): Force the match to be case sensitive.
-
i
: Force the match to be case insensitive.
-
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without the m
modifier, the start and end of line operators match only the start and end of the string.
-
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the .
operator matches any character except a newline.
-
x
: Add comments to regular expressions. The x
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
How Oracle handles subexpressions
Unlike Oracle, Vertica can handle an unlimited number of captured subexpressions, while Oracle is limited to nine.
In Vertica, you can use \10
in the replacement pattern to access the substring captured by the tenth set of parentheses in the regular expression. In Oracle, \10
is treated as the substring captured by the first set of parentheses, followed by a zero. To force this Oracle behavior in Vertica, use the \g
back reference and enclose the number of the captured subexpression in curly braces. For example, \g{1}0
is the substring captured by the first set of parentheses followed by a zero.
You can also name captured subexpressions to make your regular expressions less ambiguous. See the PCRE documentation for details.
Examples
Find groups of word characters—letters, numbers and underscore—that end with thy
in the string healthy, wealthy, and wise
, and replace them with nothing.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy');
REGEXP_REPLACE
----------------
, , and wise
(1 row)
Find groups of word characters ending with thy
and replace with the string something
.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something');
REGEXP_REPLACE
--------------------------------
something, something, and wise
(1 row)
Find groups of word characters ending with thy
and replace with the string something
starting at the third character in the string.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something', 3);
REGEXP_REPLACE
----------------------------------
hesomething, something, and wise
(1 row)
Replace the second group of word characters ending with thy
with the string something
.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
REGEXP_REPLACE
------------------------------
healthy, something, and wise
(1 row)
Find groups of word characters ending with thy
capturing the letters before the thy
, and replace with the captured letters plus the letters ish
.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','(\w+)thy', '\1ish');
REGEXP_REPLACE
----------------------------
healish, wealish, and wise
(1 row)
Create a table to demonstrate replacing strings in a query.
=> CREATE TABLE customers (name varchar(50), phone varchar(11));
CREATE TABLE
=> CREATE PROJECTION customers1 AS SELECT * FROM customers;
CREATE PROJECTION
=> COPY customers FROM stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Able, Adam|17815551234
>> Baker,Bob|18005551111
>> Chu,Cindy|16175559876
>> Dodd,Dinara|15083452121
>> \.
Query the customers, using REGEXP_REPLACE to format phone numbers.
=> SELECT name, REGEXP_REPLACE(phone, '(\d)(\d{3})(\d{3})(\d{4})',
'\1-(\2) \3-\4') as phone FROM customers;
name | phone
-------------+------------------
Able, Adam | 1-(781) 555-1234
Baker,Bob | 1-(800) 555-1111
Chu,Cindy | 1-(617) 555-9876
Dodd,Dinara | 1-(508) 345-2121
(4 rows)
9 - REGEXP_SUBSTR
Returns the substring that matches a regular expression within a string.
Returns the substring that matches a regular expression within a string. If no matches are found, REGEXP_SUBSTR returns NULL. This is different from an empty string, which the function can return if the regular expression matches a zero-length string.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
Syntax
REGEXP_SUBSTR ( string-expression, pattern
[, position [, occurrence [, regexp-modifier [, captured-subexp ]]... ]] )
Parameters
string-expression
The VARCHAR
or LONG VARCHAR
expression to evaluate for matches with the regular expression specified in pattern
. If string-expression
is in the __raw__
column of a flex or columnar table, cast the string to a LONG VARCHAR
before searching for pattern
.
pattern
The regular expression to match against string-expression
. The regular expression must conform with Perl regular expression syntax.
position
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the
n
-th character you specify.
Default: 1
occurrence
- Controls which occurrence of a pattern match in the string to return. By default, the function returns the first matching substring. For example, setting this parameter to 3 returns the third matching instance.
Default: 1
regexp-modifier
One or more single-character flags that modify how the regular expression pattern
is matched to string-expression
:
-
b
: Treat strings as binary octets, rather than UTF-8 characters.
-
c
(default): Force the match to be case sensitive.
-
i
: Force the match to be case insensitive.
-
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without the m
modifier, the start and end of line operators match only the start and end of the string.
-
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the .
operator matches any character except a newline.
-
x
: Add comments to regular expressions. The x
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
captured-subexp
- The group to return. By default, the function returns all matching groups. For example, setting this value to 3 returns the substring captured by the third set of parentheses in the regular expression.
Default: 0
Note
The subexpressions are numbered left to right, based on the appearance of opening parenthesis, so nested regular expressions . For example, in the regular expression \s*(\w+\s+(\w+))
, subexpression 1 is the one that captures everything but any leading whitespaces.
Examples
Select the first substring of letters that end with thy
.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy');
REGEXP_SUBSTR
---------------
healthy
(1 row)
Select the first substring of letters that ends with thy
starting at the second character in the string.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',2);
REGEXP_SUBSTR
---------------
ealthy
(1 row)
Select the second substring of letters that ends with thy
.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',1,2);
REGEXP_SUBSTR
---------------
wealthy
(1 row)
Return the contents of the third captured subexpression, which captures the third word in the string.
=> SELECT REGEXP_SUBSTR('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
REGEXP_SUBSTR
---------------
three
(1 row)