REGEXP_REPLACE
Replaces all occurrences of a substring that match a regular expression with another substring. REGEXP_REPLACE is similar to the REPLACE function, except it uses a regular expression to select the substring to be replaced.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.Syntax
REGEXP_REPLACE ( string-expression, target
[, replacement [, position [, occurrence[...] [, regexp-modifier]]]] )
Parameters
string-expression
The
VARCHAR
orLONG VARCHAR
expression to evaluate for matches with the regular expression specified inpattern
. Ifstring-expression
is in the__raw__
column of a flex or columnar table, cast the string to aLONG VARCHAR
before searching forpattern
.pattern
The regular expression to match against
string-expression
. The regular expression must conform with Perl regular expression syntax.replacement
- The string to replace matched substrings. If you do not supply a
replacement
, the function deletes matched substrings. The replacement string can contain backreferences for substrings captured by the regular expression. The first captured substring is inserted into the replacement string using\1
, the second\2
, and so on. position
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the
n
-th character you specify.Default: 1
occurrence
- Controls which occurrence of a pattern match in the string to replace. By default, the function replaces all matching substrings. For example, setting this parameter to 3 replaces the third matching instance.
Default: 1
regexp-modifier
One or more single-character flags that modify how the regular expression
pattern
is matched tostring-expression
:-
b
: Treat strings as binary octets, rather than UTF-8 characters. -
c
(default): Force the match to be case sensitive. -
i
: Force the match to be case insensitive. -
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without them
modifier, the start and end of line operators match only the start and end of the string. -
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the.
operator matches any character except a newline. -
x
: Add comments to regular expressions. Thex
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
-
How Oracle handles subexpressions
Unlike Oracle, Vertica can handle an unlimited number of captured subexpressions, while Oracle is limited to nine.
In Vertica, you can use \10
in the replacement pattern to access the substring captured by the tenth set of parentheses in the regular expression. In Oracle, \10
is treated as the substring captured by the first set of parentheses, followed by a zero. To force this Oracle behavior in Vertica, use the \g
back reference and enclose the number of the captured subexpression in curly braces. For example, \g{1}0
is the substring captured by the first set of parentheses followed by a zero.
You can also name captured subexpressions to make your regular expressions less ambiguous. See the PCRE documentation for details.
Examples
Find groups of word characters—letters, numbers and underscore—that end with thy
in the string healthy, wealthy, and wise
, and replace them with nothing.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy');
REGEXP_REPLACE
----------------
, , and wise
(1 row)
Find groups of word characters ending with thy
and replace with the string something
.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something');
REGEXP_REPLACE
--------------------------------
something, something, and wise
(1 row)
Find groups of word characters ending with thy
and replace with the string something
starting at the third character in the string.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something', 3);
REGEXP_REPLACE
----------------------------------
hesomething, something, and wise
(1 row)
Replace the second group of word characters ending with thy
with the string something
.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
REGEXP_REPLACE
------------------------------
healthy, something, and wise
(1 row)
Find groups of word characters ending with thy
capturing the letters before the thy
, and replace with the captured letters plus the letters ish
.
=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','(\w+)thy', '\1ish');
REGEXP_REPLACE
----------------------------
healish, wealish, and wise
(1 row)
Create a table to demonstrate replacing strings in a query.
=> CREATE TABLE customers (name varchar(50), phone varchar(11));
CREATE TABLE
=> CREATE PROJECTION customers1 AS SELECT * FROM customers;
CREATE PROJECTION
=> COPY customers FROM stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Able, Adam|17815551234
>> Baker,Bob|18005551111
>> Chu,Cindy|16175559876
>> Dodd,Dinara|15083452121
>> \.
Query the customers, using REGEXP_REPLACE to format phone numbers.
=> SELECT name, REGEXP_REPLACE(phone, '(\d)(\d{3})(\d{3})(\d{4})',
'\1-(\2) \3-\4') as phone FROM customers;
name | phone
-------------+------------------
Able, Adam | 1-(781) 555-1234
Baker,Bob | 1-(800) 555-1111
Chu,Cindy | 1-(617) 555-9876
Dodd,Dinara | 1-(508) 345-2121
(4 rows)