REGEXP_SUBSTR
Returns the substring that matches a regular expression within a string. If no matches are found, REGEXP_SUBSTR returns NULL. This is different from an empty string, which the function can return if the regular expression matches a zero-length string.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.Syntax
REGEXP_SUBSTR ( string-expression, pattern
[, position [, occurrence [, regexp-modifier [, captured-subexp ]]... ]] )
Parameters
string-expression
The
VARCHAR
orLONG VARCHAR
expression to evaluate for matches with the regular expression specified inpattern
. Ifstring-expression
is in the__raw__
column of a flex or columnar table, cast the string to aLONG VARCHAR
before searching forpattern
.pattern
The regular expression to match against
string-expression
. The regular expression must conform with Perl regular expression syntax.position
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the
n
-th character you specify.Default: 1
occurrence
- Controls which occurrence of a pattern match in the string to return. By default, the function returns the first matching substring. For example, setting this parameter to 3 returns the third matching instance.
Default: 1
regexp-modifier
One or more single-character flags that modify how the regular expression
pattern
is matched tostring-expression
:-
b
: Treat strings as binary octets, rather than UTF-8 characters. -
c
(default): Force the match to be case sensitive. -
i
: Force the match to be case insensitive. -
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without them
modifier, the start and end of line operators match only the start and end of the string. -
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the.
operator matches any character except a newline. -
x
: Add comments to regular expressions. Thex
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
-
captured-subexp
- The group to return. By default, the function returns all matching groups. For example, setting this value to 3 returns the substring captured by the third set of parentheses in the regular expression.
Default: 0
Note
The subexpressions are numbered left to right, based on the appearance of opening parenthesis, so nested regular expressions . For example, in the regular expression\s*(\w+\s+(\w+))
, subexpression 1 is the one that captures everything but any leading whitespaces.
Examples
Select the first substring of letters that end with thy
.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy');
REGEXP_SUBSTR
---------------
healthy
(1 row)
Select the first substring of letters that ends with thy
starting at the second character in the string.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',2);
REGEXP_SUBSTR
---------------
ealthy
(1 row)
Select the second substring of letters that ends with thy
.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',1,2);
REGEXP_SUBSTR
---------------
wealthy
(1 row)
Return the contents of the third captured subexpression, which captures the third word in the string.
=> SELECT REGEXP_SUBSTR('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
REGEXP_SUBSTR
---------------
three
(1 row)