Returns the substring that matches a regular expression within a string. If no matches are found, REGEXP_SUBSTR returns NULL. This is different from an empty string, which the function can return if the regular expression matches a zero-length string.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
ImportantIf you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.
REGEXP_SUBSTR ( string-expression, pattern [, position [, occurrence [, regexp-modifier [, captured-subexp ]]... ]] )
LONG VARCHARexpression to evaluate for matches with the regular expression specified in
string-expressionis in the
__raw__column of a flex or columnar table, cast the string to a
LONG VARCHARbefore searching for
The regular expression to match against
string-expression. The regular expression must conform with Perl regular expression syntax.
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the
n-th character you specify.
- Controls which occurrence of a pattern match in the string to return. By default, the function returns the first matching substring. For example, setting this parameter to 3 returns the third matching instance.
One or more single-character flags that modify how the regular expression
patternis matched to
b: Treat strings as binary octets, rather than UTF-8 characters.
c(default): Force the match to be case sensitive.
i: Force the match to be case insensitive.
m: Treat the string to match as multiple lines. Using this modifier, the start of line (
^) and end of line (
$)regular expression operators match line breaks (
\n) within the string. Without the
mmodifier, the start and end of line operators match only the start and end of the string.
n: Match the regular expression operator (
.) to a newline (
\n). By default, the
.operator matches any character except a newline.
x: Add comments to regular expressions. The
xmodifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (
#) and end with a newline (
\n). All spaces in the regular expression to be matched in strings must be escaped with a backslash (
- The group to return. By default, the function returns all matching groups. For example, setting this value to 3 returns the substring captured by the third set of parentheses in the regular expression.
NoteThe subexpressions are numbered left to right, based on the appearance of opening parenthesis, so nested regular expressions . For example, in the regular expression
\s*(\w+\s+(\w+)), subexpression 1 is the one that captures everything but any leading whitespaces.
Select the first substring of letters that end with
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy'); REGEXP_SUBSTR --------------- healthy (1 row)
Select the first substring of letters that ends with
thy starting at the second character in the string.
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',2); REGEXP_SUBSTR --------------- ealthy (1 row)
Select the second substring of letters that ends with
=> SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',1,2); REGEXP_SUBSTR --------------- wealthy (1 row)
Return the contents of the third captured subexpression, which captures the third word in the string.
=> SELECT REGEXP_SUBSTR('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3); REGEXP_SUBSTR --------------- three (1 row)