REGEXP_INSTR
Returns the starting or ending position in a string where a regular expression matches. REGEXP_INSTR returns 0 if no match for the regular expression is found in the string.
This function operates on UTF-8 strings using the default locale, even if the locale is set otherwise.
Important
If you port a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.Syntax
REGEXP_INSTR ( string-expression, pattern
[, position [, occurrence [, return-position [, regexp-modifier ]... [, captured-subexp ]]]] )
Parameters
string-expression
The
VARCHAR
orLONG VARCHAR
expression to evaluate for matches with the regular expression specified inpattern
. Ifstring-expression
is in the__raw__
column of a flex or columnar table, cast the string to aLONG VARCHAR
before searching forpattern
.pattern
The regular expression to match against
string-expression
. The regular expression must conform with Perl regular expression syntax.position
- The number of characters from the start of the string where the function should start searching for matches. By default, the function begins searching for a match at the first (leftmost) character. Setting this parameter to a value greater than 1 begins searching for a match at the *
n
*th character you specify.Default: 1
occurrence
- Controls which occurrence of a pattern match in the string to return. By default, the function returns the position of the first matching substring. Use this parameter to find the position of subsequent matching substrings. For example, setting this parameter to 3 returns the position of the third substring that matches the pattern.
Default: 1
return-position
- Sets the position within the string to return. Using the default position (0), the function returns the string position of the first character of the substring that matches the pattern. If you set
return-position
to 1, the function returns the position of the first character after the end of the matching substring.Default: 0
regexp-modifier
One or more single-character flags that modify how the regular expression
pattern
is matched tostring-expression
:-
b
: Treat strings as binary octets, rather than UTF-8 characters. -
c
(default): Force the match to be case sensitive. -
i
: Force the match to be case insensitive. -
m
: Treat the string to match as multiple lines. Using this modifier, the start of line (^
) and end of line ($)
regular expression operators match line breaks (\n
) within the string. Without them
modifier, the start and end of line operators match only the start and end of the string. -
n
: Match the regular expression operator (.
) to a newline (\n
). By default, the.
operator matches any character except a newline. -
x
: Add comments to regular expressions. Thex
modifier causes the function to ignore all un-escaped space characters and comments in the regular expression. Comments start with hash (#
) and end with a newline (\n
). All spaces in the regular expression to be matched in strings must be escaped with a backslash (\
).
-
captured-subexp
- The captured subexpression whose position to return. By default, the function returns the position of the first character in
string
that matches the regular expression. If you set this value from 1 – 9, the function returns the subexpression captured by the corresponding set of parentheses in the regular expression. For example, setting this value to 3 returns the substring captured by the third set of parentheses in the regular expression.Default: 0
Note
The subexpressions are numbered left to right, based on the appearance of opening parenthesis, so nested regular expressions . For example, in the regular expression\s*(\w+\s+(\w+))
, subexpression 1 is the one that captures everything but any leading whitespaces.
Examples
Find the first occurrence of a sequence of letters starting with the letter e
and ending with the letter y
in the specified string (easy come, easy go
).
=> SELECT REGEXP_INSTR('easy come, easy go','e\w*y');
REGEXP_INSTR
--------------
1
(1 row)
Starting at the second character (2
), find the first sequence of letters starting with the letter e
and ending with the letter y
:
=> SELECT REGEXP_INSTR('easy come, easy go','e\w*y',2);
REGEXP_INSTR
--------------
12
(1 row)
Starting at the first character (1
), find the second sequence of letters starting with the letter e
and ending with the letter y
:
=> SELECT REGEXP_INSTR('easy come, easy go','e\w*y',1,2);
REGEXP_INSTR
--------------
12
(1 row)
Find the position of the first character after the first whitespace:
=> SELECT REGEXP_INSTR('easy come, easy go','\s',1,1,1);
REGEXP_INSTR
--------------
6
(1 row)
Find the position of the start of the third word in a string by capturing each word as a subexpression, and returning the third subexpression's start position.
=> SELECT REGEXP_INSTR('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
REGEXP_INSTR
--------------
9
(1 row)