SOUNDEX
Takes a VARCHAR argument and returns a four-character code that enables comparison of that argument with other SOUNDEX-encoded strings that are spelled differently in English, but are phonetically similar. SOUNDEX implements an algorithm that was developed by Robert C. Russell and Margaret King Odell, and is described in The Art of Computer Programming, Vol. 3.
Behavior type
ImmutableSyntax
SOUNDEX ( string-expression )
Arguments
string-expression
- The VARCHAR expression to encode.
Soundex encoding algorithm
Vertica uses the following Soundex encoding algorithm, which complies with most SQL implementations:
-
Save the first letter. Map all occurrences of a, e, i, o, u, y, h, w to zero (0).
-
Replace all consonants (include the first letter) with digits:
-
b, f, p, v → 1
-
c, g, j, k, q, s, x, z → 2
-
d, t → 3
-
l → 4
-
m, n → 5
-
r → 6
-
-
Replace all adjacent same digits with one digit, and then remove all zero (0) digits
-
If the saved letter's digit is the same as the resulting first digit, remove the digit (keep the letter).
-
Append 3 zeros if result contains less than 3 digits. Remove all except first letter and 3 digits after it.
Note
Encoding ignores all non-alphabetic characters—for example, the apostrophe in O'Connor.Examples
Find last names in the employee_dimension
table that are phonetically similar to Lee
:
SELECT employee_last_name, employee_first_name, employee_state
FROM public.employee_dimension
WHERE SOUNDEX(employee_last_name) = SOUNDEX('Lee')
ORDER BY employee_state, employee_last_name, employee_first_name;
Lea | James | AZ
Li | Sam | AZ
Lee | Darlene | CA
Lee | Juanita | CA
Li | Amy | CA
Li | Barbara | CA
Li | Ben | CA
...