Vertica stemmers

Vertica stemmers use the Porter stemming algorithm to find words derived from the same base/root word.

Vertica stemmers use the Porter stemming algorithm to find words derived from the same base/root word. For example, if you perform a search on a text index for the keyword database, you might also want to get results containing the word databases.

To achieve this type of matching, Vertica stores words in their stemmed form when using any of the v_txtindex stemmers.

The Vertica Analytics Platform provides the following stemmers:

Name Description
v_txtindex.Stemmer(long varchar)

Not sensitive to case; outputs lowercase words. Stems strings from a Vertica table.

Alias of StemmerCaseInsensitive.

v_txtindex.StemmerCaseSensitive(long varchar) Sensitive to case. Stems strings from a Vertica table.
v_txtindex.StemmerCaseInsensitive(long varchar)

Default stemmer used if no stemmer is specified when creating a text index.

Not sensitive to case; outputs lowercase words. Stems strings from a Vertica table.

v_txtindex.caseInsensitiveNoStemming(long varchar) Not sensitive to case; outputs lowercase words. Does not use the Porter Stemming algorithm.

Examples

The following examples show how to use a stemmer when creating a text index.

Create a text index using the StemmerCaseInsensitive stemmer:

=> CREATE TEXT INDEX idx_100 ON top_100 (id, feedback) STEMMER v_txtindex.StemmerCaseInsensitive(long varchar)
                                                              TOKENIZER v_txtindex.StringTokenizer(long varchar);

Create a text index using the StemmerCaseSensitive stemmer:

=> CREATE TEXT INDEX idx_unstruc ON unstruc_data (__identity__, __raw__) STEMMER v_txtindex.StemmerCaseSensitive(long varchar)
                                                                                  TOKENIZER public.FlexTokenizer(long varbinary);

Create a text index without using a stemmer:

=> CREATE TEXT INDEX idx_logs FROM sys_logs ON (id, message) STEMMER NONE TOKENIZER v_txtindex.StringTokenizer(long varchar);