Searching a text index

After you create a text index, write a query to run against the index to search for a specific keyword.

After you create a text index, write a query to run against the index to search for a specific keyword.

In the following example, you use a WHERE clause to search for the keyword <WARNING> in the text index. The WHERE clause should use the stemmer you used to create the text index. When you use the STEMMER keyword, it stems the keyword to match the keywords in your text index. If you did not use the STEMMER keyword, then the default stemmer is v_txtindex.StemmerCaseInsensitive. If you used STEMMER NONE, then you can omit STEMMER keyword from the WHERE clause.

=> SELECT * FROM text_index WHERE token = v_txtindex.StemmerCaseInsensitive('<WARNING>');
  token    | doc_id
-----------+--------
<warning>  |     2
<warning>  |     3
<warning>  |     4
<warning>  |     5
(4 rows)

Next, write a query to display the full contents of the source table that match the keyword you searched for in the text index.

=> SELECT * FROM t_log WHERE id IN (SELECT doc_id FROM text_index WHERE token = v_txtindex.StemmerCaseInsensitive('<WARNING>'));
id |    date    |                                     text
---+------------+-----------------------------------------------------------------------------------------------
4  | 2014-06-04 | 11:00:49.568 unknown:0x7f9207607700 [Catalog] <WARNING> validateDependencies: Object 45035968
5  | 2014-06-04 | 11:00:49.568 unknown:0x7f9207607700 [Catalog] <WARNING> validateDependencies: Object 45030
2  | 2013-06-04 | 11:00:49.568 unknown:0x7f9207607700 [Catalog] <WARNING> validateDependencies: Object 4503
3  | 2013-06-04 | 11:00:49.568 unknown:0x7f9207607700 [Catalog] <WARNING> validateDependencies: Object 45066
(4 rows)

Use the doc_id to find the exact location of the keyword in the source table.The doc_id matches the unique identifier from the source table. This matching allows you to quickly find the instance of the keyword in your table.

Performing a case-sensitive and case-insensitive text search query

Your text index is optimized to match all instances of words depending upon your stemmer. By default, the case insensitive stemmer is applied to all text indices that do not specify a stemmer. Therefore, if the queries you plan to write against your text index are case sensitive, then Vertica recommends you use a case sensitive stemmer to build your text index.

The following examples show queries that match case-sensitive and case-insensitive words that you can use when performing a text search.

This query finds case-insensitive records in a case insensitive text index:

=> SELECT * FROM t_log WHERE id IN (SELECT doc_id FROM text_index WHERE token = v_txtindex.StemmerCaseInsensitive('warning'));

This query finds case-sensitive records in a case sensitive text index:

=> SELECT * FROM t_log_case_sensitive WHERE id IN (SELECT doc_id FROM text_index WHERE token = v_txtindex.StemmerCaseSensitive('Warning'));

Including and excluding keywords in a text search query

Your text index also allows you to perform more detailed queries to find multiple keywords or omit results with other keywords. The following example shows a more detailed query that you can use when performing a text search.

In this example, t_log is the source table, and text_index is the text index. The query finds records that either contain:

  • Both the words '<WARNING>' and 'validate'

  • Only the word '[Log]' and does not contain 'validateDependencies'

SELECT * FROM t_log where (
   id IN (SELECT doc_id FROM text_index WHERE token = v_txtindex.StemmerCaseSensitive('<WARNING>'))
      AND (   id IN (SELECT doc_id FROM text_index WHERE token = v_txtindex.StemmerCaseSensitive('validate')
      OR id IN (SELECT doc_id FROM text_index WHERE token = v_txtindex.StemmerCaseSensitive('[Log]')))
      AND NOT (id IN (SELECT doc_id FROM text_index WHERE token = v_txtindex.StemmerCaseSensitive('validateDependencies'))));

This query returns the following results:

id  |   date     |                               text
----+------------+------------------------------------------------------------------------------------------------
11  | 2014-05-04 | 11:00:49.568 unknown:0x7f9207607702 [Log] <WARNING> validate: Object 4503 via fld num_all_roles
13  | 2014-05-04 | 11:00:49.568 unknown:0x7f9207607706 [Log] <WARNING> validate: Object 45035 refers to root_i3
14  | 2014-05-04 | 11:00:49.568 unknown:0x7f9207607708 [Log] <WARNING> validate: Object 4503 refers to int_2
17  | 2014-05-04 | 11:00:49.568 unknown:0x7f9207607700 [Txn] <WARNING> Begin validate Txn: fff0ed17 catalog editor
(4 rows)