Whitespace log tokenizer

Returns only tokens surrounded by whitespace.

Deprecated

This tokenizer is deprecated and will be removed in a future release.

Returns only tokens surrounded by whitespace. You can use this tokenizer in situations where you want to the tokens in your source document to be separated by whitespace characters only. This approach lets you retain the ability to set stop words and token length limits.

Important

If you create a database with no tables and the k-safety has increased, you must rebalance your data using REBALANCE_CLUSTER before using a Vertica tokenizer.

Parameters

Parameter Name	Parameter Value
stopwordscaseinsensitive	`''`
minorseparators	`''`
majorseparators	`E' \t\n\f\r'`
minLength	`'2'`
maxLength	`'128'`
used	`'True'`

Examples

The following example shows how you can create a text index, from the table foo, using the Whitespace Log Tokenizer without a stemmer.

=> CREATE TABLE foo (id INT PRIMARY KEY NOT NULL,text VARCHAR(250));
=> COPY foo FROM STDIN;
End with a backslash and a period on a line by itself.
>> 1|2014-05-10 00:00:05.700433 %ASA-6-302013: Built outbound TCP connection 998 6454 for outside:101.123.123.111/443 (101.123.123.111/443)
>> \.
=> CREATE PROJECTION foo_projection AS SELECT * FROM foo ORDER BY id
                                     SEGMENTED BY HASH(id) ALL NODES KSAFE;
=> CREATE TEXT INDEX indexfoo_WhitespaceLogTokenizer ON foo (id, text)
                TOKENIZER v_txtindex.WhitespaceLogTokenizer(LONG VARCHAR) STEMMER NONE;
=> SELECT * FROM indexfoo_WhitespaceLogTokenizer;
            token            | doc_id
-----------------------------+--------
 %ASA-6-302013:              |      1
 (101.123.123.111/443)       |      1
 00:00:05.700433             |      1
 2014-05-10                  |      1
 6454                        |      1
 998                         |      1
 Built                       |      1
 TCP                         |      1
 connection                  |      1
 for                         |      1
 outbound                    |      1
 outside:101.123.123.111/443 |      1
(12 rows)