Advanced log tokenizer

Returns tokens that can include minor separators.

Returns tokens that can include minor separators. You can use this tokenizer in situations when your tokens are separated by whitespace or various punctuation. The advanced log tokenizer offers more granularity than the basic log tokenizer in defining separators through the addition of minor separators. This approach is frequently appropriate for analyzing log files.

Parameters

Parameter Name Parameter Value
stopwordscaseinsensitive ''
minorseparators E'/:=@.-$#%\\_'
majorseparators E' []<>(){}|!;,''"*&?+\r\n\t'
minLength '2'
maxLength '128'
used 'True'

Examples

The following example shows how you can create a text index, from the table foo, using the Advanced Log Tokenizer without a stemmer.

=> CREATE TABLE foo (id INT PRIMARY KEY NOT NULL,text VARCHAR(250));
=> COPY foo FROM STDIN;
End with a backslash and a period on a line by itself.
>> 1|2014-05-10 00:00:05.700433 %ASA-6-302013: Built outbound TCP connection 9986454 for outside:101.123.123.111/443 (101.123.123.111/443)
>> \.
=> CREATE PROJECTION foo_projection AS SELECT * FROM foo ORDER BY id
                                    SEGMENTED BY HASH(id) ALL NODES KSAFE;
=> CREATE TEXT INDEX indexfoo_AdvancedLogTokenizer ON foo (id, text)
                  TOKENIZER v_txtindex.AdvancedLogTokenizer(LONG VARCHAR) STEMMER NONE;
=> SELECT * FROM indexfoo_AdvancedLogTokenizer;
            token            | doc_id
-----------------------------+--------
 %ASA-6-302013:              |      1
 00                          |      1
 00:00:05.700433             |      1
 05                          |      1
 10                          |      1
 101                         |      1
 101.123.123.111/443         |      1
 111                         |      1
 123                         |      1
 2014                        |      1
 2014-05-10                  |      1
 302013                      |      1
 443                         |      1
 700433                      |      1
 9986454                     |      1
 ASA                         |      1
 Built                       |      1
 TCP                         |      1
 connection                  |      1
 for                         |      1
 outbound                    |      1
 outside                     |      1
 outside:101.123.123.111/443 |      1
(23 rows)