基本日志分词器
返回不包含指定二级分隔符的令牌。在令牌由空格或多种标点分隔的情况下可以使用此分词器。这一方法通常适用于分析日志文件。
重要
如果创建了不包含表的数据库并且提高了 k-safety,则必须在使用 Vertica 分词器之前先使用 REBALANCE_CLUSTER 重新平衡数据。参数
示例
以下示例显示了如何在没有词干分析器的情况下使用基本日志分词器从表 foo 创建文本索引。
=> CREATE TABLE foo (id INT PRIMARY KEY NOT NULL,text VARCHAR(250));
=> COPY foo FROM STDIN;
End with a backslash and a period on a line by itself.
>> 1|2014-05-10 00:00:05.700433 %ASA-6-302013: Built outbound TCP connection 9986454 for outside:101.123.123.111/443 (101.123.123.111/443)
>> \.
=> CREATE PROJECTION foo_projection AS SELECT * FROM foo ORDER BY id
SEGMENTED BY HASH(id) ALL NODES KSAFE;
=> CREATE TEXT INDEX indexfoo_BasicLogTokenizer ON foo (id, text)
TOKENIZER v_txtindex.BasicLogTokenizer(LONG VARCHAR) STEMMER NONE;
=> SELECT * FROM indexfoo_BasicLogTokenizer;
token | doc_id
-----------------------------+--------
%ASA-6-302013: | 1
00:00:05.700433 | 1
101.123.123.111/443 | 1
2014-05-10 | 1
9986454 | 1
Built | 1
TCP | 1
connection | 1
for | 1
outbound | 1
outside:101.123.123.111/443 | 1
(11 rows)