R example: log tokenizer
The LogTokenizer transform function reads a varchar from a table, a log message.
The LogTokenizer
transform function reads a varchar from a table, a log message. It then tokenizes each of the log messages, returning each of the tokens.
You can find more UDx examples in the Vertica Github repository, https://github.com/vertica/UDx-Examples.
Load the function and library
Create the library and the function.
=> CREATE OR REPLACE LIBRARY rLib AS 'log_tokenizer.R' LANGUAGE 'R';
CREATE LIBRARY
=> CREATE OR REPLACE TRANSFORM FUNCTION LogTokenizer AS LANGUAGE 'R' NAME 'LogTokenizerFactory' LIBRARY rLib FENCED;
CREATE FUNCTION
Querying data with the function
The following query shows how you can run a query with the UDTF.
=> SELECT machine,
LogTokenizer(error_log USING PARAMETERS spliton = ' ') OVER(PARTITION BY machine)
FROM error_logs;
machine | Token
---------+---------
node001 | ERROR
node001 | 345
node001 | -
node001 | Broken
node001 | pipe
node001 | WARN
node001 | -
node001 | Nearly
node001 | filled
node001 | disk
node002 | ERROR
node002 | 111
node002 | -
node002 | Flooded
node002 | roads
node003 | ERROR
node003 | 222
node003 | -
node003 | Plain
node003 | old
node003 | broken
(21 rows)