FREGEXPARSER

Parses a regular expression, matching columns to the contents of the named regular expression groups.

Parses a regular expression, matching columns to the contents of the named regular expression groups.

This parser is for use in Flex tables only. All flex parsers store the data as a single VMap in the LONG VARBINAR_raw__ column. If a data row is too large to fit in the column, it is rejected. Vertica supports null values for loading data with NULL-specified columns.

Syntax

FREGEXPARSER ( pattern=[parameter-name='value'[,...]] )

Parameters

pattern
Specifies the regular expression of data to match.

Default: Empty string ("")

use_jit
Boolean, specifies whether to use just-in-time compiling when parsing the regular expression.

Default: false

record_terminator
Specifies the character used to separate input records.

Default: \n

logline_column
A string that captures the destination column containing the full string that the regular expression matched.

Default: Empty string ("")

Example

These examples use the following regular expression, which searches for information that includes the timestamp, date, thread_name, and thread_id strings.

This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).

'^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'
  1. Create a flex table (vlog) to contain the results of a Vertica log file. For this example, we made a copy of a log file in the directory /home/dbadmin/data/vertica.log:

    => CREATE FLEX TABLE vlog1();
    CREATE TABLE
    
  2. Use the fregexparser with the sample regular expression to load data from the log file. Be sure to remove any line characters before using this expression shown here:

    =>  COPY vlog1 FROM '/home/dbadmin/tempdat/KMvertica.log'
    PARSER FREGEXPARSER(pattern=
    '^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
     (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
    -?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
    \<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
    ?(?<text>.*)'
    
    ); Rows Loaded ------------- 31049 (1 row)
  3. After successfully loading data, use the MAPTOSTRING() function with the table's __raw__ column. The four rows (limt 4) that the query returns are regular expression results of the KMvertica.log file, parsed with fregexparser. The output shows thread_id values with a preceding 0x or without:

    => SELECT maptostring(__raw__) FROM vlog1 LIMIT 4;
                            maptostring
    -------------------------------------------------------------------------------------
    {
      "text" : " [Init] <INFO> Log /home/dbadmin/VMart/v_vmart_node0001_catalog/vertica.log 
    opened; #2",
      "thread_id" : "0x7f2157e287c0",
      "thread_name" : "Main",
      "time" : "2017-03-21 23:30:01.704"
    }
    
    {
      "text" : " [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -D 
    /home/dbadmin/VMart/v_vmart_node0001_catalog -C VMart -n v_vmart_node0001 -h 
    10.20.100.247 -p 5433 -P 4803 -Y ipv4",
      "thread_id" : "0x7f2157e287c0",
      "thread_name" : "Main",
      "time" : "2017-03-21 23:30:01.704"
    }
    
    {
      "text" : " [Init] <INFO> Starting up Vertica Analytic Database v8.1.1-20170321",
      "thread_id" : "7f2157e287c0",
      "thread_name" : "Main",
      "time" : "2017-03-21 23:30:01.704"
    }
    
    {
      "text" : " [Init] <INFO> Compiler Version: 4.8.2 20140120 (Red Hat 4.8.2-15)",
      "thread_id" : "7f2157e287c0",
      "thread_name" : "Main",
      "time" : "2017-03-21 23:30:01.704"
    }
    (4 rows)
    

See also