This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Flex parsers reference

Vertica supports several parsers to load different types of data into flex tables.

1: FCEFPARSER
2: FCSVPARSER
3: FDELIMITEDPAIRPARSER
4: FDELIMITEDPARSER
5: FREGEXPARSER

Vertica supports several parsers to load different types of data into flex tables. The parsers in this section are specific to flex tables. The parsers described in Data formats can also load data into flex tables.

All flex parsers store the data as a single Vmap in the LONG VARBINARY __raw__ column. If a data row is too large to fit in the column, it is rejected. Vertica supports null values for loading data with NULL-specified columns.

For information about how you can use each type of flex parser, see Using flex table parsers.

1 - FCEFPARSER

Parses ArcSight Common Event Format (CEF) log files.

Parses ArcSight Common Event Format (CEF) log files. The fcefparser loads values directly into any table column with a column name that matches a source data key. The parser stores the data loaded into a flex table in a single VMap.

Syntax

FAVROPARSER ( [parameter-name='value'[,...]] )

Parameters

`delimiter`	Single-character delimiter. Default: `' '`
`record_terminator`	Single-character record terminator. Default value: `newline`
`trim`	Boolean, specifies whether to trim white space from header names and key values. Default: `true`
`reject_on_unescaped_delimiter`	Boolean, specifies whether to reject rows containing unescaped delimiters. The CEF standard does not permit them. Default: `false`

Examples

The following example illustrates creating a sample flex table for CEF data, with two real columns, eventId and priority.

Create a flex table cefdata:

=> create flex table cefdata();
CREATE TABLE

Load some basic CEF data, using the flex parser fcefparser:

=> copy cefdata from stdin parser fcefparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> CEF:0|ArcSight|ArcSight|2.4.1|machine:20|New alert|High|
>> \.

Use the maptostring() function to view the contents of your cefdata flex table:

=> select maptostring(__raw__) from cefdata;
                  maptostring
-------------------------------------------------------------
 {
   "deviceproduct" : "ArcSight",
   "devicevendor" : "ArcSight",
   "deviceversion" : "2.4.1",
   "name" : "New alert",
   "severity" : "High",
   "signatureid" : "machine:20",
   "version" : "0"
}


(1 row)

Select some virtual columns from the cefdata flex table:
```
= select deviceproduct, severity, deviceversion from cefdata;
 deviceproduct | severity | deviceversion
---------------+----------+---------------
 ArcSight      | High     | 2.4.1
(1 row)
```
For more information, see Common event format (CEF) data

See also

2 - FCSVPARSER

Parses CSV format (comma-separated values) data.

Parses CSV format (comma-separated values) data. Use this parser to load CSV data into columnar, flex, and hybrid tables. All data must be encoded in Unicode UTF-8 format. The fcsvparser parser supports the RFC 4180 standard for CSV data, and other options, to accommodate variations in CSV file format definitions. Invalid records are rejected. For more information about data formats, see Handling Non-UTF-8 input.

Syntax

FCSVPARSER ( [parameter='value'[,...]] )

Parameters

`type`	The default parameter values for the parser, one of the following strings: `rfc4180` `traditional` You do not have to use the type parameter when loading data that conforms to the RFC 4180 standard (such as MS Excel files). See Loading CSV data for the `RFC4180` default parameters, and other options you can specify for traditional CSV files. Default: `RFC4180`
`delimiter`	A single-character value used to separate fields in the CSV data. Default: `,` (for `rfc4180` and `traditional`)
`escape`	A single-character value used as an escape character to interpret the next character in the data literally. Default: `rfc4180`: `"` `traditional`: `\`
`enclosed_by`	A single-character value. Use `enclosed_by` to include a value that is identical to the delimiter, but should be interpreted literally. For example, if the data delimiter is a comma (`,`), and you want to use a comma within the data (`"my name is jane, and his is jim"`). Default: `"`
`record_terminator`	A single-character value used to specify the end of a record. Default: `rfc4180`: `\n` `traditional`: `\r\n`
`header`	Boolean, specifies whether to use the first row of data as a header column. When `header=true` (default), and no header exists, fcsvparser uses a default column heading. The default header consists of `ucol``n`, where n is the column offset number, starting with `0` for the first column. You can specify custom column heading names using the `header_names` parameter, described next. If you specify `header=false`, the `fcsvparser` parses the first row of input as data, rather than as column headers. Default: true
`header_names`	A list of column header names, delimited by the character defined by the parser's delimiter parameter. Use this parameter to specify header names in a CSV file without a header row, or to override the column names present in the CSV source. To override one or more existing column names, specify the header names to use. This parameter overrides any header row in the data.
`trim`	Boolean, specifies whether to trim white space from header names and key values. Default: true
`omit_empty_keys`	Boolean, specifies how the parser handles header keys without values. If true, keys with an empty value in the `header` row are not loaded. Default: false
`reject_on_duplicate`	Boolean, specifies whether to ignore duplicate records (false), or to reject duplicates (true). In either case, the load continues. Default: false
`reject_on_empty_key`	Boolean, specifies whether to reject any row containing a key without a value. Default: false
`reject_on_materialized_type_error`	Boolean, specifies whether to reject any materialized column value that the parser cannot coerce into a compatible data type. See Loading CSV data. Default: false

Examples

This example shows how you can use fcsvparser to load a flex table, build a view, and then query that view.

Create a flex table for CSV data:

=> CREATE FLEX TABLE rfc();
CREATE TABLE

Use fcsvparser to load the data from STDIN. Specify that no header exists, and enter some data as shown:

=> COPY rfc FROM stdin PARSER fcsvparser(header='false');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 10,10,20
>> 10,"10",30
>> 10,"20""5",90
>> \.

Run the compute_flextable_keys_and_build_view function, and query the rfc_view. Notice that the default enclosed_by character permits an escape character (") within a field ("20""5"). Thus, the resulting value was parsed correctly. Since no header existed in the input data, the function added ucoln for each column:

=> SELECT compute_flextable_keys_and_build_view('rfc');
                           compute_flextable_keys_and_build_view
--------------------------------------------------------------------------------------------
 Please see public.rfc_keys for updated keys
The view public.rfc_view is ready for querying
(1 row)

=> SELECT * FROM rfc_view;
 ucol0 | ucol1 | ucol2
-------+-------+-------
 10    | 10    | 20
 10    | 10    | 30
 10    | 20"5  | 90
(3 rows)

For more information and examples using other parameters of this parser, see Loading CSV data.

3 - FDELIMITEDPAIRPARSER

Parses delimited data files.

Parses delimited data files. This parser provides a subset of the functionality in the parser fdelimitedparser. Use the fdelimitedpairparser when the data you are loading specifies pairs of column names with data in each row.

Syntax

FDELIMITEDPAIRPARSER ( [parameter-name='value'[,...]] )

Parameters

delimiter

Specifies a single-character delimiter.

Default: ' '

record_terminator

Specifies a single-character record terminator.

Default: newline

trim

Boolean specifies whether to trim white space from header names and key values.

Default: true

Examples

The following example illustrates creating a sample flex table for simple delimited data, with two real columns, eventId and priority.

Create a table:

=> create flex table CEFData(eventId int default(eventId::int), priority int default(priority::int) );
CREATE TABLE

Load a sample delimited OpenText ArcSight log file into the CEFData table, using the fcefparser:

=> copy CEFData from '/home/release/kmm/flextables/sampleArcSight.txt' parser fdelimitedpairparser();
Rows Loaded | 200

After loading the sample data file, use maptostring() to display the virtual columns in the __raw__ column of CEFData:

=> select maptostring(__raw__) from CEFData limit 1;                                                                                                                                                                                                                                        maptostring
-----------------------------------------------------------
   "agentassetid" : "4-WwHuD0BABCCQDVAeX21vg==",
   "agentzone" : "3083",
   "agt" : "265723237",
   "ahost" : "svsvm0176",
   "aid" : "3tGoHuD0BABCCMDVAeX21vg==",
   "art" : "1099267576901",
   "assetcriticality" : "0",
   "at" : "snort_db",
   "atz" : "America/Los_Angeles",
   "av" : "5.3.0.19524.0",
   "cat" : "attempted-recon",
   "categorybehavior" : "/Communicate/Query",
   "categorydevicegroup" : "/IDS/Network",
   "categoryobject" : "/Host",
   "categoryoutcome" : "/Attempt",
   "categorysignificance" : "/Recon",
   "categorytechnique" : "/Scan",
   "categorytupledescription" : "An IDS observed a scan of a host.",
   "cnt" : "1",
   "cs2" : "3",
   "destinationgeocountrycode" : "US",
   "destinationgeolocationinfo" : "Richardson",
   "destinationgeopostalcode" : "75082",
   "destinationgeoregioncode" : "TX",
   "destinationzone" : "3133",
   "device product" : "Snort",
   "device vendor" : "Snort",
   "device version" : "1.8",
   "deviceseverity" : "2",
   "dhost" : "198.198.121.200",
   "dlat" : "329913940429",
   "dlong" : "-966644973754",
   "dst" : "3334896072",
   "dtz" : "America/Los_Angeles",
   "dvchost" : "unknown:eth1",
   "end" : "1364676323451",
   "eventid" : "1219383333",
   "fdevice product" : "Snort",
   "fdevice vendor" : "Snort",
   "fdevice version" : "1.8",
   "fdtz" : "America/Los_Angeles",
   "fdvchost" : "unknown:eth1",
   "lblstring2label" : "sig_rev",
   "locality" : "0",
   "modelconfidence" : "0",
   "mrt" : "1364675789222",
   "name" : "ICMP PING NMAP",
   "oagentassetid" : "4-WwHuD0BABCCQDVAeX21vg==",
   "oagentzone" : "3083",
   "oagt" : "265723237",
   "oahost" : "svsvm0176",
   "oaid" : "3tGoHuD0BABCCMDVAeX21vg==",
   "oat" : "snort_db",
   "oatz" : "America/Los_Angeles",
   "oav" : "5.3.0.19524.0",
   "originator" : "0",
   "priority" : "8",
   "proto" : "ICMP",
   "relevance" : "10",
   "rt" : "1099267573000",
   "severity" : "8",
   "shost" : "198.198.104.10",
   "signature id" : "[1:469]",
   "slat" : "329913940429",
   "slong" : "-966644973754",
   "sourcegeocountrycode" : "US",
   "sourcegeolocationinfo" : "Richardson",
   "sourcegeopostalcode" : "75082",
   "sourcegeoregioncode" : "TX",
   "sourcezone" : "3133",
   "src" : "3334891530",
   "start" : "1364676323451",
   "type" : "0"
}

(1 row)

Select the eventID and priority real columns, along with two virtual columns, atz and destinationgeoregioncode:


=>  select eventID, priority, atz, destinationgeoregioncode from CEFData limit 10;
  eventID   | priority |         atz         | destinationgeoregioncode
------------+----------+---------------------+--------------------------
 1218325417 |        5 | America/Los_Angeles |
 1219383333 |        8 | America/Los_Angeles | TX
 1219533691 |        9 | America/Los_Angeles | TX
 1220034458 |        5 | America/Los_Angeles | TX
 1220034578 |        9 | America/Los_Angeles |
 1220067119 |        5 | America/Los_Angeles | TX
 1220106960 |        5 | America/Los_Angeles | TX
 1220142122 |        5 | America/Los_Angeles | TX
 1220312009 |        5 | America/Los_Angeles | TX
 1220321355 |        5 | America/Los_Angeles | CA
(10 rows)

4 - FDELIMITEDPARSER

Parses data using a delimiter character to separate values.

Parses data using a delimiter character to separate values. The fdelimitedparser loads delimited data, storing it in a single-value VMap. You can use this parser to load data into columnar and flex tables.

Note

By default, fdelimitedparser treats empty fields as NULL, rather than as an empty string (''). This behavior makes casting easier. Casting a NULL to an integer (NULL::int) is valid, while casting an empty string to an integer (''::int) is not. If required, use the treat_empty_val_as_null parameter to change the default behavior of fdelimitedparser.

Syntax

FDLIMITEDPARSER ( [parameter-name='value'[,...]] )

Parameters

`delimiter`	Single character delimiter. Default: `\|`
`record_terminator`	Single-character record terminator. Default: `\n`
`trim`	Boolean, specifies whether to trim white space from header names and key values. Default: `true`
`header`	Boolean, specifies that a header column exists. The parser uses `col###` for the column names if you use this parameter but no header exists. Default: `true`
`omit_empty_keys`	Boolean, specifies how the parser handles header keys without values. If `omit_empty_keys=true`, keys with an empty value in the `header`row are not loaded. Default: `false`
`reject_on_duplicate`	Boolean, specifies whether to ignore duplicate records (`false`), or to reject duplicates (`true`). In either case, the load continues. Default: `false`
`reject_on_empty_key`	Boolean, specifies whether to reject any row containing a key without a value. Default: `false`
`reject_on_materialized_type_error`	Boolean, specifies whether to reject any row value for a materialized column that the parser cannot coerce into a compatible data type. See Using flex table parsers. Default: `false`
`treat_empty_val_as_null`	Boolean, specifies that empty fields become `NULLs`, rather than empty strings (`''`). Default: `true`

Examples

Create a flex table for delimited data:

t=> CREATE FLEX TABLE delim_flex ();
CREATE TABLE

Use the fdelimitedparser to load some delimited data from STDIN, specifying a comma (,) column delimiter:

=> COPY delim_flex FROM STDIN parser fdelimitedparser (delimiter=',');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> deviceproduct, severity, deviceversion
>> ArcSight, High, 2.4.1
>> \.

You can now query virtual columns in the delim_flex flex table:

=> SELECT deviceproduct, severity, deviceversion from delim_flex;
 deviceproduct | severity | deviceversion
---------------+----------+---------------
 ArcSight      | High     | 2.4.1
(1 row)

5 - FREGEXPARSER

Parses a regular expression, matching columns to the contents of the named regular expression groups.

Syntax

FREGEXPARSER ( pattern=[parameter-name='value'[,...]] )

Parameters

`pattern`	Specifies the regular expression of data to match. Default: Empty string (`""`)
`use_jit`	Boolean, specifies whether to use just-in-time compiling when parsing the regular expression. Default: false
`record_terminator`	Specifies the character used to separate input records. Default: `\n`
`logline_column`	A string that captures the destination column containing the full string that the regular expression matched. Default: Empty string (`""`)

Example

These examples use the following regular expression, which searches for information that includes the timestamp, date, thread_name, and thread_id strings.

Caution

For display purposes, this sample regular expression adds new line characters to split long lines of text. To use this expression in a query, first copy and edit the example to remove any new line characters.

This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).

'^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'

Create a flex table (vlog) to contain the results of a Vertica log file. For this example, we made a copy of a log file in the directory /home/dbadmin/data/vertica.log:
```
=> CREATE FLEX TABLE vlog1();
CREATE TABLE
```

Use the fregexparser with the sample regular expression to load data from the log file. Be sure to remove any line characters before using this expression shown here:

=>  COPY vlog1 FROM '/home/dbadmin/tempdat/KMvertica.log'
PARSER FREGEXPARSER(pattern='^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'


);

 Rows Loaded
-------------
       31049
(1 row)

After successfully loading data, use the MAPTOSTRING() function with the table's __raw__ column. The four rows (limt 4) that the query returns are regular expression results of the KMvertica.log file, parsed with fregexparser. The output shows thread_id values with a preceding 0x or without:

=> SELECT maptostring(__raw__) FROM vlog1 LIMIT 4;
                        maptostring
-------------------------------------------------------------------------------------
{
  "text" : " [Init] <INFO> Log /home/dbadmin/VMart/v_vmart_node0001_catalog/vertica.log 
opened; #2",
  "thread_id" : "0x7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}

{
  "text" : " [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -D 
/home/dbadmin/VMart/v_vmart_node0001_catalog -C VMart -n v_vmart_node0001 -h 
10.20.100.247 -p 5433 -P 4803 -Y ipv4",
  "thread_id" : "0x7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}

{
  "text" : " [Init] <INFO> Starting up Vertica Analytic Database v8.1.1-20170321",
  "thread_id" : "7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}

{
  "text" : " [Init] <INFO> Compiler Version: 4.8.2 20140120 (Red Hat 4.8.2-15)",
  "thread_id" : "7f2157e287c0",
  "thread_name" : "Main",
  "time" : "2017-03-21 23:30:01.704"
}
(4 rows)

Flex parsers reference

1 - FCEFPARSER

Syntax

Parameters

Examples

See also

2 - FCSVPARSER

Syntax

Parameters

Examples

See also

3 - FDELIMITEDPAIRPARSER

Syntax

Parameters

Examples

See also

4 - FDELIMITEDPARSER

Note

Syntax

Parameters

Examples

See also

5 - FREGEXPARSER

Syntax

Parameters

Example

Caution

See also