This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Flex extractor functions

The flex extractor scalar functions process polystructured data.

1: MAPDELIMITEDEXTRACTOR
2: MAPJSONEXTRACTOR
3: MAPREGEXEXTRACTOR

The flex extractor scalar functions process polystructured data. Each function accepts input data that is any of:

Existing database content
A table
Returned from an expression
Entered directly

These functions do not parse data from an external file source. All functions return a single VMap value. The extractor functions can return data with NULL-specified columns.

1 - MAPDELIMITEDEXTRACTOR

Extracts data with a delimiter character and other optional arguments, returning a single VMap value.

Syntax

MAPDELIMITEDEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])

Arguments

record-value: String containing a JSON or delimited format record on which to apply the expression.

Parameters

delimiter: Single delimiter character.
Default: |
header_names: Delimiter-separated list of column header names.
Default: ucoln, where n is the column offset number, starting with 0 for the first column.
trim: Boolean, trim white space from header names and field values.
Default: true
treat_empty_val_as_null: Boolean, set empty fields to NULL rather than an empty string ('').
Default: true

Examples

These examples use a short set of delimited data:

Name|CITY|New city|State|zip
Tom|BOSTON|boston|MA|01
Eric|Burlington|BURLINGTON|MA|02
Jamie|cambridge|CAMBRIDGE|MA|08

To begin, save this data as delim.dat.

Create a flex table, dflex:

=> CREATE FLEX TABLE dflex();
CREATE TABLE

Use COPY to load the delim.dat file. Use the flex tables fdelimitedparser with the header='false' option:

=> COPY dflex FROM '/home/release/kmm/flextables/delim.dat' parser fdelimitedparser(header='false');
 Rows Loaded
-------------
         4
(1 row)

Create a columnar table, dtab, with an identity id column, a delim column, and a vmap column to hold a VMap:

=> CREATE TABLE dtab (id IDENTITY(1,1), delim varchar(128), vmap long varbinary(512));
CREATE TABLE

Use COPY to load the delim.dat file into the dtab table. MAPDELIMITEDEXTRACTOR uses the header_names parameter to specify a header row for the sample data, along with delimiter '!' :

=> COPY dtab(delim, vmap AS MAPDELIMITEDEXTRACTOR (delim
   USING PARAMETERS header_names='Name|CITY|New City|State|Zip')) FROM '/home/dbadmin/data/delim.dat'
DELIMITER '!';

 Rows Loaded
-------------
           4
(1 row)

Use MAPTOSTRING for the flex table dflex to view the __raw__ column contents. Notice the default header names in use (ucol0 – ucol4), since you specified header='false' when you loaded the flex table:

=> SELECT MAPTOSTRING(__raw__) FROM dflex limit 10;
                              maptostring
-------------------------------------------------------------------------------------
 {
   "ucol0" : "Jamie",
   "ucol1" : "cambridge",
   "ucol2" : "CAMBRIDGE",
   "ucol3" : "MA",
   "ucol4" : "08"
}

 {
   "ucol0" : "Name",
   "ucol1" : "CITY",
   "ucol2" : "New city",
   "ucol3" : "State",
   "ucol4" : "zip"
}

 {
   "ucol0" : "Tom",
   "ucol1" : "BOSTON",
   "ucol2" : "boston",
   "ucol3" : "MA",
   "ucol4" : "01"
}

 {
   "ucol0" : "Eric",
   "ucol1" : "Burlington",
   "ucol2" : "BURLINGTON",
   "ucol3" : "MA",
   "ucol4" : "02"
}

(4 rows)

Use MAPTOSTRING again, this time with the dtab table's vmap column. Compare the results of this output to those for the flex table. Note that MAPTOSTRING returns the header_name parameter values you specified when you loaded the data:

=> SELECT MAPTOSTRING(vmap) FROM dtab;
                                                      maptostring
------------------------------------------------------------------------------------------------------------------------
 {
   "CITY" : "CITY",
   "Name" : "Name",
   "New City" : "New city",
   "State" : "State",
   "Zip" : "zip"
}

 {
   "CITY" : "BOSTON",
   "Name" : "Tom",
   "New City" : "boston",
   "State" : "MA",
   "Zip" : "02121"
}

 {
   "CITY" : "Burlington",
   "Name" : "Eric",
   "New City" : "BURLINGTON",
   "State" : "MA",
   "Zip" : "02482"
}

 {
   "CITY" : "cambridge",
   "Name" : "Jamie",
   "New City" : "CAMBRIDGE",
   "State" : "MA",
   "Zip" : "02811"
}

(4 rows)

Query the delim column to view the contents differently:

=> SELECT delim FROM dtab;
                delim
-------------------------------------
 Name|CITY|New city|State|zip
 Tom|BOSTON|boston|MA|02121
 Eric|Burlington|BURLINGTON|MA|02482
 Jamie|cambridge|CAMBRIDGE|MA|02811
(4 rows)

2 - MAPJSONEXTRACTOR

Extracts content of repeated JSON data objects,, including nested maps, or data with an outer list of JSON elements.

Produces a VMap of key/value pairs from an input JSON string. You typically use this function with COPY to populate a VMap column from another string column containing the JSON.

Empty input does not generate warnings or errors.

Syntax

MAPJSONEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])

Arguments

record-value: String containing a JSON or delimited format record on which to apply the expression.

Parameters

flatten_maps (Boolean): If true, flatten sub-maps within the JSON data, separating map levels with periods (.).
Default: true
flatten_arrays (Boolean): If true, convert lists to sub-maps with integer keys. Lists are not flattened by default.
Default value: false
reject_on_duplicate (Boolean): If true, reject duplicates. If false, ignore duplicate records. In either case, loading is unaffected.
Default: false
reject_on_empty_key (Boolean): If true, reject any row that contains a key without a value.
Default: false
omit_empty_keys (Boolean): If true, omit any key from the data without a value.
Default: false
start_point (String): Name of a key in the JSON load data at which to begin parsing. The parser ignores all data before the start_point value. The parser processes data after the first instance, and up to the second, ignoring any remaining data.
Default: none

Examples

This example uses the following sample JSON data in a file named bakery.json:

{ "id": "5001", "type": "None" }
{ "id": "5002", "type": "Glazed" }
{ "id": "5005", "type": "Sugar" }
{ "id": "5007", "type": "Powdered Sugar" }
{ "id": "5004", "type": "Maple" }

In addition to loading this data as a string, you can load it as a VMap using this function:

=> CREATE TABLE bakery(id IDENTITY(1,1), json VARCHAR(128), vmap LONG VARBINARY(10000));
CREATE TABLE

=> COPY bakery (json, vmap AS MapJSONExtractor(json)) 
   FROM '/home/dbadmin/data/bakery.json';
 Rows Loaded
-------------
           5
(1 row)

You can now use MAPTOSTRING to show the values from the VMap:

=> SELECT MAPTOSTRING(vmap) FROM bakery limit 5;
                     maptostring
-----------------------------------------------------
 {
   "id" : "5001",
   "type" : "None"
}

 {
   "id" : "5002",
   "type" : "Glazed"
}

 {
   "id" : "5004",
   "type" : "Maple"
}

 {
   "id" : "5005",
   "type" : "Sugar"
}

 {
   "id" : "5007",
   "type" : "Powdered Sugar"
}

(5 rows)

If you load the data into a flex table, you must qualify the filler column to disambiguate it from possible fields in the VMap. Use the following syntax to refer to a filler column when loading into a flex table:

=> CREATE FLEX TABLE bakery2(id IDENTITY(1,1), vmap LONG VARBINARY(10000));
CREATE TABLE

=> COPY bakery2 (json FILLER VARCHAR(128), 
                 vmap AS MapJSONExtractor("*FILLER*".json))
   FROM '/home/dbadmin/data/bakery.json';
 Rows Loaded
-------------
           5
(1 row)

If you call MAPTOSTRING on the __raw__ column in this flex table, the order of elements might be different from the previous example, but the output is otherwise the same.

3 - MAPREGEXEXTRACTOR

Extracts data with a regular expression and returns results as a VMap.

Syntax

MAPREGEXEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])

Arguments

record-value: String containing a JSON or delimited format record on which to apply the regular expression.

Parameters

pattern: Regular expression used to extract the desired data.
Default: Empty string ('')
use_jit: Boolean, use just-in-time compiling when parsing the regular expression.
Default: false
record_terminator: Character used to separate input records.
Default: \n
logline_column: Destination column containing the full string that the regular expression matched.
Default: Empty string ('')

Examples

These examples use the following regular expression, which searches for information that includes the timestamp, date, thread_name, and thread_id strings.

Caution

For display purposes, this sample regular expression adds new line characters to split long lines of text. To use this expression in a query, first copy and edit the example to remove any new line characters.

This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).

'^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'

The following examples may include newline characters for display purposes.

Create a flex table, flogs:

=> CREATE FLEX TABLE flogs();
CREATE TABLE

Use COPY to load a sample log file (vertica.log), using the flex table fregexparser. Note that this example includes added line characters for displaying long text lines.

=> COPY flogs FROM '/home/dbadmin/tempdat/vertica.log' PARSER FREGEXPARSER(pattern='
^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+) (?<thread_name>[A-Za-z ]+):
(?<thread_id>(?:0x)?[0-9a-f])-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )?(?<text>.*)');
Rows Loaded
-------------
81399
(1 row)

Use to return the results from calling MAPREGEXEXTRACTOR with a regular expression. The output returns the results of the function in string format.

=> SELECT MAPTOSTRING(MapregexExtractor(E'2014-04-02 04:02:51.011
TM Moveout:0x2aab9000f860-a0000000002067 [Txn] <INFO>
Begin Txn: a0000000002067 \'Moveout: Tuple Mover\'' using PARAMETERS
pattern='^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'


)) FROM flogs where __identity__=13;
maptostring
--------------------------------------------------------------------------------------------------
{
"component" : "Txn",
"level" : "INFO",
"text" : "Begin Txn: a0000000002067 'Moveout: Tuple Mover'",
"thread_id" : "0x2aab9000f860",
"thread_name" : "TM Moveout",
"time" : "2014-04-02 04:02:51.011",
"transaction_id" : "a0000000002067"
}
(1 row)

Flex extractor functions

1 - MAPDELIMITEDEXTRACTOR

Syntax

Arguments

Parameters

Examples

See also

2 - MAPJSONEXTRACTOR

Syntax

Arguments

Parameters

Examples

See also

3 - MAPREGEXEXTRACTOR

Syntax

Arguments

Parameters

Examples

Caution

See also