This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Flex extractor functions

The flex extractor scalar functions process polystructured data.

The flex extractor scalar functions process polystructured data. Each function accepts input data that is any of:

  • Existing database content

  • A table

  • Returned from an expression

  • Entered directly

These functions do not parse data from an external file source. All functions return a single VMap value. The extractor functions can return data with NULL-specified columns.

1 - MAPDELIMITEDEXTRACTOR

Extracts data with a delimiter character and other optional arguments, returning a single VMap value.

Extracts data with a delimiter character and other optional arguments, returning a single VMap value.

Syntax

MAPDELIMITEDEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])

Arguments

record-value
String containing a JSON or delimited format record on which to apply the expression.

Parameters

delimiter
Single delimiter character.

Default: |

header_names
Delimiter-separated list of column header names.

Default: ucoln, where n is the column offset number, starting with 0 for the first column.

trim
Boolean, trim white space from header names and field values.

Default: true

treat_empty_val_as_null
Boolean, set empty fields to NULL rather than an empty string ('').

Default: true

Examples

These examples use a short set of delimited data:

Name|CITY|New city|State|zip
Tom|BOSTON|boston|MA|01
Eric|Burlington|BURLINGTON|MA|02
Jamie|cambridge|CAMBRIDGE|MA|08

To begin, save this data as delim.dat.

  1. Create a flex table, dflex:

    => CREATE FLEX TABLE dflex();
    CREATE TABLE
    
  2. Use COPY to load the delim.dat file. Use the flex tables fdelimitedparser with the header='false' option:

    => COPY dflex FROM '/home/release/kmm/flextables/delim.dat' parser fdelimitedparser(header='false');
     Rows Loaded
    -------------
             4
    (1 row)
    
  3. Create a columnar table, dtab, with an identity id column, a delim column, and a vmap column to hold a VMap:

    => CREATE TABLE dtab (id IDENTITY(1,1), delim varchar(128), vmap long varbinary(512));
    CREATE TABLE
    
  4. Use COPY to load the delim.dat file into the dtab table. MAPDELIMITEDEXTRACTOR uses the header_names parameter to specify a header row for the sample data, along with delimiter '!' :

    => COPY dtab(delim, vmap AS MAPDELIMITEDEXTRACTOR (delim
       USING PARAMETERS header_names='Name|CITY|New City|State|Zip')) FROM '/home/dbadmin/data/delim.dat'
    DELIMITER '!';
    
     Rows Loaded
    -------------
               4
    (1 row)
    
  5. Use MAPTOSTRING for the flex table dflex to view the __raw__ column contents. Notice the default header names in use (ucol0ucol4), since you specified header='false' when you loaded the flex table:

    => SELECT MAPTOSTRING(__raw__) FROM dflex limit 10;
                                  maptostring
    -------------------------------------------------------------------------------------
     {
       "ucol0" : "Jamie",
       "ucol1" : "cambridge",
       "ucol2" : "CAMBRIDGE",
       "ucol3" : "MA",
       "ucol4" : "08"
    }
    
     {
       "ucol0" : "Name",
       "ucol1" : "CITY",
       "ucol2" : "New city",
       "ucol3" : "State",
       "ucol4" : "zip"
    }
    
     {
       "ucol0" : "Tom",
       "ucol1" : "BOSTON",
       "ucol2" : "boston",
       "ucol3" : "MA",
       "ucol4" : "01"
    }
    
     {
       "ucol0" : "Eric",
       "ucol1" : "Burlington",
       "ucol2" : "BURLINGTON",
       "ucol3" : "MA",
       "ucol4" : "02"
    }
    
    (4 rows)
    
  6. Use MAPTOSTRING again, this time with the dtab table's vmap column. Compare the results of this output to those for the flex table. Note that MAPTOSTRING returns the header_name parameter values you specified when you loaded the data:

    => SELECT MAPTOSTRING(vmap) FROM dtab;
                                                          maptostring
    ------------------------------------------------------------------------------------------------------------------------
     {
       "CITY" : "CITY",
       "Name" : "Name",
       "New City" : "New city",
       "State" : "State",
       "Zip" : "zip"
    }
    
     {
       "CITY" : "BOSTON",
       "Name" : "Tom",
       "New City" : "boston",
       "State" : "MA",
       "Zip" : "02121"
    }
    
     {
       "CITY" : "Burlington",
       "Name" : "Eric",
       "New City" : "BURLINGTON",
       "State" : "MA",
       "Zip" : "02482"
    }
    
     {
       "CITY" : "cambridge",
       "Name" : "Jamie",
       "New City" : "CAMBRIDGE",
       "State" : "MA",
       "Zip" : "02811"
    }
    
    (4 rows)
    
  7. Query the delim column to view the contents differently:

    => SELECT delim FROM dtab;
                    delim
    -------------------------------------
     Name|CITY|New city|State|zip
     Tom|BOSTON|boston|MA|02121
     Eric|Burlington|BURLINGTON|MA|02482
     Jamie|cambridge|CAMBRIDGE|MA|02811
    (4 rows)
    

See also

2 - MAPJSONEXTRACTOR

Extracts content of repeated JSON data objects,, including nested maps, or data with an outer list of JSON elements.

Extracts content of repeated JSON data objects,, including nested maps, or data with an outer list of JSON elements. You can set one or more optional parameters to control the extraction process.

Syntax

MAPJSONEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])

Arguments

record-value
String containing a JSON or delimited format record on which to apply the expression.

Parameters

flatten_maps
Boolean, flatten sub-maps within the JSON data, separating map levels with a period (.).

Default: true

flatten_arrays
Boolean, convert lists to sub-maps with integer keys. Lists are not flattened by default.

Default value: false

reject_on_duplicate
Boolean, ignore duplicate records (false), or reject duplicates (true). In either case, loading is unaffected.

Default: false

reject_on_empty_key
Boolean, reject any row that contains a key without a value.

Default: false

omit_empty_keys
Boolean, omit any key from the load data without a value.

Default: false

start_point
Name of a key in the JSON load data at which to begin parsing. The parser ignores all data before the start_point value. The parser processes data after the first instance, and up to the second, ignoring any remaining data.

Default: none

Examples

These examples use the following sample JSON data:

{ "id": "5001", "type": "None" }
{ "id": "5002", "type": "Glazed" }
{ "id": "5005", "type": "Sugar" }
{ "id": "5007", "type": "Powdered Sugar" }
{ "id": "5004", "type": "Maple" }

Save this example data as bake_single.json, and load that file.

  1. Create a flex table, flexjson:

    => CREATE FLEX TABLE flexjson();
    CREATE TABLE
    
  2. Use COPY to load the bake_single.json file with the fjsonparser parser:

    => COPY flexjson FROM '/home/dbadmin/data/bake_single.json' parser fjsonparser();
     Rows Loaded
    -------------
               5
    (1 row)
    
  3. Create a columnar table, coljson, with an IDENTITY column (id), a json column, and a column to hold a VMap, called vmap:

    => CREATE TABLE coljson(id IDENTITY(1,1), json varchar(128), vmap long varbinary(10000));
    CREATE TABLE
    
  4. Use COPY to load the bake_single.json file into the coljson table, using MAPJSONEXTRACTOR:

    => COPY coljson (json, vmap AS MapJSONExtractor(json)) FROM '/home/dbadmin/data/bake_single.json';
     Rows Loaded
    -------------
               5
    (1 row)
    
  5. Use the MAPTOSTRING function for the flex table flexjson to output the __raw__ column contents as strings:

    => SELECT MAPTOSTRING(__raw__) FROM flexjson limit 5;
                         maptostring
    -----------------------------------------------------
     {
       "id" : "5001",
       "type" : "None"
    }
    
     {
       "id" : "5002",
       "type" : "Glazed"
    }
    
     {
       "id" : "5005",
       "type" : "Sugar"
    }
    
     {
       "id" : "5007",
       "type" : "Powdered Sugar"
    }
    
     {
       "id" : "5004",
       "type" : "Maple"
    }
    
    (5 rows)
    
  6. Use MAPTOSTRING again, this time with the coljson table's vmap column and compare the results. The element order differs:

    => SELECT MAPTOSTRING(vmap) FROM coljson limit 5;
                         maptostring
    -----------------------------------------------------
     {
       "id" : "5001",
       "type" : "None"
    }
    
     {
       "id" : "5002",
       "type" : "Glazed"
    }
    
     {
       "id" : "5004",
       "type" : "Maple"
    }
    
     {
       "id" : "5005",
       "type" : "Sugar"
    }
    
     {
       "id" : "5007",
       "type" : "Powdered Sugar"
    }
    
    (5 rows)
    

See also

3 - MAPREGEXEXTRACTOR

Extracts data with a regular expression and returns results as a VMap.

Extracts data with a regular expression and returns results as a VMap.

Syntax

MAPREGEXEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])

Arguments

record-value
String containing a JSON or delimited format record on which to apply the regular expression.

Parameters

pattern
Regular expression used to extract the desired data.

Default: Empty string ('')

use_jit
Boolean, use just-in-time compiling when parsing the regular expression.

Default: false

record_terminator
Character used to separate input records.

Default: \n

logline_column
Destination column containing the full string that the regular expression matched.

Default: Empty string ('')

Examples

These examples use the following regular expression, which searches for information that includes the timestamp, date, thread_name, and thread_id strings.

This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).

'^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'

The following examples may include newline characters for display purposes.

  1. Create a flex table, flogs:

    => CREATE FLEX TABLE flogs();
    CREATE TABLE
    
  2. Use COPY to load a sample log file (vertica.log), using the flex table fregexparser. Note that this example includes added line characters for displaying long text lines.

    => COPY flogs FROM '/home/dbadmin/tempdat/vertica.log' PARSER FREGEXPARSER(pattern='
    ^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+) (?<thread_name>[A-Za-z ]+):
    (?<thread_id>(?:0x)?[0-9a-f])-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
    \<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )?(?<text>.*)');
    Rows Loaded
    -------------
    81399
    (1 row)
    
  3. Use to return the results from calling MAPREGEXEXTRACTOR with a regular expression. The output returns the results of the function in string format.

    => SELECT MAPTOSTRING(MapregexExtractor(E'2014-04-02 04:02:51.011
    TM Moveout:0x2aab9000f860-a0000000002067 [Txn] <INFO>
    Begin Txn: a0000000002067 \'Moveout: Tuple Mover\'' using PARAMETERS
    pattern=
    '^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
     (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
    -?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
    \<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
    ?(?<text>.*)'
    
    )) FROM flogs where __identity__=13; maptostring -------------------------------------------------------------------------------------------------- { "component" : "Txn", "level" : "INFO", "text" : "Begin Txn: a0000000002067 'Moveout: Tuple Mover'", "thread_id" : "0x2aab9000f860", "thread_name" : "TM Moveout", "time" : "2014-04-02 04:02:51.011", "transaction_id" : "a0000000002067" } (1 row)

See also