This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
 
Flex extractor functions
The flex extractor scalar functions process polystructured data.
    
      The flex extractor scalar functions process polystructured data. Each function accepts input data that is any of:
These functions do not parse data from an external file source. All functions return a single VMap value. The extractor functions can return data with NULL-specified columns.
 
 
  
  
  
  
  
  
  
    
    
	
    
    
	1 - MAPDELIMITEDEXTRACTOR
    Extracts data with a delimiter character and other optional arguments, returning a single VMap value.
	Extracts data with a delimiter character and other optional arguments, returning a single VMap value.
Syntax
MAPDELIMITEDEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])
Arguments
- record-value
- String containing a JSON or delimited format record on which to apply the expression.
Parameters
- delimiter
- Single delimiter character.
Default: |
 
- header_names
- Delimiter-separated list of column header names.
Default: ucoln, wherenis the column offset number, starting with0for the first column.
 
- trim
- Boolean, trim white space from header names and field values.
Default: true
 
- treat_empty_val_as_null
- Boolean, set empty fields to NULLrather than an empty string ('').Default: true
 
Examples
These examples use a short set of delimited data:
Name|CITY|New city|State|zip
Tom|BOSTON|boston|MA|01
Eric|Burlington|BURLINGTON|MA|02
Jamie|cambridge|CAMBRIDGE|MA|08
To begin, save this data as delim.dat.
- 
Create a flex table, dflex:
 => CREATE FLEX TABLE dflex();
CREATE TABLE
 
- 
Use COPY to load the delim.datfile. Use the flex tablesfdelimitedparserwith theheader='false'option:
 => COPY dflex FROM '/home/release/kmm/flextables/delim.dat' parser fdelimitedparser(header='false');
 Rows Loaded
-------------
         4
(1 row)
 
- 
Create a columnar table, dtab, with an identityidcolumn, adelimcolumn, and avmapcolumn to hold a VMap:
 => CREATE TABLE dtab (id IDENTITY(1,1), delim varchar(128), vmap long varbinary(512));
CREATE TABLE
 
- 
Use COPY to load the delim.datfile into thedtabtable. MAPDELIMITEDEXTRACTOR uses theheader_namesparameter to specify a header row for the sample data, along withdelimiter '!':
 => COPY dtab(delim, vmap AS MAPDELIMITEDEXTRACTOR (delim
   USING PARAMETERS header_names='Name|CITY|New City|State|Zip')) FROM '/home/dbadmin/data/delim.dat'
DELIMITER '!';
 Rows Loaded
-------------
           4
(1 row)
 
- 
Use MAPTOSTRING for the flex table dflexto view the__raw__column contents. Notice the default header names in use (ucol0–ucol4), since you specifiedheader='false'when you loaded the flex table:
 => SELECT MAPTOSTRING(__raw__) FROM dflex limit 10;
                              maptostring
-------------------------------------------------------------------------------------
 {
   "ucol0" : "Jamie",
   "ucol1" : "cambridge",
   "ucol2" : "CAMBRIDGE",
   "ucol3" : "MA",
   "ucol4" : "08"
}
 {
   "ucol0" : "Name",
   "ucol1" : "CITY",
   "ucol2" : "New city",
   "ucol3" : "State",
   "ucol4" : "zip"
}
 {
   "ucol0" : "Tom",
   "ucol1" : "BOSTON",
   "ucol2" : "boston",
   "ucol3" : "MA",
   "ucol4" : "01"
}
 {
   "ucol0" : "Eric",
   "ucol1" : "Burlington",
   "ucol2" : "BURLINGTON",
   "ucol3" : "MA",
   "ucol4" : "02"
}
(4 rows)
 
- 
Use MAPTOSTRING again, this time with the dtabtable'svmapcolumn. Compare the results of this output to those for the flex table. Note that MAPTOSTRING returns theheader_nameparameter values you specified when you loaded the data:
 => SELECT MAPTOSTRING(vmap) FROM dtab;
                                                      maptostring
------------------------------------------------------------------------------------------------------------------------
 {
   "CITY" : "CITY",
   "Name" : "Name",
   "New City" : "New city",
   "State" : "State",
   "Zip" : "zip"
}
 {
   "CITY" : "BOSTON",
   "Name" : "Tom",
   "New City" : "boston",
   "State" : "MA",
   "Zip" : "02121"
}
 {
   "CITY" : "Burlington",
   "Name" : "Eric",
   "New City" : "BURLINGTON",
   "State" : "MA",
   "Zip" : "02482"
}
 {
   "CITY" : "cambridge",
   "Name" : "Jamie",
   "New City" : "CAMBRIDGE",
   "State" : "MA",
   "Zip" : "02811"
}
(4 rows)
 
- 
Query the delimcolumn to view the contents differently:
 => SELECT delim FROM dtab;
                delim
-------------------------------------
 Name|CITY|New city|State|zip
 Tom|BOSTON|boston|MA|02121
 Eric|Burlington|BURLINGTON|MA|02482
 Jamie|cambridge|CAMBRIDGE|MA|02811
(4 rows)
 
See also
 
    
	
  
    
    
	
    
    
	2 - MAPJSONEXTRACTOR
    Extracts content of repeated JSON data objects,, including nested maps, or data with an outer list of JSON elements.
	Produces a VMap of key/value pairs from an input JSON string. You typically use this function with COPY to populate a VMap column from another string column containing the JSON.
Empty input does not generate warnings or errors.
Syntax
MAPJSONEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])
Arguments
- record-value
- String containing a JSON or delimited format record on which to apply the expression.
Parameters
- flatten_maps(Boolean)
- If true, flatten sub-maps within the JSON data, separating map levels with periods (.).Default: true
 
- flatten_arrays(Boolean)
- If true, convert lists to sub-maps with integer keys. Lists are not flattened by default.
Default value: false
 
- reject_on_duplicate(Boolean)
- If true, reject duplicates. If false, ignore duplicate records. In either case, loading is unaffected.
Default: false
 
- reject_on_empty_key(Boolean)
- If true, reject any row that contains a key without a value.
Default: false
 
- omit_empty_keys(Boolean)
- If true, omit any key from the data without a value.
Default: false
 
- start_point(String)
- Name of a key in the JSON load data at which to begin parsing. The parser ignores all data before the start_pointvalue. The parser processes data after the first instance, and up to the second, ignoring any remaining data.Default: none 
Examples
This example uses the following sample JSON data in a file named bakery.json:
{ "id": "5001", "type": "None" }
{ "id": "5002", "type": "Glazed" }
{ "id": "5005", "type": "Sugar" }
{ "id": "5007", "type": "Powdered Sugar" }
{ "id": "5004", "type": "Maple" }
In addition to loading this data as a string, you can load it as a VMap using this function:
=> CREATE TABLE bakery(id IDENTITY(1,1), json VARCHAR(128), vmap LONG VARBINARY(10000));
CREATE TABLE
=> COPY bakery (json, vmap AS MapJSONExtractor(json)) 
   FROM '/home/dbadmin/data/bakery.json';
 Rows Loaded
-------------
           5
(1 row)
You can now use MAPTOSTRING to show the values from the VMap:
=> SELECT MAPTOSTRING(vmap) FROM bakery limit 5;
                     maptostring
-----------------------------------------------------
 {
   "id" : "5001",
   "type" : "None"
}
 {
   "id" : "5002",
   "type" : "Glazed"
}
 {
   "id" : "5004",
   "type" : "Maple"
}
 {
   "id" : "5005",
   "type" : "Sugar"
}
 {
   "id" : "5007",
   "type" : "Powdered Sugar"
}
(5 rows)
If you load the data into a flex table, you must qualify the filler column to disambiguate it from possible fields in the VMap.  Use the following syntax to refer to a filler column when loading into a flex table:
=> CREATE FLEX TABLE bakery2(id IDENTITY(1,1), vmap LONG VARBINARY(10000));
CREATE TABLE
=> COPY bakery2 (json FILLER VARCHAR(128), 
                 vmap AS MapJSONExtractor("*FILLER*".json))
   FROM '/home/dbadmin/data/bakery.json';
 Rows Loaded
-------------
           5
(1 row)
If you call MAPTOSTRING on the __raw__ column in this flex table, the order of elements might be different from the previous example, but the output is otherwise the same.
See also
 
    
	
  
    
    
	
    
    
	3 - MAPREGEXEXTRACTOR
    Extracts data with a regular expression and returns results as a VMap.
	Extracts data with a regular expression and returns results as a VMap.
Syntax
MAPREGEXEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])
Arguments
- record-value
- String containing a JSON or delimited format record on which to apply the regular expression.
Parameters
- pattern
- Regular expression used to extract the desired data.
Default: Empty string ('')
 
- use_jit
- Boolean, use just-in-time compiling when parsing the regular expression.
Default: false
 
- record_terminator
- Character used to separate input records.
Default: \n
 
- logline_column
- Destination column containing the full string that the regular expression matched.
Default: Empty string ('')
 
Examples
These examples use the following regular expression, which searches for information that includes the timestamp, date, thread_name, and thread_id strings.
Caution
For display purposes, this sample regular expression adds new line characters to split long lines of text. To use this expression in a query, first copy and edit the example to remove any new line characters.
This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).
'^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'
The following examples may include newline characters for display purposes.
- 
Create a flex table, flogs:
 => CREATE FLEX TABLE flogs();
CREATE TABLE
 
- 
Use COPY to load a sample log file (vertica.log), using the flex tablefregexparser. Note that this example includes added line characters for displaying long text lines.
 => COPY flogs FROM '/home/dbadmin/tempdat/vertica.log' PARSER FREGEXPARSER(pattern='
^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+) (?<thread_name>[A-Za-z ]+):
(?<thread_id>(?:0x)?[0-9a-f])-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )?(?<text>.*)');
Rows Loaded
-------------
81399
(1 row)
 
- 
Use to return the results from calling MAPREGEXEXTRACTOR with a regular expression. The output returns the results of the function in string format. => SELECT MAPTOSTRING(MapregexExtractor(E'2014-04-02 04:02:51.011
TM Moveout:0x2aab9000f860-a0000000002067 [Txn] <INFO>
Begin Txn: a0000000002067 \'Moveout: Tuple Mover\'' using PARAMETERS
pattern='^(?<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)
 (?<thread_name>[A-Za-z ]+):(?<thread_id>(?:0x)?[0-9a-f]+)
-?(?<transaction_id>[0-9a-f])?(?:[(?<component>\w+)]
\<(?<level>\w+)\> )?(?:<(?<elevel>\w+)> @[?(?<enode>\w+)]?: )
?(?<text>.*)'
 
 )) FROM flogs where __identity__=13;
maptostring
--------------------------------------------------------------------------------------------------
{
"component" : "Txn",
"level" : "INFO",
"text" : "Begin Txn: a0000000002067 'Moveout: Tuple Mover'",
"thread_id" : "0x2aab9000f860",
"thread_name" : "TM Moveout",
"time" : "2014-04-02 04:02:51.011",
"transaction_id" : "a0000000002067"
}
(1 row)
 
See also