The flex extractor scalar functions process polystructured data. Each function accepts input data that is any of:
Existing database content
A table
Returned from an expression
Entered directly
These functions do not parse data from an external file source. All functions return a single VMap value. The extractor functions can return data with NULL-specified columns.
1 - MAPDELIMITEDEXTRACTOR
Extracts data with a delimiter character and other optional arguments, returning a single VMap value.
Extracts data with a delimiter character and other optional arguments, returning a single VMap value.
Syntax
MAPDELIMITEDEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])
Arguments
record-value
String containing a JSON or delimited format record on which to apply the expression.
Parameters
delimiter
Single delimiter character.
Default:|
header_names
Delimiter-separated list of column header names.
Default:ucoln, where n is the column offset number, starting with 0 for the first column.
trim
Boolean, trim white space from header names and field values.
Default:true
treat_empty_val_as_null
Boolean, set empty fields to NULL rather than an empty string ('').
Create a columnar table, dtab, with an identity id column, a delim column, and a vmap column to hold a VMap:
=> CREATE TABLE dtab (id IDENTITY(1,1), delim varchar(128), vmap long varbinary(512));
CREATE TABLE
Use COPY to load the delim.dat file into the dtab table. MAPDELIMITEDEXTRACTOR uses the header_names parameter to specify a header row for the sample data, along with delimiter '!' :
=> COPY dtab(delim, vmap AS MAPDELIMITEDEXTRACTOR (delim
USING PARAMETERS header_names='Name|CITY|New City|State|Zip')) FROM '/home/dbadmin/data/delim.dat'
DELIMITER '!';
Rows Loaded
-------------
4
(1 row)
Use MAPTOSTRING for the flex table dflex to view the __raw__ column contents. Notice the default header names in use (ucol0 – ucol4), since you specified header='false' when you loaded the flex table:
Use MAPTOSTRING again, this time with the dtab table's vmap column. Compare the results of this output to those for the flex table. Note that MAPTOSTRING returns the header_name parameter values you specified when you loaded the data:
Extracts content of repeated JSON data objects,, including nested maps, or data with an outer list of JSON elements.
Produces a VMap of key/value pairs from an input JSON string. You typically use this function with COPY to populate a VMap column from another string column containing the JSON.
Empty input does not generate warnings or errors.
Syntax
MAPJSONEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])
Arguments
record-value
String containing a JSON or delimited format record on which to apply the expression.
Parameters
flatten_maps (Boolean)
If true, flatten sub-maps within the JSON data, separating map levels with periods (.).
Default:true
flatten_arrays (Boolean)
If true, convert lists to sub-maps with integer keys. Lists are not flattened by default.
Default value:false
reject_on_duplicate (Boolean)
If true, reject duplicates. If false, ignore duplicate records. In either case, loading is unaffected.
Default:false
reject_on_empty_key (Boolean)
If true, reject any row that contains a key without a value.
Default:false
omit_empty_keys (Boolean)
If true, omit any key from the data without a value.
Default:false
start_point (String)
Name of a key in the JSON load data at which to begin parsing. The parser ignores all data before the start_point value. The parser processes data after the first instance, and up to the second, ignoring any remaining data.
Default: none
Examples
This example uses the following sample JSON data in a file named bakery.json:
If you load the data into a flex table, you must qualify the filler column to disambiguate it from possible fields in the VMap. Use the following syntax to refer to a filler column when loading into a flex table:
If you call MAPTOSTRING on the __raw__ column in this flex table, the order of elements might be different from the previous example, but the output is otherwise the same.
Extracts data with a regular expression and returns results as a VMap.
Extracts data with a regular expression and returns results as a VMap.
Syntax
MAPREGEXEXTRACTOR (record-value [ USING PARAMETERS param=value[,...] ])
Arguments
record-value
String containing a JSON or delimited format record on which to apply the regular expression.
Parameters
pattern
Regular expression used to extract the desired data.
Default: Empty string ('')
use_jit
Boolean, use just-in-time compiling when parsing the regular expression.
Default:false
record_terminator
Character used to separate input records.
Default:\n
logline_column
Destination column containing the full string that the regular expression matched.
Default: Empty string ('')
Examples
These examples use the following regular expression, which searches for information that includes the timestamp, date, thread_name, and thread_id strings.
Caution
For display purposes, this sample regular expression adds new line characters to split long lines of text. To use this expression in a query, first copy and edit the example to remove any new line characters.
This example expression loads any thread_id hex value, regardless of whether it has a 0x prefix, (<thread_id>(?:0x)?[0-9a-f]+).
The following examples may include newline characters for display purposes.
Create a flex table, flogs:
=> CREATE FLEX TABLE flogs();
CREATE TABLE
Use COPY to load a sample log file (vertica.log), using the flex table fregexparser. Note that this example includes added line characters for displaying long text lines.