FJSONPARSER

Parses and loads a JSON file.

Parses and loads a JSON file. This file can contain either repeated JSON data objects (including nested maps), or an outer list of JSON elements.

When loading into a flex or hybrid table, the parser stores the JSON data in a single-value VMap. When loading into a hybrid or columnar table, the parser loads data directly into any table column with a column name that matches a key in the JSON source data.

You can load complex types in the JSON source (arrays, structs, or combinations) with strong typing or as flexible complex types. A flexible complex type is loaded into a VMap column, as in flex tables. To load complex types as VMap columns, specify a column type of LONG VARBINARY. To preserve the indexing in complex types, set flatten_maps to false.

This parser can notify you if it finds keys in the data that are not part of the table definition. See Unmatched Keys.

FJSONPARSER supports cooperative parse only if record_terminator is specified. It does not support apportioned load.

Syntax

FJSONPARSER ( [parameter=value[,...]] )

Parameters

flatten_maps
Boolean, whether to flatten sub-maps within the JSON data, separating map levels with a period (.). This value affects all data in the load, including nested maps.

This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.

Default: true

flatten_arrays
Boolean, whether to convert lists to sub-maps with integer keys. When lists are flattened, key names are concatenated as for maps. Lists are not flattened by default. This value affects all data in the load, including nested lists.

This parameter applies only to flex tables or VMap columns and is ignored when loading strongly-typed complex types.

Default: false

reject_on_duplicate
Boolean, whether to ignore duplicate records (false), or to reject duplicates (true). In either case, the load continues.

Default: false

reject_on_empty_key
Boolean, whether to reject any row containing a field key without a value.

Default: false

omit_empty_keys
Boolean, whether to omit any field key from the data that does not have a value. Other fields in the same record are loaded.

Default: false

record_terminator
When set, any invalid JSON records are skipped and parsing continues with the next record. Records must be terminated uniformly. For example, if your input file has JSON records terminated by newline characters, set this parameter to E'\n'). If any invalid JSON records exist, parsing continues after the next record_terminator.

Even if the data does not contain invalid records, specifying an explicit record terminator can improve load performance by allowing cooperative parse and apportioned load to operate more efficiently.

When you omit this parameter, parsing ends at the first invalid JSON record.

reject_on_materialized_type_error

Boolean, whether to reject a data row that contains a materialized column value that cannot be coerced into a compatible data type. If the value is false and the type cannot be coerced, the parser sets the value in that column to NULL.

If the column is an array and the data to be loaded is too large, then false sets the column value to NULL and true rejects the row.

If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types.

Default: false

start_point
String, the name of a key in the JSON load data at which to begin parsing. The parser ignores all data before the start_point value. The value is loaded for each object in the file. The parser processes data after the first instance, and up to the second, ignoring any remaining data.
start_point_occurrence
Integer, the nth occurrence of the value you specify with start_point. Use in conjunction with start_point when the data has multiple start values and you know the occurrence at which to begin parsing.

Default: 1

suppress_nonalphanumeric_key_chars
Boolean, whether to suppress non-alphanumeric characters in JSON key values. The parser replaces these characters with an underscore (_) when this parameter is true.

Default: false

key_separator
Character for the parser to use when concatenating key names.

Default: period (.)

suppress_warnings
String, which warnings to suppress:
  • unmatched_key (see Unmatched Keys)

  • true or t (suppress all warnings)

  • false or f (do not suppress warnings)

Default: false

Data types

If the total size of an array exceeds the size defined by the target table, the parser sets the value to null.

Unmatched keys

Data being loaded can contain keys that are not part of the table definition. If you are loading into a flex table (or a flexible complex type column), no data is lost. For a table with strongly-defined columns, however, new keys cannot be loaded because the table does not have a place to put them.

This parser emits warnings if it finds new keys and if both of the following are true:

  • The target table is not a flex table.

  • The new key is not nested within a flexible complex type column.

New keys are logged in the UDX_EVENTS system table. If a new key is a complex type with nested keys, only the top-level key is logged. When you see a warning about unmatched keys, you can query this table and then use ALTER TABLE to modify your table definition for future loads.

Querying an external table loads data and thus can trigger these warnings. To prevent them, set the suppress_warnings parameter to 'unmatched_keys' or 'true':

=> CREATE EXTERNAL TABLE restaurants(
            name VARCHAR(50),
            menu ARRAY[ROW(item VARCHAR(50), price NUMERIC(8,2)),100])
   AS COPY FROM '/data/rest.json'
   PARSER FJSONPARSER(suppress_warnings='unmatched_key');

Examples

The following example loads JSON data from STDIN using the default parameters:

=> CREATE TABLE people(age INT, name VARCHAR);
CREATE TABLE

=> COPY people FROM STDIN PARSER FJSONPARSER();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"age": 5, "name": "Tim"}
>>  {"age": 3}
>>  {"name": "Fred"}
>>  {"name": "Bob", "age": 10}
>> \.
=> SELECT * FROM people;
 age | name
-----+------
     | Fred
  10 | Bob
   5 | Tim
   3 |
(4 rows)

The following example uses the reject_on_duplicate parameter to reject duplicate values:

=> CREATE FLEX TABLE json_dupes();
CREATE TABLE
=> COPY json_dupes FROM stdin PARSER fjsonparser(reject_on_duplicate=true)
exceptions '/home/dbadmin/load_errors/json_e.out'
rejected data '/home/dbadmin/load_errors/json_r.out';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"a":"1","a":"2","b":"3"}
>> \.
=>  \!cat /home/dbadmin/load_errors/json_e.out
COPY: Input record 1 has been rejected (Rejected by user-defined parser).
Please see /home/dbadmin/load_errors/json_r.out, record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.

The following example loads array data:

$ cat addrs.json
 {"number": 301, "street": "Grant", "attributes": [1, 2, 3, 4]}

=> CREATE EXTERNAL TABLE customers(number INT, street VARCHAR, attributes ARRAY[INT])
    AS COPY FROM 'addrs.json' PARSER fjsonparser();

=> SELECT number, street, attributes FROM customers;
 num | street| attributes
-----+-----------+---------------
301  | Grant | [1,2,3,4]
(1 row)

The following example loads a flexible complex type, rejecting rows that have empty keys within the nested records. Notice that while the data has two restaurants, one has a key name that is an empty string. This one is rejected:

$ cat rest1.json
{
    "name" : "Bob's pizzeria",
    "cuisine" : "Italian",
    "location_city" : ["Cambridge", "Pittsburgh"],
    "menu" : [{"item" : "cheese pizza", "" : "$8.25"},
              {"item" : "spinach pizza", "price" : "$10.50"}]
}
{
    "name" : "Bakersfield Tacos",
    "cuisine" : "Mexican",
    "location_city" : ["Pittsburgh"],
    "menu" : [{"item" : "veggie taco", "price" : "$9.95"},
              {"item" : "steak taco", "price" : "$10.95"}]
}

=> CREATE TABLE rest (name VARCHAR, cuisine VARCHAR, location_city LONG VARBINARY, menu LONG VARBINARY);

=> COPY rest FROM '/data/rest1.json'
    PARSER fjsonparser(flatten_maps=false, reject_on_empty_key=true);
Rows Loaded
------------
       1
(1 row)

=> SELECT maptostring(location_city), maptostring(menu) FROM rest;
        maptostring        |                          maptostring
---------------------------+-------------------------------------------------------
 {
    "0": "Pittsburgh"
} | {
    "0": {
        "item": "veggie taco",
        "price": "$9.95"
    },
    "1": {
        "item": "steak taco",
        "price": "$10.95"
    }
}
(1 row)

To instead load partial data, use omit_empty_keys to bypass the missing keys while loading everything else:



=> COPY rest FROM '/data/rest1.json'
    PARSER fjsonparser(flatten_maps=false, omit_empty_keys=true);
 Rows Loaded
-------------
           2
(1 row)

=> SELECT maptostring(location_city), maptostring(menu) from rest;
                   maptostring                   |               maptostring
-------------------------------------------------+---------------------------------
 {
    "0": "Pittsburgh"
}                       | {
    "0": {
        "item": "veggie taco",
        "price": "$9.95"
    },
    "1": {
        "item": "steak taco",
        "price": "$10.95"
    }
}
 {
    "0": "Cambridge",
    "1": "Pittsburgh"
} | {
    "0": {
        "item": "cheese pizza"
    },
    "1": {
        "item": "spinach pizza",
        "price": "$10.50"
    }
}
(2 rows)

To instead load this data with strong typing, define the complex types in the table:

=> CREATE EXTERNAL TABLE restaurants
  (name VARCHAR, cuisine VARCHAR,
   location_city ARRAY[VARCHAR(80)],
   menu ARRAY[ ROW(item VARCHAR(80), price FLOAT) ]
  )
 AS COPY FROM '/data/rest.json' PARSER fjsonparser();

=> SELECT * FROM restaurants;
       name        | cuisine |       location_city        |                    \
                menu
-------------------+---------+----------------------------+--------------------\
--------------------------------------------------------
 Bob's pizzeria    | Italian | ["Cambridge","Pittsburgh"] | [{"item":"cheese pi\
zza","price":0.0},{"item":"spinach pizza","price":0.0}]
 Bakersfield Tacos | Mexican | ["Pittsburgh"]             | [{"item":"veggie ta\
co","price":0.0},{"item":"steak taco","price":0.0}]
(2 rows)

In the following example, the data contains two new fields. One is a top-level field (a new column), and the other is a new field on an existing struct. The new fields are recorded in the UDX_EVENTS system table:

=> COPY rest FROM '/data/rest2.json' PARSER FJSONPARSER();
WARNING 10596:  Warning in UDx call in user-defined object [FJSONParser], code: 0, message:
Data source contained keys which did not match table schema
HINT:  SELECT key, sum(num_instances) FROM v_monitor.udx_events WHERE event_type = 'UNMATCHED_KEY' GROUP BY key
 Rows Loaded
-------------
           2
(1 row)

=> SELECT key, SUM(num_instances) FROM v_monitor.UDX_EVENTS
   WHERE event_type = 'UNMATCHED_KEY' GROUP BY key;
          key           | SUM
------------------------+-----
 chain                  |   1
 menu.elements.calories |   7
(2 rows)

For other examples, see JSON data.