JSON data
Use FJSONPARSER to load data in JSON format.
The schema for JSON data is the set of property names in the property/value pairs. When you load JSON data into a columnar table or materialized columns in a Flex table, the property names in the data must match the column names in the table. You do not need to load all of the columns in the data.
The JSON parser can load data into columns of any scalar type, strongly-typed complex type, or flexible complex type. A flexible complex type means you do not fully specify the schema for that column. You define these columns in the table as LONG VARBINARY, and you can use Flex functions to extract values from them.
In the COPY statement, use the PARSER parameter to specify the JSON parser as in the following example:
=> CREATE EXTERNAL TABLE customers(id INT, address VARCHAR, transactions ARRAY[INT,10])
AS COPY FROM 'cust.json' PARSER FJSONPARSER();
This parser has several optional parameters, some of which are specific to use with Flex tables and flexible complex types.
Before loading JSON data, consider using a tool such as JSONLint to verify that the data is valid.
If you load JSON data into a Flex table, Vertica loads all data into the __raw__
(VMap) column, including complex types found in the data. You can use Flex functions to extract values.
Strongly-typed complex types
JSON data can contain arrays, structs, and combinations of the two. You can load this data either as flexible (VMap) columns or with strong typing. Strong typing allows you to query values directly, without having to use functions to unpack a VMap column.
Use the ARRAY and ROW types in the table definition as usual:
=> CREATE EXTERNAL TABLE rest
(name VARCHAR, cuisine VARCHAR,
location_city ARRAY[VARCHAR(80),50],
menu ARRAY[ ROW(item VARCHAR(80), price FLOAT), 100 ]
)
AS COPY FROM :restdata PARSER FJSONPARSER();
=> SELECT name, location_city, menu FROM rest;
name | location_city | menu
-------------------+----------------------------+------------------------------------------------------------------------------
Bob's pizzeria | ["Cambridge","Pittsburgh"] | [{"item":"cheese pizza","price":8.25},{"item":"spinach pizza","price":10.5}]
Bakersfield Tacos | ["Pittsburgh"] | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)
When loading JSON data into a table with strong typing for complex types, Vertica ignores the parser's flatten_maps
and flatten_arrays
parameters.
Strong and flexible complex types
An advantage of strong typing is the easier (and more efficient) access in queries. A disadvantage is that additional values found in the data but not included in the column definition are ignored. If the menu struct in this data includes more attributes, such as calories, they are not loaded because the definition of the column only specified item and price. The following example uses flexible complex types to reveal the extra attributes:
=> CREATE EXTERNAL TABLE rest
(name VARCHAR, cuisine VARCHAR,
location_city LONG VARBINARY, menu LONG VARBINARY)
AS COPY FROM :restdata
PARSER FJSONPARSER(flatten_maps=false);
=> SELECT name, MAPTOSTRING(location_city) as location_city, MAPTOSTRING(menu) AS menu FROM rest;
name | location_city | menu
-------------------+--------------------------+---------------------------------------------------
Bob's pizzeria | {
"0": "Cambridge",
"1": "Pittsburgh"
} | {
"0": {
"calories": "1200",
"item": "cheese pizza",
"price": "8.25"
},
"1": {
"calories": "900",
"item": "spinach pizza",
"price": "10.50"
}
}
Bakersfield Tacos | {
"0": "Pittsburgh"
} | {
"0": {
"item": "veggie taco",
"price": "9.95",
"vegetarian": "true"
},
"1": {
"item": "steak taco",
"price": "10.95"
}
}
(2 rows)
Loading from a specific start point
You need not load an entire JSON file. You can use the start_point
parameter to load data beginning at a specific key, rather than at the beginning of a file. Data is parsed from after the start_point
key until the end of the file, or to the end of the first start_point
's value. The parser ignores any subsequent instance of the start_point
, even if that key appears multiple times in the input file. If the input data contains only one copy of the start_point
key, and that value is a list of JSON elements, the parser loads each element in the list as a row.
If a start_point
value occurs more than once in your JSON data, you can use the start_point_occurrence
integer parameter to specify the occurrence at which to start parsing.
This example uses the following JSON data, saved to a file named alphanums.json
:
{ "A": { "B": { "C": [ { "d": 1, "e": 2, "f": 3 }, { "g": 4, "h": 5, "i": 6 },
{ "j": 7, "k": 8, "l": 9 } ] } } }
Loading this data into a flex table produces the following results:
=> CREATE FLEX TABLE start_json;
CREATE TABLE
=> COPY start_json FROM '/home/dbadmin/data/alphanums.json' PARSER FJSONPARSER();
Rows Loaded
-------------
1
(1 row)
=> SELECT maptostring(__raw__) FROM start_json;
maptostring
-------------------------------------------------------------------------------
{
"A.B.C" : {
"0.d" : "1",
"0.e" : "2",
"0.f" : "3",
"1.g" : "4",
"1.h" : "5",
"1.i" : "6",
"2.j" : "7",
"2.k" : "8",
"2.l" : "9"
}
}
(1 row)
The following load specifies a start point:
=> TRUNCATE TABLE start_json;
TRUNCATE TABLE
=> COPY start_json FROM '/home/dbadmin/data/alphanums.json' PARSER FJSONPARSER(start_point='B');
Rows Loaded
-------------
1
(1 row)
=> SELECT maptostring(__raw__) FROM start_json;
maptostring
--------------------------------------------------------------------------------
{
"C" : {
"0.d" : "1",
"0.e" : "2",
"0.f" : "3",
"1.g" : "4",
"1.h" : "5",
"1.i" : "6",
"2.j" : "7",
"2.k" : "8",
"2.l" : "9"
}
}
(1 row)
Dealing with invalid JSON records
If your JSON data contains syntax errors, your load can fail due to invalid records. You can use the RECORD_TERMINATOR option in the COPY statement to skip these invalid records if your JSON records are consistently delimited by a character like a line break. Setting a record terminator allows the parser to skip over invalid records and continue parsing the rest of the data.
If your records are not consistently marked by a character, you can use the ERROR TOLERANCE option. ERROR TOLERANCE skips entire source files with invalid JSON records, while RECORD_TERMINATOR skips individual malformed JSON records. You can use the two options together.
The following example uses invalid records:
=> => CREATE FLEX TABLE fruits();
CREATE TABLE
=> COPY fruits FROM STDIN PARSER FJSONPARSER(RECORD_TERMINATOR=E'\n');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself
>> {"name": "orange", "type": "fruit", "color": "orange", "rating": 5 }
>> {"name": "apple", "type": "fruit", "color": "green" }
>> {"name": "blueberry", "type": "fruit", "color": "blue", "rating": 10 }
>> "type": "fruit", "rating": 7 }
>> {"name": "banana", "type" : "fruit", "color": "yellow", "rating": 3 }
>> \.
View the flex table using MAPTOSTRING to confirm that the invalid record was skipped while the rest of the records were successfully loaded:
=> SELECT MAPTOSTRING(__raw__) FROM fruits;
maptostring
--------------------------------------------------------------------------------------------
{
"color" : "orange",
"name" : "orange",
"rating" : "5",
"type" : "fruit"
}
{
"color" : "green",
"name" : "apple",
"type" : "fruit"
}
{
"color" : "blue",
"name" : "blueberry",
"rating" : "10",
"type" : "fruit"
}
{
"color" : "yellow",
"name" : "banana",
"rating" : "3",
"type" : "fruit"
}
(4 rows)
Rejecting data on materialized column type errors
By default, if FJSONPARSER cannot coerce a data value to a type that matches the column definition, it sets the value to NULL. You can choose to instead reject these values using the reject_on_materialized_type_error
parameter. If this parameter is true, COPY rejects such rows and reports an error.
If the column is a strongly-typed complex type, as opposed to a flexible complex type, then a type mismatch anywhere in the complex type causes the entire column to be treated as a mismatch. The parser does not partially load complex types; if any ROW field or ARRAY element cannot be coerced, it loads NULL for the column.
The following example attempts to load invalid data. Note that the invalid row is missing from the query results:
=> CREATE TABLE test(one VARCHAR, two INT);
CREATE TABLE
=> COPY test FROM stdin
PARSER FJSONPARSER(reject_on_materialized_type_error=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"one": 1, "two": 2}
>> {"one": "one", "two": "two"}
>> {"one": "one", "two": 2}
>> \.
=> SELECT one, two FROM test;
one | two
-----+-----
1 | 2
one | 2
(2 rows)
Rejecting or omitting empty keys in flex tables
Valid JSON files can include empty key and value pairs. By default, for a Flex table, FJSONPARSER loads them, as in the following example:
=> CREATE FLEX TABLE fruits();
CREATE TABLE
=> COPY fruits FROM STDIN PARSER FJSONPARSER();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"name": "orange", "rating": 5}
>> {"name": "apple", "rating" : 10}
>> {"": "banana", "rating" : 2}
>> \.
=> SELECT MAPTOSTRING(__raw__) FROM fruits;
maptostring
---------------------------------------
{
"name": "orange",
"rating": "5"
}
{
"name": "apple",
"rating": "10"
}
{
"": "banana",
"rating": "2"
}
(3 rows)
To omit fields with empty keys, use the omit_empty_keys
parameter:
=> COPY fruits FROM STDIN PARSER FJSONPARSER(omit_empty_keys=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"name": "apple", "rating": 5}
>> {"": "missing", "rating" : 1}
>> {"name": "", "rating" : 3}
>> \.
=> SELECT MAPTOSTRING(__raw__) FROM fruits;
maptostring
--------------------------------------
{
"name": "apple",
"rating": "5"
}
{
"rating": "1"
}
{
"name": "",
"rating": "3"
}
(3 rows)
Note that the second value, with the missing name, still loads with the other (non-empty) field. To instead reject the row entirely, use the reject_on_empty_key
parameter:
=> COPY fruits FROM STDIN PARSER FJSONPARSER(reject_on_empty_key=true);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"name" : "apple", "rating" : 5}
>> {"" : "missing", "rating" : 1}
>> {"name" : "", "rating" : 3}
>> \.
=> SELECT MAPTOSTRING(__raw__) FROM fruits;
maptostring
--------------------------------------
{
"name": "apple",
"rating": "5"
}
{
"name": "",
"rating": "3"
}
(2 rows)