Querying nested data

If you load JSON or Avro data with flatten_arrays=FALSE (the default), VMap data in the raw column can contain multiple nested structures.

If you load JSON or Avro data with flatten_arrays=FALSE (the default), VMap data in the __raw__ column can contain multiple nested structures. In fact, any VMap JSON or Avro data can contain nested structures. This section describes how best to query such data.

Query VMap nested values

To query a nested structure, you can use multiple maplookup() functions, one for each level. However, the most efficient method is to use bracket ([]) operators.

When parsing or extracting VMap data, the default behavior is to flatten data. Flattened VMap data concatenates key names into one long name, delimiting elements with either the default delimiter (.), or a user-defined delimiter character.

To use bracket operators for nested structures in your VMap data, the data must not be flattened. Further, you cannot use bracket operators on any existing, flattened VMap data.

To load or extract VMap data correctly, specify flatten_maps=FALSE for fjsonparser, favroparser, and the mapjsonextractor() function.

Bracket operators for nested JSON

This example uses the following JSON data as an example of nested data. Save this data as restaurant.json:

{
  "restaurant" : {
      "_name_" : "Bob's pizzeria",
      "cuisine" : "Italian",
      "location" : {"city" : "Cambridge", "zip" : "02140"},
      "menu" : [{"item" : "cheese pizza", "price" : "$8.25"},
                {"item" : "chicken pizza", "price" : "$11.99"},
                {"item" : "spinach pizza", "price" : "$10.50"}]
  }
}

Create a flex table, rests, and load it with the restaurant.json file:

=> COPY rests FROM '/home/dbadmin/tempdat/restaurant.json' PARSER fjsonparser (flatten_maps=false);
 Rows Loaded
-------------
           1
(1 row)

After loading your data into a flex table, there are two ways to access nested data using brackets:

  • Beginning with the __raw__ column, followed by the character values in brackets

  • Starting with the name of the top-most element, followed by the character values in brackets

Both methods are equally efficient. Here are examples of both:

=> SELECT __raw__['restaurant']['location']['city'] FROM rests;
  __raw__
-----------
 Cambridge
(1 row)
=> SELECT restaurant['location']['city'] from rests;
 restaurant
------------
 Cambridge
(1 row)

Bracket operators for twitter data

This example shows how to extract some basic information from Twitter data.

After creating a flex table, tweets, and loading in some data, the flex table has a block of tweets.

In the following SELECT statement, notice how to specificy the __raw__ column of table tweets, followed by the bracket operators to define the virtual columns of interest (['delete']['status']['user_id']). This query uses the COUNT() function to calculate the number of deleted tweets and outputs 10 results:

=> SELECT  __raw__['delete']['status']['user_id'] as UserId, COUNT(*) as TweetsDelete from tweets
-> WHERE mapcontainskey(__raw__, 'delete')
-> GROUP BY __raw__['delete']['status']['user_id']
-> ORDER BY TweetsDelete DESC, UserID ASC LIMIT 10;
  UserId   | TweetsDelete
-----------+--------------
 106079547 |            4
 403474369 |            4
 181188657 |            3
 223136123 |            3
 770139481 |            3
 154602299 |            2
 192127653 |            2
 215011332 |            2
 23321883  |            2
 242173898 |            2
(10 rows)