EXPORT TO JSON

Exports a table, columns from a table, or query results to JSON files.

Exports a table, columns from a table, or query results to JSON files. The files can be read back into Vertica using FJSONPARSER.

There are some limitations on the queries you can use in an export statement. See Query Restrictions.

You can export data stored in Vertica in ROS format and data from external tables.

File exporters are UDxs that cannot be run in fenced mode. If the ForceUDxFencedMode configuration parameter is set to true to support other UDxs, you can set it to false at the session level before performing the export.

This statement returns the number of rows written and logs information about exported files in a system table. See Monitoring exports.

During an export to HDFS or an NFS mount point, Vertica writes files to a temporary directory in the same location as the destination and renames the directory when the export is complete. Do not attempt to use the files in the temporary directory. During an export to S3, GCS, or Azure, Vertica writes files directly to the destination path, so you must wait for the export to finish before reading the files. For more information, see Exporting to object stores.

Syntax


EXPORT [ /*+LABEL (label)*/ ] TO JSON 
   ( directory='path'[, param=value[,...] ] )
   [ OVER (over-clause ) ] AS SELECT query-expression

Arguments

LABEL

Assigns a label to a statement to identify it for profiling and debugging.

over-clause

Specifies how to partition table data using PARTITION BY. Within partitions you can sort using ORDER BY. See SQL analytics. This clause may contain column references but not expressions.

If you partition data, Vertica creates a partition directory structure, transforming column names to lowercase. See Partitioned data for a description of the directory structure. If you use the fileName parameter, you cannot use partitioning.

If you omit this clause, Vertica optimizes for maximum parallelism.

query-expression

Specifies the data to export. See Query Restrictions for important limitations.

Parameters

directory

The destination directory for the output files. The current user must have permission to write it. The destination can be on any of the following file systems:

HDFS file system
S3 object store
Google Cloud Storage (GCS) object store
Azure Blob Storage object store
Linux file system, either an NFS mount or local storage on each node

Privileges

Non-superusers:

Source table: SELECT
Source table schema: USAGE
Destination directory: Write

Query restrictions

You must provide an alias column label for selected column targets that are expressions.

If you partition the output, you cannot specify schema and table names in the SELECT statement. Specify only the column name.

The query can contain only a single outer SELECT statement. For example, you cannot use UNION:

=> EXPORT TO JSON(directory = '/mnt/shared_nfs/accounts/rm')
   OVER(PARTITION BY hash)
   AS
   SELECT 1 as account_id, '{}' as json, 0 hash
   UNION ALL
   SELECT 2 as account_id, '{}' as json, 1 hash;
ERROR 8975:  Only a single outer SELECT statement is supported
HINT:  Please use a subquery for multiple outer SELECT statements

Instead, rewrite the query to use a subquery:

=> EXPORT TO JSON(directory = '/mnt/shared_nfs/accounts/rm')
   OVER(PARTITION BY hash)
   AS
   SELECT
    account_id,
    json
   FROM
   (
     SELECT 1 as account_id, '{}' as json, 0 hash
     UNION ALL
     SELECT 2 as account_id, '{}' as json, 1 hash
   ) a;
 Rows Exported
---------------
             2
(1 row)

To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.

Data types

EXPORT TO JSON can export ARRAY and ROW types in any combination.

EXPORT TO JSON does not support binary output (VARBINARY).

Output

The export operation always creates (or appends to) an output directory, even if all output is written to a single file or the query produces zero rows.

By default, output file names follow the pattern: prefix-nodename-threadId[-sequenceNumber].extension. prefix is typically an 8-character hash, but can be longer if the export appended to an existing directory. A sequence number is added if an export needs to be broken into pieces to satisfy fileSizeMB. You can change the prefix using the filePrefix parameter or, for appends, by setting ifDirExists=appendNoHash. You can write all output to a single named file using the filename parameter.

Column names in partition directories are lowercase.

Files exported to a local file system by any Vertica user are owned by the Vertica superuser. Files exported to HDFS or object stores are owned by the Vertica user who exported the data.

Making concurrent exports to the same output destination is an error and can produce incorrect results.

Exports to the local file system can be to an NFS mount (shared) or to the Linux file system on each node (non-shared). For details, see Exporting to the Linux file system. Exports to non-shared local file systems have the following restrictions:

The output directory must not exist on any node.
You must have a USER storage location or superuser privileges.
You cannot override the permissions mode of 700 for directories and 600 for files.

Exports to object-store file systems are not atomic. Be careful to wait for the export to finish before using the data. For details, see Exporting to object stores.

Examples

In the following example, one of the ROW elements has a null value, which is omitted in the output. EXPORT TO JSON writes each JSON record on one line; line breaks have been inserted into the following output for readability:

=> SELECT name, menu FROM restaurants;
       name        |                                     menu

-------------------+------------------------------------------------------------
------------------
 Bob's pizzeria    | [{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]
 Bakersfield Tacos | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)

=> EXPORT TO JSON (directory='/output/json', omitNullFields=true)
   AS SELECT * FROM restaurants;
 Rows Exported
---------------
             2
(1 row)

=> \! cat /output/json/*.json
{"name":"Bob's pizzeria","cuisine":"Italian","location_city":["Cambridge","Pittsburgh"],
 "menu":[{"item":"cheese pizza"},{"item":"spinach pizza","price":10.5}]}
{"name":"Bakersfield Tacos","cuisine":"Mexican","location_city":["Pittsburgh"],
 "menu":[{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]}