EXPORT TO JSON

Exports a table, columns from a table, or query results to JSON files.

Exports a table, columns from a table, or query results to JSON files. The files can be read back into Vertica using FJSONPARSER (parser).

There are some limitations on the queries you can use in an export statement. See Query Restrictions.

You can export data stored in Vertica in ROS format and data from external tables.

This statement returns the number of rows written and logs information about exported files in a system table. See Monitoring exports.

During an export to HDFS or an NFS mount point, Vertica writes files to a temporary directory in the same location as the destination and renames the directory when the export is complete. Do not attempt to use the files in the temporary directory. During an export to S3, GCS, or Azure, Vertica writes files directly to the destination path, so you must wait for the export to finish before reading the files. For more information, see Exporting to object stores.

Syntax


EXPORT [ /*+LABEL (label)*/ ] TO JSON ( directory=path[, param=value [,...] ]
   [ OVER (over-clause ) ] AS SELECT query-expression

Parameters

LABEL

Assigns a label to a statement to identify it for profiling and debugging.

directory

The destination directory for the output files. The directory must not exist, and the current user must have permission to write it. The destination can be on any of the following file systems:

filename

If specified, all output is written to a single file of this name in the location specified by directory. While the query can be processed by multiple nodes, only a single node generates the output data. The fileSizeMB parameter is ignored, and the query cannot use partitioning in the OVER() clause.

omitNullFields
Boolean, whether to omit ROW fields with null values.

Default: false

compression
Compression type, one of:
  • Uncompressed

  • BZip

  • GZip

Default: Uncompressed

fileSizeMB

The maximum file size of a single output file. This value is a hint, not a hard limit. A value of 0 specifies no limit. If filename is also specified, fileSizeMB is ignored.

This value affects the size of individual output files, not the total output size. For smaller values, Vertica divides the output into more files; all data is still exported.

Default: 10GB

fileMode

For writes to HDFS only, permission to apply to all exported files. You can specify the value in Unix octal format (such as 665) or user-group-other format—for example, rwxr-xr-x. The value must be formatted as a string even if using the octal format.

Valid octal values range between 0 and 1777, inclusive. See HDFS Permissions in the Apache Hadoop documentation.

When writing files to any destination other than HDFS, this parameter has no effect.

Default: 660, regardless of the value of fs.permissions.umask-mode in hdfs-site.xml.

dirMode

For writes to HDFS only, permission to apply to all exported directories. Values follow the same rules as those for fileMode. Further, you must give the Vertica HDFS user full permission, at least rwx------ or 700.

When writing files to any destination other than HDFS, this parameter has no effect.

Default: 755, regardless of the value of fs.permissions.umask-mode in hdfs-site.xml.

Arguments

over-clause
Specifies how to partition table data using PARTITION BY. Within partitions you can sort using ORDER BY. See SQL analytics. This clause may contain column references but not expressions.

If you partition data, Vertica creates a Hive-style partition directory structure, transforming column names to lowercase. See Using partition columns for a description of the directory structure. If you use the fileName parameter, you cannot use partitioning. Not all parsers can read partition columns as columns.

If you omit this clause, Vertica optimizes for maximum parallelism.

query-expression
Specifies the data to export. See Query Restrictions for important limitations.

Privileges

Non-superusers:

  • Source table: SELECT

  • Source table schema: USAGE

  • Destination directory: Write

Query restrictions

The following requirements and restrictions apply:

  • You must provide an alias column label for selected column targets that are expressions.

  • The query can contain only a single outer SELECT statement. For example, you cannot use UNION:

    => EXPORT TO JSON(directory = '/mnt/shared_nfs/accounts/rm')
       OVER(PARTITION BY hash)
       AS
       SELECT 1 as account_id, '{}' as json, 0 hash
       UNION ALL
       SELECT 2 as account_id, '{}' as json, 1 hash;
    ERROR 8975:  Only a single outer SELECT statement is supported
    HINT:  Please use a subquery for multiple outer SELECT statements
    

    Instead, rewrite the query to use a subquery:

    => EXPORT TO JSON(directory = '/mnt/shared_nfs/accounts/rm')
       OVER(PARTITION BY hash)
       AS
       SELECT
        account_id,
        json
       FROM
       (
         SELECT 1 as account_id, '{}' as json, 0 hash
         UNION ALL
         SELECT 2 as account_id, '{}' as json, 1 hash
       ) a;
     Rows Exported
    ---------------
                 2
    (1 row)
    
  • To use composite statements such as UNION, INTERSECT, and EXCEPT, rewrite them as subqueries.

Data types

EXPORT TO JSON can export ARRAY and ROW types in any combination.

EXPORT TO JSON does not support binary output (VARBINARY).

Output

The export operation always creates an output directory, even if all output is written to a single file or the query produces zero rows.

Output file names follow the pattern: [8-character-hash]-[nodename]-[thread-id].json.

Column names in partition directories are lowercase.

Files exported to a local file system by any Vertica user are owned by the Vertica superuser. Files exported to HDFS or object stores are owned by the Vertica user who exported the data.

The following requirements and restrictions apply:

  • Avoid concurrent exports to the same output destination. Doing so is an error on any file system and can produce incorrect results.

  • Use a shared file location for output. If you use a directory in the local file system, it must be an NFS-mounted directory.

  • For output to the local file system, you must have a USER storage location.

  • When exporting to the local file system, the permission mode is 700 for directories and 600 for files. You cannot override these values.

For restrictions specific to object stores (S3, GCS, or Azure), see Exporting to object stores.

Examples

In the following example, one of the ROW elements has a null value, which is omitted in the output. EXPORT TO JSON writes each JSON record on one line; line breaks have been inserted into the following output for readability:

=> SELECT name, menu FROM restaurants;
       name        |                                     menu

-------------------+------------------------------------------------------------
------------------
 Bob's pizzeria    | [{"item":"cheese pizza","price":null},{"item":"spinach pizza","price":10.5}]
 Bakersfield Tacos | [{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]
(2 rows)

=> EXPORT TO JSON (directory='/output/json', omitNullFields=true)
   AS SELECT * FROM restaurants;
 Rows Exported
---------------
             2
(1 row)

=> \! cat /output/json/*.json
{"name":"Bob's pizzeria","cuisine":"Italian","location_city":["Cambridge","Pittsburgh"],
 "menu":[{"item":"cheese pizza"},{"item":"spinach pizza","price":10.5}]}
{"name":"Bakersfield Tacos","cuisine":"Mexican","location_city":["Pittsburgh"],
 "menu":[{"item":"veggie taco","price":9.95},{"item":"steak taco","price":10.95}]}