Loading sample data

Vertica offers several solutions for loading files with structured and unstructred data, and from several formats.

Vertica offers several solutions for loading files with structured and unstructred data, and from several formats.

Creating a sample data file

Create a sample CSV file called cities.csv with the following contents and save it to /home/dbadmin/cities.csv.

City,State,Zip,Population
Boston,MA,02108,694583
Chicago,IL,60601,2705994
Seattle,WA,98101,744955
Dallas,TX,75201,1345047
New York,NY,10001,8398748

Loading structured data from a file

  1. Run vsql.

    $ vsql
    
  2. Create the cities table.

    => CREATE TABLE cities (
    city       varchar(20),
    state      char(2),
    zip        int,
    population int
    );
    
  3. Use the COPY statement to load the data from the cities.csv file. The following command logs exceptions and rejections in the home directory.

    => COPY cities FROM LOCAL '/home/dbadmin/cities.csv' DELIMITER ',' NULL '' EXCEPTIONS '/home/dbadmin/cities_exceptions.log'
    REJECTED DATA '/home/dbadmin/cities_rejections.log';
    
  4. Review the rejections log for what data was excluded. Here, the header was excluded.

    $ cat /home/dbadmin/cities_rejections.log
    
    City,State,Zip,Population
    
  5. Review the exceptions for details on the error. In this case, the header failed Vertica's integer data type verification.

    $ cat /home/dbadmin/cities_exceptions.log
    
    COPY: Input record 1 has been rejected (Invalid integer format 'Zip' for column 3 (zip)).
    Please see /home/dbadmin/cities_rejections.log, record 1 for the rejected record. This record was record 1 from cities.csv
    
  6. To fix this, add SKIP 1 to the original COPY statement. This excludes the first row.

    => COPY cities FROM LOCAL '/home/dbadmin/cities.csv' DELIMITER, 'NULL'
    EXCEPTIONS '/home/dbadmin/cities_exceptions.log'
    REJECTED DATA '/home/dbadmin/cities_rejections.log' SKIP 1;
    

Loading unstructured data with flex tables

To load data from another source, Vertica uses Flex tables. Flex tables simplify data loading by allowing you to load unstructured or "semi-structured" data without having to create a schema or column definitions.

Supported formats include:

  • Avro Data

  • CEF

  • CSV

  • Delimited

  • JSON

  1. Create a table called cities_flex. Notice how it does not include column names or data types.

    => CREATE FLEXIBLE TABLE cities_flex();
    
  2. Load the CSV file into the table.

    => COPY cities_flex FROM '/source/cities.csv' PARSER FDELIMITEDPARSER (delimiter=',');
    
  3. Query the cities_flex table, specifying the column names from the original CSV file.

    => SELECT city, state FROM cities_flex;