Fixed-width format data

Use the FIXEDWIDTH parser option to bulk load fixed-width data.

Use the FIXEDWIDTH parser option to bulk load fixed-width data. You must specify the COLSIZES option values to specify the number of bytes for each column. The definition of the table you are loading (COPY table f (x, y, z)) determines the number of COLSIZES values to declare.

To load fixed-width data, use the COLSIZES option to specify the number of bytes for each input column. If any records do not have values, COPY inserts one or more null characters to equal the specified number of bytes. The last record in a fixed-width data file must include a record terminator to determine the end of the load data.

The following COPY options are not supported:

  • DELIMITER

  • ENCLOSED BY

  • ESCAPE AS

  • TRAILING NULLCOLS

Using nulls in fixed-width data

The default NULL string for a fixed-width load cannot be an empty string, and instead, consists of all spaces. The number of spaces depends on the column width declared with the COLSIZES (integer, [,...]) option.

For fixed-width loads, the NULL definition depends on whether you specify NULL at the column or statement level:

  • Statement level: NULL must be defined as a single-character. The default (or custom) NULL character is repeated for the entire width of the column.

  • Column level: NULL must be defined as a string whose length matches the column width.

For fixed-width loads, if the input data column has fewer values than the specified column size, COPY inserts NULL characters. The number of NULLs must match the declared column width. If you specify a NULL string at the column level, COPY matches the string with the column width.

Defining a null character (statement level)

  1. Create a two-column table (fw):

    => CREATE TABLE fw(co int, ci int);
    CREATE TABLE
    
  2. Copy the table, specifying null as 'N', and enter some data:

    => COPY fw FROM STDIN FIXEDWIDTH colsizes(2,2) null AS 'N' NO COMMIT;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> NN12
    >> 23NN
    >> NNNN
    >> nnnn
    >> \.
    
  3. Select all (*) from the table:

    => SELECT * FROM fw;
     co | ci
    ----+----
        | 12
     23 |
    (2 rows)
    

Defining a custom record terminator

To define a record terminator other than the COPY default when loading fixed-width data, take these steps:

  1. Create table fw with two columns, co and ci:

    => CREATE TABLE fw(co int, ci int);
    CREATE TABLE
    
  2. Copy table fw, specifying two 2-byte column sizes, and specifying a comma (,) as the record terminator:

    => COPY fw FROM STDIN FIXEDWIDTH colsizes(2,2) RECORD TERMINATOR ',';
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 1234,1444,6666
    >> \.
    
  3. Query all data in table fw:

    => SELECT * FROM fw;
     co | ci
    ----+----
     12 | 34
     14 | 44
    (2 rows)
    

The SELECT output indicates only two values. COPY rejected the third value (6666) because it was not followed by a comma (,) record terminator. Fixed-width data requires a trailing record terminator only if you explicitly specify a record terminator explicitly.

Copying fixed-width data

Use COPY FIXEDWIDTH COLSIZES (n [,...) to load files into a Vertica database. By default, all spaces are NULLs. For example:

=> CREATE TABLE mytest(co int, ci int);
=> CREATE PROJECTION mytest_p1 AS SELECT * FROM mytest SEGMENTED BY HASH(co) ALL NODES;
=> COPY mytest(co,ci) FROM STDIN FIXEDWIDTH colsizes(6,4) NO COMMIT;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> \.
=> SELECT * FROM mytest ORDER BY co;
 co | ci
----+----
(0 rows)

Skipping content in fixed-width data

The COPY statement has two options to skip input data. The SKIP BYTES option is only for fixed-width data loads:

SKIP BYTES num-bytes Skips the specified number of bytes from the input data.
SKIP num-records Skips the specified number of records.

The following example uses SKIP BYTES to skip 11 bytes when loading a fixed-width table with two columns (4 and 6 bytes):

  1. Copy a table using SKIP BYTES:

    => COPY fw FROM STDIN FIXEDWIDTH colsizes (4,6) SKIP BYTES 11;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 2222666666
    >> 1111999999
    >> 1632641282
    >> \.
    
  2. Query all data in table fw:

    => SELECT * FROM fw ORDER BY co;
      co  |   ci
    ------+--------
     1111 | 999999
     1632 | 641282
    (2 rows)
    

The output confirms that COPY skipped the first 11 bytes of loaded data.

The following example uses SKIP when loading a fixed-width (4,6) table:

  1. Copy a table, using SKIP to skip two records of input data:

    => COPY fw FROM STDIN FIXEDWIDTH colsizes (4,6) SKIP 2;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 2222666666
    >> 1111999999
    >> 1632641282
    >> 3333888888
    >> \.
    
  2. Query all data in table fw:

    => SELECT * FROM fw ORDER BY co;
      co  |   ci
    ------+--------
     1632 | 641282
     3333 | 888888
    (2 rows)
    

The output confirms that COPY skipped the first two records of load data.

Trimming characters in fixed-width data loads

Use the TRIM option to trim a character. TRIM accepts a single-byte character, which is trimmed at the beginning and end of the data. For fixed-width data loads, when you specify a TRIM character, COPY first checks to see if the row is NULL. If the row is not null, COPY trims the character(s). The next example instructs COPY to trim the character A, and shows the results:

  1. Copy table fw, specifying TRIM character A:

    => COPY fw FROM STDIN FIXEDWIDTH colsizes(4,6) TRIM 'A';
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> A22A444444
    >> A22AA4444A
    >> \.
    
  2. Query all data in table fw:

    => SELECT * FROM fw ORDER BY co;
     co |   ci
    ----+--------
     22 |   4444
     22 | 444444
    (2 rows)
    

Using padding in fixed-width data loads

By default, the padding character is ' ' (a single space). The padding behavior for fixed-width data loads is similar to how a space is treated in other formats, differing by data type as follows:

Data type Padding
Integer Leading and trailing spaces
Bool Leading and trailing spaces
Float Leading and trailing spaces
[var]Binary None, all characters are significant.
[Var]Char Trailing spaces if string is too large
DateInterval, Time, Timestamp, TimeTZ None, all characters are significant. COPY uses an internal algorithm to parse these data types.
Date (formatted) Use the COPY FORMAT option string to match the expected column length.
Numerics Leading and trailing spaces