Fixed-width format data
Use the FIXEDWIDTH
parser option to bulk load fixed-width data. You must specify the COLSIZES
option values to specify the number of bytes for each column. The definition of the table you are loading (COPY table f (x, y, z))
determines the number of COLSIZES
values to declare.
To load fixed-width data, use the COLSIZES
option to specify the number of bytes for each input column. If any records do not have values, COPY
inserts one or more null characters to equal the specified number of bytes. The last record in a fixed-width data file must include a record terminator to determine the end of the load data.
The following COPY options are not supported:
-
DELIMITER
-
ENCLOSED BY
-
ESCAPE AS
-
TRAILING NULLCOLS
Using nulls in fixed-width data
The default NULL
string for a fixed-width load cannot be an empty string, and instead, consists of all spaces. The number of spaces depends on the column width declared with the COLSIZES (integer, [,...])
option.
For fixed-width loads, the NULL definition depends on whether you specify NULL at the column or statement level:
-
Statement level: NULL must be defined as a single-character. The default (or custom) NULL character is repeated for the entire width of the column.
-
Column level: NULL must be defined as a string whose length matches the column width.
For fixed-width loads, if the input data column has fewer values than the specified column size, COPY
inserts NULL characters. The number of NULLs must match the declared column width. If you specify a NULL string at the column level, COPY
matches the string with the column width.
Note
To turn off NULLs, use theNULL AS
option and specify NULL AS ''
.
Defining a null character (statement level)
-
Create a two-column table (
fw
):=> CREATE TABLE fw(co int, ci int); CREATE TABLE
-
Copy the table, specifying null as
'N'
, and enter some data:=> COPY fw FROM STDIN FIXEDWIDTH colsizes(2,2) null AS 'N' NO COMMIT; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> NN12 >> 23NN >> NNNN >> nnnn >> \.
-
Select all (
*
) from the table:=> SELECT * FROM fw; co | ci ----+---- | 12 23 | (2 rows)
Defining a custom record terminator
To define a record terminator other than the COPY
default when loading fixed-width data, take these steps:
-
Create table
fw
with two columns,co
andci
:=> CREATE TABLE fw(co int, ci int); CREATE TABLE
-
Copy table
fw
, specifying two 2-byte column sizes, and specifying a comma (,) as the record terminator:=> COPY fw FROM STDIN FIXEDWIDTH colsizes(2,2) RECORD TERMINATOR ','; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 1234,1444,6666 >> \.
-
Query all data in table
fw
:=> SELECT * FROM fw; co | ci ----+---- 12 | 34 14 | 44 (2 rows)
The SELECT
output indicates only two values. COPY
rejected the third value (6666
) because it was not followed by a comma (,
) record terminator. Fixed-width data requires a trailing record terminator only if you explicitly specify a record terminator explicitly.
Copying fixed-width data
Use COPY FIXEDWIDTH COLSIZES (n [,...)
to load files into a Vertica database. By default, all spaces are NULLs. For example:
=> CREATE TABLE mytest(co int, ci int);
=> CREATE PROJECTION mytest_p1 AS SELECT * FROM mytest SEGMENTED BY HASH(co) ALL NODES;
=> COPY mytest(co,ci) FROM STDIN FIXEDWIDTH colsizes(6,4) NO COMMIT;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> \.
=> SELECT * FROM mytest ORDER BY co;
co | ci
----+----
(0 rows)
Skipping content in fixed-width data
The COPY
statement has two options to skip input data. The SKIP BYTES
option is only for fixed-width data loads:
SKIP BYTES num-bytes |
Skips the specified number of bytes from the input data. |
SKIP num-records |
Skips the specified number of records. |
The following example uses SKIP BYTES
to skip 11 bytes when loading a fixed-width table with two columns (4 and 6 bytes):
-
Copy a table using
SKIP BYTES
:=> COPY fw FROM STDIN FIXEDWIDTH colsizes (4,6) SKIP BYTES 11; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 2222666666 >> 1111999999 >> 1632641282 >> \.
-
Query all data in table
fw
:=> SELECT * FROM fw ORDER BY co; co | ci ------+-------- 1111 | 999999 1632 | 641282 (2 rows)
The output confirms that COPY
skipped the first 11 bytes of loaded data.
The following example uses SKIP
when loading a fixed-width (4,6) table:
-
Copy a table, using
SKIP
to skip two records of input data:=> COPY fw FROM STDIN FIXEDWIDTH colsizes (4,6) SKIP 2; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 2222666666 >> 1111999999 >> 1632641282 >> 3333888888 >> \.
-
Query all data in table
fw
:=> SELECT * FROM fw ORDER BY co; co | ci ------+-------- 1632 | 641282 3333 | 888888 (2 rows)
The output confirms that COPY
skipped the first two records of load data.
Trimming characters in fixed-width data loads
Use the TRIM
option to trim a character. TRIM
accepts a single-byte character, which is trimmed at the beginning and end of the data. For fixed-width data loads, when you specify a TRIM
character, COPY
first checks to see if the row is NULL. If the row is not null, COPY
trims the character(s). The next example instructs COPY
to trim the character A, and shows the results:
-
Copy table
fw
, specifyingTRIM
characterA
:=> COPY fw FROM STDIN FIXEDWIDTH colsizes(4,6) TRIM 'A'; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> A22A444444 >> A22AA4444A >> \.
-
Query all data in table
fw
:=> SELECT * FROM fw ORDER BY co; co | ci ----+-------- 22 | 4444 22 | 444444 (2 rows)
Using padding in fixed-width data loads
By default, the padding character is ' ' (a single space). The padding behavior for fixed-width data loads is similar to how a space is treated in other formats, differing by data type as follows:
Data type | Padding |
---|---|
Integer |
Leading and trailing spaces |
Bool |
Leading and trailing spaces |
Float |
Leading and trailing spaces |
[var]Binary |
None, all characters are significant. |
[Var]Char |
Trailing spaces if string is too large |
DateInterval , Time , Timestamp , TimeTZ |
None, all characters are significant. COPY uses an internal algorithm to parse these data types. |
Date (formatted) |
Use the COPY FORMAT option string to match the expected column length. |
Numerics |
Leading and trailing spaces |