Using VerticaCopyStream
The VerticaCopyStream class lets you stream data from the client system to a Vertica database. It lets you use COPY directly without first copying the data to a host in the database cluster. Using COPY to load data from the host requires superuser privileges to access the host's file system. The COPY statement used to load data from a stream does not require superuser privileges, so your client can connect with any user account that has INSERT privileges on the target table.
To copy streams into the database:
-
Disable the database connections AutoCommit connection parameter.
-
Instantiate a
VerticaCopyStreamObject
, passing it at least the database connection objects and a string containing a COPY statement to load the data. This statement must copy data from the STDIN into your table. You can use any parameters that are appropriate for your data load.Note
TheVerticaCopyStreamObject
constructor optionally takes a singleInputStream
object, or aList
ofInputStream
objects. This option lets you pre-populate the list of streams to be copied into the database. -
Call
VerticaCopyStreamObject.start()
to start the COPY statement and begin streaming the data in any streams you have already added to theVerticaCopyStreamObject
. -
Call
VerticaCopyStreamObject.addStream()
to add additional streams to the list of streams to send to the database. You can then callVerticaCopyStreamObject.execute()
to stream them to the server. -
Optionally, call
VerticaCopyStreamObject.getRejects()
to get a list of rejected rows from the last.execute()
call. The list of rejects is reset by each call to.execute()
or.finish()
.Note
If you used either the REJECTED DATA or EXCEPTIONS options in the COPY statement you passed toVerticaCopyStreamObject
the object in step 2,.getRejects()
returns an empty list. You can only use one method of tracking the rejected rows at a time. -
When you are finished adding streams, call
VerticaCopyStreamObject.finish()
to send any remaining streams to the database and close the COPY statement. -
Call
Connection.commit()
to commit the loaded data.
Getting rejected rows
The VerticaCopyStreamObject.getRejects()
method returns a List containing the row numbers of rows that were rejected after the previous .execute()
method call. Each call to .execute()
clears the list of rejected rows, so you need to call .getRejects()
after each call to .execute()
. Since .start()
and .finish()
also call .execute()
to send any pending streams to the server, you should also call .getRejects()
after these methods as well.
The following example demonstrates loading the content of five text files stored on the client system into a table.
Running the above example on some sample data results in the following output:
Loading file: C:\Data\customers-1.txtNumber of rows rejected in load #1: 3
Rejected row #1 is row 3
Rejected row #2 is row 7
Rejected row #3 is row 51
Loading file: C:\Data\customers-2.txt
Number of rows rejected in load #2: 5Rejected row #1 is row 4143
Rejected row #2 is row 6132
Rejected row #3 is row 9998
Rejected row #4 is row 10000
Rejected row #5 is row 10050
Loading file: C:\Data\customers-3.txt
Number of rows rejected in load #3: 9
Rejected row #1 is row 14142
Rejected row #2 is row 16131
Rejected row #3 is row 19999
Rejected row #4 is row 20001
Rejected row #5 is row 20005
Rejected row #6 is row 20049
Rejected row #7 is row 20056
Rejected row #8 is row 20144
Rejected row #9 is row 20236
Loading file: C:\Data\customers-4.txt
Number of rows rejected in load #4: 8
Rejected row #1 is row 23774
Rejected row #2 is row 24141
Rejected row #3 is row 25906
Rejected row #4 is row 26130
Rejected row #5 is row 27317
Rejected row #6 is row 28121
Rejected row #7 is row 29321
Rejected row #8 is row 29998
Loading file: C:\Data\customers-5.txt
Number of rows rejected in load #5: 1
Rejected row #1 is row 39997
Finish returned 39995
Number of rows accepted: 39995
Total number of rows rejected: 26