This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Batch inserts using JDBC prepared statements

You can load batches of data into Vertica using prepared INSERT statements—server-side statements that you set up once, and then call repeatedly.

You can load batches of data into Vertica using prepared INSERT statements—server-side statements that you set up once, and then call repeatedly. You instantiate a member of the PreparedStatement class with a SQL statement that contains question mark placeholders for data. For example:

PreparedStatement pstmt = conn.prepareStatement(
                    "INSERT INTO customers(last, first, id) VALUES(?,?,?)");

You then set the parameters using data-type-specific methods on the PreparedStatement object, such as setString() and setInt(). Once your parameters are set, call the addBatch() method to add the row to the batch. When you have a complete batch of data ready, call the executeBatch() method to execute the insert batch.

Behind the scenes, the batch insert is converted into a COPY statement. When the connection's AutoCommit parameter is disabled, Vertica keeps the COPY statement open and uses it to load subsequent batches until the transaction is committed, the cursor is closed, or your application executes anything else (or executes any statement using another Statement or PreparedStatement object). Using a single COPY statement for multiple batch inserts makes loading data more efficient. If you are loading multiple batches, you should disable the AutoCommit property of the database to take advantage of this increased efficiency.

When performing batch inserts, experiment with various batch and row sizes to determine the settings that provide the best performance.

The following example demonstrates using a prepared statement to batch insert data.

import java.sql.*;
import java.util.Properties;

public class BatchInsertExample {
    public static void main(String[] args) {
        Properties myProp = new Properties();
        myProp.put("user", "ExampleUser");
        myProp.put("password", "password123");

     //Set streamingBatchInsert to True to enable streaming mode for batch inserts.
     //myProp.put("streamingBatchInsert", "True");

     Connection conn;
        try {
            conn = DriverManager.getConnection(
                            "jdbc:vertica://VerticaHost:5433/ExampleDB",
                            myProp);
            // establish connection and make a table for the data.
            Statement stmt = conn.createStatement();


            // Set AutoCommit to false to allow Vertica to reuse the same
            // COPY statement
            conn.setAutoCommit(false);


            // Drop table and recreate.
            stmt.execute("DROP TABLE IF EXISTS customers CASCADE");
            stmt.execute("CREATE TABLE customers (CustID int, Last_Name"
                            + " char(50), First_Name char(50),Email char(50), "
                            + "Phone_Number char(12))");
            // Some dummy data to insert.
            String[] firstNames = new String[] { "Anna", "Bill", "Cindy",
                            "Don", "Eric" };
            String[] lastNames = new String[] { "Allen", "Brown", "Chu",
                            "Dodd", "Estavez" };
            String[] emails = new String[] { "aang@example.com",
                            "b.brown@example.com", "cindy@example.com",
                            "d.d@example.com", "e.estavez@example.com" };
            String[] phoneNumbers = new String[] { "123-456-7890",
                            "555-444-3333", "555-867-5309",
                            "555-555-1212", "781-555-0000" };
            // Create the prepared statement
            PreparedStatement pstmt = conn.prepareStatement(
                            "INSERT INTO customers (CustID, Last_Name, " +
                            "First_Name, Email, Phone_Number)" +
                            " VALUES(?,?,?,?,?)");
            // Add rows to a batch in a loop. Each iteration adds a
            // new row.
            for (int i = 0; i < firstNames.length; i++) {
                // Add each parameter to the row.
                pstmt.setInt(1, i + 1);
                pstmt.setString(2, lastNames[i]);
                pstmt.setString(3, firstNames[i]);
                pstmt.setString(4, emails[i]);
                pstmt.setString(5, phoneNumbers[i]);
                // Add row to the batch.
                pstmt.addBatch();
            }

            try {
                // Batch is ready, execute it to insert the data
                pstmt.executeBatch();
            } catch (SQLException e) {
                System.out.println("Error message: " + e.getMessage());
                return; // Exit if there was an error
            }

            // Commit the transaction to close the COPY command
            conn.commit();


            // Print the resulting table.
            ResultSet rs = null;
            rs = stmt.executeQuery("SELECT CustID, First_Name, "
                            + "Last_Name FROM customers ORDER BY CustID");
            while (rs.next()) {
                System.out.println(rs.getInt(1) + " - "
                                + rs.getString(2).trim() + " "
                                + rs.getString(3).trim());
            }
            // Cleanup
            conn.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

The result of running the example code is:

1 - Anna Allen
2 - Bill Brown
3 - Cindy Chu
4 - Don Dodd
5 - Eric Estavez

Streaming batch inserts

By default, Vertica performs batch inserts by caching each row and inserting the cache when the user calls the executeBatch() method. Vertica also supports streaming batch inserts. A streaming batch insert adds a row to the database each time the user calls addBatch(). Streaming batch inserts improve database performance by allowing parallel processing and reducing memory demands.

To enable streaming batch inserts, set the streamingBatchInsert property to True. The preceding code sample includes a line enabling streamingBatchInsert mode. Remove the // comment marks to enable this line and activate streaming batch inserts.

The following table explains the various batch insert methods and how their behavior differs between default batch insert mode and streaming batch insert mode.

Method Default Batch Insert Behavior Streaming Batch Insert Behavior
addBatch() Adds a row to the row cache. Inserts a row into the database.
executeBatch() Adds the contents of the row cache to the database in a single action. Sends an end-of-batch message to the server and returns an array of integers indicating the success or failure of each addBatch() attempt.
clearBatch() Clears the row cache without inserting any rows. Not supported. Triggers an exception if used when streaming batch inserts are enabled.

Notes

  • Using the PreparedStatement.setFloat() method can cause rounding errors. If precision is important, use the .setDouble() method instead.

  • The PreparedStatement object caches the connection's AutoCommit property when the statement is prepared. Later changes to the AutoCommit property have no effect on the prepared statement.

1 - Error handling during batch loads

When loading individual batches, you can find how many rows were accepted and what rows were rejected (see Identifying Accepted and Rejected Rows for details).

When loading individual batches, you can find how many rows were accepted and what rows were rejected (see Identifying Accepted and Rejected Rows for details). If you have disabled the AutoCommit connection setting, other errors (such as disk space errors, for example) do not occur while inserting individual batches. This behavior is caused by having a single SQL COPY statement perform the loading of multiple consecutive batches (which makes the load process more efficient). It is only when the COPY statement closes that the batched data is committed and Vertica reports other types of errors.

Therefore, your bulk loading application should be prepared to check for errors when the COPY statement closes. You can trigger the COPY statement to close by:

  • ending the batch load transaction by calling Connection.commit()

  • closing the statement using Statement.close()

  • setting the connection's AutoCommit property to true before inserting the last batch in the load

2 - Identifying accepted and rejected rows (JDBC)

The return value of PreparedStatement.executeBatch is an integer array containing the success or failure status of inserting each row.

The return value of PreparedStatement.executeBatch is an integer array containing the success or failure status of inserting each row. A value 1 means the row was accepted and a value of -3 means that the row was rejected. In the case where an exception occurred during the batch execution, you can also get the array using BatchUpdateException.getUpdateCounts().

The following example extends the example shown in Batch inserts using JDBC prepared statements to retrieve this array and display the results the batch load.

import java.sql.*;
import java.util.Arrays;
import java.util.Properties;

public class BatchInsertErrorHandlingExample {
    public static void main(String[] args) {
        Properties myProp = new Properties();
        myProp.put("user", "ExampleUser");
        myProp.put("password", "password123");
        Connection conn;

        // establish connection and make a table for the data.
        try {
            conn = DriverManager.getConnection(
                            "jdbc:vertica://VerticaHost:5433/ExampleDB",
                            myProp);


            // Disable auto commit
            conn.setAutoCommit(false);

            // Create a statement
            Statement stmt = conn.createStatement();
            // Drop table and recreate.
            stmt.execute("DROP TABLE IF EXISTS customers CASCADE");
            stmt.execute("CREATE TABLE customers (CustID int, Last_Name"
                            + " char(50), First_Name char(50),Email char(50), "
                            + "Phone_Number char(12))");

            // Some dummy data to insert. The one row won't insert because
            // the phone number is too long for the phone column.
            String[] firstNames = new String[] { "Anna", "Bill", "Cindy",
                            "Don", "Eric" };
            String[] lastNames = new String[] { "Allen", "Brown", "Chu",
                            "Dodd", "Estavez" };
            String[] emails = new String[] { "aang@example.com",
                            "b.brown@example.com", "cindy@example.com",
                            "d.d@example.com", "e.estavez@example.com" };
            String[] phoneNumbers = new String[] { "123-456-789",
                            "555-444-3333", "555-867-53093453453",
                            "555-555-1212", "781-555-0000" };

            // Create the prepared statement
            PreparedStatement pstmt = conn.prepareStatement(
                            "INSERT INTO customers (CustID, Last_Name, " +
                            "First_Name, Email, Phone_Number)" +
                            " VALUES(?,?,?,?,?)");

            // Add rows to a batch in a loop. Each iteration adds a
            // new row.
            for (int i = 0; i < firstNames.length; i++) {
                // Add each parameter to the row.
                pstmt.setInt(1, i + 1);
                pstmt.setString(2, lastNames[i]);
                pstmt.setString(3, firstNames[i]);
                pstmt.setString(4, emails[i]);
                pstmt.setString(5, phoneNumbers[i]);
                // Add row to the batch.
                pstmt.addBatch();
            }

            // Integer array to hold the results of inserting
            // the batch. Will contain an entry for each row,
            // indicating success or failure.
            int[] batchResults = null;

            try {
                // Batch is ready, execute it to insert the data
                batchResults = pstmt.executeBatch();
            } catch (BatchUpdateException e) {
                // We expect an exception here, since one of the
                // inserted phone numbers is too wide for its column. All of the
                // rest of the rows will be inserted.
                System.out.println("Error message: " + e.getMessage());

                // Batch results isn't set due to exception, but you
                // can get it from the exception object.
                //
                // In your own code, you shouldn't assume the a batch
                // exception occurred, since exceptions can be thrown
                // by the server for a variety of reasons.
                batchResults = e.getUpdateCounts();
            }
            // You should also be prepared to catch SQLExceptions in your own
            // application code, to handle dropped connections and other general
            // problems.

            // Commit the transaction
            conn.commit();


            // Print the array holding the results of the batch insertions.
            System.out.println("Return value from inserting batch: "
                            + Arrays.toString(batchResults));
            // Print the resulting table.
            ResultSet rs = null;
            rs = stmt.executeQuery("SELECT CustID, First_Name, "
                            + "Last_Name FROM customers ORDER BY CustID");
            while (rs.next()) {
                System.out.println(rs.getInt(1) + " - "
                                + rs.getString(2).trim() + " "
                                + rs.getString(3).trim());
            }

            // Cleanup
            conn.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

Running the above example produces the following output on the console:

Error message: [Vertica][VJDBC](100172) One or more rows were rejected by the server.Return value from inserting batch: [1, 1, -3, 1, 1]
1 - Anna Allen
2 - Bill Brown
4 - Don Dodd
5 - Eric Estavez

Notice that the third row failed to insert because its phone number is too long for the Phone_Number column. All of the rest of the rows in the batch (including those after the error) were correctly inserted.

3 - Rolling back batch loads on the server

Batch loads always insert all of their data, even if one or more rows is rejected.

Batch loads always insert all of their data, even if one or more rows is rejected. Only the rows that caused errors in a batch are not loaded. When the database connection's AutoCommit property is true, batches automatically commit their transactions when they complete, so once the batch finishes loading, the data is committed.

In some cases, you may want all of the data in a batch to be successfully inserted—none of the data should be committed if an error occurs. The best way to accomplish this is to turn off the database connection's AutoCommit property to prevent batches from automatically committing themselves. Then, if a batch encounters an error, you can roll back the transaction after catching the BatchUpdateException caused by the insertion error.

The following example demonstrates performing a rollback if any error occurs when loading a batch.

import java.sql.*;
import java.util.Arrays;
import java.util.Properties;

public class RollbackBatchOnError {
    public static void main(String[] args) {
        Properties myProp = new Properties();
        myProp.put("user", "ExampleUser");
        myProp.put("password", "password123");
        Connection conn;
        try {
            conn = DriverManager.getConnection(
                            "jdbc:vertica://VerticaHost:5433/ExampleDB",
                            myProp);
            // Disable auto-commit. This will allow you to roll back a
            // a batch load if there is an error.
            conn.setAutoCommit(false);
            // establish connection and make a table for the data.
            Statement stmt = conn.createStatement();
            // Drop table and recreate.
            stmt.execute("DROP TABLE IF EXISTS customers CASCADE");
            stmt.execute("CREATE TABLE customers (CustID int, Last_Name"
                            + " char(50), First_Name char(50),Email char(50), "
                            + "Phone_Number char(12))");

            // Some dummy data to insert. The one row won't insert because
            // the phone number is too long for the phone column.
            String[] firstNames = new String[] { "Anna", "Bill", "Cindy",
                            "Don", "Eric" };
            String[] lastNames = new String[] { "Allen", "Brown", "Chu",
                            "Dodd", "Estavez" };
            String[] emails = new String[] { "aang@example.com",
                            "b.brown@example.com", "cindy@example.com",
                            "d.d@example.com", "e.estavez@example.com" };
            String[] phoneNumbers = new String[] { "123-456-789",
                            "555-444-3333", "555-867-53094535", "555-555-1212",
                            "781-555-0000" };
            // Create the prepared statement
            PreparedStatement pstmt = conn.prepareStatement(
                            "INSERT INTO customers (CustID, Last_Name, " +
                            "First_Name, Email, Phone_Number) "+
                            "VALUES(?,?,?,?,?)");
            // Add rows to a batch in a loop. Each iteration adds a
            // new row.
            for (int i = 0; i < firstNames.length; i++) {
                // Add each parameter to the row.
                pstmt.setInt(1, i + 1);
                pstmt.setString(2, lastNames[i]);
                pstmt.setString(3, firstNames[i]);
                pstmt.setString(4, emails[i]);
                pstmt.setString(5, phoneNumbers[i]);
                // Add row to the batch.
                pstmt.addBatch();
            }
            // Integer array to hold the results of inserting
            // the batch. Will contain an entry for each row,
            // indicating success or failure.
            int[] batchResults = null;
            try {
                // Batch is ready, execute it to insert the data
                batchResults = pstmt.executeBatch();
                // If we reach here, we inserted the batch without errors.
                // Commit it.
                System.out.println("Batch insert successful. Committing.");
                conn.commit();
            } catch (BatchUpdateException e) {
                    System.out.println("Error message: " + e.getMessage());
                    // Batch results isn't set due to exception, but you
                    // can get it from the exception object.
                    batchResults =  e.getUpdateCounts();
                    // Roll back the batch transaction.
                    System.out.println("Rolling back batch insertion");
                    conn.rollback();
            }
            catch  (SQLException e) {
                // General SQL errors, such as connection issues, throw
                // SQLExceptions. Your application should do something more
                // than just print a stack trace,
                e.printStackTrace();
            }
            System.out.println("Return value from inserting batch: "
                            + Arrays.toString(batchResults));
            System.out.println("Customers table contains:");


            // Print the resulting table.
            ResultSet rs = null;
            rs = stmt.executeQuery("SELECT CustID, First_Name, "
                            + "Last_Name FROM customers ORDER BY CustID");
            while (rs.next()) {
                System.out.println(rs.getInt(1) + " - "
                                + rs.getString(2).trim() + " "
                                + rs.getString(3).trim());
            }

            // Cleanup
            conn.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

Running the above example prints the following on the system console:

Error message: [Vertica][VJDBC](100172) One or more rows were rejected by the server.Rolling back batch insertion
Return value from inserting batch: [1, 1, -3, 1, 1]
Customers table contains:

The return values indicate whether each rows was successfully inserted. The value 1 means the row inserted without any issues, and a -3 indicates the row failed to insert.

The customers table is empty since the batch insert was rolled back due to the error caused by the third column.