Using batch inserts
You use batch inserts to insert chunks of data into the database. By breaking the data into batches, you can monitor the progress of the load by receiving information about any rejected rows after each batch is loaded. To perform a batch load through ODBC, you typically use a prepared statement with the parameters bound to arrays that contain the data to be loaded. For each batch, you load a new set of data into the arrays then execute the prepared statement.
When you perform a batch load, Vertica uses a COPY statement to load the data. Each additional batch you load uses the same COPY statement. The statement remains open until you end the transaction, close the cursor for the statement, or execute a non-INSERT statement.
Using a single COPY statement for multiple batches improves batch loading efficiency by:
-
reducing the overhead of inserting individual batches
-
combining individual batches into larger ROS containers
Note
If the database connection has AUTOCOMMIT enabled, then the transaction is automatically committed after each batch insert statement which closes the COPY statement. Leaving AUTOCOMMIT enabled makes your batch load much less efficient, and can cause added overhead in your database as all of the smaller loads are consolidated.Even though Vertica uses a single COPY statement to insert multiple batches within a transaction, you can locate which (if any) rows were rejected due to invalid row formats or data type issues after each batch is loaded. See Tracking load status (ODBC) for details.
Note
While you can find rejected rows during the batch load transaction, other types of errors (such as running out of disk space or a node shutdown that makes the database unsafe) are only reported when the COPY statement ends.Since the batch loads share a COPY statement, errors in one batch can cause earlier batches in the same transaction to be rolled back.
Batch insert steps
The steps your application needs to take in order to perform an ODBC Batch Insert are:
-
Connect to the database.
-
Disable autocommit for the connection.
-
Create a prepared statement that inserts the data you want to load.
-
Bind the parameters of the prepared statement to arrays that will contain the data you want to load.
-
Populate the arrays with the data for your batches.
-
Execute the prepared statement.
-
Optionally, check the results of the batch load to find rejected rows.
-
Repeat the previous three steps until all of the data you want to load is loaded.
-
Commit the transaction.
-
Optionally, check the results of the entire batch transaction.
The following example code demonstrates a simplified version of the above steps.
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
int main()
{
// Number of data rows to insert
const int NUM_ENTRIES = 4;
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate database handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated Database handle.\n");
}
// Connect to the database
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
const char* userID = "dbadmin";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Disable AUTOCOMMIT
printf("Disabling autocommit.\n");
ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF,
SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not disable autocommit.\n");
exit(EXIT_FAILURE);
}
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
// Create a table to hold the data
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers "
"(CustID int, CustName varchar(100), Phone_Number char(15));",
SQL_NTS);
// Create the prepared statement. This will insert data into the
// table we created above.
printf("Creating prepared statement\n");
ret = SQLPrepare (hdlStmt, (SQLTCHAR*)"INSERT INTO customers (CustID, "
"CustName, Phone_Number) VALUES(?,?,?)", SQL_NTS) ;
if(!SQL_SUCCEEDED(ret)) {
printf("Could not create prepared statement\n");
exit(EXIT_FAILURE);
} else {
printf("Created prepared statement.\n");
}
// This is the data to be inserted into the database.
SQLCHAR custNames[][50] = { "Allen, Anna", "Brown, Bill", "Chu, Cindy",
"Dodd, Don" };
SQLINTEGER custIDs[] = { 100, 101, 102, 103};
// year, month, day, hour, minute, second, fraction
// struct accepts fraction in billionths of a second, but Vertica supports millionths
SQL_TIMESTAMP_STRUCT accountCreationDates[] = {
{1997, 4, 1, 12, 35, 29, 0}, // 1997-04-01 12:35:29
{2002, 6, 13, 1, 0, 12, 1000}, // 2002-06-13 01:00:12.000001
{2000, 9, 2, 2, 59, 37, 999000000}, // 2000-09-02 02:59:37.999
{2009, 1, 25, 3, 7, 59, 999999000}, // 2009-01-25 03:07:59.999999
};
SQLCHAR phoneNums[][15] = {"1-617-555-1234", "1-781-555-1212",
"1-508-555-4321", "1-617-555-4444"};
// Bind the data arrays to the parameters in the prepared SQL
// statement. First is the custID.
ret = SQLBindParameter(hdlStmt, 1, SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER,
0, 0, (SQLPOINTER)custIDs, sizeof(SQLINTEGER) , NULL);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind custID array\n");
exit(EXIT_FAILURE);
} else {
printf("Bound CustIDs array to prepared statement\n");
}
// Bind the customer account creation date
// timestamp column size is safe at 23 + length of longest fractional component precision
// Max precision that vertica supports for timestamp second precision is 6 digits
ret = SQLBindParameter(stmt.hstmt, 2, SQL_PARAM_INPUT, SQL_C_TYPE_TIMESTAMP, SQL_TIMESTAMP, 29, 6,
(SQLPOINTER)accountCreationDates, sizeof(SQL_TIMESTAMP_STRUCT), NULL);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind account creation dates\n");
exit(EXIT_FAILURE);
} else {
printf("Bound account creation date array to prepared statement\n");
}
// Bind CustNames
ret = SQLBindParameter(hdlStmt, 3, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_VARCHAR,
50, 0, (SQLPOINTER)custNames, 50, NULL);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind custNames\n");
exit(EXIT_FAILURE);
} else {
printf("Bound CustNames array to prepared statement\n");
}
// Bind phoneNums
ret = SQLBindParameter(hdlStmt, 4, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR,
15, 0, (SQLPOINTER)phoneNums, 15, NULL);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind phoneNums\n");
exit(EXIT_FAILURE);
} else {
printf("Bound phoneNums array to prepared statement\n");
}
// Tell the ODBC driver how many rows we have in the
// array.
ret = SQLSetStmtAttr( hdlStmt, SQL_ATTR_PARAMSET_SIZE,
(SQLPOINTER)NUM_ENTRIES, 0 );
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind set parameter size\n");
exit(EXIT_FAILURE);
} else {
printf("Bound phoneNums array to prepared statement\n");
}
// Add multiple batches to the database. This just adds the same
// batch of data four times for simplicity's sake. Each call adds
// the 4 rows into the database.
for (int batchLoop=1; batchLoop<=5; batchLoop++) {
// Execute the prepared statement, loading all of the data
// in the arrays.
printf("Adding Batch #%d...", batchLoop);
ret = SQLExecute(hdlStmt);
if(!SQL_SUCCEEDED(ret)) {
printf("not successful!\n");
} else {
printf("successful.\n");
}
}
// Done with batches, commit the transaction
printf("Committing transaction\n");
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not commit transaction\n");
} else {
printf("Committed transaction\n");
}
// Clean up
printf("Free handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting. Transaction still open?\n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
The result of running the above code is shown below.
Allocated an environment handle.
Set application to ODBC 3.
Allocated Database handle.
Connecting to database.
Connected to database.
Creating prepared statement
Created prepared statement.
Bound CustIDs array to prepared statement
Bound CustNames array to prepared statement
Bound phoneNums array to prepared statement
Adding Batch #1...successful.
Adding Batch #2...successful.
Adding Batch #3...successful.
Adding Batch #4...successful.
Adding Batch #5...successful.
Committing transaction
Committed transaction
Free handles.
The resulting table looks like this:
=> SELECT * FROM customers;
CustID | CustName | Phone_Number
--------+-------------+-----------------
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don | 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don | 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don | 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don | 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don | 1-617-555-4444
(20 rows)
Note
An input parameter bound with the SQL_C_NUMERIC data type uses the default numeric precision (37) and the default scale (0) instead of the precision and scale set by the SQL_NUMERIC_STRUCT input value. This behavior adheres to the ODBC standard. If you do not want to use the default precision and scale, useSQLSetDescField()
or SQLSetDescRec()
to change them in the statement's attributes.