使用 Perl 批量加载数据

要使用 Perl 将大批量数据加载到 Vertica:

  1. 将 DBI 的 AutoCommit 连接属性设置为 false,以提高批量加载速度。有关禁用 AutoCommit 的示例,请参阅设置 Perl DBI 连接属性

  2. 调用数据库句柄的 prepare 函数以准备 SQL INSERT 语句(包含要插入的数据值的占位符)。例如:

    # Prepare an INSERT statement for the test table
    $sth = $dbh->prepare("INSERT into test values(?,?,?,?,?,?,?)");
    

    prepare 函数将返回一个语句句柄,您可以使用此句柄来插入数据。

  3. 为占位符分配数据。有几种方法可用于执行此操作。最简单的方法是使用 INSERT 语句中每个占位符的值来填充数组。

  4. 调用该语句句柄的 execute 函数以将数据行插入到 Vertica。此函数调用的返回值指示 Vertica 已接受还是已拒绝该行。

  5. 重复步骤 3 和 4,直至已加载所需的所有数据为止。

  6. 调用数据库句柄的 commit 函数以提交已插入的数据。

以下示例演示了通过以下方法插入小批量数据:使用数据填充一系列数组,然后在其中循环并插入每个行。

#!/usr/bin/perl
use strict;
use DBI;
# Create a hash reference that holds a hash of parameters for the
# connection.
 my $attr = {AutoCommit => 0, # Turn off autocommit
             PrintError => 0   # Turn off automatic error printing.
                               # This is handled manually.
             };
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
    $attr);
if (defined DBI::err) {
    # Conection failed.
    die "Failed to connect: $DBI::errstr";
}
print "Connection AutoCommit state is: " . $dbh->{AutoCommit} . "\n";
# Create table to hold inserted data
$dbh->do("DROP TABLE IF EXISTS TEST CASCADE;") or die "Could not drop table";
$dbh->do("CREATE TABLE TEST( \
               C_ID  INT, \
               C_FP  FLOAT,\
               C_VARCHAR VARCHAR(100),\
               C_DATE DATE, C_TIME TIME,\
               C_TS TIMESTAMP,\
               C_BOOL BOOL)") or die "Could not create table";
# Populate an array of arrays with values. One of these rows contains
# data that will not be sucessfully inserted. Another contains an
# undef value, which gets inserted into the database as a NULL.
my @data = (
                [1,1.111,'Hello World!','2001-01-01','01:01:01'
                    ,'2001-01-01 01:01:01','t'],
                [2,2.22222,'How are you?','2002-02-02','02:02:02'
                    ,'2002-02-02 02:02:02','f'],
                ['bad value',2.22222,'How are you?','2002-02-02','02:02:02'
                    ,'2002-02-02 02:02:02','f'],
                [4,4.22222,undef,'2002-02-02','02:02:02'
                    ,'2002-02-02 02:02:02','f'],
             );
# Create a prepared statement to use parameters for inserting values.
my $sth = $dbh->prepare_cached("INSERT into test values(?,?,?,?,?,?,?)");
my $rowcount = 0; # Count # of rows
# Loop through the arrays to insert values
foreach my $tuple (@data) {
    $rowcount++;
    # Insert the row
    my $retval = $sth->execute(@$tuple);

    # See if the row was successfully inserted.
    if ($retval == 1) {
        # Value of 1 means the row was inserted (1 row was affected by insert)
        print "Row $rowcount successfully inserted\n";
    } else {
        print "Inserting row $rowcount failed";
        # Error message is not set on some platforms/versions of DBUI. Check to
        # ensure a message exists to avoid getting an unitialized var warning.
        if ($sth->err()) {
                print ": " . $sth->errstr();
        }
        print "\n";
    }
}
# Commit changes. With AutoCommit off, you need to use commit for batched
# data to actually be committed into the database. If your Perl script exits
# without committing its data, Vertica rolls back the transaction and the
# data is not committed.
$dbh->commit();
$dbh->disconnect();

成功运行以上示例后,将显示以下内容:

Connection AutoCommit state is: 0
Row 1 successfully inserted
Row 2 successfully inserted
Inserting row 3 failed with error 01000 [Vertica][VerticaDSII] (20) An
error occurred during query execution: Row rejected by server; see
server log for details (SQL-01000)
Row 4 successfully inserted

请注意,未插入其中一个行,因为该行包含无法存储在整数列中的字符串值。有关与 Vertica 通信的 Perl 脚本中数据类型处理的详细信息,请参阅 Perl 数据类型和 Vertica 数据类型之间的转换