Querying using Perl

To query Vertica using Perl:.

To query Vertica using Perl:

  1. Prepare a query statement using the Perl DBI module's prepare function. This function returns a statement handle that you use to execute the query and get the result set.

  2. Execute the prepared statement by calling the execute function on the statement handle.

  3. Retrieve the results of the query from the statement handle using one of several methods, such as calling the statement handle's fetchrow_array function to retrieve a row of data, or fetchall_array to get an array of arrays containing the entire result set (not a good idea if your result set may be very large!).

The following example demonstrates querying the table created by the example shown in Batch loading data using Perl. It executes a query to retrieve all of the content of the table, then repeatedly calls the statement handle's fetchrow_array function to get rows of data in an array. It repeats this process until fetchrow_array returns undef, which means that there are no more rows to be read.

#!/usr/bin/perl
use strict;
use DBI;
my $attr = {RaiseError => 1 }; # Make errors fatal to the Perl script.
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
                        $attr);
# Prepare a query to get the content of the table
my $sth = $dbh->prepare("SELECT * FROM TEST ORDER BY C_ID ASC");
# Execute the query by calling execute on the statement handle
$sth->execute();
# Loop through result rows while we have them, getting each row as an array
while (my @row = $sth->fetchrow_array()) {
    # The @row array contains the column values for this row of data
    # Loop through the column values
    foreach my $column (@row) {
        if (!defined $column) {
            # NULLs are signaled by undefs. Set to NULL for clarity
            $column = "NULL";
        }
        print "$column\t"; # Output the column separated by a tab
    }
    print "\n";
}
$dbh->disconnect();

The example prints the following when run:

1    1.111    Hello World!    2001-01-01    01:01:01    2001-01-01 01:01:01    1
2    2.22222    How are you?    2002-02-02    02:02:02    2002-02-02 02:02:02    0
4    4.22222    NULL    2002-02-02    02:02:02    2002-02-02 02:02:02    0

Binding variables to column values

Another method of retrieving the query results is to bind variables to columns in the result set using the statement handle's bind_columns function. You may find this method convenient if you need to perform extensive processing on the returned data, since your code can use variables rather than array references to access the data. The following example demonstrates binding variables to the result set, rather than looping through the row and column values.

#!/usr/bin/perl
use strict;
use DBI;
my $attr = {RaiseError => 1 }; # Make SQL errors fatal to the Perl script.
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN32","ExampleUser","password123",
                        $attr);
# Prepare a query to get the content of the table
my $sth = $dbh->prepare("SELECT * FROM TEST ORDER BY C_ID ASC");
$sth->execute();
# Create a set of variables to bind to the column values.
my ($C_ID, $C_FP, $C_VARCHAR, $C_DATE, $C_TIME, $C_TS, $C_BOOL);
# Bind the variable references to the columns in the result set.
$sth->bind_columns(\$C_ID, \$C_FP, \$C_VARCHAR, \$C_DATE, \$C_TIME,
                    \$C_TS, \$C_BOOL);

# Now, calling fetch() to get a row of data updates the values of the bound
# variables. Continue calling fetch until it returns undefined.
while ($sth->fetch()) {
    # Note, you should always check that values are defined before using them,
    # since NULL values are translated into Perl as undefined. For this
    # example, just check the VARCHAR column for undefined values.
    if (!defined $C_VARCHAR) {
        $C_VARCHAR = "NULL";
    }
    # Just print values separated by tabs.
    print "$C_ID\t$C_FP\t$C_VARCHAR\t$C_DATE\t$C_TIME\t$C_TS\t$C_BOOL\n";
}
$dbh->disconnect();

The output of this example is identical to the output of the previous example.

Preparing, querying, and returning a single row

If you expect a single row as the result of a query (for example, when you execute a COUNT (*) query), you can use the DBI module's selectrow_array function to combine executing a statement and retrieving an array as a result.

The following example shows using selectrow_array to execute and get the results of the SHOW LOCALE statement. It also demonstrates changing the locale using the do function.

#!/usr/bin/perl
use strict;
use DBI;
my $attr = {RaiseError => 1 }; # Make SQL errors fatal to the Perl script.
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
                        $attr);
# Demonstrate setting/getting locale.
# Use selectrow_array to combine preparing a statement, executing it, and
# getting an array as a result.
my @localerv = $dbh->selectrow_array("SHOW LOCALE;");
# The locale name is the 2nd column (array index 1) in the result set.
print "Locale: $localerv[1]\n";
# Use do() to execute a SQL statement to set the locale.
$dbh->do("SET LOCALE TO en_GB");
# Get the locale again.
@localerv = $dbh->selectrow_array("SHOW LOCALE;");
print "Locale is now: $localerv[1]\n";
$dbh->disconnect();

The result of running the example is:

Locale: en_US@collation=binary (LEN_KBINARY)
Locale is now: en_GB (LEN)

Executing queries and ResultBufferSize settings

When you call the execute() function on a prepared statement, the client library retrieves results up to the size of the result buffer. The result buffer size is set using ODBC's ResultBufferSize setting.

Vertica does not allow multiple active queries per connection. However, you can simulate multiple active queries by setting the result buffer to be large enough to accommodate the entire results from the first query. To ensure that the ODBC client driver's buffer is large enough to store result set for first query you can set ResultBufferSize to 0. Setting this parameter to 0 makes the result buffer size unlimited. The ODBC driver allocates enough memory to read the entire result set. With the entire result set from the first query stored in the result set buffer, the database connection is free to perform another query. Your client can execute this second query even though it has not processed the entire result set from the first query.

However, if you set the ResultBufferSize to 0, you may find that your calls to execute() result in the operating system killing your Perl client script. The operating system may terminate your script if the ODBC driver allocates too much memory to store a large result set.

A workaround for this behavior is limit the number of rows returned by your query. Then you can set the ResultBufferSize to a value that accommodates this limited result set. For example, you can estimate the amount of memory needed to store a single row of your query result. Then use the LIMIT and OFFSET clauses to get a specific number of rows that will fit into the space you allocated using ResultBufferSize. If the results of your query is able to fit within the limited result set buffer, you can then perform additional queries with the same database connection. This solution makes your code more complex as you will need to perform multiple queries to get the entire result set. Also, it is not appropriate in cases where you need to operate on an entire result set at once, rather than just a portion of it at a time.

A better solution is to use separate database connections for each query you want to perform. The overhead of the additional database connection is small compared to the resources needed to process large data sets.