This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Export data to Amazon S3 using the AWS library
The AWS library is deprecated.
Deprecated
The AWS library is deprecated. To export delimited data to S3 or any other destination, use
EXPORT TO DELIMITED.
The Vertica library for Amazon Web Services (AWS) is a set of functions and configurable session parameters. These parameters allow you to export delimited data from Vertica to Amazon S3 storage without any third-party scripts or programs.
To use the AWS library, you must have access to an Amazon S3 storage account.
1 - Configure the Vertica library for Amazon Web Services
You use the Vertica library for Amazon Web Services (AWS) to export data from Vertica to S3.
You use the Vertica library for Amazon Web Services (AWS) to export data from Vertica to S3. This library does not support IAM authentication. You must configure it to authenticate with S3 by using session parameters containing your AWS access key credentials. You can set your session parameters directly, or you can store your credentials in a table and set them with the AWS_SET_CONFIG function.
Because the AWS library uses session parameters, you must reconfigure the library with each new session.
Note
Important: Your AWS access key ID and secret access key are different from your account access credentials. For more information about AWS access keys, visit the
Managing Access Keys for IAM Users in the AWS documentation.
Set AWS authentication parameters
The following AWS authentication parameters allow you to access AWS and work with the data in your Vertica database:
-
aws_id: The 20-character AWS access key used to authenticate your account.
-
aws_secret: The 40-character AWS secret access key used to authenticate your account.
-
aws_session_token: The AWS temporary security token generated by running the AWS STS command get-session-token
. This AWS STS command generates temporary credentials you can use to implement multi-factor authentication for security purposes. See Implementing Multi-factor Authentication.
Implement multi-factor authentication
Implement multi-factor authentication as follows:
-
Run the AWS STS command get-session-token
, this returns the following:
$ Credentials": {
"SecretAccessKey": "bQid6jNuSWRqUzkIJCFG7c71gDHZY3h7aDSW2DU6",
"SessionToken":
"FQoDYXdzEBcaDKM1mWpeu88nDTTFICKsAbaiIDTWe4BTh33tnUvo9F/8mZicKKLLy7WIcpT4FLfr6ltIm242/U2CI9G/
XdC6eoysUi3UGH7cxdhjxAW4fjgCKKYuNL764N2xn0issmIuJOku3GTDyc4U4iNlWyEng3SlshdiqVlk1It2Mk0isEQXKtx
F9VgfncDQBxjZUCkYIzseZw5pULa9YQcJOzl+Q2JrdUCWu0iFspSUJPhOguH+wTqiM2XdHL5hcUcomqm41gU=",
"Expiration": "2018-04-12T01:58:50Z",
"AccessKeyId": "ASIAJ4ZYGTOSVSLUIN7Q"
}
}
For more information on get-session-token, see the AWS documentation.
-
Using the SecretAccessKey returned from get-sessiontoken, set your temporary aws_secret:
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_secret='bQid6jNuSWRqUzkIJCFG7c71gDHZY3h7aDSW2DU6';
-
Using the SessionToken returned from get-session-token, set your temporary aws_session_token:
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_session_token='FQoDYXdzEBcaDKM1mWpeu88nDTTFICKsAbaiIDTWe4B
Th33tnUvo9F/8mZicKKLLy7WIcpT4FLfr6ltIm242/U2CI9G/XdC6eoysUi3UGH7cxdhjxAW4fjgCKKYuNL764N2xn0issmIuJOku3GTDy
c4U4iNlWyEng3SlshdiqVlk1It2Mk0isEQXKtxF9VgfncDQBxjZUCkYIzseZw5pULa9YQcJOzl+Q2JrdUCWu0iFspSUJPhOguH+wTq
iM2XdHL5hcUcomqm41gU=';
-
Using the AccessKeyID returned from get-session-token, set your temporary aws_id:
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_id='ASIAJ4ZYGTOSVSLUIN7Q';
The Expiration value returned indicates when the temporary credentials expire. In this example expiration occurs April 12, 2018 at 01:58:50.
These examples show how to implement multifactor authentication using session parameters. You can use either of the following methods to securely set and store your AWS account credentials:
AWS access key requirements
To communicate with AWS, your access key must have the following permissions:
-
s3:GetObject
-
s3:PutObject
-
s3:ListBucket
For security purposes, Vertica recommends that you create a separate access key with limited permissions specifically for use with the Vertica Library for AWS.
These examples show how to set the session parameters for AWS using your own credentials. Parameter values are case sensitive:
-
aws_id
: This value is your AWS access key ID.
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_id='AKABCOEXAMPLEPKPXYZQ';
-
aws_secret
: This value is your AWS secret access key.
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_secret='CEXAMPLE3tEXAMPLE1wEXAMPLEFrFEXAMPLE6+Yz';
-
aws_region
: This value is the AWS region associated with the S3 bucket you intend to access. Left unconfigured, aws_region will default to us-east-1. It identifies the default server used by Amazon S3.
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_region='us-east-1';
When using ALTER SESSION:
-
Using ALTER SESSION to change the values of S3 parameters also changes the values of corresponding UDParameters.
-
Setting a UDParameter changes only the UDParameter.
-
Setting a configuration parameter changes both the AWS parameter and UDParameter.
You can place your credentials in a table and secure them with a row-level access policy. You can then call your credentials with the AWS_SET_CONFIG scalar meta-function. This approach allows you to store your credentials on your cluster for future session parameter configuration. You must have dbadmin access to create access policies.
-
Create a table with rows or columns corresponding with your credentials:
=> CREATE TABLE keychain(accesskey varchar, secretaccesskey varchar);
-
Store your credentials in the corresponding columns:
=> COPY keychain FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> AEXAMPLEI5EXAMPLEYXQ|CCEXAMPLEtFjTEXAMPLEiEXAMPLE6+Yz
>> \.
-
Set a row-level access policy appropriate to your security situation.
-
With each new session, configure your session parameters by calling the AWS_SET_CONFIG parameter in a SELECT statement:
=> SELECT AWS_SET_CONFIG('aws_id', accesskey), AWS_SET_CONFIG('aws_secret', secretaccesskey)
FROM keychain;
aws_set_config | aws_set_config
----------------+----------------
aws_id | aws_secret
(1 row)
-
After you have configured your session parameters, verify them:
=> SHOW SESSION UDPARAMETER ALL;
2 - Export data to Amazon S3 from Vertica
After you configure the library for Amazon Web Services (AWS), you can export Vertica data to Amazon S3 by calling the S3EXPORT() transform function.
After you configure the library for Amazon Web Services (AWS), you can export Vertica data to Amazon S3 by calling the S3EXPORT() transform function. S3EXPORT() writes data to files, based on the URL you provide. Vertica performs all communication over HTTPS, regardless of the URL type you use.Vertica does not support virtual host style URLs. If you use HTTPS URL constructions, you must use path style URLs.
Note
If your S3 bucket contains a period in its path, set the prepend_hash
parameter to True.
You can control the output of S3EXPORT() in the following ways:
Adjust the query provided to S3EXPORT
By adjusting the query given to S3EXPORT(), you can export anything from tables to reporting queries.
This example exports a whole table:
=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object') OVER(PARTITION BEST)
FROM exampleTable;
rows | url
------+------------------------------
606 | https://exampleBucket/object
(1 row)
This example exports the results of a query:
=> SELECT S3EXPORT(customer_name, annual_income USING PARAMETERS url='s3://exampleBucket/object') OVER()
FROM public.customer_dimension
WHERE (customer_gender, annual_income) IN
(SELECT customer_gender, MAX(annual_income)
FROM public.customer_dimension
GROUP BY customer_gender);
rows | url
------+------------------------------
25 | https://exampleBucket/object
(1 row)
Adjust the partition of your result set with the OVER clause
Use the OVER clause to control your export partitions. Using the OVER() clause without qualification results in a single partition processed by the initiator for all of the query data. This example shows how to call the function with an unqualified OVER() clause:
=> SELECT S3EXPORT(name, company USING PARAMETERS url='s3://exampleBucket/object',
delimiter=',') OVER()
FROM exampleTable WHERE company='Vertica';
rows | url
------+------------------------------
10 | https://exampleBucket/object
(1 row)
You can also use window clauses, such as window partition clauses and window order clauses, to manage exported objects.
This example shows how you can use a window partition clause to partition S3 objects based on company values:
=> SELECT S3EXPORT(name, company
USING PARAMETERS url='s3://exampleBucket/object',
delimiter=',') OVER(PARTITION BY company) AS MEDIAN
FROM exampleTable;
Adjusting the export chunk size for wide tables
You may encounter the following error when exporting extremely wide tables or tables with long data types such as LONG VARCHAR or LONG VARBINARY:
=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object') OVER(PARTITION BEST)
FROM veryWideTable;
ERROR 5861: Error calling setup() in User Function s3export
at [/data/.../S3.cpp:787],
error code: 0, message: The specified buffer of 10485760 bytesRead is too small,
it should be at least 11279701 bytesRead.
Vertica returns this error if the data for a single row overflows the buffer storing the data before export. By default, this buffer is 10MB. You can increase the size of this buffer using the chunksize parameter, which sets the size of the buffer in bytes. This example sets it to around 60MB:
=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object', chunksize=60485760)
OVER(PARTITION BEST) FROM veryWideTable;
rows | url
------+------------------------------
606 | https://exampleBucket/object
(1 row)
See also