Add the required network inbound and outbound rules to the Network ACL associated to the VPC.
Note
A Vertica cluster must be operated within a single availability zone.
For information about VPCs, including how to create one, visit the AWS documentation.
1.3 - Install Vertica with manually deployed AWS resources
Vertica provides an AMI that you can install on AWS resources that you manually deploy.
Vertica provides an AMI that you can install on AWS resources that you manually deploy. This section will guide you through configuring your network settings on AWS, launching and preparing EC2 instances using the Vertica AMI, and creating a Vertica cluster on those EC2 instances.
Choose this method of installation if you are familiar with configuring AWS and have many specific AWS configuration needs. (To automatically deploy AWS resources and a Vertica cluster instead, see Installing Vertica with CloudFormation templates.
1.3.1 - Configure your network
Before you create your cluster, you must configure the network on which Vertica will run.
Before you create your cluster, you must configure the network on which Vertica will run. Vertica requires a number of specific network configurations to operate on AWS. You may also have specific network configuration needs beyond the default Vertica settings.
Important
You can create a Vertica database that uses IPv6 for internal communications running on AWS. However, if you do so, you must identify the hosts in your cluster using IP addresses rather than host names. The AWS DNS resolution service is incompatible with IPv6.
The following sections explain which Amazon EC2 features you need to configure for instance creation.
1.3.1.1 - Create a placement group, key pair, and VPC
Part of configuring your network for AWS is to create the following:.
Part of configuring your network for AWS is to create the following:
Create a placement group
A placement group is a logical grouping of instances in a single Availability Zone. Placement Groups are required for clusters and all Vertica nodes must be in the same Placement Group.
Vertica recommends placement groups for applications that benefit from low network latency, high network throughput, or both. To provide the lowest latency, and the highest packet-per-second network performance for your Placement Group, choose an instance type that supports enhanced networking.
For information on creating placement groups, see Placement Groups in the AWS documentation.
Create a key pair
You need a key pair to access your instances using SSH. Create the key pair using the AWS interface and store a copy of your key (*.pem) file on your local machine. When you access an instance, you need to know the local path of your key.
Use a key pair to:
for information on creating a key pair, see Amazon EC2 Key Pairs in the AWS documentation.
Create a virtual private cloud (VPC)
You create a Virtual Private Cloud (VPC) on Amazon so that you can create a network of your EC2 instances. Your instances in the VPC all share the same network and security settings.
A Vertica cluster on AWS must be logically located in the same network. Create a VPC to ensure the nodes in you cluster can communicate with each other in AWS.
Create a single public subnet VPC with the following configurations:
Note
A Vertica cluster must be operated in a single availability zone.
For information on creating a VPC, see Create a Virtual Private Cloud (VPC) in the AWS documentation.
1.3.1.2 - Network ACL settings
Vertica requires the following network access control list (ACL) settings on an AWS instance running the Vertica AMI.
Vertica requires the following basic network access control list (ACL) settings on an AWS instance running the Vertica AMI. Vertica recommends that you secure your network with additional ACL settings that are appropriate to your situation.
Inbound Rules
Type |
Protocol |
Port Range |
Use |
Source |
Allow/Deny |
SSH |
TCP (6) |
22 |
SSH (Optional—for access to your cluster from outside your VPC) |
User Specific |
Allow |
Custom TCP Rule |
TCP (6) |
5450 |
MC (Optional—for MC running outside of your VPC) |
User Specific |
Allow |
Custom TCP Rule |
TCP (6) |
5433 |
SQL Clients (Optional—for access to your cluster from SQL clients) |
User Specific |
Allow |
Custom TCP Rule |
TCP (6) |
50000 |
Rsync (Optional—for backup outside of your VPC) |
User Specific |
Allow |
Custom TCP Rule |
TCP (6) |
1024-65535 |
Ephemeral Ports (Needed if you use any of the above) |
User Specific |
Allow |
ALL Traffic |
ALL |
ALL |
N/A |
0.0.0.0/0 |
Deny |
Outbound Rules
Type |
Protocol |
Port Range |
Use |
Source |
Allow/Deny |
Custom TCP Rule |
TCP (6) |
0–65535 |
Ephemeral Ports |
0.0.0.0/0 |
Allow |
You can use the entire port range specified in the previous table, or find your specific ephemeral ports by entering the following command:
$ cat /proc/sys/net/ipv4/ip_local_port_range
For detailed information on network ACLs within AWS, refer to Network ACLs in the Amazon documentation.
For detailed information on ephemeral ports within AWS, refer to Ephemeral Ports in the Amazon documentation.
1.3.1.3 - Configure TCP keepalive with AWS network load balancer
AWS supports three types of elastic load balancers (ELBs):.
AWS supports three types of elastic load balancers (ELBs):
•Classic Load Balancers
•Application Load Balancers
•Network Load Balancers
Vertica strongly recommends the AWS Network Load Balancer (NLB), which provides the best performance with your Vertica database. The Network Load Balancer acts as a proxy between clients (such as JDBC) and Vertica servers. The Classic and Application Load Balancers do not work with Vertica, in Enterprise Mode or Eon Mode.
To avoid timeouts and hangs when connecting to Vertica through the NLB, it is important to understand how AWS NLB handles idle timeouts for connections. For the NLB, AWS sets the idle timeout value to 350 seconds and you cannot change this value. The timeout applies to both connection points.
For a long-running query, if either the client or the server fails to send a timely keepalive, that side of the connection is terminated. This can lead to situations where a JDBC client hangs waiting for results that would never be returned because the server fails to send a keepalive within 350 seconds.
To identify an idle timeout/keepalive issue, run a query like this via a client such as JDBC:
=> SELECT SLEEP(355);
If there’s a problem, one of the following situations occurs:
-
The client connection terminates before 355 seconds. In this case, lower the JDBC keepalive setting so that keepalives are sent less than 350 seconds apart.
-
The client connection doesn’t return a result after 355 seconds. In this case, you need to adjust the server keepalive settings (tcp_keepalive_time and tcp_keepalive_intvl) so that keepalives are sent less than 350 seconds apart.
For detailed information about AWS Network Load Balancers, see What is a Network Load Balancer? in the AWS documentation.
1.3.1.4 - Create and assign an internet gateway
When you create a VPC, an Internet gateway is automatically assigned to it.
When you create a VPC, an Internet gateway is automatically assigned to it. You can use that gateway, or you can assign your own. If you are using the default Internet gateway, continue with the procedure described in Create a security group.
Otherwise, create an Internet gateway specific to your needs. Associate that internet gateway with your VPC and subnet.
For information about how to create an Internet Gateway, see Internet Gateways in the AWS documentation.
1.3.1.5 - Assign an elastic IP address
An elastic IP address is an unchanging IP address that you can use to connect to your cluster externally.
An elastic IP address is an unchanging IP address that you can use to connect to your cluster externally. Vertica recommends you assign a single elastic IP to a node in your cluster. You can then connect to other nodes in your cluster from your primary node using their internal IP addresses dictated by your VPC settings.
Create an elastic IP address. For information, see Elastic IP Addresses in the AWS documentation.
1.3.1.6 - Create a security group
The Vertica AMI has specific security group requirements.
The Vertica AMI has specific security group requirements. When you create a Virtual Private Cloud (VPC), AWS automatically creates a default security group and assigns it to the VPC. You can use the default security group, or you can name and assign your own.
Create and name your own security group using the following basic security group settings. You may make additional modifications based on your specific needs.
Inbound
Type |
Use |
Protocol |
Port Range |
IP |
SSH |
|
TCP |
22 |
The CIDR address range of administrative systems that require SSH access to the Vertica nodes. Make this range as restrictive as possible. You can add multiple rules for separate network ranges, if necessary. |
DNS (UDP) |
|
UDP |
53 |
Your private subnet address range (for example, 10.0.0.0/24). |
Custom UDP |
Spread |
UDP |
4803 and 4804 |
Your private subnet address range (for example, 10.0.0.0/24). |
Custom TCP |
Spread |
TCP |
4803 |
Your private subnet address range (for example, 10.0.0.0/24). |
Custom TCP |
VSQL/SQL |
TCP |
5433 |
The CIDR address range of client systems that require access to the Vertica nodes. This range should be as restrictive as possible. You can add multiple rules for separate network ranges, if necessary. |
Custom TCP |
Inter-node Communication |
TCP |
5434 |
Your private subnet address range (for example, 10.0.0.0/24). |
Custom TCP |
|
TCP |
5444 |
Your private subnet address range (for example, 10.0.0.0/24). |
Custom TCP |
MC |
TCP |
5450 |
The CIDR address of client systems that require access to the management console. This range should be as restrictive as possible. You can add multiple rules for separate network ranges, if necessary. |
Custom TCP |
Rsync |
TCP |
50000 |
Your private subnet address range (for example, 10.0.0.0/24). |
ICMP |
Installer |
Echo Reply |
N/A |
Your private subnet address range (for example, 10.0.0.0/24). |
ICMP |
Installer |
Traceroute |
N/A |
Your private subnet address range (for example, 10.0.0.0/24). |
Note
In Management Console (MC), the Java IANA discovery process uses port 7 once to detect if an IP address is reachable before the database import operation. Vertica tries port 7 first. If port 7 is blocked, Vertica switches to port 22.
Outbound
Type |
Protocol |
Port Range |
Destination |
IP |
All TCP |
TCP |
0-65535 |
Anywhere |
0.0.0.0/0 |
All ICMP |
ICMP |
0-65535 |
Anywhere |
0.0.0.0/0 |
All UDP |
UDP |
0-65535 |
Anywhere |
0.0.0.0/0 |
For information about what a security group is, as well as how to create one, see Amazon EC2 Security Groups for Linux Instances in the AWS documentation.
1.3.2 - Deploy AWS instances for your Vertica database cluster
Once you have configured your network, you are ready to create your AWS instances and install Vertica.
Once you have configured your network, you are ready to create your AWS instances and install Vertica. Follow these procedures to install and run Vertica on AWS.
1.3.2.1 - Configure and launch an instance
After you configure your network settings on AWS, configure and launch the instances onto which you will install Vertica.
After you configure your network settings on AWS, configure and launch the instances onto which you will install Vertica. An Elastic Compute Cloud (EC2) instance without a Vertica AMI is similar to a traditional host. Just like with an on-premises cluster, you must prepare and configure your cluster and network at the hardware level before you can install Vertica.
When you create an EC2 instance on AWS using a Vertica AMI, the instance includes the Vertica software and the recommended configuration. The Vertica AMI acts as a template, requiring fewer configuration steps. Vertica recommends that you use the Vertica AMI as is—without modification.
-
Select the Vertica AMI from the AWS marketplace.
-
Select the desired fulfillment method.
-
Configure the following:
Add storage to your instances
Consider the following issues when you add storage to your instances:
-
Add a number of drives equal to the number of physical cores in your instance. For example, for a c3.8xlarge instance, 16 drives. For an r3.4xlarge, add 8 drives.
-
Do not store your information on the root volume.
-
Amazon EBS provides durable, block-level storage volumes that you can attach to running instances. For guidance on selecting and configuring an Amazon EBS volume type, see Amazon EBS Volume Types in the Amazon Web Services documentation.
You can choose to configure your EBS volumes into a RAID 0 array to improve disk performance. Before doing so, use the vioperf utility to determine whether the performance of the EBS volumes is fast enough without using them in a RAID array. Pass vioperf the path to a mount point for an EBS volume. In this example, an EBS volume is mounted on a directory named /vertica/data:
[dbadmin@ip-10-11-12-13 ~]$ /opt/vertica/bin/vioperf /vertica/data
The minimum required I/O is 20 MB/s read and write per physical processor core on
each node, in full duplex i.e. reading and writing at this rate simultaneously,
concurrently on all nodes of the cluster. The recommended I/O is 40 MB/s per
physical core on each node. For example, the I/O rate for a server node with 2
hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.
Using direct io (buffer size=1048576, alignment=512) for directory "/vertica/data"
test | directory | counter name | counter | counter | counter | counter | thread | %CPU | %IO Wait | elapsed | remaining
| | | value | value (10 | value/core | value/core | count | | | time (s)| time (s)
| | | | sec avg) | | (10 sec avg) | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------
Write | /vertica/data | MB/s | 259 | 259 | 32.375 | 32.375 | 8 | 4 | 11 | 10 | 65
Write | /vertica/data | MB/s | 248 | 232 | 31 | 29 | 8 | 4 | 11 | 20 | 55
Write | /vertica/data | MB/s | 240 | 234 | 30 | 29.25 | 8 | 4 | 11 | 30 | 45
Write | /vertica/data | MB/s | 240 | 233 | 30 | 29.125 | 8 | 4 | 13 | 40 | 35
Write | /vertica/data | MB/s | 240 | 233 | 30 | 29.125 | 8 | 4 | 13 | 50 | 25
Write | /vertica/data | MB/s | 240 | 232 | 30 | 29 | 8 | 4 | 12 | 60 | 15
Write | /vertica/data | MB/s | 240 | 238 | 30 | 29.75 | 8 | 4 | 12 | 70 | 5
Write | /vertica/data | MB/s | 240 | 235 | 30 | 29.375 | 8 | 4 | 12 | 75 | 0
ReWrite | /vertica/data | (MB-read+MB-write)/s| 237+237 | 237+237 | 29.625+29.625 | 29.625+29.625 | 8 | 4 | 22 | 10 | 65
ReWrite | /vertica/data | (MB-read+MB-write)/s| 235+235 | 234+234 | 29.375+29.375 | 29.25+29.25 | 8 | 4 | 20 | 20 | 55
ReWrite | /vertica/data | (MB-read+MB-write)/s| 234+234 | 235+235 | 29.25+29.25 | 29.375+29.375 | 8 | 4 | 20 | 30 | 45
ReWrite | /vertica/data | (MB-read+MB-write)/s| 233+233 | 234+234 | 29.125+29.125 | 29.25+29.25 | 8 | 4 | 18 | 40 | 35
ReWrite | /vertica/data | (MB-read+MB-write)/s| 233+233 | 234+234 | 29.125+29.125 | 29.25+29.25 | 8 | 4 | 20 | 50 | 25
ReWrite | /vertica/data | (MB-read+MB-write)/s| 234+234 | 235+235 | 29.25+29.25 | 29.375+29.375 | 8 | 3 | 19 | 60 | 15
ReWrite | /vertica/data | (MB-read+MB-write)/s| 233+233 | 236+236 | 29.125+29.125 | 29.5+29.5 | 8 | 4 | 21 | 70 | 5
ReWrite | /vertica/data | (MB-read+MB-write)/s| 232+232 | 236+236 | 29+29 | 29.5+29.5 | 8 | 4 | 21 | 75 | 0
Read | /vertica/data | MB/s | 248 | 248 | 31 | 31 | 8 | 4 | 12 | 10 | 65
Read | /vertica/data | MB/s | 241 | 236 | 30.125 | 29.5 | 8 | 4 | 15 | 20 | 55
Read | /vertica/data | MB/s | 240 | 232 | 30 | 29 | 8 | 4 | 10 | 30 | 45
Read | /vertica/data | MB/s | 240 | 232 | 30 | 29 | 8 | 4 | 12 | 40 | 35
Read | /vertica/data | MB/s | 240 | 234 | 30 | 29.25 | 8 | 4 | 12 | 50 | 25
Read | /vertica/data | MB/s | 238 | 235 | 29.75 | 29.375 | 8 | 4 | 15 | 60 | 15
Read | /vertica/data | MB/s | 238 | 232 | 29.75 | 29 | 8 | 4 | 13 | 70 | 5
Read | /vertica/data | MB/s | 238 | 238 | 29.75 | 29.75 | 8 | 3 | 9 | 75 | 0
SkipRead | /vertica/data | seeks/s | 22909 | 22909 | 2863.62 | 2863.62 | 8 | 0 | 6 | 10 | 65
SkipRead | /vertica/data | seeks/s | 21989 | 21068 | 2748.62 | 2633.5 | 8 | 0 | 6 | 20 | 55
SkipRead | /vertica/data | seeks/s | 21639 | 20936 | 2704.88 | 2617 | 8 | 0 | 7 | 30 | 45
SkipRead | /vertica/data | seeks/s | 21478 | 20999 | 2684.75 | 2624.88 | 8 | 0 | 6 | 40 | 35
SkipRead | /vertica/data | seeks/s | 21381 | 20995 | 2672.62 | 2624.38 | 8 | 0 | 5 | 50 | 25
SkipRead | /vertica/data | seeks/s | 21310 | 20953 | 2663.75 | 2619.12 | 8 | 0 | 5 | 60 | 15
SkipRead | /vertica/data | seeks/s | 21280 | 21103 | 2660 | 2637.88 | 8 | 0 | 8 | 70 | 5
SkipRead | /vertica/data | seeks/s | 21272 | 21142 | 2659 | 2642.75 | 8 | 0 | 6 | 75 | 0
If the EBS volume read and write performance (the entries with Read and Write in column 1 of the output) is greater than 20MB/s per physical processor core (columns 6 and 7), you do not need to configure the EBS volumes as a RAID array to meet the minimum requirements to run Vertica. You may still consider configuring your EBS volumes as a RAID array if the performance is less than the optimal 40MB/s per physical core (as is the case in this example).
Note
If your EC2 instance has hyper-threading enabled, vioperf may incorrectly count the number of cores in your system. The 20MB/s throughput per core requirement only applies to physical cores, rather than virtual cores. If your EC2 instance has hyper-threading enabled, divide the counter value (column 4 in the output) by the number of physical cores. See CPU Cores and Threads Per CPU Core Per Instance Type section in the AWS documentation topic
Optimizing CPU Options for a list of physical cores in each instance type.
If you determine you need to configure your EBS volumes as a RAID 0 array, see the AWS documentation topic RAID Configuration on Linux the steps you need to take.
Security group and access
-
Choose between your previously configured security group or the default security group.
-
Configure S3 access for your nodes by creating and assigning an IAM role to your EC2 instance. See AWS authentication for more information.
Launch instances
Verify that your instances are running.
1.3.2.2 - Connect to an instance
Using your private key, take these steps to connect to your cluster through the instance to which you attached an elastic IPaddress:.
Using your private key, take these steps to connect to your cluster through the instance to which you attached an elastic IPaddress:
-
As the dbadmin user, type the following command, substituting your ssh key:
$ ssh --ssh-identity <ssh key> dbadmin@elasticipaddress
-
Select Instances from the Navigation panel.
-
Select the instance that is attached to the Elastic IP.
-
Click Connect.
-
On Connect to Your Instance, choose one of the following options:
-
A Java SSH Client directly from my browser—Add the path to your private key in the field Private key path, andclick Launch SSH Client.
-
Connect with a standalone SSH client**—**Follow the steps required by your standalone SSH client.
Connect to an instance from windows using putty
If you connect to the instance from the Windows operating system, and plan to use Putty:
-
Convert your key file using PuTTYgen.
-
Connect with Putty or WinSCP (connect via the elastic IP), using your converted key (i.e., the *ppk
file).
-
Move your key file (the *pem
file) to the root dir using Putty or WinSCP.
1.3.2.3 - Prepare instances for cluster formation
After you create your instances, you need to prepare them for cluster formation.
After you create your instances, you need to prepare them for cluster formation. Prepare your instances by adding your AWS .pem
key and your Vertica license.
By default, each AMI includes a Community Edition license. Once Vertica is installed, you can find the license at this location:
/opt/vertica/config/licensing/vertica_community_edition.license.key
-
As the dbadmin user, copy your *pem
file (from where you saved it locally) onto your primary instance.
Depending upon the procedure you use to copy the file, the permissions on the file may change. If permissions change, the install_vertica
script fails with a message similar to the following:
FATAL (19): Failed Login Validation 10.0.3.158, cannot resolve or connect to host as root.
If you receive a failure message, enter the following command to correct permissions on your *pem
file:
$ chmod 600 /<name-of-pem>.pem
-
Copy your Vertica license over to your primary instance, placing it in your home directory or other known location.
1.3.2.4 - Change instances on AWS
You can change instance types on AWS.
You can change instance types on AWS. For example, you can downgrade a c3.8xlarge instance to c3.4xlarge. See Supported AWS instance types for a list of valid AWS instances.
When you change AWS instances you may need to:
-
Reconfigure memory settings
-
Reset memory size in a resource pool
-
Reset number of CPUs in a resource pool
If you change to an AWS instance type that requires a different amount of memory, you may need to recompute the following and then reset the values:
Note
You may need root user permissions to reset these values.
Reset memory size in a resource pool
If you used absolute memory in a resource pool, you may need to reconfigure the memory using the MEMORYSIZE parameter in ALTER RESOURCE POOL.
Note
If you set memory size as a percentage when you created the original resource pool, you do not need to change it here.
Reset number of CPUs in a resource pool
If your new instance requires a different number of CPUs, you may need to reset the CPUAFFINITYSET parameter in ALTER RESOURCE POOL.
1.3.2.5 - Configure storage
Vertica recommends that you store information — especially your data and catalog directories — on dedicated Amazon EBS volumes formatted with a supported file system.
Vertica recommends that you store information — especially your data and catalog directories — on dedicated Amazon EBS volumes formatted with a supported file system. The /opt/vertica/sbin/configure_software_raid.sh
script automates the storage configuration process.
Caution
Do not store information on the root volume because it might result in data loss.
Vertica performance tests Eon Mode with a per-node EBS volume of up to 2TB. For best performance, combine multiple EBS volumes into a RAID 0 array.
For more information about RAID 0 arrays and EBS volumes, see RAID configuration on Linux.
Determining volume names
Because the storage configuration script requires the volume names that you want to configure, you must identify the volumes on your machine. The following command lists the contents of the /dev
directory. Search for the volumes that begin with xvd
:
$ ls /dev
Important
Ignore the root volume. Do not include any of your root volumes in the RAID creation process.
Combining volumes for storage
The configure_software_raid.sh
shell script combines your EBS volumes into a RAID 0 array.
Caution
Run configure_software_raid.sh
in the default setting only if you have a fresh configuration with no existing RAID settings.
If you have existing RAID settings, open the script in a text editor and manually edit the raid_dev
value to reflect your current RAID settings. If you have existing RAID settings and you do not edit the script, the script deletes important operating system device files.
Alternately, use the Management Console (MC) console to add storage nodes without unwanted changes to operating system device files. For more information, see Managing database clusters.
The following steps combine your EBS volumes into RAID 0 with the configure_software_raid.sh
script:
-
Edit the /opt/vertica/sbin/configure_software_raid.sh
shell file as follows:
-
Comment out the safety exit
command at the beginning .
-
Change the sample volume names to your own volume names, which you noted previously. Add more volumes, if necessary.
-
Run the /opt/vertica/sbin/configure_software_raid.sh
shell file. Running this file creates a RAID 0 volume and mounts it to /vertica/data
.
-
Change the owner of the newly created volume to dbadmin with chown
.
-
Repeat steps 1-3 for each node on your cluster.
1.3.2.6 - Create a cluster
On AWS, use the install_vertica script to combine instances and create a cluster.
On AWS, use the
install_vertica
script to combine instances and create a cluster. Check your My Instances page on AWS for a list of current instances and their associated IP addresses. You need these IP addresses when you run install_vertica
.
Create a cluster as follows:
-
While connected to your primary instance, enter the following command to combine your instances into a cluster. Substitute the IP addresses for your instances and include your root *.pem
file name.
$ sudo /opt/vertica/sbin/install_vertica --hosts 10.0.11.164,10.0.11.165,10.0.11.166 \
--dba-user-password-disabled --point-to-point --data-dir /vertica/data \
--ssh-identity ~/name-of-pem.pem --license license.file
Note
* If you are using Vertica Community Edition, which limits you to three instances, you can specify `-L CE` with no license file.
* When you issue install_vertica or update_vertica on a Vertica AMI script, --point-to-point is the default. This parameter configures <a class="glosslink" href="/en/glossary/spread/" title="An open source toolkit used in Vertica to provide a high performance messaging service that is resilient to network faults.">Spread</a> to use direct point-to-point communication between all Vertica nodes, which is a requirement for clusters on AWS.
* If you are using IPv6 network addresses to identify the hosts in your cluster, use the --ipv6 flag in your `install_vertica` command. You must also use IP addresses instead of host names, as the AWS DNS server cannot resolve host names to IPv6 addresses.
-
After combining your instances, Vertica recommends deleting your *.pem
key from your cluster to reduce security risks. The example below uses the shred
command to delete the file:
$ shred name-of-pem.pem
-
After creating one or more clusters, create your database.
For complete information on the install_vertica
script and its parameters, see Installing Vertica with the installation script.
Important
Stopping or rebooting an instance or cluster without first shutting down the database down, may result in disk or database corruption. To safely shut down and restart your cluster, see
Operating the database.
Check open ports manually using the netcat utility
Once your cluster is up and running, you can check ports manually through the command line using the netcat (nc) utility. What follows is an example using the utility to check ports.
Before performing the procedure, choose the private IP addresses of two nodes in your cluster.
The examples given below use nodes with the private IPs:
10.0.11.60 10.0.11.61
Install the nc utility on your nodes. Once installed, you can issue commands to check the ports on one node from another node.
-
To check a TCP port:
-
Put one node in listen mode and specify the port. The following sample shows how to put IP 10.0.11.60
into listen mode for port 4804.
[root@ip-10-0-11-60 ~]# nc -l 4804
-
From the other node, run nc
specifying the IP address of the node you just put in listen mode, and the same port number.
[root@ip-10-0-11-61 ~]# nc 10.0.11.60 4804
-
Enter sample text from either node and it should show up on the other node. To cancel after you have checked a port, enter Ctrl+C.
Note
Note: To check a UDP port, use the same nc
commands with the –u
option.
[root@ip-10-0-11-60 ~]# nc -u -l 4804
[root@ip-10-0-11-61 ~]# nc -u 10.0.11.60 4804
1.3.2.7 - Use Management Console (MC) on AWS
Management Console (MC) is a database management tool that allows you to view and manage aspects of your cluster.
Management Console (MC) is a database management tool that allows you to view and manage aspects of your cluster. Vertica provides an MC AMI, which you can use with AWS. The MC AMI allows you to create an instance, dedicated to running MC, that you can attach to a new or existing Vertica cluster on AWS. You can create and attach an MC instance to your Vertica on AWS cluster at any time.
For information on requirements and installing MC, see Installing Management Console.
See also
1.3.2.7.1 - Log in to MC and managing your cluster
After you launch your MC instance and configure your security group settings, log in to your database.
After you launch your MC instance and configure your security group settings, log in to your database. To do so, use the elastic IP you specified during instance creation.
From this elastic IP, you can manage your Vertica database on AWS using standard MC procedures.
Considerations when using MC on AWS
-
Because MC is already installed on the MC AMI, the MC installation process does not apply.
-
To uninstall MC on AWS, follow the procedures provided in Uninstalling Management Console before terminating the MC Instance.
1.4 - Export data to Amazon S3 using the AWS library
The AWS library is deprecated.
Deprecated
The AWS library is deprecated. To export delimited data to S3 or any other destination, use
EXPORT TO DELIMITED.
The Vertica library for Amazon Web Services (AWS) is a set of functions and configurable session parameters. These parameters allow you to export delimited data from Vertica to Amazon S3 storage without any third-party scripts or programs.
To use the AWS library, you must have access to an Amazon S3 storage account.
1.4.1 - Configure the Vertica library for Amazon Web Services
You use the Vertica library for Amazon Web Services (AWS) to export data from Vertica to S3.
You use the Vertica library for Amazon Web Services (AWS) to export data from Vertica to S3. This library does not support IAM authentication. You must configure it to authenticate with S3 by using session parameters containing your AWS access key credentials. You can set your session parameters directly, or you can store your credentials in a table and set them with the AWS_SET_CONFIG function.
Because the AWS library uses session parameters, you must reconfigure the library with each new session.
Note
Important: Your AWS access key ID and secret access key are different from your account access credentials. For more information about AWS access keys, visit the
Managing Access Keys for IAM Users in the AWS documentation.
Set AWS authentication parameters
The following AWS authentication parameters allow you to access AWS and work with the data in your Vertica database:
-
aws_id: The 20-character AWS access key used to authenticate your account.
-
aws_secret: The 40-character AWS secret access key used to authenticate your account.
-
aws_session_token: The AWS temporary security token generated by running the AWS STS command get-session-token
. This AWS STS command generates temporary credentials you can use to implement multi-factor authentication for security purposes. See Implementing Multi-factor Authentication.
Implement multi-factor authentication
Implement multi-factor authentication as follows:
-
Run the AWS STS command get-session-token
, this returns the following:
$ Credentials": {
"SecretAccessKey": "bQid6jNuSWRqUzkIJCFG7c71gDHZY3h7aDSW2DU6",
"SessionToken":
"FQoDYXdzEBcaDKM1mWpeu88nDTTFICKsAbaiIDTWe4BTh33tnUvo9F/8mZicKKLLy7WIcpT4FLfr6ltIm242/U2CI9G/
XdC6eoysUi3UGH7cxdhjxAW4fjgCKKYuNL764N2xn0issmIuJOku3GTDyc4U4iNlWyEng3SlshdiqVlk1It2Mk0isEQXKtx
F9VgfncDQBxjZUCkYIzseZw5pULa9YQcJOzl+Q2JrdUCWu0iFspSUJPhOguH+wTqiM2XdHL5hcUcomqm41gU=",
"Expiration": "2018-04-12T01:58:50Z",
"AccessKeyId": "ASIAJ4ZYGTOSVSLUIN7Q"
}
}
For more information on get-session-token, see the AWS documentation.
-
Using the SecretAccessKey returned from get-sessiontoken, set your temporary aws_secret:
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_secret='bQid6jNuSWRqUzkIJCFG7c71gDHZY3h7aDSW2DU6';
-
Using the SessionToken returned from get-session-token, set your temporary aws_session_token:
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_session_token='FQoDYXdzEBcaDKM1mWpeu88nDTTFICKsAbaiIDTWe4B
Th33tnUvo9F/8mZicKKLLy7WIcpT4FLfr6ltIm242/U2CI9G/XdC6eoysUi3UGH7cxdhjxAW4fjgCKKYuNL764N2xn0issmIuJOku3GTDy
c4U4iNlWyEng3SlshdiqVlk1It2Mk0isEQXKtxF9VgfncDQBxjZUCkYIzseZw5pULa9YQcJOzl+Q2JrdUCWu0iFspSUJPhOguH+wTq
iM2XdHL5hcUcomqm41gU=';
-
Using the AccessKeyID returned from get-session-token, set your temporary aws_id:
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_id='ASIAJ4ZYGTOSVSLUIN7Q';
The Expiration value returned indicates when the temporary credentials expire. In this example expiration occurs April 12, 2018 at 01:58:50.
These examples show how to implement multifactor authentication using session parameters. You can use either of the following methods to securely set and store your AWS account credentials:
AWS access key requirements
To communicate with AWS, your access key must have the following permissions:
-
s3:GetObject
-
s3:PutObject
-
s3:ListBucket
For security purposes, Vertica recommends that you create a separate access key with limited permissions specifically for use with the Vertica Library for AWS.
These examples show how to set the session parameters for AWS using your own credentials. Parameter values are case sensitive:
-
aws_id
: This value is your AWS access key ID.
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_id='AKABCOEXAMPLEPKPXYZQ';
-
aws_secret
: This value is your AWS secret access key.
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_secret='CEXAMPLE3tEXAMPLE1wEXAMPLEFrFEXAMPLE6+Yz';
-
aws_region
: This value is the AWS region associated with the S3 bucket you intend to access. Left unconfigured, aws_region will default to us-east-1. It identifies the default server used by Amazon S3.
=> ALTER SESSION SET UDPARAMETER FOR awslib aws_region='us-east-1';
When using ALTER SESSION:
-
Using ALTER SESSION to change the values of S3 parameters also changes the values of corresponding UDParameters.
-
Setting a UDParameter changes only the UDParameter.
-
Setting a configuration parameter changes both the AWS parameter and UDParameter.
You can place your credentials in a table and secure them with a row-level access policy. You can then call your credentials with the AWS_SET_CONFIG scalar meta-function. This approach allows you to store your credentials on your cluster for future session parameter configuration. You must have dbadmin access to create access policies.
-
Create a table with rows or columns corresponding with your credentials:
=> CREATE TABLE keychain(accesskey varchar, secretaccesskey varchar);
-
Store your credentials in the corresponding columns:
=> COPY keychain FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> AEXAMPLEI5EXAMPLEYXQ|CCEXAMPLEtFjTEXAMPLEiEXAMPLE6+Yz
>> \.
-
Set a row-level access policy appropriate to your security situation.
-
With each new session, configure your session parameters by calling the AWS_SET_CONFIG parameter in a SELECT statement:
=> SELECT AWS_SET_CONFIG('aws_id', accesskey), AWS_SET_CONFIG('aws_secret', secretaccesskey)
FROM keychain;
aws_set_config | aws_set_config
----------------+----------------
aws_id | aws_secret
(1 row)
-
After you have configured your session parameters, verify them:
=> SHOW SESSION UDPARAMETER ALL;
1.4.2 - Export data to Amazon S3 from Vertica
After you configure the library for Amazon Web Services (AWS), you can export Vertica data to Amazon S3 by calling the S3EXPORT() transform function.
After you configure the library for Amazon Web Services (AWS), you can export Vertica data to Amazon S3 by calling the S3EXPORT() transform function. S3EXPORT() writes data to files, based on the URL you provide. Vertica performs all communication over HTTPS, regardless of the URL type you use.Vertica does not support virtual host style URLs. If you use HTTPS URL constructions, you must use path style URLs.
Note
If your S3 bucket contains a period in its path, set the prepend_hash
parameter to True.
You can control the output of S3EXPORT() in the following ways:
Adjust the query provided to S3EXPORT
By adjusting the query given to S3EXPORT(), you can export anything from tables to reporting queries.
This example exports a whole table:
=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object') OVER(PARTITION BEST)
FROM exampleTable;
rows | url
------+------------------------------
606 | https://exampleBucket/object
(1 row)
This example exports the results of a query:
=> SELECT S3EXPORT(customer_name, annual_income USING PARAMETERS url='s3://exampleBucket/object') OVER()
FROM public.customer_dimension
WHERE (customer_gender, annual_income) IN
(SELECT customer_gender, MAX(annual_income)
FROM public.customer_dimension
GROUP BY customer_gender);
rows | url
------+------------------------------
25 | https://exampleBucket/object
(1 row)
Adjust the partition of your result set with the OVER clause
Use the OVER clause to control your export partitions. Using the OVER() clause without qualification results in a single partition processed by the initiator for all of the query data. This example shows how to call the function with an unqualified OVER() clause:
=> SELECT S3EXPORT(name, company USING PARAMETERS url='s3://exampleBucket/object',
delimiter=',') OVER()
FROM exampleTable WHERE company='Vertica';
rows | url
------+------------------------------
10 | https://exampleBucket/object
(1 row)
You can also use window clauses, such as window partition clauses and window order clauses, to manage exported objects.
This example shows how you can use a window partition clause to partition S3 objects based on company values:
=> SELECT S3EXPORT(name, company
USING PARAMETERS url='s3://exampleBucket/object',
delimiter=',') OVER(PARTITION BY company) AS MEDIAN
FROM exampleTable;
Adjusting the export chunk size for wide tables
You may encounter the following error when exporting extremely wide tables or tables with long data types such as LONG VARCHAR or LONG VARBINARY:
=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object') OVER(PARTITION BEST)
FROM veryWideTable;
ERROR 5861: Error calling setup() in User Function s3export
at [/data/.../S3.cpp:787],
error code: 0, message: The specified buffer of 10485760 bytesRead is too small,
it should be at least 11279701 bytesRead.
Vertica returns this error if the data for a single row overflows the buffer storing the data before export. By default, this buffer is 10MB. You can increase the size of this buffer using the chunksize parameter, which sets the size of the buffer in bytes. This example sets it to around 60MB:
=> SELECT S3EXPORT( * USING PARAMETERS url='s3://exampleBucket/object', chunksize=60485760)
OVER(PARTITION BEST) FROM veryWideTable;
rows | url
------+------------------------------
606 | https://exampleBucket/object
(1 row)
See also
1.5 - Add nodes to a running cluster on the cloud
There are two ways to add nodes to an AWS cluster:.
There are two ways to add nodes to an AWS cluster:
-
Using Management Console
-
Using admintools
When you use MC to add nodes to a cluster in the cloud, MC provisions the instances, adds the new instances to the existing Vertica cluster, and then adds those hosts to the database. However, when you add nodes to a cluster using admintools, you need to execute those steps yourself, as explained in Adding Nodes Using admintools.
Adding nodes using Management Console
In the Vertica Management Console, you can add nodes in several ways, depending on your database mode.
For Eon Mode databases, MC supports actions for subcluster and node management for the following public and private cloud providers:
Note
Enterprise Mode does not support subclusters.
For Enterprise Mode databases, MC supports these actions:
Note
In the cloud on GCP, Enterprise Mode databases are not supported.
Adding nodes in an Eon Mode database
In an Eon Mode database, every node must belong to a subcluster. To add nodes, you always add them to one of the subclusters in the database:
Adding nodes in an Enterprise Mode database on AWS
In an Enterprise Mode database on AWS, to add an instance to your cluster:
-
On the MC Home page, click View Infrastructure to go to the Infrastructure page. This page lists all the clusters the MC is monitoring.
-
Click any cluster shown on the Infrastructure page.
-
Select View or Manage from the dialog that displays, to view its Cluster page. (In a cloud environment, if MC was deployed from a cloud template the button says "Manage". Otherwise, the button says "View".)
Note
You can click the pencil icon beside the cluster name to rename the cluster. Enter a name that is unique within MC.
-
Click the Add (+) icon on the Instance List on the Cluster Management page.
MC adds a node to the selected cluster.
This section gives an overview on how to add nodes if you are managing your cluster using admintools. Each main step points to another topic with the complete instructions.
Step 1: before you start
Before you add nodes to a cluster, verify that you have an AWS cluster up and running and that you have:
-
Created a database.
-
Defined a database schema.
-
Loaded data.
-
Run the Database Designer.
-
Connected to your database.
Step 2: launch new instances to add to an existing cluster
Perform the procedure in Configure and launch an instance to create new instances (hosts) that you then will add to your existing cluster. Be sure to choose the same details you chose when you created the original instances (VPC, placement group, subnet, and security group).
Step 3: include new instances as cluster nodes
You need the IP addresses when you run the install_vertica
script to include new instances as cluster nodes.
If you are configuring Amazon Elastic Block Store (EBS) volumes, be sure to configure the volumes on the node before you add the node to your cluster.
To add the new instances as nodes to your existing cluster:
-
Configure and launch your new instances.
-
Connect to the instance that is assigned to the Elastic IP. See Connect to an instance if you need more information.
-
Run the Vertica installation script to add the new instances as nodes to your cluster. Specify the internal IP addresses for your instances and your *.pem
file name.
$ sudo /opt/vertica/sbin/install_vertica --add-hosts instance-ip --dba-user-password-disabled \
--point-to-point --data-dir /vertica/data --ssh-identity ~/name-of-pem.pem
Step 4: add the nodes
After you have added the new instances to your existing cluster, add them as nodes to your cluster, as described in Adding nodes to a database.
Step 5: rebalance the database
After you add nodes to a database, always rebalance the database.
1.6 - Remove nodes from a running AWS cluster
Use the following procedures to remove instances/nodes from an AWS cluster.
Use the following procedures to remove instances/nodes from an AWS cluster.
To avoid data loss, Vertica strongly recommends that you back up your database before removing a node. For details, see Backing up and restoring the database.
In this section
1.6.1 - Remove hosts from the database
Before you remove hosts from the database, verify that you have:.
Before you remove hosts from the database, verify that you have:
Note
Do not stop the database.
To remove a host from the database:
-
While logged on as dbadmin, launch Administration Tools.
$ /opt/vertica/bin/admintools
-
From the Main Menu, select Advanced Menu.
-
From Advanced Menu, select Cluster Management. ClickOK.
-
From Cluster Management, select Remove Host(s). Click OK.
-
From Select Database, choose the database from which you plan to remove hosts. Click OK.
-
Select the host(s) to remove. Click OK.
-
Click Yes to confirm removal of the hosts.
Note
Enter a password if necessary. Leave blank if there is no password.
-
Click OK. The system displays a message telling you that the hosts have been removed. Automatic rebalancing also occurs.
-
Click OK to confirm. Administration Tools brings you back to the Cluster Management menu.
1.6.2 - Remove nodes from the cluster
To remove nodes from a cluster, run the update_vertica script and specify:.
To remove nodes from a cluster, run the update_vertica
script and specify:
-
The option --remove-hosts
, followed by the IP addresses of the nodes you are removing.
-
The option --ssh-identity
, followed by the location and name of your *pem
file.
-
The option --dba-user-password-disabled
.
The following example removes one node from the cluster:
$ sudo /opt/vertica/sbin/update_vertica --remove-hosts 10.0.11.165 --point-to-point \
--ssh-identity ~/name-of-pem.pem --dba-user-password-disabled
1.6.3 - Stop the AWS instances (optional)
After you have removed one or more nodes from your cluster, to save costs associated with running instances, you can choose to stop the AWS instances that were previously part of your cluster.
After you have removed one or more nodes from your cluster, to save costs associated with running instances, you can choose to stop the AWS instances that were previously part of your cluster.
To stop an instance in AWS:
-
On AWS, navigate to your Instances page.
-
Right-click the instance, and choose Stop.
This step is optional because, after you have removed the node from your Vertica cluster, Vertica no longer sees the node as part of the cluster, even though it is still running within AWS.
1.7 - Upgrade Vertica on AWS
Before you upgrade to the latest Vertica version, do the following:.
Before you upgrade to the latest Vertica version, do the following:
-
Back up your existing database.
-
Download the Vertica install packages described in Download and Install the Vertica Install Package.
Upgrade to the latest version of Vertica on AWS
To upgrade to the latest version of Vertica on AWS, follow the instructions in Upgrading Vertica.
If you are setting up a Vertica cluster on AWS for the first time, follow the procedure for installing and running on AWS.
Upgrade Vertica running on AWS
Vertica supports upgrades of Vertica server running on AWS instances created from the Vertica AMI. To upgrade Vertica, follow the instructions provided in Upgrading Vertica.
Make sure to add the following arguments to the upgrade script:
1.8 - Copying and exporting data on AWS: what you need to know
There are common issues that occur when exporting or copying on AWS clusters, as described below.
There are common issues that occur when exporting or copying on AWS clusters, as described below. Except for these specific issues as they relate to AWS, copying and exporting data works as documented in Database export and import.
To copy or export data on AWS:
-
Verify that all nodes in source and destination clusters have their own elastic IPs (or public IPs) assigned.
If your destination cluster is located within the same VPC as your source cluster, proceed to step 3. Each node in one cluster must be able to communicate with each node in the other cluster. Thus, each source and destination node needs an elastic IP (or public IP) assigned.
-
(For non-CloudFormation Template installs) Create an S3 gateway endpoint.
If you aren't using a CloudFormation Template (CFT) to install Vertica, you must create an S3 gateway endpoint in your VPC. For more information, see the AWS documentation.
For example, the Vertica CFT has the following VPC endpoint:
"S3Enpoint" : {
"Type" : "AWS::EC2::VPCEndpoint",
"Properties" : {
"PolicyDocument" : {
"Version":"2012-10-17",
"Statement":[{
"Effect":"Allow",
"Principal": "*",
"Action":["*"],
"Resource":["*"]
}]
},
"RouteTableIds" : [ {"Ref" : "RouteTable"} ],
"ServiceName" : { "Fn::Join": [ "", [ "com.amazonaws.", { "Ref": "AWS::Region" }, ".s3" ] ] },
"VpcId" : {"Ref" : "VPC"}
}
-
Verify that your security group allows the AWS clusters to communicate.
Check your security groups for both your source and destination AWS clusters. Verify that ports 5433 and 5434 are open. If one of your AWS clusters is on a separate VPC, verify that your network access control list (ACL) allows communication on port 5434.
Note
Note:
This communication method exports and copies (imports) data across the Internet. You can alternatively use non-public IPs and gateways, or VPN to connect the source and destination clusters.
-
If there are one or more elastic load balancers (ELBs) between the clusters, verify that port 5433 is open between the ELBs and clusters.
-
If you use the Vertica client to connect to one or more ELBs, the ELBs only distribute incoming connections. The data transmission path occurs between clusters.
2 - Vertica on Microsoft Azure
You can deploy a Vertica database on the Microsoft Azure Cloud running in either or.
You can deploy a Vertica database on the Microsoft Azure Cloud running in either Enterprise Mode or Eon Mode. In Eon Mode, Vertica stores its data communally using Azure block blob storage.
This section explains how to deploy a Vertica database to Microsoft Azure.
For more information about Azure, see the Azure documentation.
2.1 - Deploying Vertica from the Azure Marketplace
Deploy Vertica in the Microsoft Azure Cloud using the Vertica Analytics Platform entry in the Azure Marketplace.
Deploy Vertica in the Microsoft Azure Cloud using the Vertica Analytics Platform entry in the Azure Marketplace. Vertica provides the following deployment options:
-
Eon Mode: Deploy a Management Console (MC) instance, and then provision and create an Eon Mode database from the MC. For cluster and storage requirements, see Eon Mode on Azure prerequisites.
-
Enterprise Mode: Deploy a four-node Enterprise Mode database comprised of one MC instance and three database nodes. This requires an Azure subscription with a minimum of 12 cores for the Vertica Marketplace solution.
The Enterprise Mode deployment uses the MC primarily as a monitoring tool. For example, you cannot provision and create a database with an Enterprise Mode MC. For information about creating and managing an Enterprise Mode database, see Create a database using administration tools.
Creating a deployment
Eon Mode and Enterprise Mode require much of the same information for deployment. Any information that is not required for both deployment types is clearly marked.
1. selecting the deployment type
-
Sign in to your Microsoft Azure account. From the Home screen, select Create a resource under Azure services.
-
Search for Vertica Analytics Platform and select it from the search results.
-
On the Vertica Analytics Platform page, select one of the following:
-
To deploy an MC instance that can manage an Eon Mode database, select Vertica Data Warehouse, Eon BYOL.
-
To deploy an Enterprise Mode database, select Vertica Analytics Platform.
-
On the next screen, select Create.
After you select your deployment type, the Basics tab on the Create Vertica Analytics Platform page displays.
2. adding project and instance details on the basics tab
Provide the following information in the Project details and Instance details sections:
-
Subscription: Azure bills this subscription for the cluster resources.
-
Resource group: The location to save all of the Azure resources. Create a new resource group or choose an existing one from the dropdown list.
-
Region: The location where the virtual machine running your MC instance is deployed.
-
VerticaManagement ConsoleUser: Eon Mode only. The administrator username for the MC.
-
SSH public key for OS Access: Provide the SSH public key associated with the Vertica User, for command line access to the virtual machine.
-
Password for MC Access: Enter a password to log in to Management Console. Note that Management Console requires that you change your password after the initial login.
-
Confirm password: Reenter the value you entered in Password for MC Access.
-
Select Next: Virtual Machine Settings >.
3. selecting virtual machine settings
Provide the following information on the Virtual Machine Settings tab:
-
Management Console VM size: Select Change size to customize the VM settings or select the default. For a list of VM types recommended by use case, see Recommended Azure VM types.
-
Storage account of Eon DB: Eon Mode only. The storage account associated with the database deployment.
-
Number of Vertica Cluster nodes: Enterprise Mode only. The number of nodes to deploy in the cluster, in addition to the MC instance.
The Community Edition (CE) license is automatically applied to the cluster. This license is limited to 1 TB of RAW data 3 Vertica nodes. If you select more than 3 nodes with a CE license, the initial database is created on the first 3 nodes. For information about upgrading your license, see Managing licenses.
-
Vertica Node VM size: Enterprise Mode only. Select the VM type to deploy in your cluster. Use the default or select Change size to customize the VM settings. For a list of VM types recommended by use case, see Recommended Azure VM types.
-
Total RAW storage per node: Enterprise Mode only. Select the amount of storage per node from the dropdown list. Each VM has a set of premium data disks that are configured and presented as a single storage location.
-
Select Next: Network Settings >.
4. selecting network settings
Provide the following information on the Network Settings tab:
-
Virtual Network: The virtual network that hosts the Vertica cluster. Create a new virtual network or select an existing one from the dropdown list.
If you select an existing virtual network, Vertica recommends that you already created a subnet to use for the deployment.
-
First subnet: The subnet for the associated Virtual Network. Create a new subnet or select an existing one from the dropdown list.
-
Public IP Address Resource Name: Each VM is configured with a publicly accessible IP address. This field allows you to specify the resource name for those IP addresses, and whether they are static or dynamic. The first public IP address resource is created exactly as entered, and associated with the VerticaManagement Console. Azure appends a number from 1 to 16 to the resource name for each additional Vertica cluster node created. This number associates each VM with a resource.
-
Domain Name Label for Management Console: Because each VM has a public IP address, each node requires a DNS name. Enter a prefix for the name. The first DNS name is created exactly as entered, and associated with the VerticaManagement Console. Azure appends a number from 1 to 16 to the DNS name for each Vertica cluster node created. That number associates each VM with a resource. Azure adds the remaining part of the fully qualified domain name based on the location where you created the cluster.
-
Select Next: Review + create >.
5. verifying on review + create
As the Review + create page loads, Azure validates your settings. After it passes validation, review your settings. When you are satisfied with your selections, select Create.
Accessing the MC after deployment
After your resources are successfully deployed, you are brought to the Overview page on Home > resources-name > Deployments. You must retrieve your Management Console IP address and username to log in.
-
From the Overview page, select Outputs in the left navigation.
-
Copy the vertica management console URL and vertica management console user name.
-
Paste the vertica management console URL in the browser address bar and press Enter.
-
Depending on your browser, you might receive a warning of a security risk. If you receive the warning, select the Advanced button and follow the browsers instructions to proceed to the Management Console.
-
On the VerticaManagement Console log in page, paste the vertica management console user name, and enter the Password for MC Access that you entered on Basics > Project details when you were deploying your MC instance.
Deleting a resource group
For details about the Azure Resource Manager and deleting a resource group, see the Azure documentation.
2.2 - Manually deploy Vertica on Microsoft Azure
Manually creating a database cluster for your Vertica deployment lets you customize your VMs to meet your specific needs.
Manually creating a database cluster for your Vertica deployment lets you customize your VMs to meet your specific needs. You often want to manually configure your VMs when deploying a Vertica cluster to host an Eon Mode database.
To start creating your Vertica cluster in Azure using manual steps, you first need to create a VM. During the VM creation process, you create and configure the other resources required for your cluster, which are then available for any additional VMs that you create.
The topics in this section explain how to manually deploy Vertica on Azure.
2.2.1 - Recommended Azure VM types
Vertica supports a range of Microsoft Azure virtual machine (VM) types, each optimized for different purposes.
Vertica supports a range of Microsoft Azure virtual machine (VM) types, each optimized for different purposes. Choose the VM type that best matches your performance and price needs as a user.
For the best performance in most common scenarios, use one of the following VMs:
Virtual Machine Types |
Virtual Machine Size |
Memory optimized |
DS13_v2
DS14_v2
DS15_v2
D8s_v3
D16s_v3
D32s_v3
|
High memory and I/O throughput |
GS3
GS4
GS5
E8s_v3
E16s_v3
E32s_v3
L8s
L16s
L32s
|
2.2.2 - Supported Azure operating systems
For best performance, use one of the following operating systems when deploying Vertica on Azure:.
For best performance, use one of the following operating systems when deploying Vertica on Azure:
For more information, see Supported platforms.
2.2.3 - Configuring and launching a new instance
An Azure VM is similar to a traditional host.
An Azure VM is similar to a traditional host. Just as with an on-premises cluster, you must prepare and configure the hardware settings for your cluster and network before you install Vertica.
The first steps are:
-
From the Azure marketplace, select an operating system that Vertica supports.
-
Select a VM type.See Recommended Azure VM types.
-
Choose a deployment model. For best results, choose the resource manager deployment model.
Vertica has specific network security group requirements, as described in page_title.
Create and name your own network security group, following these guidelines.
You must configure SSH as:
You can make additional modifications, based on your specific requirements.
Add disk containers
Create an Azure storage account, which later contains your cluster storage disk containers.
For optimal throughput, select Premium storage and align the storage to your chosen VM type.
For more information about what a storage account is, and how to create one, refer to About Azure storage accounts.
For an Enterprise Mode database deployment, provision enough space
Create a password or assign an SSH key pair to use with Vertica.
For information about how to use key pairs in Azure, see How to create and use an SSH public and private key pair for Linux VMs in Azure.
Assign a public IP address
A public IP is an IP address that you can use to connect to your cluster externally. For best results, assign a single static public IP to a node in your cluster. You can then connect to other nodes in your cluster from your primary node using the internal IP addresses that Azure generated when you specified your virtual network settings.
By default, a public IP address is dynamic; it changes every time you shut down the server. You can choose a static IP address, but doing so can add cost to your deployment.
During a VM installation, you cannot set a DNS name. If you use dynamic public IPs, set the DNS name in the public IP resource for each VM after deployment.
For information about public IP addresses, refer to IP address types and allocation methods in Azure.
Create additional VMs
If needed, to create additional VMs, repeat the previous instructions in this document.
2.2.4 - Connect to a virtual machine
Before you can connect to any of the VMs you created, you must first make your virtual network externally accessible.
Before you can connect to any of the VMs you created, you must first make your virtual network externally accessible. To do so, you must attach the public IP address you created during network configuration to one of your VMs.
Connect to your VM
To connect to your VM, complete the following tasks:
-
Connect to your VM using SSH with the public IP address you created in the configuration steps.
-
Authenticate using the credentials and authentication method you specified during the VM creation process.
Connect to other VMs
Connect to other virtual machines in your virtual network by first using SSH to connect to your publicly connected VM. Then, use SSH again from that VM to connect through the private IP addresses of your other VMs.
If you are using private key authentication, you may need to move your key file to the root directory of your publicly connected VM. Then, use PuTTY or WinSCP to connect to other VMs in your virtual network.
2.2.5 - Prepare the virtual machines
After you create your VMs, you need to prepare them for cluster formation.
After you create your VMs, you need to prepare them for cluster formation.
Add the Vertica license and private key
Prepare your nodes by adding your private key (if you are using one) to each node and to your Vertica license. These steps assume that the initial user you configured is the DBADMIN user.
-
As the dbadmin user, copy your private key file from where you saved it locally onto your primary node.
Depending upon the procedure you use to copy the file, the permissions on the file may change. If permissions change, the install_vertica
script fails with a message similar to the following:
Failed Login Validation 10.0.2.158, cannot resolve or connect to host as root.
If you receive a failure message, enter the following command to correct permissions on your private key file:
$ chmod 600 /<name-of-key>.pem
-
Copy your Vertica license to your primary VM. Save it in your home directory or other known location.
Install software dependencies for Vertica on Azure
In addition to the Vertica standard Package dependencies, as the root user, you must install the following packages before you install Vertica on Azure:
-
pstack
-
mcelog
-
sysstat
-
dialog
2.2.6 - Configure storage
Use a dedicated Azure storage account for node storage.
Use a dedicated Azure storage account for node storage.
Caution
Caution: Do not store your information on the root
volume, especially your data
and catalog
directories. Storing information on the root
volume may result in data loss.
When configuring your storage, make sure to use a supported file system. For details, see Recommended storage format types.
Attach disk containers to virtual machines (VMs)
Using your previously created storage account, attach disk containers to your VMs that are appropriate to your needs.
For best performance, combine multiple storage volumes into RAID-0. For most RAID-0 implementations, attach 6 storage disk containers per VM.
Combine disk containers for storage
If you are using RAID, follow these steps to create a RAID-0 drive on your VMs. The following example shows how you can create a RAID-0 volume named md10
from 6 individual volumes named:
-
Form a RAID-0 volume using the mdadm
utility:
$ mdadm --create /dev/md10 --level 0 --raid-devices=6 \
/dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
-
Format the file system to be one that Vertica supports:
$ mkfs.ext4 /dev/md10
-
Find the UUID on the newly-formed RAID volume using the blkid
command. In the output, look for the device you assigned to the RAID volume:
$ blkid
. . .
/dev/md10 : UUID="e7510a6f-2922-4413-b5fa-9dcd725967fd" TYPE="ext4" PARTUUID="fb9b7449-08c3-4231-9ee5-086f7b0c9001"
. . .
-
The RAID device can be renamed after a reboot. To ensure the filesystem is mounted in a predictable location on your VM, create a directory to use as the mount point to mount the filesystem. For example, you can choose to create a mount point named /data
that you will use to store your database's catalog and data (or depot, if you are running Vertica in Eon Mode).
$ mkdir /data
-
Using a text editor, add an entry to the /etc/fstab
file for the UUID of the filesystem and your mount point so it is mounted when the system boots:
UUID=RAID_UUID mountpoint ext4 defaults,nofail,nobarrier 0 2
For example, if you have the UUID shown in the previous example and the mount point /data
, add the following line to the /etc/fstab
file:
UUID=e7510a6f-2922-4413-b5fa-9dcd725967fd /data ext4 defaults,nofail,nobarrier 0 2
-
Mount the RAID filesystem you added to the fstab file. For example, to mount a mount point named /data
use the command:
$ mount /data
-
Create folders for your Vertica data and catalog under your mount point.
$ mkdir /data/vertica
$ mkdir /data/vertica/data
If you are planning to run Vertica in Eon Mode, create a directory for the depot instead of data:
$ mkdir /data/vertica/depot
Create a swap file
In addition to storage volumes to store your data, Vertica requires a swap volume or swap file to operate.
Create a swap file or swap volume of at least 2 GB. The following steps show how to create a swap file within Vertica on Azure:
-
Install devnull and swapfile:
$ install -o root -g root -m 0600 /dev/null /swapfile
-
Create the swap file:
$ dd if=/dev/zero of=/swapfile bs=1024 count=2048k
-
Prepare the swap file using mkswap
:
$ mkswap /swapfile
-
Use swapon
to instruct Linux to swap on the swap file:
$ swapon /swapfile
-
Persist the swapfile in FSTAB:
$ echo "/swapfile swap swap auto 0 0" >> /etc/fstab
Repeat the volume attachment, combination, and swap file creation procedures for each VM in your cluster.
2.2.7 - Download Vertica
To download the Vertica server appropriate for your operating system and license type, go to www.vertica.com/download/vertica.
Run the rpm to extract the files.
After you complete the download and extraction, the next section describes how to use the install_vertica
script to form a cluster and install the Vertica database software.
2.2.8 - Form a cluster and install Vertica
Use the install_vertica script to combine two or more individual VMs to form a cluster and install the Vertica database.
Use the install_vertica
script to combine two or more individual VMs to form a cluster and install the Vertica database.
Before you start
Before you run the install_vertica
script:
-
Check the Virtual Network page for a list of current VMs and their associated private IP addresses.
-
Identify your storage location. The installer assumes that you have mounted your storage to /vertica/data
. To specify another location, use the --data-dir
argument.
-
Identify your storage location. To create your database's data
directory on mounted RAID drive, when you run the install_vertica
script, provide /vertica/data
as the value of the --data-dir
option .
Caution
Caution: Do not store your data on the root drive.
Combine virtual machines (VMs)
The following example shows how to combine VMs using the install_vertica
script.
-
While connected to your primary node, construct the following command to combine your nodes into a cluster.
$ sudo /opt/vertica/sbin/install_vertica --hosts 10.2.0.164,10.2.0.165,10.2.0.166 --dba-user-password-disabled --point-to-point --data-dir /vertica/data --ssh-identity ~/<name-of-private-key>.pem --license <license.file>
-
Substitute the IP addresses for your VMs and include your root key file name, if applicable.
-
Include the --point-to-point
parameter to configure spread to use direct point-to-point communication between all Vertica nodes, as required for clusters on Azure when installing or updating Vertica.
-
If you are using Vertica Community Edition, which limits you to three nodes, specify -L CE
with no license file.
-
After you combine your nodes, to reduce security risks, keep your key file in a secure place—separate from your cluster—and delete your on-cluster key with the shred command:
$ shred examplekey.pem
Important
You need your key file to perform future Vertica updates.
-
Reboot your cluster to complete the cluster formation and Vertica installation.
For complete information on the install_vertica
script and its parameters, see Installing Vertica with the installation script.
2.2.9 - After your cluster is up and running
Now that your cluster is configured and running, take these steps:.
Now that your cluster is configured and running, take these steps:
-
Log into one of the database nodes using the database administrator account (named dbadmin by default).
-
Create and start a database:
-
Configure your database. See Configuring the database.
2.3 - Eon Mode databases on Azure
You can create an database on a cluster that is hosted on Azure.
You can create an Eon Mode database on a cluster that is hosted on Azure. In this configuration, your database stores its data communally in Azure Blob storage. See Eon Mode to learn more about this database mode.
Eon Mode databases on Azure support some of the encryption features built into Azure Storage. You can use its encryption at rest feature transparently—you do not need to configure Vertica to take advantage of it. You can use Microsoft-managed or customer-managed keys for storage encryption. Vertica does not support Azure Storage's client-side encryption and encryption using customer-provided keys. See the Azure Data Encryption at rest page in the Azure documentation for more information about the encryption at rest features in Azure Storage.
This section explains how you create an Eon Mode database running on Azure cloud.
2.3.1 - Eon Mode on Azure prerequisites
Before you can create an Eon Mode database on Azure, you must have a database cluster and an Azure blob storage container to store your database's data.
Before you can create an Eon Mode database on Azure, you must have a database cluster and an Azure blob storage container to store your database's data.
Cluster requirements
Before you can create an Eon Mode database on Azure, you must provision a cluster to host it. See Configuring your Vertica cluster for Eon Mode for suggestions on choosing VM configurations and the number of nodes your cluster should start with.
Storage requirements
An Eon Mode database on Azure stores its data communally in Azure blob storage. Vertica only supports block blob storage for communal data storage, not append or page blob storage.
You must create a storage path for Vertica to use exclusively. This path can be a blob container or a folder within a blob container. This path must not contain any files. If you attempt to create an Eon Mode database with a container or folder that contains files, admintools returns an error.
You pass Vertica a URI for the storage path using the azb://
schema. See Azure Blob Storage object store for the format of this URI.
You must also configure the storage container so Vertica is authorized to access it. Depending on authentication method you use, you may need to supply Vertica the with credentials to access the container. Vertica can use one of following methods to authenticate with the blob storage container:
-
Using Azure managed identities. This authentication method is transparent—you do not need to add any authentication configuration information to Vertica. Vertica automatically uses the managed identity bound to the VMs it runs on to authenticate with the blob storage container. See the Azure AD-managed identities for Azure resources documentation page in the Azure documentation for more information.
If you provide credentials for either of the other two supported authentication methods, Vertica uses them instead of authenticating using a managed identity bound to your VM.
Note
If your Azure VMs have more than one managed identity bound to them, you must tell Vertica which identity to use when authenticating with the blob storage container. Vertica gets the identity to use from a tag set on the VMs that it is running on.
On your VMs, create a tag with its key named VerticaManagedIdentityClientId and its value to the name of a managed identity bound to your VMs. See the Use tags to organize your Azure resources and management hierarchy page in the Azure documentation for more information.
-
Using an account name and access key credentials for a service account that has full access to the blob storage container. In this case, you provide Vertica with the credentials when you create the Eon Mode database. See Creating an Authentication File for details.
-
Using a shared access signature (SAS) that grants Vertica access to the storage container. See Grant limited access to Azure Storage resources using shared access signatures (SAS) in the Azure documentation. See Creating an Authentication File for details.
For details on how Vertica accesses Azure blob storage, see Azure Blob Storage object store.
2.3.2 - Manually creating an Eon Mode database on Azure
Once you have met the cluster and storage requirements for using an Eon Mode database on Azure, you are ready to create an Eon Mode database.
Once you have met the cluster and storage requirements for using an Eon Mode database on Azure, you are ready to create an Eon Mode database. Use the admintools create_db
tool to create your Eon Mode database.
Creating an authentication file
If your database will use a managed identity to authenticate with the Azure storage container, you do not need to supply any additional configuration information to the create_db
tool.
If your database will not use a managed identity, you must supply create_db
with authentication information in a configuration file. It must contain at least the AzureStorageCredentials parameter that defines one or more account names and keys Vertica will use to access blob storage. It can also contain an AzureStorageEnpointConfig parameter that defines an alternate endpoint to use instead of the the default Azure host name. This option is useful if you are creating a test environment using an Azure storage emulator such as Azurite.
Important
Vertica does not officially support Azure storage emulators as a communal storage location.
The following table defines the values that can be set in these two parameters.
- AzureStorageCredentials
- Collection of JSON objects, each of which specifies connection credentials for one endpoint. This parameter takes precedence over Azure managed identities.
The collection must contain at least one object and may contain more. Each object must specify at least one of accountName
or blobEndpoint
, and at least one of accountKey
or sharedAccessSignature
.
accountName
: If not specified, uses the label of blobEndpoint
.
blobEndpoint
: Host name with optional port (host:port
). If not specified, uses account
.blob.core.windows.net
.
accountKey
: Access key for the account or endpoint.
sharedAccessSignature
: Access token for finer-grained access control, if being used by the Azure endpoint.
- AzureStorageEndpointConfig
- Collection of JSON objects, each of which specifies configuration elements for one endpoint. Each object must specify at least one of
accountName
or blobEndpoint
.
accountName
: If not specified, uses the label of blobEndpoint
.
blobEndpoint
: Host name with optional port (host:port
). If not specified, uses account
.blob.core.windows.net
.
protocol
: HTTPS (default) or HTTP.
isMultiAccountEndpoint
: true if the endpoint supports multiple accounts, false otherwise (default is false). To use multiple-account access, you must include the account name in the URI. If a URI path contains an account, this value is assumed to be true unless explicitly set to false.
The authentication configuration file is a text file containing the configuration parameter names and their values. The values are in a JSON format. The name of this file is not important. The following examples use the file name auth_params.conf
.
The following example is a configuration file for a storage account hosted on Azure. The storage account name is mystore, and the key value is a placeholder. In your own configuration file, you must provide the storage account's access key. You can find this value by right-clicking the storage account in the Azure Storage Explorer and selecting Copy Primary Key.
AzureStorageCredentials=[{"accountName": "mystore", "accountKey": "access-key"}]
The following example shows a configuration file that defines an account for a storage container hosted on the local system using the Azurite storage system. The user account and key are the "well-known" account provided by Azurite by default. Because this configuration uses an alternate storage endpoint, it also defines the AzureStorageEndpointConfig parameter. In addition to reiterating the account name and endpoint definition, this example sets the protocol to the non-encrypted HTTP.
Important
This example wraps the contents of the JSON values for clarity. In an actual configuration file, you cannot wrap these values. They must be on a single line.
AzureStorageCredentials=[{"accountName": "devstoreaccount1", "blobEndpoint": "127.0.0.1:10000 ",
"accountKey":
"Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="
}]
AzureStorageEndpointConfig=[{"accountName": "devstoreaccount1",
"blobEndpoint": "127.0.0.1:10000", "protocol": "http"}]
Creating the Eon Mode database
Use the admintools create_db
tool to create your Eon Mode database. The required arguments you pass to this tool are:
Argument |
Description |
--communal-storage-location |
The URI for the storage container Vertica will use for communal storage. This URI must use the azb:// schema. See Azure Blob Storage object store for the format of this URI. |
-x |
The path to the file containing the authentication parameters Vertica needs to access the communal storage location. This argument is only required if your database will use a storage account name and key to authenticate with the storage container. If it is using a managed identity, you do not need to specify this argument. |
--depot-path |
The absolute path to store the depot on the nodes in the cluster. |
--shard-count |
The number of shards for the database. This is an integer number that is usually either a multiple of the number of nodes in your cluster, or an even divisor. See Planning for Scaling Your Cluster for more information. |
-s |
A comma-separated list of the nodes in your database. |
-d |
The name for your database. |
Some other common optional arguments for create_db
are:
Argument |
Description |
-l |
The absolute path to the Vertica license file to apply to the new database. |
-p |
The password for the new database. |
--depot-size |
The maximum size for the depot. Defaults to 60% of the filesystem containing the depot path.
You can specify the size in two ways:
-
integer % : Percentage of filesystem's disk space to allocate.
-
integer {K|M|G|T} : Amount of disk space to allocate for the depot in kilobytes, megabytes, gigabytes, or terabytes.
However you specify this value, the depot size cannot be more than 80 percent of disk space of the file system where the depot is stored.
|
To view all arguments for the create_db tool, run the command:
admintools -t create_db --help
The following example demonstrates creating an Eon Mode database with the following settings:
-
Vertica will use a storage account named mystore.
-
The communal data will be stored in a directory named verticadb
located in a storage container named db_blobs
.
-
The authentication information Vertica needs to access the storage container is in the file named auth_params.conf
in the current directory. The contents of this file are shown in the first example under Creating an Authentication File.
-
The hostnames of the nodes in the cluster are node01 through node03.
$ admintools -t create_db \
--communal-storage-location=azb://mystore/db_blobs/verticadb \
-x auth_params.conf -s node01,node02,node03 \
-d verticadb --depot-path /vertica/depot --shard-count 3 \
-p 'mypassword'
2.4 - page_title
Vertica has the following network security group requirements.
Vertica has the following network security group requirements.
For details on security groups and how to create one, see the Azure documentation.
Inbound settings
Name |
Protocol |
Source port range |
Destination port range |
Source |
Destination |
SSH |
TCP |
|
22 |
Any |
Any |
HTTP |
TCP |
|
80 |
Any |
Any |
HTTPS |
TCP |
|
80 |
Any |
Any |
HTTPS |
TCP |
|
443 |
Any |
Any |
DNS (UDP) |
UDP |
|
53 |
Any |
Any |
Spread |
UDP |
|
4803-4805 |
Any |
Any |
Spread |
TCP |
|
4803-4805 |
Any |
Any |
VSQL/SQL |
TCP |
|
5433 |
Any |
Any |
Inter-node communication |
TCP |
|
5434 |
Any |
Any |
|
TCP |
|
5444 |
Any |
Any |
MC |
TCP |
|
5450 |
Any |
Any |
|
TCP |
|
8080 |
Any |
Any |
|
TCP |
|
48073 |
Any |
Any |
rsync |
TCP |
|
50000 |
Any |
Any |
Outbound settings
Name |
Protocol |
Source port range |
Destination port range |
Source |
Destination |
All TCP |
TCP |
0-65535 |
|
Any |
Any |
All ICMP |
ICMP |
0-65535 |
|
Any |
Any |
All UDP |
UDP |
0-65535 |
|
Any |
Any |
3 - Vertica on Google Cloud Platform
Welcome to the Vertica on Google Cloud Platform guide.
Welcome to the Vertica on Google Cloud Platform guide.
Vertica provides two templates to help you deploy a Vertica database running in either Enterprise Mode or Eon Mode. See Architecture for more information about these modes.
The following topics describe several deployment methods to run Vertica on Google Cloud Platform.
3.1 - Supported GCP machine types
Vertica Analytic Database supports a range of machine types, each optimized for different workloads.
Vertica Analytic Database supports a range of machine types, each optimized for different workloads. When you deploy your Vertica Analytic Database cluster to the Google Cloud Platform (GCP), different machine types are available depending on how you provision your database.
Note
Some machine types are not available across all regions.
The sections below list the GCP machine types that Vertica supports for Vertica cluster hosts, and for use in Management Console. For details on the configuration of the machine type options, see the Google Cloud documentation's Machine types page.
Machine types available for MC hosts
Vertica supports all N1, N2, E2, M1, M2, and C2 machine types to deploy an instance for running the Vertica Management Console.
Tip
In most cases, 8 vCPUs are sufficient when selecting a machine type for running the Management Console.
Machine types available for Vertica database cluster hosts
Vertica supports all N1, N2, E2, M1, M2, and C2 machine types to deploy cluster hosts.
Machine types for Vertica database cluster hosts provisioned from MC
The table below lists the GCP machine types that Vertica supports when you provision your cluster from Management Console.
Machine Type |
Machine Name |
N1 standard |
n1-standard-16
n1-standard-32
n1-standard-64
|
N1 high-memory |
n1-highmem-16
n1-highmem-32
n1-highmem-64
|
N2 standard |
n2-standard-16
n2-standard-32
n2-standard-48
n2-standard-64
|
N2 high-memory |
n2-highmem-16
n2-highmem-32
n2-highmem-48
n2-highmem-64
|
3.2 - Deploy Vertica from the Google cloud marketplace
The Vertica entries in the Google Cloud Launcher Marketplace let you quickly deploy a Vertica cluster in the Google Cloud Platform (GCP).
The Vertica entries in the Google Cloud Launcher Marketplace let you quickly deploy a Vertica cluster in the Google Cloud Platform (GCP). Currently, three entries let you select the database mode and the license you want to use:
-
The Enterprise Mode launcher deploys a Vertica database with 3 or more nodes, plus an additional VM running the Management Console (MC). See Deploying an Enterprise Mode database in GCP from the marketplace for more information.
-
The Eon Mode BYOL (bring your own license) launcher deploys a single instance running the MC. You use this MC instance to deploy a Vertica database running on Eon Mode. This database has a community license applied to it initially. You can later upgrade it to a license you have obtained from Vertica. See Deploying an Eon Mode database on GCP for more information.
-
The Eon Mode BTH (by the hour) launcher also deploys a single instance running the MC that you use to deploy a database. This database has a by-the-hour license applied to it. Instead of paying for a license up front, you pay an hourly fee that covers both Vertica and running your instances. The BTH license is automatically applied to all clusters you create using a BTH MC instance. See Deploying an Eon Mode database on GCP for more information. If you choose, you can upgrade this hourly license to a longer-term license you purchase from Vertica. To move a BTH cluster to a BYOL license, follow the instructions in Moving a cloud installation from by the hour (BTH) to bring your own license (BYOL) for more information.
Note
Vertica clusters that use IPv6 to identify hosts have not been tested on GCP. Vertica recommends you use IPv4 addresses to identify the hosts in your cluster on GCP.
3.2.1 - Deploying an Enterprise Mode database in GCP from the marketplace
The Vertica Cloud Launcher solution creates a Vertica Enterprise Mode database.
The Vertica Cloud Launcher solution creates a Vertica Enterprise Mode database. The solution includes the Vertica Management Console (MC) as the primary UI for you to get started.
The launcher automatically creates a database named vdb using the Community Edition (CE) license. The CE license is limited to a maximum of 3 nodes. You can tell the launcher to add more than 3 nodes to your deployment. In this case, it uses the first three nodes in the cluster to create the database. The remaining nodes are not part of the database, but are added to your cluster. To add these nodes to your database, you must replace the Community Edition license with a license key you receive from the Software Entitlement support site. See Managing licenses for more information.
After the launcher creates the initial database, it configures the MC to attach to that database automatically.
To get started with a deployment of Vertica from the Google Cloud Launcher, search for the Vertica Data Warehouse, Enterprise Mode entry.
Follow these steps:
-
Verify that your user account has the Editor role and the runtimeconfig.waiters.getIamPolicy
permission.
-
From the listing page, click LAUNCH.
-
On the New Vertica Analytics Platform deployment page, enter the following information:
-
Deployment name: Each deployment must have a unique name. That name is used as the prefix for the names of all VMs created during the deployment. The deployment name can only contain lowercase characters, numbers, and dashes. The name must start with a lowercase letter and cannot end with a dash.
-
Zone: GCP breaks its cloud data centers into regions and zones. Regions are a collection of zones in the same geographical location. Zones are collections of compute resources, which vary from zone to zone.
For best results, pick the zone in your designated region that supports the latest Intel CPUs. For a complete listing of regions and zones, including supported processors, see Regions and Zones.
-
Service Account: Service accounts allow automated processes to authenticate with GCP. Select the default service account, identified by project_number
-compute@developer.gserviceaccount.com
.
-
Under Vertica Management Console, choose the configuration for the virtual machine that will run the Management Console. The Vertica Analytics Platform in Cloud Launcher always deploys the Vertica Management Console (MC) as part of the solution.
The default machine type for MC is sufficient for most deployments. You can choose another machine type that better suits any additional purposes, such serving as a target node for backups, data transformation, or additional management tools.
-
Node count for Vertica Cluster: The total number of VMs you want to deploy in the Vertica Cluster. The default is 3.
Note
As mentioned above, the Cloud Launcher automatically deploys the Vertica Community Edition license, which limits the database to 3 nodes and up to 1 TB in raw data. Any additional nodes will be part of your database cluster, but will not be part of your database.
If you intend to use the Community Edition license for your database, leave the setting at 3. Otherwise, you would add nodes that will sit idle and cost you money without being part of your database.
-
Machine type for Vertica Cluster nodes: The Cloud Launcher builds each node in the cluster using the same machine type. Modify the machine type for your nodes based on the workloads you expect your database to handle. See Supported GCP machine types for more information.
-
Data disk type: GCP offers two types of persistent disk storage: Standard and SSD. The costs associated with Standard are less, but the performance of SSD storage is much better. Vertica recommends you use SSD storage. For more information on Standard and SSD persistent disks, see Storage Options.
-
Disk size in GB: Disk performance is directly tied to the disk size in GCP. The default value of 2000 GBs (2 TB) is the minimum disk size for SSD persistent disks that allows maximum throughput.
If you select a smaller disk size, the throughput performance decreases. If you select a large disk size, the performance remains the same as the 2 TB option.
-
Network: VMs in GCP must exist on a virtual private cloud (VPC). When you created your GCP account, a default VPC was created. Create additional VPCs to isolate solutions or projects from one another. The Vertica Analytics Plaform creates all the nodes in the same VPC.
-
Subnetwork: Just as a GCP account may have multiple VPCs, each VPC may also have multiple subnets. Use additional subnets to group or isolate solutions within the same VPC.
-
Firewall: If you want your MC to be accessible via the internet, check the Allow access to the Management Console from the Internet box. Vertica recommends you protect your MC using a firewall that restricts access to just the IP addresses of users that need to access it. You can enter one or more comma-separated CIDR address ranges.
After you have entered all the required information, click Deploy to begin the deployment process.
Monitor the deployment
After the deployment begins, Google Cloud Launcher automatically opens the Deployment Manager page that displays the status of the deployment. Items that are still being processed have a spinning circle to the left of them and the text is a light gray color. Items that have been created are dark gray in color, with an icon designating that resource type on the left.
After the deployment completes, a green check mark appears next to the deployment name in the upper left-hand section of the screen.
Accessing the cluster after deployment
After the deployment completes, the right-hand section of the screen displays the following information:
-
dbadmin password: A randomly generated password for the dbadmin account on the nodes. For security reasons, change the dbadmin password when you first log in to one of the Vertica cluster nodes.
-
mcadmin password: A randomly generated password for the mcadmin account for accessing the Management Console. For security reasons, change the mcadmin password after you first log in to the MC.
-
Vertica Node 1 IP address: The external IP address for the first node in the Vertica cluster is exposed here so that you can connect to the VM using a standard SSH client.To access the MC, press the Access Vertica MC button in the Get Started section of the dialog box. Copy the mcadmin password and paste it when asked.
For more information on using the MC, see Management Console.
Access the cluster nodes
There are two ways to access the cluster nodes directly:
-
Use GCP's integrated SSH shell by selecting the SSH button in the Get Started section. This shell opens a pop-up in your browser that runs GCP's web-based SSH client. You are automatically logged on as the user you authenticated as in the GCP environment.
After you have access to the first Vertica cluster node, execute the su dbadmin
command, and authenticate using the dbadmin password.
-
In addition, use other standard SSH clients to connect directly to the first Vertica cluster node. Use the Vertica Node 1 IP address listed on the screen as the dbadmin user, and authenticate with the dbadmin password.
Follow the on-screen directions to log in using the mcadmin account and accept the EULA. After you've been authenticated, access the initial database by clicking the vdb icon (looks like a green cylinder) in the Recent Databases section.
Using a custom service account
In general, you should use the default service account created by the GCP deployment (project_number
-compute@developer.gserviceaccount.com
), but if you want to use a custom service account:
3.2.2 - Eon Mode databases on GCP
You deploy an Eon Mode database to GCP using Google Cloud Platform Launcher to deploy a Management Console (MC) instance.
You deploy an Eon Mode database to GCP using Google Cloud Platform Launcher to deploy a Management Console (MC) instance. You then use the MC instance to provision and deploy an Eon Mode database.
3.2.2.1 - GCP Eon Mode instance recommendations
When you use the MC to deploy an Eon Mode database to the Google Cloud Platform (GCP), you choose the instance type to deploy as the database's nodes.
When you use the MC to deploy an Eon Mode database to the Google Cloud Platform (GCP), you choose the instance type to deploy as the database's nodes. The default instance settings in the MC are the more conservative option (currently, n1-standard-16). They are sufficient for most workloads. However, you may choose instances with more memory (such as n1-highmem-16) if your queries perform complex joins that may otherwise spill to disk. You can also choose instances with more cores (such as n1-standard-32), if you perform highly-complex compute-intensive analysis. The following links provide additional information about GCP machine type instances and Vertica:
The more powerful instance you choose, the higher the cost per hour. You need to balance whether you want to use fewer, higher-powered but more expensive instances vs. relying on more lower-powered instances that cost less. Thanks to Eon Mode's elasticity, if you choose to use the less-powerful instances, you can always add more nodes to meet peak demands. When you reduce the number of instances to a minimum during off-peak times, you'll spend less than if you had a similar number of more-powerful instances.
Storage options
The MC's deployment wizard also asks you to select the type of local storage for your instances. You can select different options for each type of local storage that Vertica uses: the catalog, the depot, and temporary space. For all of these storage locations, you choose the type of disks to use (standard vs. SSD). You will see the best performance with SSD disks. However, SSD disks cost more.
For the depot, you also choose whether to use local or persistent disks. The local option is faster, as it resides directly on the virtual machine host. However, whenever you shut down the node, this storage is wiped clean. The persistent storage is slower than the local option, as it is not stored directly on the machine hosting the instance. However, it is not wiped out whenever you shut down the instance. See the Google Cloud documentation's Storage options page for more information.
Which of these options you choose depends on how much depot warming the nodes must perform when starting. If the content of your node's depots change little over time (or you tend to frequently start and stop instances), using persistent storage makes sense. In this case, the depot's warming period will be shorter because most of the data the node needs to participate in queries may still be in its depot when it starts. It will perform fewer fetches of data from communal storage while participating in queries.
If your working data set is rapidly changing or you tend to leave nodes stopped for extended periods of time, your best choice is usually to use local storage. In this scenario, the data in the node's depot when it restarts is usually stale. To participate in queries, the node must fetch much of the data it needs from communal storage, resulting in slower performance until it has warmed its depot. Using local ephemeral storage makes sense here, because you will get the benefit of having faster depot storage. Because your nodes have to warm their depots anyhow, there is less of a downside of having the depot on ephemeral storage.
For general guidelines on scaling your cluster for Eon Mode database, see Configuring your Vertica cluster for Eon Mode.
3.2.2.2 - Eon Mode on GCP prerequisites
Before deploying an Eon Mode database on GCP, you must take several steps:.
Before deploying an Eon Mode database on GCP, you must take several steps:
-
Review the default service account's permissions for your GCP project.
-
Create an HMAC key to use when creating your cluster.
-
Create a communal storage location.
Service account permissions
Service accounts allow automated processes to authenticate with GCP. The Eon Mode database deployment process uses the project's service account for your GCP project to deploy instances. When you create a new project, GCP automatically creates a default service account (identified by project_number
-compute@developer.gserviceaccount.com
) for the project and grants it the IAM role Editor. See the Google Cloud documentation's Understanding roles for details about this and other IAM roles.
The Editor role lets the service account create resources from the Marketplace. When you create an instance of the Management Console (MC), the MC uses the account to deploy further resources, such as provisioning instances for an database.
For details, see the Google Cloud documentation's Understanding service accounts page.
Permissions and roles
To deploy Vertica on GCP, your user account must have the:
Creating an HMAC key
Vertica uses a hash-based message authentication code (HMAC) key to authenticate requests to access the communal storage location. This key has two parts: an access ID and a secret. When you create an Eon Mode database in GCP, you provide both parts of an HMAC key for the nodes to use to access communal storage.
To create an HMAC key:
-
Log in to your Google Cloud account.
-
If the name of the project you will use to create your database does not appear in the top banner, click the dropdown and select the correct project.
-
In the navigation menu in the upper-left corner, under the Storage heading, click Storage and select Settings.
-
In the Settings page, click Interoperability.
-
Scroll to the bottom of the page and find the User account HMAC heading.
-
Unless you have already set a default project, you will see the message stating you haven’t set a default project for your user account yet. Click the Set project-id as default project button to choose the current project as your default for interoperability.
Note
The project ID appears in the button label, not the project name.
-
Under Access keys for your user account, click Create a key.
-
Your new access key and secret appear in the HMAC key list. You will need them when you create your Eon Mode database. You can copy them to a handy location (such as a text editor) or leave a browser tab open to this page while you use another tab or window to create your database. These keys remain available on this page, so you do not need to worry about saving them elsewhere.
Caution
It is vital that you protect the security of your HMAC key. It can grant others access to your Eon Mode database's communal storage location. This means they could access all of the data in your database. Do not write the HMAC key anyplace where it may be exposed, such as email, shared folders, or similar insecure locations.
Creating a communal storage location
Your Eon Mode database needs a storage location for its communal storage. Eon Mode databases running on GCP use Google Cloud Storage (GCS) for their communal storage location. When you create your new Eon Mode database, you will supply the MC's wizard with a GCS URL for the storage location.
This location needs to meet the following criteria:
-
The URL must include at least a bucket name. You can use one or more levels of folders, as well. For example, the following GCS URLs are valid:
Multiple databases can share the same bucket, as long as each has its own folder.
-
If provided, the lowest-level folder in the URL must not already exist. For example, in the GCS URL gs://verticabucket/databases/mydatabase
, the bucket named verticabucket
and the directory named databases
must exist. The subdirectory named mydatabase
must not exist. The Vertica install process expects to create the final folder itself. If the folder already exists, the installation process fails.
-
The permissions on the bucket must be set to allow the service account read, write, and delete privileges on the bucket. The best role to assign to the user to gain these permissions is Storage Object Admin.
-
To prevent performance issues, the bucket must be in the same region as all of the nodes running the Eon Mode database.
-
If you create the database through the admintools UI, you must set gcsauth
as a bootstrap parameter in admintools.conf
. For more information on this and other GCP parameters, see Google Cloud Storage parameters.
[BootstrapParameters]
gcsauth = ID:secret
3.2.2.3 - Deploying an Eon Mode database on GCP
Once you have taken the steps listed in Eon Mode on GCP Prerequisites, you are ready to deploy an Eon Mode database in GCP.
Once you have taken the steps listed in Eon Mode on GCP prerequisites, you are ready to deploy an Eon Mode database in GCP. This process has two steps: deploy a single-node MC instance, then use the MC to provision and deploy a database. The following topics explain these steps.
3.2.2.3.1 - Deploying an MC instance to GCP for Eon Mode
To deploy an MC instance that is able to deploy Eon Mode databases to GCP:.
To deploy an MC instance that is able to deploy Eon Mode databases to GCP:
-
Log into your GCP account, if you are not currently logged in.
-
Verify that your user account has the Editor role and the runtimeconfig.waiters.getIamPolicy
permission.
-
Verify that the name of the GCP project you want to use for the deployment appears in the top banner. If it does not, click the down arrow next to the project name and select the correct project.
-
Click the navigation menu icon in the top left of the page and select Marketplace.
-
In the Search for solutions box, type Vertica Eon Mode and press enter.
-
Click the search result for Vertica Data Warehouse, Eon Mode. There are two license options: by the hour (BTH) and bring your own license (BYOL). See Deploy Vertica from the Google cloud marketplace for more information on this license choice.
-
Click Launch on the license option you prefer.
-
On the following page, fill in the fields to configure your MC instance:
-
Deployment name identifies your MC deployment in the GCP Deployments page.
-
Zone is the location where the virtual machine running your MC instance will be deployed. Make this the same location where your communal storage bucket is located.
-
Service Account: Service accounts allow automated processes to authenticate with GCP. Select the default service account, identified by project_number
-compute@developer.gserviceaccount.com
.
-
Machine Type is the virtual hardware configuration of the instance that will run the MC. The default values here are "middle of the road" settings which are sufficient for most use cases. If you are doing a small proof-of-concept deployment, you can choose a less powerful instance to save some money. If you are planning on deploying multiple large databases, consider increasing the count of virtual CPUs and RAM.
For details about Vertica's default volume configurations, see Eon Mode volume configuration defaults for GCP.
-
User Name for Access to MC is the administrator username for the MC. You can customize this if you want.
-
Network and Subnetwork are the virtual private cloud (VPC) network and subnet within that network you want your MC instance and your Vertica nodes to use. This setting does not affect your MC's external network address. If you want to isolate your Vertica cluster from other GCP instances in your project, create a custom VPC network and optionally a subnet in your GCP project and select them in these fields. See the Google Cloud documentation's VPC network overview page for more information.
-
Firewall enables access to the MC from the internet by opening port 5450 in the firewall. You can choose to not open this port by clearing the I accept opening a port in the firewall (5450) for Vertica box. However, if you do not open the port in the firewall, your MC instance will only be accessible from within the VPC network. Not opening the port will make accessing your MC instance much harder.
-
Source IP ranges for MC traffic: If you choose to open the MC for external access, add one or more or more CIDR address ranges to this box for network addresses that you want to be able to access to the MC.
Caution
Make the address ranges as limited as possible to reduce the chances of unauthorized access to your MC instance.
-
Click the Deploy button to start the deployment of your MC instance.
The deployment process will take several minutes.
Using a custom service account
In general, you should use the default service account created by the GCP deployment (project_number
-compute@developer.gserviceaccount.com
), but if you want to use a custom service account:
Connect and log into the MC instance
After the deployment process is finished, the Deployment Manager page for your MC instance contains links to connect to the MC via your browser or ssh.
To connect to the MC instance:
-
The MC administrator user has a randomly-generated password that you need to log into the MC. Copy the password in the MC Admin Password field to the clipboard.
-
Click Access Management Console.
-
A new browser tab or window opens, showing you a page titled Redirection Notice. Click the link for the MC URL to continue to the MC login page.
-
Your browser will likely show you a security warning. The MC instance uses a self-signed security certificate. Most browsers treat these certificates as a security hazard because they cannot verify their origin. You can safely ignore this warning and continue. In most browsers, click the Advanced button on the warning page, and select the option to proceed. In Chrome, this is a link titled Proceed to xxx.xxx.xxx.xxx (unsafe). In Firefox, it is a button labeled Accept the Risk and Continue.
-
At the login screen, enter the MC administrator user name into the Username box. This user name is mcadmin, unless you changed the user name in the MC deployment form.
-
Paste the automatically-generated password you copied from the MC Admin Password field earlier into the Password box.
-
Click Log In.
Once you have logged into the MC, change the MC administrator account's password.
Caution
The automatically-generated password appears on the MC instance's deployment page and can be revealed in several locations in the deployment logs. Failure to change this password can lead to unauthorized access to your MC instance.
To change the password:
-
On the home page of the MC, under the MC Tools section, click MC Settings.
-
In the left-hand menu, click User Management.
-
Select the entry for the MC administrator account and click Edit.
-
Click either the Generate new or Edit password button to change the password. If you click the Generate new button, be sure to save the automatically-generated password in a safe location. If you click Edit password, you are prompted to enter a new password twice.
-
Click Save to update the password.
Now that you have created your MC instance, you are ready to deploy a Vertica Eon Mode cluster. See Using the MC to provision and create an Eon Mode database in GCP.
3.2.2.3.2 - Using the MC to provision and create an Eon Mode database in GCP
After you deploy an MC instance to GCP, use it to deploy an Eon Mode database.
After you deploy an MC instance to GCP, use it to deploy an Eon Mode database.
Note
Currently, the admintools menu-based interface does not support creating an Eon Mode database on GCP.
To use the MC to provision and deploy a new Eon Mode database on GCP:
-
From the MC home screen, click Create new database to launch the Create a Vertica Cluster on Google Cloud wizard.
-
On the first page of the wizard enter the following information:
-
Google Cloud Storage HMAC Access Key and HMAC Secret Key: Copy and paste the HMAC access key and secret you created earlier. You find these values on the Interoperability tab of the of the Storage Settings page. See Eon Mode on GCP prerequisites for details.
-
Zone: This value defaults to the zone containing your MC instance. Make this value the same as the zone containing the Google Cloud Storage bucket that your database will use for communal storage.
-
Caution
You will see significant performance issues if you choose different zones for cluster instances, storage, or the MC.
-
CIDR Range: The IP address range for clients to whom you want to grant access to your database. Make this range as restrictive as possible to limit access to your database.
-
Click Next, and supply the following information:
-
Vertica Database Name: the name for your new database. See Creating a database name and password for database name requirements.
-
Vertica Version: select the desired Vertica database version. You can select from the latest hotfix of recent Vertica releases. For each database version, you can also select the operating system.
-
Vertica Database User Name: the name of the database superuser. This name defaults to dbadmin, but you can enter another user name here.
-
Password and Confirm Password: Enter a password for the database superuser account.
-
Database Size: The number of nodes in your initial database. If you specify more than three nodes here, you must supply a valid Vertica license file in the Vertica License field (below).
-
Vertica License: Click Browse to locate and upload your Vertica license key file. If you do not supply a license key file here, the wizard deploys your database with a Vertica Community Edition license. This license has a three node limit, so the value in the Database Size filed cannot be larger than 3 if you do not supply a license. If you use a Community Edition license for your deployment, you can upgrade the license later to expand your cluster load more than 1TB of data. See Managing licenses form more information.
Note
This field does not appear if you created your MC instance using a by-the-hour (BTH) launcher. The BTH license is automatically applied to all clusters you create using a BTH MC instance. For a by-the-hour license, cloud vendors charge the customer for licensed Vertica usage along with their cloud infrastructure charges.
-
Load example data: Check this box if you want your deployed database to load some example clickstream data. This option is useful if you are testing features and just want some preloaded data in the database to query.
-
Click Next and supply the following information:
-
Instance Type: the specifications of the virtual machine instances the MC will use to deploy your database nodes. See the Google Cloud documentation's Machine types page for details of each instance type. Also see GCP Eon Mode instance recommendations.
-
Database Depot Path and Disk Type: the local mount point for the depot, and the type and number of local disks dedicated to the depot for each node. You cannot change the mount path for the depot. The disks you select in the Disk Type field are only used to store the depot. On the next page of the wizard, you will configure disks for the catalog and temporary disk space. You will see the best performance when using SSD disks, although at a higher cost. You can choose to use faster local storage for your depot. However, local storage is ephemeral—GCP wipes the disk clean whenever you stop the instance. This means each time you start a node, it will have to warm its depot from scratch, rather than taking advantage of any still-current data in its depot. See the Google Cloud documentation's Storage options page for more information about the local disk options.
-
Volume Size: the amount of disk space available on each disk attached to each node in your cluster. This field shows you the total disk space available per node in your cluster. For the best practices on choosing the amount of disk space for your nodes, see Configuring your Vertica cluster for Eon Mode.
-
Data Segmentation Shards: sets the number of shards in your database. After you set this value, you cannot change it later. See Configuring your Vertica cluster for Eon Mode for recommendations. The default value is based on the number of nodes you entered in the Database size you specified earlier. It is usually sufficient, unless you anticipate greatly expanding your cluster beyond your initial node count.
-
Communal Location: a Google Cloud Storage URL that specifies where to store your database's communal data. See Eon Mode on GCP prerequisites for requirements.
-
Instance IP settings: specify whether the nodes in your database will have static or ephemeral network addresses that are accessible from the internet, or addresses that are only accessible from within the internal virtual network.
-
Click Next. The wizard validates your communal storage location URL. If there is an problem with the URL you entered, it displays an error message and prompts you to fix the URL.
After your communal storage URL passes validation, fill in the following information:
-
Database Catalog Path, Disk Type, and Size (GB) per Available Node: the mount point disk type, and disk size for the local copy of the database catalog on each node. You cannot edit the mount point. You choose the type of local disk to use for the catalog, and its size. You can only choose persistent disk storage for the catalog. SSD drives are faster, but more expensive than standard disks. The default setting for the disk size is adequate for most medium size databases. Increase the size if you anticipate maintaining a large database.
-
Database Temp Path, Disk Type, and Size (GB) per Available Node: the mount point disk type, and disk size for the temporary storage space on each node. You cannot edit the mount point. You choose the type of local disk to use, and its size. You can only choose persistent disk storage for the temporary disk space. SSD drives are faster, but more expensive than standard disks. The default setting is adequate for most databases. Consider increasing the temporary space if you perform many complex merges that spill to disk.
-
Label Instances: check this box to enable adding labels to your node's instances. Many organizations use labels to organize, track responsibility, and assign costs for instances. See the Google Cloud documentation's Labeling resources page for more information. If you choose to add labels, enter the label name and value, and click Add.
-
Click Next. Review the summary of all your database settings. If you need to make a correction, use the Back button to step back to previous pages of the wizard.
-
When you are satisfied with the database settings, check Accept terms and conditions and click Create.
The process of provisioning and creating the database takes several minutes. After it completes successfully, the MC displays a Get Started button. This button leads to a page of useful links for getting started with your new database.
See also
3.3 - Manually deploying an Enterprise Mode database on GCP
Before you create your Vertica cluster in Google Cloud Platform (GCP) using manual steps, you must create a virtual machine (VM) instance from the Compute Engine section of GCP.
Before you create your Vertica cluster in Google Cloud Platform (GCP) using manual steps, you must create a virtual machine (VM) instance from the Compute Engine section of GCP.
All VM instances that you create should be launched in the same virtual public cloud (VPC).
To configure and launch a new VM instance, follow these instructions:
-
From within the Compute Engine section of GCP, from the menu on the left-hand site of the screen, select VM Instances.
GCP displays all the VM instances that you have created so far.
-
Select the CREATE INSTANCE link.
-
Enter a name for the new instance.
-
Select the zone where you plan to deploy the instance.
GCP breaks its cloud data centers down by regions and zones. Regions are a collection of zones that are all in the same geographical location. Zones are collections of compute resources, which vary from zone to zone. Always pick the zone in your designated region that supports the latest Intel CPUs.
For a complete listing of regions and zones, including supported processors, see Regions and Zones.
-
Select a machine type.
GCE offers many different types of VM instances. For best results, only deploy Vertica on VM instances with 8 vCPus or more and at least 30 GB of RAM.
-
Select the boot disk (image).
You create VM instances from a public or custom image. If you are starting with Vertica in GCP for the first time, select either the CentOS 7 or RHEL 7 public image. Those images have been tested thoroughly with Vertica.
For more information about deploying a VM instance, see Creating and Starting an Instance.
After you have configured the VM instance to be used as a Vertica cluster node, GCP allows you to convert that instance into a custom image. Doing so allows you to deploy multiple versions of that VM instance; each VM instance is identical except for the node name and IP address.
For more information about creating a custom image, see Creating, Deleting, and Deprecating Custom Images.
Connect to a virtual machine
Before you can connect to any of the VMs you created, you must first identify the external IP address. The VM instance section of GCP contains a list of all currently deployed VMs and their associated external IP addresses.
Connect to your VM
To connect to your VM, complete the following tasks:
-
Connect to your VM using SSH with the external IP address you created in the configuration steps.
-
Authenticate using the credentials and SSH key that you provided to your GCP account upon creation.
Connect to other VMs
To connect to other virtual machines in your virtual network:
-
Use SSH to connect to your publicly connected VM.
-
Use SSH again from that VM to connect through the private IP addresses of your other VMs.
Because GCP forces the use of private key authentication, you may need to move your key file to the root
directory of your publicly connected VM. Then, use SSH to connect to other VMs in your virtual network.
Prepare the virtual machines
After you create your VMs, you need to prepare them for cluster formation.
Add the Vertica license and private key
Prepare your nodes by adding your private key (if you are using one) to each node and to your Vertica license. The following steps assume that the initial user you configured is the DBADMIN user:
-
As the DBADMIN user, copy your private key file from where you saved it locally onto your primary node.
Depending upon the procedure you use to copy the file, the permissions on the file may change. If permissions change, the install_vertica script fails with a message similar to the following:
Failed Login Validation 10.0.2.158, cannot resolve or connect to host as root.
If you see the previous failure message, enter the following command to correct permissions on your private key file:
$ chmod 600 /<name-of-key>.pem
-
Copy your Vertica license to your primary VM. Save it in your home directory or other known location.
Install software dependencies for Vertica on GCP
In addition to the Vertica standard package dependencies, as the root user, you must install the following packages before you install Vertica:
-
pstack
-
mcelog
-
sysstat
-
dialog
For best disk performance in GCP, Vertica recommends customers use SSD persistent storage, configured to at least 2TB (2000 GB) in size. Disk performance is directly tied to the disk size in GCP. 2000 GBs (2TB) is the minimum disk size for SSD persistent disks that allows maximum throughput.
Caution
Do not store your information on the root
volume, especially in your data and catalog directories. Storing information on the root volume may result in data loss.
When configuring your storage, make sure to use a supported file system. See Recommended storage format types for details.
Create a swap file
In addition to storage volumes to store your data, Vertica requires a swap volume or swap file for the setup script to complete.
Create a swap file or swap volume of at least 2 GB. The following steps show how to create a swap file within Vertica on GCP:
-
Install the devnull
and swapfile
files:
$ install -o root -g root -m 0600 /dev/null /swapfile
-
Create the swap file:
$ dd if=/dev/zero of=/swapfile bs=1024 count=2048k
-
Prepare the swap file using mkswap
:
$ mkswap /swapfile
-
Use swapon
to instruct Linux to swap on the swap file:
$ swapon /swapfile
-
Persist the swapfile in FSTAB:
$ echo "/swapfile swap swap auto 0 0" >> /etc/fstab
-
Repeat the volume attachment, combination, and swap file creation procedures for each VM in your cluster.
Download Vertica
To download the Vertica server appropriate for your operating system and license type, follow the steps in described in Download and install the Vertica server package.
After you complete the download and extraction, use the install_vertica script to form a cluster and install the Vertica database software, as described in the next section.
Use the install_vertica script to combine two or more individual VMs to form a cluster and install your Vertica database.
Before you run the install_vertica script, follow these steps:
-
Check the VM Instances page of the Compute Engine section on GCP to locate a list of current VMs and their associated internal IP addresses.
-
Identify your storage location on your VMs. The installer assumes that you have mounted your storage to /home/dbadmin
. To specify another location, use the --data-dir
argument.
Caution
Do not store your data on the root drive.
The following steps show how to combine virtual machines (VMs) into a cluster using the install_vertica script:
-
While connected to your primary node, construct the following command to combine your nodes into a cluster.
$ sudo /opt/vertica/sbin/install_vertica --hosts 10.2.0.164,10.2.0.165,10.2.0.166 --dba-user-password-disabled --point-to-point --data-dir /vertica/data --ssh-identity ~/.pem --license
-
Substitute the IP addresses for your VMs, and include your root key file name, if applicable.
-
Include the --point-to-point
parameter to configure spread to use direct point-to-point communication among all Vertica nodes, as required for clusters on GCP when installing or updating Vertica.
-
If you are using Vertica Community Edition, which limits you to three nodes, specify -L CE
with no license file.
-
After you combine your nodes, to reduce security risks, keep your key file in a secure place—separate from your cluster—and delete your on-cluster key with the shred command:
$ shred examplekey.pem
Important
You need your key file to perform future Vertica updates.
For complete information about the install_vertica script and its parameters, see Installing Vertica with the installation script.
After your cluster is up and running
Now that your cluster is configured and running, and Vertica is running, take these steps:
- Create a database. See Creating a database for details.
- When you installed Vertica, a database administrator user was created with the DBADMIN role (usually named dbadmin). Use this account to create and start a database.
- See Configuring the database for important database configuration steps.
4 - Moving a cloud installation from by the hour (BTH) to bring your own license (BYOL)
Vertica offers two licensing options for some of the entries in the Amazon Web Services Marketplace and Google Cloud Marketplace:.
Vertica offers two licensing options for some of the entries in the Amazon Web Services Marketplace and Google Cloud Marketplace:
- Bring Your Own License (BYOL): a long-term license that you obtain through an online licensing portal. These deployments also work with a free Community Edition license. Vertica uses a community license automatically if you do not install a license that you purchased. (For more about Vertica licenses, see Managing licenses and Understanding Vertica licenses.)
- Vertica by the Hour (BTH): a pay-as-you-go environment where you are charged an hourly fee for both the use of Vertica and the cost of the instances it runs on. The Vertica by the hour deployment offers an alternative to purchasing a term license. If you want to crunch large volumes of data within a short period of time, this option might work better for you. The BTH license is automatically applied to all clusters you create using a BTH MC instance.
If you start out with an hourly license, you can later decide to use a long-term license for your database. The support for an hourly versus a long-term license is built into the instances running your database. To move your database from an hourly license to a long-term license, you must create a new database cluster with a new set of instances.
To move from an hourly to a long-term license, follow these steps:
-
Purchase a BYOL license. Follow the process described in Obtaining a license key file.
-
Apply the new license to your database.
-
Shut down your database.
-
Create a new database cluster using a BYOL marketplace entry.
-
Revive your database onto the new cluster.
The exact steps you must take depend on your database mode and your preferred tool for managing your database:
Moving an Eon Mode database from BTH to BYOL using the command line
Follow these steps to move an Eon Mode database from an hourly to a long-term license.
Obtain a long-term BYOL license from the online licensing portal, described in Obtaining a license key file.Upload the license file to a node in your database. Note the absolute path in the node's filesystem, as you will need this later when installing the license.Connect to the node you uploaded the license file to in the previous step.
Connect to your database using vsql and view the licenses table:
=> SELECT * FROM licenses;
Note the name of the hourly license listed in the NAME column, so you can check if it is still present later.
Install the license in the database using the INSTALL_LICENSE function with the absolute path to the license file you uploaded in step 2:
=> SELECT install_license('absolute path to BYOL license');
View the licenses table again:
=> SELECT * FROM licenses;
If only the new BYOL license appears in the table, skip to step 8. If the hourly license whose name you noted in step 4 is still in the table, copy the name and proceed to step 7.
Call the DROP_LICENSE function to drop the hourly license:
=> SELECT drop_license('hourly license name');
-
You will need the path for your cluster's communal storage in a later step. If you do not already know the path, you can find this information by executing this query:
=> SELECT location_path FROM V_CATALOG.STORAGE_LOCATIONS
WHERE sharing_type = 'COMMUNAL';
-
Synchronize your database's metadata. See Synchronizing metadata.
-
Shut down the database by calling the SHUTDOWN function:
=> SELECT SHUTDOWN();
-
You now need to create a new BYOL cluster onto which you will revive your database. Deploy a new cluster including a new MC instance using a BYOL entry in the marketplace of your chosen cloud platform. See:
Important
Your new BYOL cluster must have the same number of primary nodes as your existing hourly license cluster.
-
Revive your database onto the new cluster. For instructions, see Reviving an Eon Mode database cluster. Because you created the new cluster using a BYOL entry in the marketplace, the database uses the BYOL you applied earlier.
-
After reviving the database on your new BYOL cluster, terminate the instances for your hourly license cluster and MC. For instructions, see your cloud provider's documentation.
Moving an Eon Mode database from BTH to BYOL using the MC
Follow this procedure to move to BYOL and revive your database using MC:
-
Purchase a long-term BYOL license from the online licensing portal, following the steps detailed in Obtaining a license key file. Save the file to a location on your computer.
-
You now need to install the new license on your database. Log into MC and click your database in the Recent Databases list.
-
At the bottom of your database's Overview page, click the License tab.
-
Under the Installed Licenses list, note the name of the BTH license in the License Name column. You will need this later to check whether it is still present after installing the new long-term license.
-
In the ribbon at the top of the License History page, click the Install New License button. The Settings: License page opens.
-
Click the Browse button next to the Upload a new license box.
-
Locate the license file you obtained in step 1, and click Open.
-
Click the Apply button on the top right of the page.
-
Select the checkbox to agree to the EULA terms and click OK.
-
After Vertica installs the license, click the Close button.
-
Click the License tab at the bottom of the page.
-
If only the new long-term license appears in the Installed Licenses list, skip to Step 16. If the by-the-hour license also appears in the list, copy down its name from the License Name column.
-
You must drop the by-the-hour license before you can proceed. At the bottom of the page, click the Query Execution tab.
-
In the query editor, enter the following statement:
SELECT DROP_LICENSE('hourly license name');
-
Click Execute Query. The query should complete indicating that the license has been dropped.
-
You will need the path for your cluster's communal storage in a later step. If you do not already know the path, you can find this information by executing this query in the Query Execution tab:
SELECT location_path FROM V_CATALOG.STORAGE_LOCATIONS
WHERE sharing_type = 'COMMUNAL';
-
Synchronize your database's metadata. See Synchronizing metadata.
-
You must now stop your by-the-hour database cluster. At the bottom of the page, click the Manage tab.
-
In the banner at the top of the page, click Stop Database and then click OK to confirm.
-
From the Amazon Web Services Marketplace or the Google Cloud Marketplace, deploy a new Vertica Management Console using a BYOL entry. Do not deploy a full cluster. You just need an MC deployment.
-
Log into your new MC instance and revive the database. See Reviving an Eon Mode database on AWS in MC for detailed instructions.
-
After reviving the database on your new environment, terminate the instances for your hourly license environment. To do so, on the AWS CloudFormation Stacks page, select the hourly environment's stack (its collection of AWS resources) and click Actions > Delete Stack.
Moving an Enterprise Mode database from hourly to BYOL using backup and restore
Note
Currently, AWS is the only platform supported for Enterprise Mode databases using hourly licenses.
In an Enterprise Mode database, follow this procedure to move to BYOL, and then back up and restore your database:
Obtain a long-term BYOL license from the online licensing portal, described in Obtaining a license key file.Upload the license file to a node in your database. Note the absolute path in the node's filesystem, as you will need this later when installing the license.Connect to the node you uploaded the license file to in the previous step.
Connect to your database using vsql and view the licenses table:
=> SELECT * FROM licenses;
Note the name of the hourly license listed in the NAME column, so you can check if it is still present later.
Install the license in the database using the INSTALL_LICENSE function with the absolute path to the license file you uploaded in step 2:
=> SELECT install_license('absolute path to BYOL license');
View the licenses table again:
=> SELECT * FROM licenses;
If only the new BYOL license appears in the table, skip to step 8. If the hourly license whose name you noted in step 4 is still in the table, copy the name and proceed to step 7.
Call the DROP_LICENSE function to drop the hourly license:
=> SELECT drop_license('hourly license name');
-
Back up the database. See Backing up and restoring the database.
-
Deploy a new cluster for your database using one of the BYOL entries in the Amazon Web Services Marketplace.
-
Restore the database from the backup you created earlier. See Backing up and restoring the database. When you restore the database, it will use the BYOL you loaded earlier.
-
After restoring the database on your new environment, terminate the instances for your hourly license environment. To do so, on the AWS CloudFormation Stacks page, select the hourly environment's stack (its collection of AWS resources) and click Actions > Delete Stack.
After completing one of these procedures, see Viewing your license status to confirm the license drop and install were successful.
5 - Adjusting Spread Daemon timeouts for virtual environments
You may see Vertica nodes leave the database even though they are still running.
You may see Vertica nodes leave the database even though they are still running. This issue can happen on networks that are prone to spikes in latency or in virtual environments where a node's VM may be paused for a short period of time. You can adjust a setting in Vertica to help prevent this issue from occurring.
Vertica relies on spread daemons to pass messages between database nodes. When a node fails to respond to a spread message after a timeout period, Vertica assumes the node is down and starts to remove it from the database.
The default Spread timeout depends on the number of configured Spread segments:
Configured Spread segments |
Default timeout |
1 |
8 seconds |
> 1 |
25 seconds |
If network delays or temporary pauses of a VM last longer than the spread timeout period, you may see UP nodes leave the database. In these cases, you can increase the spread timeout to reduce or eliminate instances where UP nodes leave the database.
Azure's memory-preserving updates and spread timeouts
In Azure, you might see running nodes leave the database due to scheduled maintenance. Azure's maintenance down time is usually well-defined. For example, Azure's memory-preserving updates can pause a VM for up to 30 seconds while performing maintenance on the system hosting the VM. This pause does not disrupt the node. It continues normal operation once Azure resumes it. See the Azure documentation's topic on Maintenance for virtual machines in Azure for more information about updates. If Azure pauses a node for longer than the spread timeout period, Vertica interprets the node's inability to respond to a spread message as the node going down, even though it will resume running normally.
Note
If you deploy your Vertica cluster using the Azure Marketplace, the spread timeout defaults to 35 seconds. If you manually create your cluster in Azure, the spread timeout defaults to 8 or 25 seconds, as described earlier.
Setting the spread timeout
When you know your network or nodes may be unable to respond for a specific amount of time, you can increase the spread timeout period to longer than this time. Adjust the timeout to the period of time the node may be unable to respond, plus an additional 5 seconds as a safety margin.
For example, if you know Azure's memory-preserving maintenance can pause your VMs for up to 30 seconds, set the spread timeout to 35 seconds.
If you do not know exactly how long network or node disruptions can last, you can try increasing the spread timeout gradually, until you see reduced instances of UP nodes leaving the database. Be as conservative with this setting as you can.
Important
Vertica cannot react to a node going down or being shut down improperly before the timeout period has elapsed. Changing spread’s timeout to a value too high can result in longer query restarts if a node goes down.
You can see the current setting of the spread timeout by querying system tableSPREAD_STATE:
=> SELECT * FROM V_MONITOR.SPREAD_STATE;
node_name | token_timeout
------------------+---------------
v_vmart_node0003 | 8000
v_vmart_node0001 | 8000
v_vmart_node0002 | 8000
(3 rows)
You change the spread timeout calling the meta-function SET_SPREAD_OPTION to set the token timeout to a new value. This value is a string, and sets the timeout in milliseconds.
Important
Changing spread settings with SET_SPREAD_OPTION
has minor impact on your cluster as it pauses while the new settings are propagated across the entire cluster.
This example sets the timeout to 35 seconds (35000ms):
=> SELECT SET_SPREAD_OPTION( 'TokenTimeout', '35000');
NOTICE 9003: Spread has been notified about the change
SET_SPREAD_OPTION
--------------------------------------------------------
Spread option 'TokenTimeout' has been set to '35000'.
(1 row)
=> SELECT * FROM V_MONITOR.SPREAD_STATE;
node_name | token_timeout
------------------+---------------
v_vmart_node0001 | 35000
v_vmart_node0002 | 35000
v_vmart_node0003 | 35000
(3 rows);
Note
The changes you make to the spread timeout might not take effect immediately. It might take some time before you see the settings change in system table V_MONITOR.SPREAD_STATE table.
See also