The topics described in this section are performed automatically by the install_vertica
script and are described in Installing using the command line. If you did not encounter any installation problems, proceed to the Administrator's guidefor instructions on how to configure and operate a database.
This is the multi-page printable view of this section. Click here to print.
Troubleshooting the Vertica install
1 - Validation scripts
Vertica provides several validation utilities that can be used prior to deploying Vertica to help determine if your hosts and network can properly handle the processing and network traffic required by Vertica. These utilities can also be used if you are encountering performance issues and need to troubleshoot the issue.
After you install the Vertica RPM, you have access to the following scripts in /opt/vertica/bin
:
-
Vcpuperf - a CPU performance test used to verify your CPU performance.
-
Vioperf - an Input/Output test used to verify the speed and consistency of your hard drives.
-
Vnetperf - a Network test used to test the latency and throughput of your network between hosts.
These utilities can be run at any time, but are well suited to use before running the install_vertica script.
1.1 - Vcpuperf
The vcpuperf utility measures your server's CPU processing speed and compares it against benchmarks for common server CPUs. The utility performs a CPU test and measures the time it takes to complete the test. The lower the number scored on the test, the better the performance of the CPU.
The vcpuperf utility also checks the high and low load times to determine if CPU throttling is enabled. If a server's low-load computation time is significantly longer than the high-load computation time, CPU throttling may be enabled. CPU throttling is a power-saving feature. However, CPU throttling can reduce the performance of your server. Vertica recommends disabling CPU throttling to enhance server performance.
Syntax
vcpuperf [-q]
Option
Option | Description |
---|---|
-q |
Run in quiet mode. Quiet mode displays only the CPU Time, Real Time, and high and low load times. |
Returns
-
CPU Time: the amount of time it took the CPU to run the test.
-
Real Time: the total time for the test to execute.
-
High load time: The amount of time to run the load test while simulating a high CPU load.
-
Low load time: The amount of time to run the load test while simulating a low CPU load.
Example
The following example shows a CPU that is running slightly slower than the expected time on a Xeon 5670 CPU that has CPU throttling enabled.
[root@node1 bin]# /opt/vertica/bin/vcpuperf
Compiled with: 4.1.2 20080704 (Red Hat 4.1.2-52) Expected time on Core 2, 2.53GHz: ~9.5s
Expected time on Nehalem, 2.67GHz: ~9.0s
Expected time on Xeon 5670, 2.93GHz: ~8.0s
This machine's time:
CPU Time: 8.540000s
Real Time:8.710000s
Some machines automatically throttle the CPU to save power.
This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
Low load times much larger than 100-200us or much larger than the corresponding high load time
indicate low-load throttling, which can adversely affect small query / concurrent performance.
This machine's high load time: 67 microseconds.
This machine's low load time: 208 microseconds.
1.2 - Vioperf
The vioperf
utility quickly tests the performance of your host's input and output subsystem. The utility performs the following tests:
-
sequential write
-
sequential rewrite
-
sequential read
-
skip read (read non-contiguous data blocks)
The utility verifies that the host reads the same bytes that it wrote and prints its output to STDOUT. The utility also logs the output to a JSON formatted file.
For data in HDFS, the utility tests reads but not writes.
Syntax
vioperf [--help] [--duration=<INTERVAL>] [--log-interval=<INTERVAL>]
[--log-file=<FILE>] [--condense-log] [--thread-count=<N>] [--max-buffer-size=<SIZE>]
[--preserve-files] [--disable-crc] [--disable-direct-io] [--debug]
[<DIR>*]
Minimum and recommended I/O performance
-
The minimum required I/O is 20 MB/s read/write per physical processor core on each node, in full duplex (reading and writing) simultaneously, concurrently on all nodes of the cluster.
-
The recommended I/O is 40 MB/s per physical core on each node.
-
The minimum required I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s. Vertica recommends 480 MB/s.
For example, the I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s required minimum, 480 MB/s recommended.
Disk space vioperf needs
vioperf
requires about 4.5 GB to run.
Options
Option | Description |
---|---|
--help |
Prints a help message and exits. |
--duration |
The length of time
|
--log-interval |
The interval at which the log file reports summary information. The default interval is 10 seconds. This option uses the same interval notation as --duration . |
--log-file |
The path and name where log file contents are written, in JSON. If not specified, then vioperf creates a file named results date-time.JSON in the current directory. |
--condense-log |
Directs vioperf to write the log file contents in condensed format, one JSON entry per line, rather than as indented JSON syntax. |
--thread-count=<N> |
The number of execution threads to use. By default, vioperf uses all threads available on the host machine. |
--max-buffer-size=<SIZE> |
The maximum size of the in-memory buffer to use for reads or writes. Specify the units with any of these suffixes:
|
--preserve-files |
Directs vioperf to keep the files it writes. This parameter is ignored for HDFS tests, which are read-only. Inspecting the files can help diagnose write-related failures. |
--disable-crc |
Directs vioperf to ignore CRC checksums when validating writes. Verifying checksums can add overhead, particularly when running vioperf on slower processors. This parameter is ignored for HDFS tests. |
--disable-direct-io |
When reading from or writing to a local file system, Disabling this behavior can produce more realistic performance results but slows down the operation of |
--debug |
Directs vioperf to report verbose error messages. |
<DIR> |
Zero or more directories to test. If you do not specify a directory, To test reads from a directory on HDFS:
|
Returns
The utility returns the following information:
Heading | Description |
---|---|
test |
The test being run (Write, ReWrite, Read, or Skip Read) |
directory |
The directory in which the test is being run. |
counter name |
The counter type of the test being run. Can be either MB/s or Seeks per second. |
counter value |
The value of the counter in MB/s or Seeks per second across all threads. This measurement represents the bandwidth at the exact time of measurement. Contrast with counter value (avg). |
counter value (10 sec avg) |
The average amount of data in MB/s, or the average number of Seeks per second, for the test being run in the duration specified with --log-interval . The default interval is 10 seconds. The counter value (avg) is the average bandwidth since the last log message, across all threads. |
counter value/core |
The counter value divided by the number of cores. |
counter value/core (10 sec avg) |
The counter value (10 sec avg) divided by the number of cores. |
thread count |
The number of threads used to run the test. |
%CPU |
The available CPU percentage used during this test. |
%IO Wait |
The CPU percentage in I/O Wait state during this test. I/O wait state is the time working processes are blocked while waiting for I/O operations to complete. |
elapsed time |
The amount of time taken for a particular test. If you run the test multiple times, elapsed time increases the next time the test is run. |
remaining time |
The time remaining until the next test. Based on the --duration option, each of the tests is run at least once. If the test set is run multiple times, then remaining time is how much longer the test will run. The remaining time value is cumulative. Its total is added to elapsed time each time the same test is run again. |
Example
Invoking vioperf
from a terminal outputs the following message and sample results:
[dbadmin@v_vmart_node0001 ~]$ /opt/vertica/bin/vioperf --duration=60s
The minimum required I/O is 20 MB/s read and write per physical processor core on each node, in full duplex
i.e. reading and writing at this rate simultaneously, concurrently on all nodes of the cluster.
The recommended I/O is 40 MB/s per physical core on each node.
For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.
Using direct io (buffer size=1048576, alignment=512) for directory "/home/dbadmin"
test | directory | counter name | counter value | counter value (10 sec avg) | counter value/core | counter value/core (10 sec avg) | thread count | %CPU | %IO Wait | elapsed time (s)| remaining time (s)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Write | /home/dbadmin | MB/s | 420 | 420 | 210 | 210 | 2 | 89 | 10 | 10 | 5
Write | /home/dbadmin | MB/s | 412 | 396 | 206 | 198 | 2 | 89 | 9 | 15 | 0
ReWrite | /home/dbadmin | (MB-read+MB-write)/s | 150+150 | 150+150 | 75+75 | 75+75 | 2 | 58 | 40 | 10 | 5
ReWrite | /home/dbadmin | (MB-read+MB-write)/s | 158+158 | 172+172 | 79+79 | 86+86 | 2 | 64 | 33 | 15 | 0
Read | /home/dbadmin | MB/s | 194 | 194 | 97 | 97 | 2 | 69 | 26 | 10 | 5
Read | /home/dbadmin | MB/s | 192 | 190 | 96 | 95 | 2 | 71 | 27 | 15 | 0
SkipRead | /home/dbadmin | seeks/s | 659 | 659 | 329.5 | 329.5 | 2 | 2 | 85 | 10 | 5
SkipRead | /home/dbadmin | seeks/s | 677 | 714 | 338.5 | 357 | 2 | 2 | 59 | 15 | 0
Note
When evaluating performance for minimum and recommended I/O, include the Write and Read values in your evaluation. ReWrite and SkipRead values are not relevant to determining minimum and recommended I/O.1.3 - Vnetperf
The vnetperf utility measures network performance of database hosts, as well as network latency and throughput for TCP and UDP protocols.
Caution
This utility incurs high network load, which degrades database performance. Do not use this utility on a Vertica production database.This utility helps identify the following issues:
-
Low throughput for all hosts or one
-
High latency for all hosts or one
-
Bottlenecks between one or more hosts or subnets
-
Too-low limit on the number of TCP connections that can be established simultaneously
-
High rates of network packet loss
Syntax
vnetperf [[options](#Options)] [[tests](#Tests)]
Options
Option | Description |
---|---|
--condense |
Condenses the log into one JSON entry per line, instead of indented JSON syntax. |
--collect-logs |
Collects test log files from each host. |
--datarate rate |
Limits throughput to this rate in MB/s. A rate of 0 loops the tests through several different rates. Default: 0 |
--duration seconds |
Time limit for each test to run in seconds. Default: 1 |
--hosts host-name [,...] |
Comma-separated list of host names or IP addresses on which to run the tests. The list must not contain embedded spaces. |
--hosts file |
File that specifies the hosts on which to run the tests. If you omit this option, then the vnetperf tries to access admintools to identify cluster hosts. |
--identity-file file |
If using passwordless SSH/SCP access between hosts, then specify the key file used to gain access to the hosts. |
--ignore-bad-hosts |
If set, runs tests on reachable hosts even if some hosts are not reachable. If you omit this option and a host is unreachable, then no tests are run on any hosts. |
--log-dir directory |
If Default: |
--log-level level |
Log level to use, one of the following:
Default: |
--list-tests |
Lists the tests that vnetperf can run. |
--output-file file |
The file to which JSON results are written. Default: |
--ports port#[,...] |
Comma-delimited list of port numbers to use. If only one port number is specified, then the next two numbers in sequence are also used. Default: 14159,14160,14161 |
--scp-options ' scp-args ' |
Specifies one or more standard SCP command line arguments. SCP is used to copy test binaries over to the target hosts. |
--ssh-options ' ssh-args ' |
Specifies one or more standard SSH command line arguments. SSH is used to issue test commands on the target hosts. |
--tmp-dir directory |
Specifies the temporary directory for vnetperf, where Default: |
--vertica-install directory |
Indicates that Vertica is installed on each of the hosts, so vnetperf uses test binaries on the target system rather than copying them over with SCP. |
Tests
vnetperf can specify one or more of the following tests. If no test is specified, vnetperf runs all tests. Test results are printed for each host.
Test | Description | Results |
---|---|---|
latency |
Measures latency from the host that is running the script to other hosts. Hosts with unusually high latency should be investigated further. |
|
tcp-throughput |
Tests TCP throughput among hosts. |
|
udp-throughput |
Tests UDP throughput among hosts |
Recommended network performance
-
Maximum recommended RTT (round-trip time) latency is 1000 microseconds. Ideal RTT latency is 200 microseconds or less. Vertica recommends that clock skew be less than 1 second.
-
Minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more.
Note
UDP throughput can be lower; multiple network switches can adversely affect performance.
Example
$ vnetperf latency tcp-throughput
The maximum recommended rtt latency is 2 milliseconds. The ideal rtt latency is 200 microseconds or less. It is recommended that clock skew be kept to under 1 second.
test | date | node | index | rtt latency (us) | clock skew (us)
-------------------------------------------------------------------------------------------------------------------------
latency | 2022-03-29_10:23:55,739 | 10.20.100.247 | 0 | 49 | 3
latency | 2022-03-29_10:23:55,739 | 10.20.100.248 | 1 | 272 | -702
latency | 2022-03-29_10:23:55,739 | 10.20.100.249 | 2 | 245 | 1037
The minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more. Note: UDP numbers may be lower, multiple network switches may reduce performance results.
date | test | rate limit (MB/s) | node | MB/s (sent) | MB/s (rec) | bytes (sent) | bytes (rec) | duration (s)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2022-03-29_10:23:55,742 | tcp-throughput | 32 | 10.20.100.247 | 30.579 | 30.579 | 32112640 | 32112640 | 1.00151
2022-03-29_10:23:55,742 | tcp-throughput | 32 | 10.20.100.248 | 30.5791 | 30.5791 | 32112640 | 32112640 | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput | 32 | 10.20.100.249 | 30.5791 | 30.5791 | 32112640 | 32112640 | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput | 32 | average | 30.579 | 30.579 | 32112640 | 32112640 | 1.0015
2022-03-29_10:23:57,749 | tcp-throughput | 64 | 10.20.100.247 | 61.0952 | 61.0952 | 64094208 | 64094208 | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput | 64 | 10.20.100.248 | 61.096 | 61.096 | 64094208 | 64094208 | 1.00048
2022-03-29_10:23:57,749 | tcp-throughput | 64 | 10.20.100.249 | 61.0952 | 61.0952 | 64094208 | 64094208 | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput | 64 | average | 61.0955 | 61.0955 | 64094208 | 64094208 | 1.00048
2022-03-29_10:23:59,753 | tcp-throughput | 128 | 10.20.100.247 | 122.131 | 122.131 | 128122880 | 128122880 | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput | 128 | 10.20.100.248 | 122.132 | 122.132 | 128122880 | 128122880 | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput | 128 | 10.20.100.249 | 122.132 | 122.132 | 128122880 | 128122880 | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput | 128 | average | 122.132 | 122.132 | 128122880 | 128122880 | 1.00046
2022-03-29_10:24:01,757 | tcp-throughput | 256 | 10.20.100.247 | 243.819 | 244.132 | 255754240 | 256081920 | 1.00036
2022-03-29_10:24:01,757 | tcp-throughput | 256 | 10.20.100.248 | 244.125 | 243.282 | 256049152 | 255164416 | 1.00025
2022-03-29_10:24:01,757 | tcp-throughput | 256 | 10.20.100.249 | 244.172 | 243.391 | 256114688 | 255295488 | 1.00032
2022-03-29_10:24:01,757 | tcp-throughput | 256 | average | 244.039 | 243.601 | 255972693 | 255513941 | 1.00031
2022-03-29_10:24:03,761 | tcp-throughput | 512 | 10.20.100.247 | 337.232 | 485.247 | 355893248 | 512098304 | 1.00645
2022-03-29_10:24:03,761 | tcp-throughput | 512 | 10.20.100.248 | 446.16 | 231.001 | 467894272 | 242253824 | 1.00013
2022-03-29_10:24:03,761 | tcp-throughput | 512 | 10.20.100.249 | 349.667 | 409.961 | 368476160 | 432013312 | 1.00497
2022-03-29_10:24:03,761 | tcp-throughput | 512 | average | 377.686 | 375.403 | 397421226 | 395455146 | 1.00385
2022-03-29_10:24:05,772 | tcp-throughput | 640 | 10.20.100.247 | 328.279 | 509.256 | 383975424 | 595656704 | 1.11548
2022-03-29_10:24:05,772 | tcp-throughput | 640 | 10.20.100.248 | 505.626 | 217.217 | 532250624 | 228655104 | 1.00389
2022-03-29_10:24:05,772 | tcp-throughput | 640 | 10.20.100.249 | 390.355 | 474.89 | 410812416 | 499777536 | 1.00365
2022-03-29_10:24:05,772 | tcp-throughput | 640 | average | 408.087 | 400.454 | 442346154 | 441363114 | 1.04101
2022-03-29_10:24:07,892 | tcp-throughput | 768 | 10.20.100.247 | 300.5 | 426.762 | 318734336 | 452657152 | 1.01154
2022-03-29_10:24:07,892 | tcp-throughput | 768 | 10.20.100.248 | 268.252 | 402.891 | 283017216 | 425066496 | 1.00616
2022-03-29_10:24:07,892 | tcp-throughput | 768 | 10.20.100.249 | 510.569 | 243.649 | 535592960 | 255590400 | 1.00042
2022-03-29_10:24:07,892 | tcp-throughput | 768 | average | 359.774 | 357.767 | 379114837 | 377771349 | 1.00604
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | 10.20.100.247 | 304.545 | 444.261 | 334987264 | 488669184 | 1.049
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | 10.20.100.248 | 422.246 | 192.773 | 474284032 | 216530944 | 1.07121
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | 10.20.100.249 | 353.206 | 446.809 | 378732544 | 479100928 | 1.0226
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | average | 359.999 | 361.281 | 396001280 | 394767018 | 1.0476
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | 10.20.100.247 | 343.324 | 414.559 | 387710976 | 468156416 | 1.07697
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | 10.20.100.248 | 292.44 | 246.254 | 308314112 | 259620864 | 1.00544
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | 10.20.100.249 | 437.559 | 405.02 | 459145216 | 425000960 | 1.00072
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | average | 357.774 | 355.278 | 385056768 | 384259413 | 1.02771
JSON results available at: ./results.2022-03-29_10:23:51,548.json
2 - Enable secure shell (SSH) logins
The administrative account must be able to use Secure Shell (SSH) to log in (ssh) to all hosts without specifying a password. The shell script install_vertica does this automatically. This section describes how to do it manually if necessary.
-
If you do not already have SSH installed on all hosts, log in as root on each host and install it now. You can download a free version of the SSH connectivity tools from OpenSSH.
-
Log in to the Vertica administrator account (dbadmin in this example).
-
Make your home directory (~) writable only by yourself. Choose one of:
$ chmod 700 ~
or
$ chmod 755 ~
where:
700 includes 755 includes 400 read by owner
200 write by owner
100 execute by owner
400 read by owner
200 write by owner
100 execute by owner
040 read by group
010 execute by group
004 read by anybody (other)
001 execute by anybody
-
Change to your home directory:
$ cd ~
- Generate a private key/ public key pair:
$ ssh-keygen -t rsaGenerating public/private rsa key pair.
Enter file in which to save the key (/home/dbadmin/.ssh/id_rsa):
Created directory '/home/dbadmin/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/dbadmin/.ssh/id_rsa.
Your public key has been saved in /home/dbadmin/.ssh/id_rsa.pub.
- Make your .ssh directory readable and writable only by yourself:
$ chmod 700 ~/.ssh
- Change to the .ssh directory:
$ cd ~/.ssh
- Copy the file
id_rsa.pub
onto the fileauthorized_keys2
.
$ cp id_rsa.pub authorized_keys2
- Make the files in your .ssh directory readable and writable only by yourself:
$ chmod 600 ~/.ssh/*
- For each cluster host:
$ scp -r ~/.ssh <host>:.
- Connect to each cluster host. The first time you ssh to a new remote machine, you could get a message similar to the following:
$ ssh dev0 Warning: Permanently added 'dev0,192.168.1.92' (RSA) to the list of known hosts.
This message appears only the first time you ssh to a particular remote host.