This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Troubleshooting the Vertica install

The topics described in this section are performed automatically by the install_vertica script and are described in Installing Using the Command Line.

The topics described in this section are performed automatically by the install_vertica script and are described in Installing using the command line. If you did not encounter any installation problems, proceed to the Administrator's guidefor instructions on how to configure and operate a database.

1 - Validation scripts

Vertica provides several validation utilities that can be used prior to deploying Vertica to help determine if your hosts and network can properly handle the processing and network traffic required by Vertica.

Vertica provides several validation utilities that can be used prior to deploying Vertica to help determine if your hosts and network can properly handle the processing and network traffic required by Vertica. These utilities can also be used if you are encountering performance issues and need to troubleshoot the issue.

After you install the Vertica RPM, you have access to the following scripts in /opt/vertica/bin:

  • Vcpuperf - a CPU performance test used to verify your CPU performance.

  • Vioperf - an Input/Output test used to verify the speed and consistency of your hard drives.

  • Vnetperf - a Network test used to test the latency and throughput of your network between hosts.

These utilities can be run at any time, but are well suited to use before running the install_vertica script.

1.1 - Vcpuperf

The vcpuperf utility measures your server's CPU processing speed and compares it against benchmarks for common server CPUs.

The vcpuperf utility measures your server's CPU processing speed and compares it against benchmarks for common server CPUs. The utility performs a CPU test and measures the time it takes to complete the test. The lower the number scored on the test, the better the performance of the CPU.

The vcpuperf utility also checks the high and low load times to determine if CPU throttling is enabled. If a server's low-load computation time is significantly longer than the high-load computation time, CPU throttling may be enabled. CPU throttling is a power-saving feature. However, CPU throttling can reduce the performance of your server. Vertica recommends disabling CPU throttling to enhance server performance.

Syntax

vcpuperf [-q]

Option

Option Description
-q Run in quiet mode. Quiet mode displays only the CPU Time, Real Time, and high and low load times.

Returns

  • CPU Time: the amount of time it took the CPU to run the test.

  • Real Time: the total time for the test to execute.

  • High load time: The amount of time to run the load test while simulating a high CPU load.

  • Low load time: The amount of time to run the load test while simulating a low CPU load.

Example

The following example shows a CPU that is running slightly slower than the expected time on a Xeon 5670 CPU that has CPU throttling enabled.

[root@node1 bin]# /opt/vertica/bin/vcpuperf
Compiled with: 4.1.2 20080704 (Red Hat 4.1.2-52) Expected time on Core 2, 2.53GHz: ~9.5s
Expected time on Nehalem, 2.67GHz: ~9.0s
Expected time on Xeon 5670, 2.93GHz: ~8.0s

This machine's time:
  CPU Time: 8.540000s
  Real Time:8.710000s

Some machines automatically throttle the CPU to save power.
  This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
  Low load times much larger than 100-200us or much larger than the corresponding high load time
    indicate low-load throttling, which can adversely affect small query / concurrent performance.

This machine's high load time: 67 microseconds.
This machine's low load time: 208 microseconds.

1.2 - Vioperf

The vioperf utility quickly tests the performance of your host's input and output subsystem.

The vioperf utility quickly tests the performance of your host's input and output subsystem. The utility performs the following tests:

  • sequential write

  • sequential rewrite

  • sequential read

  • skip read (read non-contiguous data blocks)

The utility verifies that the host reads the same bytes that it wrote and prints its output to STDOUT. The utility also logs the output to a JSON formatted file.

For data in HDFS, the utility tests reads but not writes.

Syntax

vioperf [--help] [--duration=<INTERVAL>] [--log-interval=<INTERVAL>]
  [--log-file=<FILE>] [--condense-log] [--thread-count=<N>] [--max-buffer-size=<SIZE>]
  [--preserve-files] [--disable-crc] [--disable-direct-io] [--debug]
  [<DIR>*]
  • The minimum required I/O is 20 MB/s read/write per physical processor core on each node, in full duplex (reading and writing) simultaneously, concurrently on all nodes of the cluster.

  • The recommended I/O is 40 MB/s per physical core on each node.

  • The minimum required I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s. Vertica recommends 480 MB/s.

For example, the I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s required minimum, 480 MB/s recommended.

Disk space vioperf needs

vioperf requires about 4.5 GB to run.

Options

Option Description
--help Prints a help message and exits.
--duration

The length of time vioprobe runs performance tests. The default is 5 minutes. Specify the interval in seconds, minutes, or hours with any of these suffixes:

  • Seconds: s, sec, secs, second, seconds. Example: --duration=60sec

  • Minutes: m, min, mins, minute, minutes. Example: --duration=10min

  • Hours: h, hr, hrs, hour, hours. Example: --duration=1hrs

--log-interval The interval at which the log file reports summary information. The default interval is 10 seconds. This option uses the same interval notation as --duration.
--log-file The path and name where log file contents are written, in JSON. If not specified, then vioperf creates a file named resultsdate-time.JSON in the current directory.
--condense-log Directs vioperf to write the log file contents in condensed format, one JSON entry per line, rather than as indented JSON syntax.
--thread-count=<N> The number of execution threads to use. By default, vioperf uses all threads available on the host machine.
--max-buffer-size=<SIZE>

The maximum size of the in-memory buffer to use for reads or writes. Specify the units with any of these suffixes:

  • Bytes: b, byte, bytes.

  • Kilobytes: k, kb, kilobyte, kilobytes.

  • Megabytes: m, mb, megabyte, megabytes.

  • Gigabytes: g, gb, gigabyte, gigabytes.

--preserve-files Directs vioperf to keep the files it writes. This parameter is ignored for HDFS tests, which are read-only. Inspecting the files can help diagnose write-related failures.
--disable-crc Directs vioperf to ignore CRC checksums when validating writes. Verifying checksums can add overhead, particularly when running vioperf on slower processors. This parameter is ignored for HDFS tests.
--disable-direct-io

When reading from or writing to a local file system, vioperf goes directly to disk by default, bypassing the operating system's page cache. Using direct I/O allows vioperf to measure performance quickly without having to fill the cache.

Disabling this behavior can produce more realistic performance results but slows down the operation of vioperf.

--debug Directs vioperf to report verbose error messages.
<DIR>

Zero or more directories to test. If you do not specify a directory, vioperf tests the current directory. To test the performance of each disk, specify different directories mounted on different disks.

To test reads from a directory on HDFS:

  • Use a URL in the hdfs scheme that points to a single directory (not a path) containing files at least 10MB in size. For best results, use 10GB files and verify that there is at least one file per vioperf thread.

  • If you do not specify a host and port, set the HADOOP_CONF_DIR environment variable to a path including the Hadoop configuration files. This value is the same value that you use for the HadoopConfDir configuration parameter in Vertica. For more information see Configuring HDFS access.

  • If the HDFS cluster uses Kerberos, set the HADOOP_USER_NAME environment variable to a Kerberos principal.

Returns

The utility returns the following information:

Heading Description
test The test being run (Write, ReWrite, Read, or Skip Read)
directory The directory in which the test is being run.
counter name The counter type of the test being run. Can be either MB/s or Seeks per second.
counter value The value of the counter in MB/s or Seeks per second across all threads. This measurement represents the bandwidth at the exact time of measurement. Contrast with counter value (avg).
counter value (10 sec avg) The average amount of data in MB/s, or the average number of Seeks per second, for the test being run in the duration specified with --log-interval. The default interval is 10 seconds. The counter value (avg) is the average bandwidth since the last log message, across all threads.
counter value/core The counter value divided by the number of cores.
counter value/core (10 sec avg) The counter value (10 sec avg) divided by the number of cores.
thread count The number of threads used to run the test.
%CPU The available CPU percentage used during this test.
%IO Wait The CPU percentage in I/O Wait state during this test. I/O wait state is the time working processes are blocked while waiting for I/O operations to complete.
elapsed time The amount of time taken for a particular test. If you run the test multiple times, elapsed time increases the next time the test is run.
remaining time The time remaining until the next test. Based on the --duration option, each of the tests is run at least once. If the test set is run multiple times, then remaining time is how much longer the test will run. The remaining time value is cumulative. Its total is added to elapsed time each time the same test is run again.

Example

Invoking vioperf from a terminal outputs the following message and sample results:

[dbadmin@v_vmart_node0001 ~]$ /opt/vertica/bin/vioperf --duration=60s
The minimum required I/O is 20 MB/s read and write per physical processor core on each node, in full duplex
i.e. reading and writing at this rate simultaneously, concurrently on all nodes of the cluster.
The recommended I/O is 40 MB/s per physical core on each node.
For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.

Using direct io (buffer size=1048576, alignment=512) for directory "/home/dbadmin"

test     | directory     | counter name         | counter value | counter value (10 sec avg) | counter value/core  | counter value/core (10 sec avg) | thread count  | %CPU  | %IO Wait  | elapsed time (s)| remaining time (s)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Write    | /home/dbadmin | MB/s                 | 420           | 420                             | 210                | 210                        | 2             | 89    | 10        | 10              | 5
Write    | /home/dbadmin | MB/s                 | 412           | 396                             | 206                 | 198                        | 2             | 89    | 9         | 15              | 0
ReWrite  | /home/dbadmin | (MB-read+MB-write)/s | 150+150       | 150+150                         | 75+75               | 75+75                      | 2             | 58    | 40        | 10              | 5
ReWrite  | /home/dbadmin | (MB-read+MB-write)/s | 158+158       | 172+172                         | 79+79              | 86+86                      | 2             | 64    | 33        | 15              | 0
Read     | /home/dbadmin | MB/s                 | 194           | 194                             | 97                 | 97                         | 2             | 69    | 26        | 10              | 5
Read     | /home/dbadmin | MB/s                 | 192           | 190                             | 96                 | 95                         | 2             | 71    | 27        | 15              | 0
SkipRead | /home/dbadmin | seeks/s              | 659           | 659                             | 329.5              | 329.5                      | 2             | 2     | 85        | 10              | 5
SkipRead | /home/dbadmin | seeks/s              | 677           | 714                             | 338.5              | 357                        | 2             | 2     | 59        | 15              | 0

1.3 - Vnetperf

The vnetperf utility measures network performance of database hosts, as well as network latency and throughput for TCP and UDP protocols.

The vnetperf utility measures network performance of database hosts, as well as network latency and throughput for TCP and UDP protocols.

This utility helps identify the following issues:

  • Low throughput for all hosts or one

  • High latency for all hosts or one

  • Bottlenecks between one or more hosts or subnets

  • Too-low limit on the number of TCP connections that can be established simultaneously

  • High rates of network packet loss

Syntax

vnetperf [[options](#Options)] [[tests](#Tests)]

Options

Option Description
--condense Condenses the log into one JSON entry per line, instead of indented JSON syntax.
--collect-logs Collects test log files from each host.
--datarate rate

Limits throughput to this rate in MB/s. A rate of 0 loops the tests through several different rates.

Default: 0

--duration seconds

Time limit for each test to run in seconds.

Default: 1

--hosts host-name[,...] Comma-separated list of host names or IP addresses on which to run the tests. The list must not contain embedded spaces.
--hosts file File that specifies the hosts on which to run the tests. If you omit this option, then the vnetperf tries to access admintools to identify cluster hosts.
--identity-file file If using passwordless SSH/SCP access between hosts, then specify the key file used to gain access to the hosts.
--ignore-bad-hosts If set, runs tests on reachable hosts even if some hosts are not reachable. If you omit this option and a host is unreachable, then no tests are run on any hosts.
--log-dir directory

If --collect-logs is set, specifies the directory in which to place the collected logs.

Default: logs.netperf.<timestamp>

--log-level level

Log level to use, one of the following:

  • INFO

  • ERROR

  • DEBUG

  • WARN

Default: WARN

--list-tests Lists the tests that vnetperf can run.
--output-file file

The file to which JSON results are written.

Default: results.<timestamp>.json

--ports port#[,...]

Comma-delimited list of port numbers to use. If only one port number is specified, then the next two numbers in sequence are also used.

Default: 14159,14160,14161

--scp-options 'scp-args' Specifies one or more standard SCP command line arguments. SCP is used to copy test binaries over to the target hosts.
--ssh-options 'ssh-args' Specifies one or more standard SSH command line arguments. SSH is used to issue test commands on the target hosts.
--tmp-dir directory

Specifies the temporary directory for vnetperf, where directory must have execute permission on all hosts, and does not include the unsupported characters ", ```, or '.

Default: /tmp (execute permission required)

--vertica-install directory Indicates that Vertica is installed on each of the hosts, so vnetperf uses test binaries on the target system rather than copying them over with SCP.

Tests

vnetperf can specify one or more of the following tests. If no test is specified, vnetperf runs all tests. Test results are printed for each host.

Test Description Results
latency Measures latency from the host that is running the script to other hosts. Hosts with unusually high latency should be investigated further.
  • Round trip time latency for each host in milliseconds.

  • Clock skew—the difference in time shown by the clock on the target host relative to the host running the utility.

tcp-throughput Tests TCP throughput among hosts.
  • Date/time and test name

  • Rrate limit in MB/s

  • Tested node

  • Sent and received data in MB/s and bytes

  • Duration of the test in seconds

udp-throughput Tests UDP throughput among hosts

  • Maximum recommended RTT (round-trip time) latency is 1000 microseconds. Ideal RTT latency is 200 microseconds or less. Vertica recommends that clock skew be less than 1 second.

  • Minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more.

Example

$ vnetperf latency tcp-throughput

The maximum recommended rtt latency is 2 milliseconds. The ideal rtt latency is 200 microseconds or less. It is recommended that clock skew be kept to under 1 second.
test              | date                    | node             | index | rtt latency (us)  | clock skew (us)
-------------------------------------------------------------------------------------------------------------------------
latency           | 2022-03-29_10:23:55,739 | 10.20.100.247    | 0     | 49                | 3
latency           | 2022-03-29_10:23:55,739 | 10.20.100.248    | 1     | 272               | -702
latency           | 2022-03-29_10:23:55,739 | 10.20.100.249    | 2     | 245               | 1037

The minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more. Note: UDP numbers may be lower, multiple network switches may reduce performance results.
date                    | test              | rate limit (MB/s) | node             | MB/s (sent) | MB/s (rec)  | bytes (sent)        | bytes (rec)         | duration (s)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | 10.20.100.247    | 30.579      | 30.579      | 32112640            | 32112640            | 1.00151
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | 10.20.100.248    | 30.5791     | 30.5791     | 32112640            | 32112640            | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | 10.20.100.249    | 30.5791     | 30.5791     | 32112640            | 32112640            | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | average          | 30.579      | 30.579      | 32112640            | 32112640            | 1.0015
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | 10.20.100.247    | 61.0952     | 61.0952     | 64094208            | 64094208            | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | 10.20.100.248    | 61.096      | 61.096      | 64094208            | 64094208            | 1.00048
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | 10.20.100.249    | 61.0952     | 61.0952     | 64094208            | 64094208            | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | average          | 61.0955     | 61.0955     | 64094208            | 64094208            | 1.00048
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | 10.20.100.247    | 122.131     | 122.131     | 128122880           | 128122880           | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | 10.20.100.248    | 122.132     | 122.132     | 128122880           | 128122880           | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | 10.20.100.249    | 122.132     | 122.132     | 128122880           | 128122880           | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | average          | 122.132     | 122.132     | 128122880           | 128122880           | 1.00046
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | 10.20.100.247    | 243.819     | 244.132     | 255754240           | 256081920           | 1.00036
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | 10.20.100.248    | 244.125     | 243.282     | 256049152           | 255164416           | 1.00025
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | 10.20.100.249    | 244.172     | 243.391     | 256114688           | 255295488           | 1.00032
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | average          | 244.039     | 243.601     | 255972693           | 255513941           | 1.00031
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | 10.20.100.247    | 337.232     | 485.247     | 355893248           | 512098304           | 1.00645
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | 10.20.100.248    | 446.16      | 231.001     | 467894272           | 242253824           | 1.00013
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | 10.20.100.249    | 349.667     | 409.961     | 368476160           | 432013312           | 1.00497
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | average          | 377.686     | 375.403     | 397421226           | 395455146           | 1.00385
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | 10.20.100.247    | 328.279     | 509.256     | 383975424           | 595656704           | 1.11548
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | 10.20.100.248    | 505.626     | 217.217     | 532250624           | 228655104           | 1.00389
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | 10.20.100.249    | 390.355     | 474.89      | 410812416           | 499777536           | 1.00365
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | average          | 408.087     | 400.454     | 442346154           | 441363114           | 1.04101
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | 10.20.100.247    | 300.5       | 426.762     | 318734336           | 452657152           | 1.01154
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | 10.20.100.248    | 268.252     | 402.891     | 283017216           | 425066496           | 1.00616
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | 10.20.100.249    | 510.569     | 243.649     | 535592960           | 255590400           | 1.00042
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | average          | 359.774     | 357.767     | 379114837           | 377771349           | 1.00604
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | 10.20.100.247    | 304.545     | 444.261     | 334987264           | 488669184           | 1.049
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | 10.20.100.248    | 422.246     | 192.773     | 474284032           | 216530944           | 1.07121
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | 10.20.100.249    | 353.206     | 446.809     | 378732544           | 479100928           | 1.0226
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | average          | 359.999     | 361.281     | 396001280           | 394767018           | 1.0476
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | 10.20.100.247    | 343.324     | 414.559     | 387710976           | 468156416           | 1.07697
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | 10.20.100.248    | 292.44      | 246.254     | 308314112           | 259620864           | 1.00544
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | 10.20.100.249    | 437.559     | 405.02      | 459145216           | 425000960           | 1.00072
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | average          | 357.774     | 355.278     | 385056768           | 384259413           | 1.02771

JSON results available at: ./results.2022-03-29_10:23:51,548.json

2 - Enable secure shell (SSH) logins

The administrative account must be able to use Secure Shell (SSH) to log in (ssh) to all hosts without specifying a password.

The administrative account must be able to use Secure Shell (SSH) to log in (ssh) to all hosts without specifying a password. The shell script install_vertica does this automatically. This section describes how to do it manually if necessary.

  1. If you do not already have SSH installed on all hosts, log in as root on each host and install it now. You can download a free version of the SSH connectivity tools from OpenSSH.

  2. Log in to the Vertica administrator account (dbadmin in this example).

  3. Make your home directory (~) writable only by yourself. Choose one of:

    $ chmod 700 ~
    

    or

    $ chmod 755 ~
    

    where:

    700 includes 755 includes

    400 read by owner

    200 write by owner

    100 execute by owner

    400 read by owner

    200 write by owner

    100 execute by owner

    040 read by group

    010 execute by group

    004 read by anybody (other)

    001 execute by anybody

  4. Change to your home directory:

$ cd ~
  1. Generate a private key/ public key pair:
$ ssh-keygen -t rsaGenerating public/private rsa key pair.
Enter file in which to save the key (/home/dbadmin/.ssh/id_rsa):
Created directory '/home/dbadmin/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/dbadmin/.ssh/id_rsa.
Your public key has been saved in /home/dbadmin/.ssh/id_rsa.pub.
  1. Make your .ssh directory readable and writable only by yourself:
$ chmod 700 ~/.ssh
  1. Change to the .ssh directory:
$ cd ~/.ssh
  1. Copy the file id_rsa.pub onto the file authorized_keys2.
$ cp id_rsa.pub authorized_keys2
  1. Make the files in your .ssh directory readable and writable only by yourself:
$ chmod 600 ~/.ssh/*
  1. For each cluster host:
$ scp -r ~/.ssh <host>:.
  1. Connect to each cluster host. The first time you ssh to a new remote machine, you could get a message similar to the following:
$ ssh dev0 Warning: Permanently added 'dev0,192.168.1.92' (RSA) to the list of known hosts.

This message appears only the first time you ssh to a particular remote host.

See also