This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Validation scripts

1: Vcpuperf

2: Vioperf

3: Vnetperf

Vertica provides several validation utilities that can be used prior to deploying Vertica to help determine if your hosts and network can properly handle the processing and network traffic required by Vertica. These utilities can also be used if you are encountering performance issues and need to troubleshoot the issue.

After you install the Vertica RPM, you have access to the following scripts in /opt/vertica/bin:

Vcpuperf - a CPU performance test used to verify your CPU performance.
Vioperf - an Input/Output test used to verify the speed and consistency of your hard drives.
Vnetperf - a Network test used to test the latency and throughput of your network between hosts.

These utilities can be run at any time, but are well suited to use before running the install_vertica script.

1 - Vcpuperf

The vcpuperf utility measures your server's CPU processing speed and compares it against benchmarks for common server CPUs.

The vcpuperf utility measures your server's CPU processing speed and compares it against benchmarks for common server CPUs. The utility performs a CPU test and measures the time it takes to complete the test. The lower the number scored on the test, the better the performance of the CPU.

The vcpuperf utility also checks the high and low load times to determine if CPU throttling is enabled. If a server's low-load computation time is significantly longer than the high-load computation time, CPU throttling may be enabled. CPU throttling is a power-saving feature. However, CPU throttling can reduce the performance of your server. Vertica recommends disabling CPU throttling to enhance server performance.

Syntax

vcpuperf [-q]

Options

-q: Run in quiet mode. Quiet mode displays only the CPU Time, Real Time, and high and low load times.

Returns

CPU Time: the amount of time it took the CPU to run the test.
Real Time: the total time for the test to execute.
High load time: The amount of time to run the load test while simulating a high CPU load.
Low load time: The amount of time to run the load test while simulating a low CPU load.

Example

The following example shows a CPU that is running slightly slower than the expected time on a Xeon 5670 CPU that has CPU throttling enabled.

[root@node1 bin]# /opt/vertica/bin/vcpuperf
Compiled with: 4.1.2 20080704 (Red Hat 4.1.2-52) Expected time on Core 2, 2.53GHz: ~9.5s
Expected time on Nehalem, 2.67GHz: ~9.0s
Expected time on Xeon 5670, 2.93GHz: ~8.0s

This machine's time:
  CPU Time: 8.540000s
  Real Time:8.710000s

Some machines automatically throttle the CPU to save power.
  This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
  Low load times much larger than 100-200us or much larger than the corresponding high load time
    indicate low-load throttling, which can adversely affect small query / concurrent performance.

This machine's high load time: 67 microseconds.
This machine's low load time: 208 microseconds.

2 - Vioperf

The vioperf utility quickly tests the performance of your host's input and output subsystem.

The vioperf utility quickly tests the performance of your host's input and output subsystem. The utility performs the following tests:

sequential write
sequential rewrite
sequential read
skip read (read non-contiguous data blocks)

The utility verifies that the host reads the same bytes that it wrote and prints its output to STDOUT. The utility also logs the output to a JSON formatted file.

For data in HDFS, the utility tests reads but not writes.

Syntax

vioperf [--help] [--duration=<INTERVAL>] [--log-interval=<INTERVAL>]
  [--log-file=<FILE>] [--condense-log] [--thread-count=<N>] [--max-buffer-size=<SIZE>]
  [--preserve-files] [--disable-crc] [--disable-direct-io] [--debug]
  [<DIR>*]

Minimum and recommended I/O performance

The minimum required I/O is 20 MB/s read/write per physical processor core on each node, in full duplex (reading and writing) simultaneously, concurrently on all nodes of the cluster.

Note
Vertica supports some AWS instance types that do not meet these minimum I/O requirements. However, all supported AWS instances types, regardless of vioperf performance, can be used as Vertica cluster hosts. See Supported AWS instance types for a list of all supported AWS instance types.
The recommended I/O is 40 MB/s per physical core on each node.
The minimum required I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s. Vertica recommends 480 MB/s.

For example, the I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s required minimum, 480 MB/s recommended.

Disk space vioperf needs

vioperf requires about 4.5 GB to run.

Options

--help

Prints a help message and exits.

--duration

The length of time vioprobe runs performance tests. The default is 5 minutes. Specify the interval in seconds, minutes, or hours with any of these suffixes:

Seconds: s, sec, secs, second, seconds. Example: --duration=60sec
Minutes: m, min, mins, minute, minutes. Example: --duration=10min
Hours: h, hr, hrs, hour, hours. Example: --duration=1hrs

--log-interval

The interval at which the log file reports summary information. The default interval is 10 seconds. This option uses the same interval notation as --duration.

--log-file

The path and name where log file contents are written, in JSON. If not specified, then vioperf creates a file named resultsdate-time.JSON in the current directory.

--condense-log

Directs vioperf to write the log file contents in condensed format, one JSON entry per line, rather than as indented JSON syntax.

--thread-count=<N>

The number of execution threads to use. By default, vioperf uses all threads available on the host machine.

--max-buffer-size=<SIZE>

The maximum size of the in-memory buffer to use for reads or writes. Specify the units with any of these suffixes:

Bytes: b, byte, bytes.
Kilobytes: k, kb, kilobyte, kilobytes.
Megabytes: m, mb, megabyte, megabytes.
Gigabytes: g, gb, gigabyte, gigabytes.

--preserve-files

Directs vioperf to keep the files it writes. This parameter is ignored for HDFS tests, which are read-only. Inspecting the files can help diagnose write-related failures.

--disable-crc

Directs vioperf to ignore CRC checksums when validating writes. Verifying checksums can add overhead, particularly when running vioperf on slower processors. This parameter is ignored for HDFS tests.

--disable-direct-io

When reading from or writing to a local file system, vioperf goes directly to disk by default, bypassing the operating system's page cache. Using direct I/O allows vioperf to measure performance quickly without having to fill the cache.

Disabling this behavior can produce more realistic performance results but slows down the operation of vioperf.

--debug

Directs vioperf to report verbose error messages.

<DIR>

Zero or more directories to test. If you do not specify a directory, vioperf tests the current directory. To test the performance of each disk, specify different directories mounted on different disks.

To test reads from a directory on HDFS:

Use a URL in the hdfs scheme that points to a single directory (not a path) containing files at least 10MB in size. For best results, use 10GB files and verify that there is at least one file per vioperf thread.
If you do not specify a host and port, set the HADOOP_CONF_DIR environment variable to a path including the Hadoop configuration files. This value is the same value that you use for the HadoopConfDir configuration parameter in Vertica. For more information see Configuring HDFS access.
If the HDFS cluster uses Kerberos, set the HADOOP_USER_NAME environment variable to a Kerberos principal.

Returns

The utility returns the following information:

test: The test being run (Write, ReWrite, Read, or Skip Read)
directory: The directory in which the test is being run.
counter name: The counter type of the test being run. Can be either MB/s or Seeks per second.
counter value: The value of the counter in MB/s or Seeks per second across all threads. This measurement represents the bandwidth at the exact time of measurement. Contrast with counter value (avg).
counter value (10 sec avg): The average amount of data in MB/s, or the average number of Seeks per second, for the test being run in the duration specified with --log-interval. The default interval is 10 seconds. The counter value (avg) is the average bandwidth since the last log message, across all threads.
counter value/core: The counter value divided by the number of cores.
counter value/core (10 sec avg): The counter value (10 sec avg) divided by the number of cores.
thread count: The number of threads used to run the test.
%CPU: The available CPU percentage used during this test.
%IO Wait: The CPU percentage in I/O Wait state during this test. I/O wait state is the time working processes are blocked while waiting for I/O operations to complete.
elapsed time: The amount of time taken for a particular test. If you run the test multiple times, elapsed time increases the next time the test is run.
remaining time: The time remaining until the next test. Based on the --duration option, each of the tests is run at least once. If the test set is run multiple times, then remaining time is how much longer the test will run. The remaining time value is cumulative. Its total is added to elapsed time each time the same test is run again.

Example

Invoking vioperf from a terminal outputs the following message and sample results:

[dbadmin@v_vmart_node0001 ~]$ /opt/vertica/bin/vioperf --duration=60s
The minimum required I/O is 20 MB/s read and write per physical processor core on each node, in full duplex
i.e. reading and writing at this rate simultaneously, concurrently on all nodes of the cluster.
The recommended I/O is 40 MB/s per physical core on each node.
For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.

Using direct io (buffer size=1048576, alignment=512) for directory "/home/dbadmin"

test     | directory     | counter name         | counter value | counter value (10 sec avg) | counter value/core  | counter value/core (10 sec avg) | thread count  | %CPU  | %IO Wait  | elapsed time (s)| remaining time (s)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Write    | /home/dbadmin | MB/s                 | 420           | 420                             | 210                | 210                        | 2             | 89    | 10        | 10              | 5
Write    | /home/dbadmin | MB/s                 | 412           | 396                             | 206                 | 198                        | 2             | 89    | 9         | 15              | 0
ReWrite  | /home/dbadmin | (MB-read+MB-write)/s | 150+150       | 150+150                         | 75+75               | 75+75                      | 2             | 58    | 40        | 10              | 5
ReWrite  | /home/dbadmin | (MB-read+MB-write)/s | 158+158       | 172+172                         | 79+79              | 86+86                      | 2             | 64    | 33        | 15              | 0
Read     | /home/dbadmin | MB/s                 | 194           | 194                             | 97                 | 97                         | 2             | 69    | 26        | 10              | 5
Read     | /home/dbadmin | MB/s                 | 192           | 190                             | 96                 | 95                         | 2             | 71    | 27        | 15              | 0
SkipRead | /home/dbadmin | seeks/s              | 659           | 659                             | 329.5              | 329.5                      | 2             | 2     | 85        | 10              | 5
SkipRead | /home/dbadmin | seeks/s              | 677           | 714                             | 338.5              | 357                        | 2             | 2     | 59        | 15              | 0

Note

When evaluating performance for minimum and recommended I/O, include the Write and Read values in your evaluation. ReWrite and SkipRead values are not relevant to determining minimum and recommended I/O.

3 - Vnetperf

The vnetperf utility measures network performance of database hosts, as well as network latency and throughput for TCP and UDP protocols.

Caution

This utility incurs high network load, which degrades database performance. Do not use this utility on a Vertica production database.

This utility helps identify the following issues:

Low throughput for all hosts or one
High latency for all hosts or one
Bottlenecks between one or more hosts or subnets
Too-low limit on the number of TCP connections that can be established simultaneously
High rates of network packet loss

Syntax

vnetperf [[options](#Options)] [[tests](#Tests)]

Options

--condense

Condenses the log into one JSON entry per line, instead of indented JSON syntax.

--collect-logs

Collects test log files from each host.

--datarate rate

Limits throughput to this rate in MB/s. A rate of 0 loops the tests through several different rates.

Default: 0

--duration seconds

Time limit for each test to run in seconds.

Default: 1

--hosts host-name[,...]

Comma-separated list of host names or IP addresses on which to run the tests. The list must not contain embedded spaces.

--hosts file

File that specifies the hosts on which to run the tests. If you omit this option, then the vnetperf tries to access admintools to identify cluster hosts.

--identity-file file

If using passwordless SSH/SCP access between hosts, then specify the key file used to gain access to the hosts.

--ignore-bad-hosts

If set, runs tests on reachable hosts even if some hosts are not reachable. If you omit this option and a host is unreachable, then no tests are run on any hosts.

--log-dir directory

If --collect-logs is set, specifies the directory in which to place the collected logs.

Default: logs.netperf.<timestamp>

--log-level level

Log level to use, one of the following:

INFO
ERROR
DEBUG
WARN

Default: WARN

--list-tests

Lists the tests that vnetperf can run.

--output-file file

The file to which JSON results are written.

Default: results.<timestamp>.json

--ports port#[,...]

Comma-delimited list of port numbers to use. If only one port number is specified, then the next two numbers in sequence are also used.

Default: 14159,14160,14161

--scp-options 'scp-args'

Specifies one or more standard SCP command line arguments. SCP is used to copy test binaries over to the target hosts.

--ssh-options 'ssh-args'

Specifies one or more standard SSH command line arguments. SSH is used to issue test commands on the target hosts.

--tmp-dir directory

Specifies the temporary directory for vnetperf, where directory must have execute permission on all hosts, and does not include the unsupported characters ", ```, or '.

Default: /tmp (execute permission required)

--vertica-install directory

Indicates that Vertica is installed on each of the hosts, so vnetperf uses test binaries on the target system rather than copying them over with SCP.

Tests

vnetperf can specify one or more of the following tests. If no test is specified, vnetperf runs all tests. Test results are printed for each host.

Test	Description	Results
`latency`	Measures latency from the host that is running the script to other hosts. Hosts with unusually high latency should be investigated further.	Round trip time latency for each host in milliseconds. Clock skew—the difference in time shown by the clock on the target host relative to the host running the utility.
`tcp-throughput`	Tests TCP throughput among hosts.	Date/time and test name Rrate limit in MB/s Tested node Sent and received data in MB/s and bytes Duration of the test in seconds
`udp-throughput`	Tests UDP throughput among hosts

Recommended network performance

Maximum recommended RTT (round-trip time) latency is 1000 microseconds. Ideal RTT latency is 200 microseconds or less. Vertica recommends that clock skew be less than 1 second.
Minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more.

Note
UDP throughput can be lower; multiple network switches can adversely affect performance.

Example

$ vnetperf latency tcp-throughput

The maximum recommended rtt latency is 2 milliseconds. The ideal rtt latency is 200 microseconds or less. It is recommended that clock skew be kept to under 1 second.
test              | date                    | node             | index | rtt latency (us)  | clock skew (us)
-------------------------------------------------------------------------------------------------------------------------
latency           | 2022-03-29_10:23:55,739 | 10.20.100.247    | 0     | 49                | 3
latency           | 2022-03-29_10:23:55,739 | 10.20.100.248    | 1     | 272               | -702
latency           | 2022-03-29_10:23:55,739 | 10.20.100.249    | 2     | 245               | 1037

The minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more. Note: UDP numbers may be lower, multiple network switches may reduce performance results.
date                    | test              | rate limit (MB/s) | node             | MB/s (sent) | MB/s (rec)  | bytes (sent)        | bytes (rec)         | duration (s)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | 10.20.100.247    | 30.579      | 30.579      | 32112640            | 32112640            | 1.00151
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | 10.20.100.248    | 30.5791     | 30.5791     | 32112640            | 32112640            | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | 10.20.100.249    | 30.5791     | 30.5791     | 32112640            | 32112640            | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput    | 32                | average          | 30.579      | 30.579      | 32112640            | 32112640            | 1.0015
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | 10.20.100.247    | 61.0952     | 61.0952     | 64094208            | 64094208            | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | 10.20.100.248    | 61.096      | 61.096      | 64094208            | 64094208            | 1.00048
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | 10.20.100.249    | 61.0952     | 61.0952     | 64094208            | 64094208            | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput    | 64                | average          | 61.0955     | 61.0955     | 64094208            | 64094208            | 1.00048
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | 10.20.100.247    | 122.131     | 122.131     | 128122880           | 128122880           | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | 10.20.100.248    | 122.132     | 122.132     | 128122880           | 128122880           | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | 10.20.100.249    | 122.132     | 122.132     | 128122880           | 128122880           | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput    | 128               | average          | 122.132     | 122.132     | 128122880           | 128122880           | 1.00046
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | 10.20.100.247    | 243.819     | 244.132     | 255754240           | 256081920           | 1.00036
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | 10.20.100.248    | 244.125     | 243.282     | 256049152           | 255164416           | 1.00025
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | 10.20.100.249    | 244.172     | 243.391     | 256114688           | 255295488           | 1.00032
2022-03-29_10:24:01,757 | tcp-throughput    | 256               | average          | 244.039     | 243.601     | 255972693           | 255513941           | 1.00031
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | 10.20.100.247    | 337.232     | 485.247     | 355893248           | 512098304           | 1.00645
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | 10.20.100.248    | 446.16      | 231.001     | 467894272           | 242253824           | 1.00013
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | 10.20.100.249    | 349.667     | 409.961     | 368476160           | 432013312           | 1.00497
2022-03-29_10:24:03,761 | tcp-throughput    | 512               | average          | 377.686     | 375.403     | 397421226           | 395455146           | 1.00385
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | 10.20.100.247    | 328.279     | 509.256     | 383975424           | 595656704           | 1.11548
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | 10.20.100.248    | 505.626     | 217.217     | 532250624           | 228655104           | 1.00389
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | 10.20.100.249    | 390.355     | 474.89      | 410812416           | 499777536           | 1.00365
2022-03-29_10:24:05,772 | tcp-throughput    | 640               | average          | 408.087     | 400.454     | 442346154           | 441363114           | 1.04101
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | 10.20.100.247    | 300.5       | 426.762     | 318734336           | 452657152           | 1.01154
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | 10.20.100.248    | 268.252     | 402.891     | 283017216           | 425066496           | 1.00616
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | 10.20.100.249    | 510.569     | 243.649     | 535592960           | 255590400           | 1.00042
2022-03-29_10:24:07,892 | tcp-throughput    | 768               | average          | 359.774     | 357.767     | 379114837           | 377771349           | 1.00604
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | 10.20.100.247    | 304.545     | 444.261     | 334987264           | 488669184           | 1.049
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | 10.20.100.248    | 422.246     | 192.773     | 474284032           | 216530944           | 1.07121
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | 10.20.100.249    | 353.206     | 446.809     | 378732544           | 479100928           | 1.0226
2022-03-29_10:24:09,911 | tcp-throughput    | 1024              | average          | 359.999     | 361.281     | 396001280           | 394767018           | 1.0476
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | 10.20.100.247    | 343.324     | 414.559     | 387710976           | 468156416           | 1.07697
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | 10.20.100.248    | 292.44      | 246.254     | 308314112           | 259620864           | 1.00544
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | 10.20.100.249    | 437.559     | 405.02      | 459145216           | 425000960           | 1.00072
2022-03-29_10:24:11,988 | tcp-throughput    | 2048              | average          | 357.774     | 355.278     | 385056768           | 384259413           | 1.02771

JSON results available at: ./results.2022-03-29_10:23:51,548.json