This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Validation scripts
Vertica provides several validation utilities that can be used prior to deploying Vertica to help determine if your hosts and network can properly handle the processing and network traffic required by Vertica.
Vertica provides several validation utilities that can be used prior to deploying Vertica to help determine if your hosts and network can properly handle the processing and network traffic required by Vertica. These utilities can also be used if you are encountering performance issues and need to troubleshoot the issue.
After you install the Vertica RPM, you have access to the following scripts in /opt/vertica/bin
:
-
Vcpuperf - a CPU performance test used to verify your CPU performance.
-
Vioperf - an Input/Output test used to verify the speed and consistency of your hard drives.
-
Vnetperf - a Network test used to test the latency and throughput of your network between hosts.
These utilities can be run at any time, but are well suited to use before running the install_vertica script.
1 - Vcpuperf
The vcpuperf utility measures your server's CPU processing speed and compares it against benchmarks for common server CPUs.
The vcpuperf utility measures your server's CPU processing speed and compares it against benchmarks for common server CPUs. The utility performs a CPU test and measures the time it takes to complete the test. The lower the number scored on the test, the better the performance of the CPU.
The vcpuperf utility also checks the high and low load times to determine if CPU throttling is enabled. If a server's low-load computation time is significantly longer than the high-load computation time, CPU throttling may be enabled. CPU throttling is a power-saving feature. However, CPU throttling can reduce the performance of your server. Vertica recommends disabling CPU throttling to enhance server performance.
Syntax
vcpuperf [-q]
Options
-q
- Run in quiet mode. Quiet mode displays only the CPU Time, Real Time, and high and low load times.
Returns
-
CPU Time: the amount of time it took the CPU to run the test.
-
Real Time: the total time for the test to execute.
-
High load time: The amount of time to run the load test while simulating a high CPU load.
-
Low load time: The amount of time to run the load test while simulating a low CPU load.
Example
The following example shows a CPU that is running slightly slower than the expected time on a Xeon 5670 CPU that has CPU throttling enabled.
[root@node1 bin]# /opt/vertica/bin/vcpuperf
Compiled with: 4.1.2 20080704 (Red Hat 4.1.2-52) Expected time on Core 2, 2.53GHz: ~9.5s
Expected time on Nehalem, 2.67GHz: ~9.0s
Expected time on Xeon 5670, 2.93GHz: ~8.0s
This machine's time:
CPU Time: 8.540000s
Real Time:8.710000s
Some machines automatically throttle the CPU to save power.
This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
Low load times much larger than 100-200us or much larger than the corresponding high load time
indicate low-load throttling, which can adversely affect small query / concurrent performance.
This machine's high load time: 67 microseconds.
This machine's low load time: 208 microseconds.
2 - Vioperf
The vioperf
utility quickly tests the performance of your host's input and output subsystem.
The vioperf
utility quickly tests the performance of your host's input and output subsystem. The utility performs the following tests:
The utility verifies that the host reads the same bytes that it wrote and prints its output to STDOUT. The utility also logs the output to a JSON formatted file.
For data in HDFS, the utility tests reads but not writes.
Syntax
vioperf [--help] [--duration=<INTERVAL>] [--log-interval=<INTERVAL>]
[--log-file=<FILE>] [--condense-log] [--thread-count=<N>] [--max-buffer-size=<SIZE>]
[--preserve-files] [--disable-crc] [--disable-direct-io] [--debug]
[<DIR>*]
-
The minimum required I/O is 20 MB/s read/write per physical processor core on each node, in full duplex (reading and writing) simultaneously, concurrently on all nodes of the cluster.
Note
Vertica supports some AWS instance types that do not meet these minimum I/O requirements. However, all supported AWS instances types, regardless of
vioperf
performance, can be used as Vertica cluster hosts. See
Supported AWS instance types for a list of all supported AWS instance types.
-
The recommended I/O is 40 MB/s per physical core on each node.
-
The minimum required I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s. Vertica recommends 480 MB/s.
For example, the I/O rate for a node with 2 hyper-threaded six-core CPUs (12 physical cores) is 240 MB/s required minimum, 480 MB/s recommended.
Disk space vioperf needs
vioperf
requires about 4.5 GB to run.
Options
--help
- Prints a help message and exits.
--duration
- The length of time
vioprobe
runs performance tests. The default is 5 minutes. Specify the interval in seconds, minutes, or hours with any of these suffixes:
-
Seconds: s
, sec
, secs
, second
, seconds
. Example: --duration=60sec
-
Minutes: m
, min
, mins
, minute
, minutes
. Example: --duration=10min
-
Hours: h
, hr
, hrs
, hour
, hours
. Example: --duration=1hrs
--log-interval
- The interval at which the log file reports summary information. The default interval is 10 seconds. This option uses the same interval notation as
--duration
.
--log-file
- The path and name where log file contents are written, in JSON. If not specified, then
vioperf
creates a file named results
date-time.JSON
in the current directory.
--condense-log
- Directs
vioperf
to write the log file contents in condensed format, one JSON entry per line, rather than as indented JSON syntax.
--thread-count=<N>
- The number of execution threads to use. By default,
vioperf
uses all threads available on the host machine.
--max-buffer-size=<SIZE>
- The maximum size of the in-memory buffer to use for reads or writes. Specify the units with any of these suffixes:
-
Bytes: b
, byte
, bytes
.
-
Kilobytes: k
, kb
, kilobyte
, kilobytes
.
-
Megabytes: m
, mb
, megabyte
, megabytes
.
-
Gigabytes: g
, gb
, gigabyte
, gigabytes
.
--preserve-files
- Directs
vioperf
to keep the files it writes. This parameter is ignored for HDFS tests, which are read-only. Inspecting the files can help diagnose write-related failures.
--disable-crc
- Directs
vioperf
to ignore CRC checksums when validating writes. Verifying checksums can add overhead, particularly when running vioperf
on slower processors. This parameter is ignored for HDFS tests.
--disable-direct-io
- When reading from or writing to a local file system,
vioperf
goes directly to disk by default, bypassing the operating system's page cache. Using direct I/O allows vioperf
to measure performance quickly without having to fill the cache.
Disabling this behavior can produce more realistic performance results but slows down the operation of vioperf
.
--debug
- Directs
vioperf
to report verbose error messages.
<DIR>
- Zero or more directories to test. If you do not specify a directory,
vioperf
tests the current directory. To test the performance of each disk, specify different directories mounted on different disks.
To test reads from a directory on HDFS:
-
Use a URL in the hdfs
scheme that points to a single directory (not a path) containing files at least 10MB in size. For best results, use 10GB files and verify that there is at least one file per vioperf
thread.
-
If you do not specify a host and port, set the HADOOP_CONF_DIR environment variable to a path including the Hadoop configuration files. This value is the same value that you use for the HadoopConfDir configuration parameter in Vertica. For more information see Configuring HDFS access.
-
If the HDFS cluster uses Kerberos, set the HADOOP_USER_NAME environment variable to a Kerberos principal.
Returns
The utility returns the following information:
test
- The test being run (Write, ReWrite, Read, or Skip Read)
directory
- The directory in which the test is being run.
counter name
- The counter type of the test being run. Can be either MB/s or Seeks per second.
counter value
- The value of the counter in MB/s or Seeks per second across all threads. This measurement represents the bandwidth at the exact time of measurement. Contrast with counter value (avg).
counter value (10 sec avg)
- The average amount of data in MB/s, or the average number of Seeks per second, for the test being run in the duration specified with
--log-interval
. The default interval is 10 seconds. The counter value (avg)
is the average bandwidth since the last log message, across all threads.
counter value/core
- The
counter value
divided by the number of cores.
counter value/core (10 sec avg)
- The
counter value (10 sec avg)
divided by the number of cores.
thread count
- The number of threads used to run the test.
%CPU
- The available CPU percentage used during this test.
%IO Wait
- The CPU percentage in I/O Wait state during this test. I/O wait state is the time working processes are blocked while waiting for I/O operations to complete.
elapsed time
- The amount of time taken for a particular test. If you run the test multiple times, elapsed time increases the next time the test is run.
remaining time
- The time remaining until the next test. Based on the
--duration
option, each of the tests is run at least once. If the test set is run multiple times, then remaining time
is how much longer the test will run. The remaining time
value is cumulative. Its total is added to elapsed time each time the same test is run again.
Example
Invoking vioperf
from a terminal outputs the following message and sample results:
[dbadmin@v_vmart_node0001 ~]$ /opt/vertica/bin/vioperf --duration=60s
The minimum required I/O is 20 MB/s read and write per physical processor core on each node, in full duplex
i.e. reading and writing at this rate simultaneously, concurrently on all nodes of the cluster.
The recommended I/O is 40 MB/s per physical core on each node.
For example, the I/O rate for a server node with 2 hyper-threaded six-core CPUs is 240 MB/s required minimum, 480 MB/s recommended.
Using direct io (buffer size=1048576, alignment=512) for directory "/home/dbadmin"
test | directory | counter name | counter value | counter value (10 sec avg) | counter value/core | counter value/core (10 sec avg) | thread count | %CPU | %IO Wait | elapsed time (s)| remaining time (s)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Write | /home/dbadmin | MB/s | 420 | 420 | 210 | 210 | 2 | 89 | 10 | 10 | 5
Write | /home/dbadmin | MB/s | 412 | 396 | 206 | 198 | 2 | 89 | 9 | 15 | 0
ReWrite | /home/dbadmin | (MB-read+MB-write)/s | 150+150 | 150+150 | 75+75 | 75+75 | 2 | 58 | 40 | 10 | 5
ReWrite | /home/dbadmin | (MB-read+MB-write)/s | 158+158 | 172+172 | 79+79 | 86+86 | 2 | 64 | 33 | 15 | 0
Read | /home/dbadmin | MB/s | 194 | 194 | 97 | 97 | 2 | 69 | 26 | 10 | 5
Read | /home/dbadmin | MB/s | 192 | 190 | 96 | 95 | 2 | 71 | 27 | 15 | 0
SkipRead | /home/dbadmin | seeks/s | 659 | 659 | 329.5 | 329.5 | 2 | 2 | 85 | 10 | 5
SkipRead | /home/dbadmin | seeks/s | 677 | 714 | 338.5 | 357 | 2 | 2 | 59 | 15 | 0
Note
When evaluating performance for minimum and recommended I/O, include the Write and Read values in your evaluation. ReWrite and SkipRead values are not relevant to determining minimum and recommended I/O.
3 - Vnetperf
The vnetperf utility measures network performance of database hosts, as well as network latency and throughput for TCP and UDP protocols.
The vnetperf utility measures network performance of database hosts, as well as network latency and throughput for TCP and UDP protocols.
Caution
This utility incurs high network load, which degrades database performance. Do not use this utility on a Vertica production database.
This utility helps identify the following issues:
-
Low throughput for all hosts or one
-
High latency for all hosts or one
-
Bottlenecks between one or more hosts or subnets
-
Too-low limit on the number of TCP connections that can be established simultaneously
-
High rates of network packet loss
Syntax
vnetperf [[options](#Options)] [[tests](#Tests)]
Options
--condense
- Condenses the log into one JSON entry per line, instead of indented JSON syntax.
--collect-logs
- Collects test log files from each host.
--datarate
rate
- Limits throughput to this rate in MB/s. A rate of 0 loops the tests through several different rates.
Default: 0
--duration
seconds
- Time limit for each test to run in seconds.
Default: 1
--hosts
host-name
[,...]
- Comma-separated list of host names or IP addresses on which to run the tests. The list must not contain embedded spaces.
--hosts
file
- File that specifies the hosts on which to run the tests. If you omit this option, then the vnetperf tries to access admintools to identify cluster hosts.
--identity-file
file
- If using passwordless SSH/SCP access between hosts, then specify the key file used to gain access to the hosts.
--ignore-bad-hosts
- If set, runs tests on reachable hosts even if some hosts are not reachable. If you omit this option and a host is unreachable, then no tests are run on any hosts.
--log-dir
directory
- If
--collect-logs
is set, specifies the directory in which to place the collected logs.
Default: logs.netperf.
<timestamp>
--log-level
level
- Log level to use, one of the following:
Default: WARN
--list-tests
- Lists the tests that vnetperf can run.
--output-file
file
- The file to which JSON results are written.
Default: results.
<timestamp>
.json
--ports port#[,...]
- Comma-delimited list of port numbers to use. If only one port number is specified, then the next two numbers in sequence are also used.
Default: 14159,14160,14161
--scp-options '
scp-args
'
- Specifies one or more standard SCP command line arguments. SCP is used to copy test binaries over to the target hosts.
--ssh-options '
ssh-args
'
- Specifies one or more standard SSH command line arguments. SSH is used to issue test commands on the target hosts.
--tmp-dir
directory
- Specifies the temporary directory for vnetperf, where
directory
must have execute permission on all hosts, and does not include the unsupported characters "
, ```, or '
.
Default: /tmp
(execute permission required)
--vertica-install
directory
- Indicates that Vertica is installed on each of the hosts, so vnetperf uses test binaries on the target system rather than copying them over with SCP.
Tests
vnetperf can specify one or more of the following tests. If no test is specified, vnetperf runs all tests. Test results are printed for each host.
Test |
Description |
Results |
latency |
Measures latency from the host that is running the script to other hosts. Hosts with unusually high latency should be investigated further. |
|
tcp-throughput |
Tests TCP throughput among hosts. |
|
udp-throughput |
Tests UDP throughput among hosts |
-
Maximum recommended RTT (round-trip time) latency is 1000 microseconds. Ideal RTT latency is 200 microseconds or less. Vertica recommends that clock skew be less than 1 second.
-
Minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more.
Note
UDP throughput can be lower; multiple network switches can adversely affect performance.
Example
$ vnetperf latency tcp-throughput
The maximum recommended rtt latency is 2 milliseconds. The ideal rtt latency is 200 microseconds or less. It is recommended that clock skew be kept to under 1 second.
test | date | node | index | rtt latency (us) | clock skew (us)
-------------------------------------------------------------------------------------------------------------------------
latency | 2022-03-29_10:23:55,739 | 10.20.100.247 | 0 | 49 | 3
latency | 2022-03-29_10:23:55,739 | 10.20.100.248 | 1 | 272 | -702
latency | 2022-03-29_10:23:55,739 | 10.20.100.249 | 2 | 245 | 1037
The minimum recommended throughput is 100 MB/s. Ideal throughput is 800 MB/s or more. Note: UDP numbers may be lower, multiple network switches may reduce performance results.
date | test | rate limit (MB/s) | node | MB/s (sent) | MB/s (rec) | bytes (sent) | bytes (rec) | duration (s)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2022-03-29_10:23:55,742 | tcp-throughput | 32 | 10.20.100.247 | 30.579 | 30.579 | 32112640 | 32112640 | 1.00151
2022-03-29_10:23:55,742 | tcp-throughput | 32 | 10.20.100.248 | 30.5791 | 30.5791 | 32112640 | 32112640 | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput | 32 | 10.20.100.249 | 30.5791 | 30.5791 | 32112640 | 32112640 | 1.0015
2022-03-29_10:23:55,742 | tcp-throughput | 32 | average | 30.579 | 30.579 | 32112640 | 32112640 | 1.0015
2022-03-29_10:23:57,749 | tcp-throughput | 64 | 10.20.100.247 | 61.0952 | 61.0952 | 64094208 | 64094208 | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput | 64 | 10.20.100.248 | 61.096 | 61.096 | 64094208 | 64094208 | 1.00048
2022-03-29_10:23:57,749 | tcp-throughput | 64 | 10.20.100.249 | 61.0952 | 61.0952 | 64094208 | 64094208 | 1.00049
2022-03-29_10:23:57,749 | tcp-throughput | 64 | average | 61.0955 | 61.0955 | 64094208 | 64094208 | 1.00048
2022-03-29_10:23:59,753 | tcp-throughput | 128 | 10.20.100.247 | 122.131 | 122.131 | 128122880 | 128122880 | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput | 128 | 10.20.100.248 | 122.132 | 122.132 | 128122880 | 128122880 | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput | 128 | 10.20.100.249 | 122.132 | 122.132 | 128122880 | 128122880 | 1.00046
2022-03-29_10:23:59,753 | tcp-throughput | 128 | average | 122.132 | 122.132 | 128122880 | 128122880 | 1.00046
2022-03-29_10:24:01,757 | tcp-throughput | 256 | 10.20.100.247 | 243.819 | 244.132 | 255754240 | 256081920 | 1.00036
2022-03-29_10:24:01,757 | tcp-throughput | 256 | 10.20.100.248 | 244.125 | 243.282 | 256049152 | 255164416 | 1.00025
2022-03-29_10:24:01,757 | tcp-throughput | 256 | 10.20.100.249 | 244.172 | 243.391 | 256114688 | 255295488 | 1.00032
2022-03-29_10:24:01,757 | tcp-throughput | 256 | average | 244.039 | 243.601 | 255972693 | 255513941 | 1.00031
2022-03-29_10:24:03,761 | tcp-throughput | 512 | 10.20.100.247 | 337.232 | 485.247 | 355893248 | 512098304 | 1.00645
2022-03-29_10:24:03,761 | tcp-throughput | 512 | 10.20.100.248 | 446.16 | 231.001 | 467894272 | 242253824 | 1.00013
2022-03-29_10:24:03,761 | tcp-throughput | 512 | 10.20.100.249 | 349.667 | 409.961 | 368476160 | 432013312 | 1.00497
2022-03-29_10:24:03,761 | tcp-throughput | 512 | average | 377.686 | 375.403 | 397421226 | 395455146 | 1.00385
2022-03-29_10:24:05,772 | tcp-throughput | 640 | 10.20.100.247 | 328.279 | 509.256 | 383975424 | 595656704 | 1.11548
2022-03-29_10:24:05,772 | tcp-throughput | 640 | 10.20.100.248 | 505.626 | 217.217 | 532250624 | 228655104 | 1.00389
2022-03-29_10:24:05,772 | tcp-throughput | 640 | 10.20.100.249 | 390.355 | 474.89 | 410812416 | 499777536 | 1.00365
2022-03-29_10:24:05,772 | tcp-throughput | 640 | average | 408.087 | 400.454 | 442346154 | 441363114 | 1.04101
2022-03-29_10:24:07,892 | tcp-throughput | 768 | 10.20.100.247 | 300.5 | 426.762 | 318734336 | 452657152 | 1.01154
2022-03-29_10:24:07,892 | tcp-throughput | 768 | 10.20.100.248 | 268.252 | 402.891 | 283017216 | 425066496 | 1.00616
2022-03-29_10:24:07,892 | tcp-throughput | 768 | 10.20.100.249 | 510.569 | 243.649 | 535592960 | 255590400 | 1.00042
2022-03-29_10:24:07,892 | tcp-throughput | 768 | average | 359.774 | 357.767 | 379114837 | 377771349 | 1.00604
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | 10.20.100.247 | 304.545 | 444.261 | 334987264 | 488669184 | 1.049
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | 10.20.100.248 | 422.246 | 192.773 | 474284032 | 216530944 | 1.07121
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | 10.20.100.249 | 353.206 | 446.809 | 378732544 | 479100928 | 1.0226
2022-03-29_10:24:09,911 | tcp-throughput | 1024 | average | 359.999 | 361.281 | 396001280 | 394767018 | 1.0476
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | 10.20.100.247 | 343.324 | 414.559 | 387710976 | 468156416 | 1.07697
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | 10.20.100.248 | 292.44 | 246.254 | 308314112 | 259620864 | 1.00544
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | 10.20.100.249 | 437.559 | 405.02 | 459145216 | 425000960 | 1.00072
2022-03-29_10:24:11,988 | tcp-throughput | 2048 | average | 357.774 | 355.278 | 385056768 | 384259413 | 1.02771
JSON results available at: ./results.2022-03-29_10:23:51,548.json