You can use TLS/SSL encryption between Vertica, your scheduler, and Kakfa. This encryption prevents others from accessing the data that is sent between Kafka and Vertica. It can also verify the identity of all parties involved in data streaming, so no impostor can pose as your Vertica cluster or a Kafka broker.
Note
Many people often confuse the terms TLS and SSL. SSL is an older encryption protocol that has been largely replace with the newer and more secure TLS standard. However, many people still use the term SSL to refer to encryption between servers and applications , even when that encryption is actually TLS. For example, Java and Kafka use the term SSL exclusively, even when dealing with TLS. This document uses SSL/TLS and SSL interchangeably.
Some common cases where you want to use SSL encryption between Vertica and Kafka are:
Your Vertica database and Kafka communicate over an insecure network. For example, suppose your Kafka cluster is located in a cloud service and your Vertica cluster is within your internal network. In this case, any data you read from Kafka travels over an insecure connection across the Internet.
You are required by security policies, laws, or other requirements to encrypt all of your network traffic.
For more information about TLS/SSL encryption in Vertica, see TLS protocol.
Using TLS/SSL between the scheduler and Vertica
The scheduler connects to Vertica the same way other client applications do. There are two ways you can configure Vertica to use SSL/TLS authentication and encryption with clients:
If Vertica is configured to use SSL/TLS server authentication, you can choose to have your scheduler confirm the identity of the Vertica server.
If Vertica is configured to use mutual SSL/TLS authentication, you can configure your scheduler identify itself to Vertica as well as have it verify the identity of the Vertica server. Depending on your database's configuration, the Vertica server may require your scheduler to use TLS when connecting. See Client authentication with TLS for more information.
For information on encrypted client connections with Vertica, refer to TLS protocol.
The scheduler runs on a Java Virtual Machine (JVM) and uses JDBC to connect to Vertica. It acts like any other JDBC client when connecting to Vertica. To use TLS/SSL encryption for the scheduler's connection to Vertica, use the Java keystore and truststore mechanism to hold the keys and certificates the scheduler uses to identify itself and Vertica.
The keystore contains your scheduler's private encryption key and its certificate (public key).
The truststore contains CAs that you trust. If you enable authentication, the scheduler uses these CAs to verify the identity of the Vertica cluster it connects to. If one of the CAs in the trust store was used to sign the server's certificate, then the Scheduler knows it can trust the identity of the Vertica server.
You can pass options to the JVM that executes the scheduler through the Linux environment variable named VKCONFIG_JVM_OPTS. You add the parameters to this variable that alter the scheduler's JDBC settings (such as the truststore and keystore for the scheduler's JDBC connection). See Step 2: Set the VKCONFIG_JVM_OPTS Environment Variable for an example.
You can also use the --jdbc-url scheduler option to alter the JDBC configuration. See Common vkconfig script options for more information about the scheduler options and JDBC connection properties for more information about the properties they can alter.
Using TLS/SSL between Vertica and Kafka
You can stream data from Kafka into Vertica two ways: manually using a COPY statement and the KafkaSource UD source function, or automatically using the scheduler.
To directly copy data from Kafka via an SSL connection, you set session variables containing an SSL key and certificate. When KafkaSource finds that you have set these variables, it uses the key and certificate to create a secure connection to Kafka. See Kafka TLS/SSL Example Part 4: Loading Data Directly From Kafka for details.
When automatically streaming data from Kafka to Vertica, you configure the scheduler the same way you do to use an SSL connection to Vertica. When the scheduler executes COPY statements to load data from Kafka, it uses its own keystore and truststore to create an SSL connection to Kafka.
To use an SSL connection when producing data from Vertica to Kafka, you set the same session variables you use when directly streaming data from Kafka via an SSL connection. The KafkaExport function uses these variables to establish a secure connection to Kafka.
Note
Notifiers do not currently support using TLS/SSL connections.
1 - Planning TLS/SSL encryption between Vertica and Kafka
Some things to consider before you begin configuring TLS/SSL:.
Some things to consider before you begin configuring TLS/SSL:
Which connections between the scheduler, Vertica, and Kafka needs to be encrypted? In some cases, you may only need to enable encryption between Vertica and Kafka. This scenario is common when Vertica and the Kafka cluster are on different networks. For example, suppose Kafka is hosted in a cloud provider and Vertica is hosted in your internal network. Then the data must travel across the unsecured Internet between the two. However, if Vertica and the scheduler are both in your local network, you may decide that configuring them to use SSL/TLS is unnecessary. In other cases, you will want all parts of the system to be encrypted. For example, you want to encrypt all traffic when Kafka, Vertica, and the scheduler are all hosted in a cloud provider whose network may not be secure.
Which connections between the scheduler, Vertica, and Kafka require trust? You can opt to have any of these connections fail if one system cannot verify the identity of another. See Verifying Identities below.
Which CAs will you be using to sign each certificate? The simplest way to configure the scheduler, Kafka, and Vertica is to use the same CA to sign all of the certificates you will use when setting up TLS/SSL. Using the same root CA to sign the certificates requires you to be able to edit the configuration of Kafka, Vertica, and the scheduler. If you cannot use the same CA to sign all certificates, all truststores must contain the entire chain of CAs used to sign the certificates, all the way up to the root CA. Including the entire chain of trust ensures each system can verify each other's identities.
Verifying identities
Your primary challenge when configuring TLS/SSL encryption between Vertica, the scheduler, and Kafka is making sure the scheduler, Kafka, and Vertica can all verify each other's identity. The most common problem people have encountered when setting up TLS/SSL encryption is ensuring the remote system can verify the authenticity of a system's certificate. The best way to prevent this problem is to ensure that the all systems have their certificates signed by a CA that all of the systems explicitly trust.
When a system attempts to start an encrypted connection with another system, it sends its certificate to the remote system. This certificate can be signed by one or more Certificate Authorities (CA) that help identify the system making the connection. These signatures form a "chain of trust." A certificate is signed by a CA. That CA, in turn, could have been signed by another CA, and so forth. Often, the chain ends with a CA (referred to as the root CA) from a well-known commercial provider of certificates, such as Comodo SSL or DigiCert, that are trusted by default on many platforms such as operating systems and web browsers.
If the remote system finds a CA in the chain that it trusts, it verifies the identity of the system making the connection, and the connection can continue. If the remote system cannot find the signature of a CA it trusts, it may block the connection, depending on its configuration. Systems can be configured to only allow connections from systems whose identity has been verified.
Note
Verifying the identity of another system making a TLE/SSL connection is often referred to as "authentication." Do not confuse this use of authentication with other forms of authentication used with Vertica. For example, TLS/SSL's authentication of a client connection has nothing to do with Vertica user authentication. Even if you successfully establish a TLS/SSL connection to Vertica using a client, Vertica still requires you to provide a user name and password before you can interact with it.
2 - Configuring your scheduler for TLS connections
The scheduler can use TLS for two different connections: the one it makes to Vertica, and the connection it creates when running COPY statements to retrieve data from Kafka.
The scheduler can use TLS for two different connections: the one it makes to Vertica, and the connection it creates when running COPY statements to retrieve data from Kafka. Because the scheduler is a Java application, you supply the TLS key and the certificate used to sign it in a keystore. You also supply a truststore that contains in the certificates that the scheduler should trust. Both the connection to Vertica and to Kafka can use the same keystore and truststore. You can also choose to use separate keystores and truststores for these two connections by setting different JDBC settings for the scheduler. See JDBC connection properties for a list of these settings.
If you choose to use a file format other than the standard Java Keystore (JKS) format for your keystore or truststore files, you must use the correct file extension in the filename. For example, suppose you choose to use a keystore and truststore saved in PKCS#12 format. Then your keystore and trustore files must end with the .pfx or .p12 extension.
If the scheduler does not recognize the file's extension (or there is no extension in the file name), it assumes that the file is in JKS format. If the file is not in JKS format, you will see an error message when starting the scheduler, similar to "Failed to create an SSLSocketFactory when setting up TLS: keystore not found."
Note that if the Kafka server's parameter client.ssl.auth is set to none or requested, you do not need to create a keystore.
3 - Using TLS/SSL when directly loading data from Kafka
You can manually load data from Kafka using the COPY statement and the KafkaSource user-defined load function (see Manually Copying Data From Kafka).
You can manually load data from Kafka using the COPY statement and the KafkaSource user-defined load function (see Manually consume data from Kafka). To have KafkaSource open a secure connection to Kafka, you must supply it with an SSL key and other information.
When starting, the KafkaSource function checks if several user session variables are defined. These variables contain the SSL key, the certificate used to sign the key, and other information that the function needs to create the SSL connection. See Kafka user-defined session parameters for a list of these variables. If KafkaSource finds these variables are defined, it uses them to create an SSL connection to Kafka.
These variables are also used by the KafkaExport function to establish a secure connection to Kafka when exporting data.
4 - Configure Kafka for TLS
This page covers procedures for configuring TLS connections Vertica, Kafka, and the scheduler.
This page covers procedures for configuring TLS connections Vertica, Kafka, and the scheduler.
Note that the following example configures TLS for a Kafka server where ssl.client.auth=required, which requires the following:
kafka_SSL_Certificate
kafka_SSL_PrivateKey_secret
kafka_SSL_PrivateKeyPassword_secret
A keystore for the Scheduler
If your configuration uses ssl.client.auth=none or ssl.client.auth=requested, these parameters and the scheduler keystore are optional.
Creating certificates for Vertica and clients
The CA certificate in this example is self-signed. In a production environment, you should instead use a trusted CA.
This example uses the same self-signed root CA to sign all of the certificates used by the scheduler, Kafka brokers, and Vertica. If you cannot use the same CA to sign the keys for all of these systems, make sure you include the entire chain of trust in your keystores.
$ openssl genrsa -out root.key
Generating RSA private key, 2048 bit long modulus
..............................................................................
............................+++
...............+++
e is 65537 (0x10001)
Generate a self-signed CA certificate.
$ openssl req -new -x509 -key root.key -out root.crt
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:MA
Locality Name (eg, city) []:Cambridge
Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:*.mycompany.com
Email Address []:myemail@mycompany.com
Restrict to the owner read/write permissions for root.key and root.crt. Grant read permissions to other groups for root.crt.
$ openssl genrsa -out server.key
Generating RSA private key, 2048 bit long modulus
....................................................................+++
......................................+++
e is 65537 (0x10001)
Create a certificate signing request (CSR) for your CA. Be sure to set the "Common Name" field to a wildcard (asterisk) so the certificate is accepted for all Vertica nodes in the cluster:
$ openssl req -new -key server.key -out server_reqout.txt
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:MA
Locality Name (eg, city) []:Cambridge
Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:*.mycompany.com
Email Address []:myemail@mycompany.com
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []: server_key_password
An optional company name []:
Sign the server certificate with your CA. This creates the server certificate server.crt.
Set the appropriate permissions for the key and certificate.
$ chmod 600 server.key
$ chmod 644 server.crt
Create a client key and certificate (mutual mode only)
In Mutual Mode, clients and servers verify each other's certificates before establishing a connection. The following procedure creates a client key and certificate to present to Vertica. The certificate must be signed by a CA that Vertica trusts.
The steps for this are identical to those above for creating a server key and certificate for Vertica.
$ openssl genrsa -out client.key
Generating RSA private key, 2048 bit long modulus
................................................................+++
..............................+++
e is 65537 (0x10001)
$ openssl req -new -key client.key -out client_reqout.txt
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:MA
Locality Name (eg, city) []:Cambridge
Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Company
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:*.mycompany.com
Email Address []:myemail@mycompany.com
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []: server_key_password
An optional company name []:
$ openssl x509 -req -in client_reqout.txt -days 3650 -sha1 -CAcreateserial -CA root.crt \
-CAkey root.key -out client.crt
Signature ok
subject=/C=US/ST=MA/L=Cambridge/O=My Company/CN=*.mycompany.com/emailAddress=myemail@mycompany.com
Getting CA Private Key
$ chmod 600 client.key
$ chmod 644 client.crt
Set up mutual mode client-server TLS
Configure Vertica for mutual mode
The following keys and certificates must be imported and then distributed to the nodes on your Vertica cluster with TLS Configuration for Mutual Mode:
=> CREATE KEY imported_key TYPE 'RSA' AS '-----BEGIN PRIVATE KEY-----...-----END PRIVATE KEY-----';
=> CREATE CA CERTIFICATE imported_ca AS '-----BEGIN CERTIFICATE-----...-----END CERTIFICATE-----';
=> CREATE CERTIFICATE imported_cert AS '-----BEGIN CERTIFICATE-----...-----END CERTIFICATE-----';
In this example, \set is used to retrieve the contents of root.key, root.crt, server.key, and server.crt.
=> \set ca_cert ''''`cat root.crt`''''
=> \set serv_key ''''`cat server.key`''''
=> \set serv_cert ''''`cat server.crt`''''
=> CREATE CA CERTIFICATE root_ca AS :ca_cert;
CREATE CERTIFICATE
=> CREATE KEY server_key TYPE 'RSA' AS :serv_key;
CREATE KEY
=> CREATE CERTIFICATE server_cert AS :serv_cert;
CREATE CERTIFICATE
Follow the steps for Mutual Mode in Configuring client-server TLS to set the proper TLSMODE and TLS Configuration parameters.
Configure a client for mutual mode
Clients must have their private key, certificate, and CA certificate. The certificate will be presented to Vertica when establishing a connection, and the CA certificate will be used to verify the server certificate from Vertica.
This example configures the vsql client for mutual mode.
Create a .vsql directory in the user's home directory.
$ mkdir ~/.vsql
Copy client.key, client.crt, and root.crt to the vsql directory.
$ cp client.key client.crt root.crt ~/.vsql
Log into Vertica with vsql and query the SESSIONS system table to verify that the connection is using mutual mode:
$ vsql
Password: user-password
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
SSL connection (cipher: DHE-RSA-AES256-GCM-SHA384, bits: 256, protocol: TLSv1.2)
=> select user_name,ssl_state from sessions;
user_name | ssl_state
-----------+-----------
dbadmin | Mutual
(1 row)
Configure Kafka for TLS
Configure the Kafka brokers
This procedure configures Kafka to use TLS with client connections. You can also configure Kafka to use TLS to communicate between brokers. However, inter-broker TLS has no impact on establishing an encrypted connection between Vertica and Kafka.
Create a truststore file for all of your Kafka brokers, importing your CA certificate. This example uses the self-signed root.crt created above.
=> $ keytool -keystore kafka.truststore.jks -alias CARoot -import \
-file root.crt
Enter keystore password: some_password
Re-enter new password: some_password
Owner: EMAILADDRESS=myemail@mycompany.com, CN=*.mycompany.com, O=MyCompany, L=Cambridge, ST=MA, C=US
Issuer: EMAILADDRESS=myemail@mycompany.com, CN=*.mycompany.com, O=MyCompany, L=Cambridge, ST=MA, C=US
Serial number: c3f02e87707d01aa
Valid from: Fri Mar 22 13:37:37 EDT 2019 until: Sun Apr 21 13:37:37 EDT 2019
Certificate fingerprints:
MD5: 73:B1:87:87:7B:FE:F1:6E:94:55:FD:AF:5D:D0:C3:0C
SHA1: C0:69:1C:93:54:21:87:C7:03:93:FE:39:45:66:DE:22:18:7E:CD:94
SHA256: 23:03:BB:B7:10:12:50:D9:C5:D0:B7:58:97:41:1E:0F:25:A0:DB:
D0:1E:7D:F9:6E:60:8F:79:A6:1C:3F:DD:D5
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3
Extensions:
#1: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: 50 69 11 64 45 E9 CC C5 09 EE 26 B5 3E 71 39 7C Pi.dE.....&.>q9.
0010: E5 3D 78 16 .=x.
]
]
#2: ObjectId: 2.5.29.19 Criticality=false
BasicConstraints:[
CA:true
PathLen:2147483647
]
#3: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 50 69 11 64 45 E9 CC C5 09 EE 26 B5 3E 71 39 7C Pi.dE.....&.>q9.
0010: E5 3D 78 16 .=x.
]
]
Trust this certificate? [no]: yes
Certificate was added to keystore
Create a keystore file for the Kafka broker named kafka01. Each broker's keystore should be unique.
The keytool command adds the a Subject Alternative Name (SAN) used as a fallback when establishing a TLS connection. Use your Kafka' broker's fully-qualified domain name (FQDN) as the value for the SAN and "What is your first and last name?" prompt.
In this example, the FQDN is kafka01.example.com. The alias for keytool is set to localhost, so local connections to the broker use TLS.
$ keytool -keystore kafka01.keystore.jks -alias localhost -validity 365 -genkey -keyalg RSA \
-ext SAN=DNS:kafka01.mycompany.com
Enter keystore password: some_password
Re-enter new password: some_password
What is your first and last name?
[Unknown]: kafka01.mycompany.com
What is the name of your organizational unit?
[Unknown]:
What is the name of your organization?
[Unknown]: MyCompany
What is the name of your City or Locality?
[Unknown]: Cambridge
What is the name of your State or Province?
[Unknown]: MA
What is the two-letter country code for this unit?
[Unknown]: US
Is CN=Database Admin, OU=MyCompany, O=Unknown, L=Cambridge, ST=MA, C=US correct?
[no]: yes
Enter key password for <localhost>
(RETURN if same as keystore password):
Export the Kafka broker's certificate. In this example, the certificate is exported as kafka01.unsigned.crt.
Import the CA certificate into the broker's keystore.
Note
If you use different CAs to sign the certificates in your environment, you must add the entire chain of CAs you used to sign your certificate to the keystore, all the way up to the root CA. Including the entire chain of trust helps other systems verify the identity of your Kafka broker.
$ keytool -keystore kafka01.keystore.jks -alias CARoot -import -file root.crt
Enter keystore password: some_password
Owner: EMAILADDRESS=myemail@mycompany.com, CN=*.mycompany.com, O=MyCompany, L=Cambridge, ST=MA, C=US
Issuer: EMAILADDRESS=myemail@mycompany.com, CN=*.mycompany.com, O=MyCompany, L=Cambridge, ST=MA, C=US
Serial number: c3f02e87707d01aa
Valid from: Fri Mar 22 13:37:37 EDT 2019 until: Sun Apr 21 13:37:37 EDT 2019
Certificate fingerprints:
MD5: 73:B1:87:87:7B:FE:F1:6E:94:55:FD:AF:5D:D0:C3:0C
SHA1: C0:69:1C:93:54:21:87:C7:03:93:FE:39:45:66:DE:22:18:7E:CD:94
SHA256: 23:03:BB:B7:10:12:50:D9:C5:D0:B7:58:97:41:1E:0F:25:A0:DB:D0:1E:7D:F9:6E:60:8F:79:A6:1C:3F:DD:D5
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3
Extensions:
#1: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: 50 69 11 64 45 E9 CC C5 09 EE 26 B5 3E 71 39 7C Pi.dE.....&.>q9.
0010: E5 3D 78 16 .=x.
]
]
#2: ObjectId: 2.5.29.19 Criticality=false
BasicConstraints:[
CA:true
PathLen:2147483647
]
#3: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 50 69 11 64 45 E9 CC C5 09 EE 26 B5 3E 71 39 7C Pi.dE.....&.>q9.
0010: E5 3D 78 16 .=x.
]
]
Trust this certificate? [no]: yes
Certificate was added to keystore
Import the signed Kafka broker certificate into the keystore.
$ keytool -keystore kafka01.keystore.jks -alias localhost \
-import -file kafka01.signed.crt
Enter keystore password: some_password
Owner: CN=Database Admin, OU=MyCompany, O=Unknown, L=Cambridge, ST=MA, C=US
Issuer: EMAILADDRESS=myemail@mycompany.com, CN=*.mycompany.com, O=MyCompany, L=Cambridge, ST=MA, C=US
Serial number: b4bba9a1828ecaaf
Valid from: Tue Mar 26 12:26:34 EDT 2019 until: Wed Mar 25 12:26:34 EDT 2020
Certificate fingerprints:
MD5: 17:EA:3E:15:B4:15:E9:93:67:EE:59:C0:4F:D1:4C:01
SHA1: D5:35:B7:F7:44:7C:D6:B4:56:6F:38:2D:CD:3A:16:44:19:C1:06:B7
SHA256: 25:8C:46:03:60:A7:4C:10:A8:12:8E:EA:4A:FA:42:1D:A8:C5:FB:65:81:74:CB:46:FD:B1:33:64:F2:A3:46:B0
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 1
Trust this certificate? [no]: yes
Certificate was added to keystore
If you are not logged into the Kafka broker for which you prepared the keystore, copy the truststore and keystore to it using scp. If you have already decided where to store the keystore and truststore files in the broker's filesystem, you can directly copy them to their final destination. This example just copies them to the root user's home directory temporarily. The next step moves them into their final location.
Repeat steps 2 through 7 for the remaining Kafka brokers.
Allow Kafka to read the keystore and truststore
If you did not copy the truststore and keystore to directory where Kafka can read them in the previous step, you must copy them to a final location on the broker. You must also allow the user account you use to run Kafka to read these files. The easiest way to ensure the user's access is to give this user ownership of these files.
In this example, Kafka is run by a Linux user kafka. If you use another user to run Kafka, be sure to set the permissions on the truststore and keystore files appropriately.
Log into the Kafka broker as root.
Copy the truststore and keystore to a directory where Kafka can access them. There is no set location for these files: you can choose a directory under /etc, or some other location where configuration files are usually stored. This example copies them from root's home directory to Kafka's configuration directory named /opt/kafka/config/. In your own system, this configuration directory may be in a different location depending on how you installed Kafka.
Copy the truststore and keystore to a directory where Kafka can access them. There is no set location for these files: you can choose a directory under /etc, or some other location where configuration files are usually stored. This example copies them from root's home directory to Kafka's configuration directory named /opt/kafka/config/. In your own system, this configuration directory may be in a different location depending on how you installed Kafka.
~# cd /opt/kafka/config/
/opt/kafka/config# cp /root/kafka01.keystore.jks /root/kafka.truststore.jks .
If you aren't logged in as a user account that runs Kafka, change the ownership of the truststore and keystore files. This example changes the ownership from root (which is the user currently logged in) to the kafka user:
/opt/kafka/config# ls -l
total 80
...
-rw-r--r-- 1 kafka nogroup 1221 Feb 21 2018 consumer.properties
-rw------- 1 root root 3277 Mar 27 08:03 kafka01.keystore.jks
-rw-r--r-- 1 root root 1048 Mar 27 08:03 kafka.truststore.jks
-rw-r--r-- 1 kafka nogroup 4727 Feb 21 2018 log4j.properties
...
/opt/kafka/config# chown kafka kafka01.keystore.jks kafka.truststore.jks
/opt/kafka/config# ls -l
total 80
...
-rw-r--r-- 1 kafka nogroup 1221 Feb 21 2018 consumer.properties
-rw------- 1 kafka root 3277 Mar 27 08:03 kafka01.keystore.jks
-rw-r--r-- 1 kafka root 1048 Mar 27 08:03 kafka.truststore.jks
-rw-r--r-- 1 kafka nogroup 4727 Feb 21 2018 log4j.properties
...
Repeat steps 1 through 3 for the remaining Kafka brokers.
Configure Kafka to use TLS
With the truststore and keystore in place, your next step is to edit the Kafka's server.properties configuration file to tell Kafka to use TLS/SSL encryption. This file is usually stored in the Kafka config directory. The location of this directory depends on how you installed Kafka. In this example, the file is located in /opt/kafka/config.
When editing the files, be sure you do not change their ownership. The best way to ensure Linux does not change the file's ownership is to use su to become the user account that runs Kafka, assuming you are not already logged in as that user:
$ /opt/kafka/config# su -s /bin/bash kafka
Note
The previous command lets you start a shell as the kafka system user even if that user cannot log in.
The server.properties file contains Kafka broker settings in a property=value format. To configure the Kafka broker to use SSL, alter or add the following property settings:
listeners
Host names and ports on which the Kafka broker listens. If you are not using SSL for connections between brokers, you must supply both a PLANTEXT and SSL option. For example:
Password for the Kafka broker's key in the keystore. You can make this password different than the keystore password if you choose.
ssl.truststore.location
Location of the truststore file.
ssl.truststore.password
Password to access the truststore.
ssl.enabled.protocols
TLS/SSL protocols that Kafka allows clients to use.
ssl.client.auth
Specifies whether SSL authentication is required or optional. The most secure setting for this setting is required to verify the client's identity.
Important
These settings vary depending on your version of Kafka. Always consult the Apache Kafka documentation for your version of Kafka before making changes to server.properties. In particular, be aware that Kafka version 2.0 and later enables host name verification for clients and inter-broker communications by default.
This example configures Kafka to verify client identities via SSL authentication. It does not use SSL to communicate with other brokers, so the server.properties file defines both SSL and PLAINTEXT listener ports. It does not supply a host name for listener ports which tells Kafka to listen on the default network interface.
The lines added to the kafka01 broker's copy of server.properties for this configuration are:
You must make these changes to the server.properties file on all of your brokers.
After making your changes to your broker's server.properties files, restart Kafka. How you restart Kafka depends on your installation:
If Kafka is running as part of a Hadoop cluster, you can usually restart it from within whatever interface you use to control Hadoop (such as Ambari).
If you installed Kafka directly, you can restart it either by directly running the kafka-server-stop.sh and kafka-server-start.sh scripts or via the Linux system's service control commands (such as systemctl). You must run this command on each broker.
Test the configuration
If you have not configured client authentication, you can quickly test whether Kafka can access its keystore by running the command:
The above method is not conclusive, however; it only tells you if Kafka is able to find its keystore.
The best test of whether Kafka is able to accept TLS connections is to configure the command-line Kafka producer and consumer. In order to configure these tools, you must first create a client keystore. These steps are identical to creating a broker keystore.
Note
This example assumes that Kafka has a topic named test that you can send test messages to.
Respond to the "What is your first and last name?" with the FQDN of the system you will use to run the producer and/or consumer. Answer the rest of the prompts with the details of your organization.
Export the client certificate so it can be signed:
Copy the keystore to a location where you will use it. For example, you could choose to copy it to the same directory where you copied the keystore for the Kafka broker. If you choose to copy it to some other location, or intend to use some other user to run the command-line clients, be sure to add a copy of the truststore file you created for the brokers. Clients can reuse this truststore file for authenticating the Kafka brokers because the same CA is used to sign all of the certificates. Also set the file's ownership and permissions accordingly.
Next, you must create a properties file (similar to the broker's server.properties file) that configures the command-line clients to use TLS. For a client running on the Kafka broker named kafka01, your configuration file could would look like this:
This property file assumes the keystore file is located in the Kafka configuration directory.
Finally, you can run the command line producer or consumer to ensure they can connect and process data. You supply these clients the properties file you just created. The following example assumes you stored the properties file in the Kafka configuration directory, and that Kafka is installed in /opt/kafka:
~# cd /opt/kafka
/opt/kafka# bin/kafka-console-producer.sh --broker-list kafka01.mycompany.com:9093 \
--topic test --producer.config config/client.properties
>test
>test again
>More testing. These messages seem to be getting through!
^D
/opt/kafka# bin/kafka-console-consumer.sh --bootstrap-server kafaka01.mycompany.com:9093 --topic test \
--consumer.config config/client.properties --from-beginning
test
test again
More testing. These messages seem to be getting through!
^C
Processed a total of 3 messages
Loading data from Kafka
After you configure Kafka to accept TLS connections, verify that you can directly load data from it into Vertica. You should perform this step even if you plan to create a scheduler to automatically stream data.
You can choose to create a separate key and certificate for directly loading data from Kafka. This example re-uses the key and certificate created for the Vertica server in part 2 of this example.
You directly load data from Kafka by using the KafkaSource data source function with the COPY statement (see Manually consume data from Kafka). The KafkaSource function creates the connection to Kafka, so it needs a key, certificate, and related passwords to create an encrypted connection. You pass this information via session parameters. See Kafka user-defined session parameters for a list of these parameters.
The easiest way to get the key and certificate into the parameters is by first reading them into vsql variables. You get their contents by using back quotes to read the file contents via the Linux shell. Then you set the session parameters from the variables. Before setting the session parameters, increase the MaxSessionUDParameterSize session parameter to add enough storage space in the session variables for the key and the certificates. They can be larger than the default size limit for session variables (1000 bytes).
The following example reads the server key and certificate and the root CA from the a directory named /home/dbadmin/SSL. Because the server's key password is not saved in a file, the example sets it in a Linux environment variable named KVERTICA_PASS before running vsql. The example sets MaxSessionUDParameterSize to 100000 before setting the session variables. Finally, it enables TLS for the Kafka connection and streams data from the topic named test.
$ export KVERTICA_PASS=server_key_password
$ vsql
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
SSL connection (cipher: DHE-RSA-AES256-GCM-SHA384, bits: 256, protocol: TLSv1.2)
=> \set cert '\''`cat /home/dbadmin/SSL/server.crt`'\''
=> \set pkey '\''`cat /home/dbadmin/SSL/server.key`'\''
=> \set ca '\''`cat /home/dbadmin/SSL/root.crt`'\''
=> \set pass '\''`echo $KVERTICA_PASS`'\''
=> alter session set MaxSessionUDParameterSize=100000;
ALTER SESSION
=> ALTER SESSION SET UDPARAMETER kafka_SSL_Certificate=:cert;
ALTER SESSION
=> ALTER SESSION SET UDPARAMETER kafka_SSL_PrivateKey_secret=:pkey;
ALTER SESSION
=> ALTER SESSION SET UDPARAMETER kafka_SSL_PrivateKeyPassword_secret=:pass;
ALTER SESSION
=> ALTER SESSION SET UDPARAMETER kafka_SSL_CA=:ca;
ALTER SESSION
=> ALTER SESSION SET UDPARAMETER kafka_Enable_SSL=1;
ALTER SESSION
=> CREATE TABLE t (a VARCHAR);
CREATE TABLE
=> COPY t SOURCE KafkaSource(brokers='kafka01.mycompany.com:9093',
stream='test|0|-2', stop_on_eof=true,
duration=interval '5 seconds')
PARSER KafkaParser();
Rows Loaded
-------------
3
(1 row)
=> SELECT * FROM t;
a
---------------------------------------------------------
test again
More testing. These messages seem to be getting through!
test
(3 rows)
Configure the scheduler
The final piece of the configuration is to set up the scheduler to use SSL when communicating with Kafka (and optionally with Vertica). When the scheduler runs a COPY command to get data from Kafka, it uses its own key and certificate to authenticate with Kafka. If you choose to have the scheduler use TLS/SSL to connect to Vertica, it can reuse the same keystore and truststore to make this connection.
Create a truststore and keystore for the scheduler
Because the scheduler is a separate component, it must have its own key and certificate. The scheduler runs in Java and uses the JDBC interface to connect to Vertica. Therefore, you must create a keystore (when ssl.client.auth=required ) and truststore for it to use when making a TLS-encrypted connection to Vertica.
Keep in mind that creating a keystore is optional if your Kafka server sets ssl.client.auth to none or requested.
This process is similar to creating the truststores and keystores for Kafka brokers. The main difference is using the the -dname option for keytool to set the Common Name (CN) for the key to a domain wildcard. Using this setting allows the key and certificate to match any host in the network. This option is especially useful if you run multiple schedulers on different servers to provide redundancy. The schedulers can use the same key and certificate, no matter which server they are running on in your domain.
Create a truststore file for the scheduler. Add the CA certificate that you used to sign the keystore of the Kafka cluster and Vertica cluster. If you are using more than one CA to sign your certificates, add all of the CAs you used.
$ keytool -keystore scheduler.truststore.jks -alias CARoot -import \
-file root.crt
Enter keystore password: some_password
Re-enter new password: some_password
Owner: EMAILADDRESS=myemail@mycompany.com, CN=*.mycompany.com, O=MyCompany, L=Cambridge, ST=MA, C=US
Issuer: EMAILADDRESS=myemail@mycompany.com, CN=*.mycompany.com, O=MyCompany, L=Cambridge, ST=MA, C=US
Serial number: c3f02e87707d01aa
Valid from: Fri Mar 22 13:37:37 EDT 2019 until: Sun Apr 21 13:37:37 EDT 2019
Certificate fingerprints:
MD5: 73:B1:87:87:7B:FE:F1:6E:94:55:FD:AF:5D:D0:C3:0C
SHA1: C0:69:1C:93:54:21:87:C7:03:93:FE:39:45:66:DE:22:18:7E:CD:94
SHA256: 23:03:BB:B7:10:12:50:D9:C5:D0:B7:58:97:41:1E:0F:25:A0:DB:
D0:1E:7D:F9:6E:60:8F:79:A6:1C:3F:DD:D5
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3
Extensions:
#1: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: 50 69 11 64 45 E9 CC C5 09 EE 26 B5 3E 71 39 7C Pi.dE.....&.>q9.
0010: E5 3D 78 16 .=x.
]
]
#2: ObjectId: 2.5.29.19 Criticality=false
BasicConstraints:[
CA:true
PathLen:2147483647
]
#3: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 50 69 11 64 45 E9 CC C5 09 EE 26 B5 3E 71 39 7C Pi.dE.....&.>q9.
0010: E5 3D 78 16 .=x.
]
]
Trust this certificate? [no]: yes
Certificate was added to keystore
Initialize the keystore, passing it a wildcard host name as the Common Name. The alias parameter in this command is important, as you use it later to identify the key the scheduler must use when creating SSL conections:
If you choose to use a file format other than the standard Java Keystore (JKS) format for your keystore or truststore files, you must use the correct file extension in the filename. For example, suppose you choose to use a keystore and truststore saved in PKCS#12 format. Then your keystore and trustore files must end with the .pfx or .p12 extension.
If the scheduler does not recognize the file's extension (or there is no extension in the file name), it assumes that the file is in JKS format. If the file is not in JKS format, you will see an error message when starting the scheduler, similar to "Failed to create an SSLSocketFactory when setting up TLS: keystore not found."
Export the scheduler's key so you can sign it with the root CA:
You must pass several settings to the JDBC interface of the Java Virtual Machine (JVM) that runs the scheduler. These settings tell the JDBC driver where to find the keystore and truststore, as well as the key's password. The easiest way to pass in these settings is to set a Linux environment variable named VKCONFIG_JVM_OPTS. As it starts, the scheduler checks this environment variable and passes any properties defined in it to the JVM.
The properties that you need to set are:
javax.net.ssl.keystore: the absolute path to the keystore file to use.
javax.net.ssl.keyStorePassword: the password for the scheduler's key.
javax.net.ssl.trustStore: The absolute path to the truststore file.
The Linux command line to set the environment variable is:
The previous command preserves any existing contents of the VKCONFIG_JVM_OPTS variable. If you find the variable has duplicate settings, remove the $VKCONFIG_JVM_OPTS from your statement so you override the existing values in the variable.
For example, suppose the scheduler's truststore and keystore are located in the directory /home/dbadmin/SSL. Then you could use the following command to set the VKCONFIG_JVM_OPTS variable:
To ensure that this variable is always set, add the command to the ~/.bashrc or other startup file of the user account that runs the scheduler.
If you require TLS on the JDBC connection to Vertica, add TLSmode=require to the JDBC URL that the scheduler uses. The easiest way to add this is to use the scheduler's --jdbc-url option. Assuming that you use a configuration file for your scheduler, you can add this line to it:
For more information about using the JDBC with Vertica, see Java.
Enable TLS in the scheduler configuration
Lastly, enable TLS. Every time you run vkconfig, you must pass it the following options:
--enable-ssl
true, to enable the scheduler to use SSL when connecting to Kafka.
--ssl-ca-alias
Alias for the CA you used to sign your Kafka broker's keys. This must match the value you supplied to the -alias argument of the keytool command to import the CA into the truststore.
Note
If you used more than one CA to sign keys, omit this option to import all of the CAs into the truststore.
--ssl-key-alias
Alias assigned to the schedule key. This value must match the value you supplied to the -alias you supplied to the keytool command when creating the scheduler's keystore.
--ssl-key-password
Password for the scheduler key.
See Common vkconfig script options for details of these options. For convenience and security, add these options to a configuration file that you pass to vkconfig. Otherwise, you run the risk of exposing the key password via the process list which can be viewed by other users on the same system. See Configuration File Format for more information on setting up a configuration file.
Add the following to the scheduler configuration file to allow it to use the keystore and truststore and enable TLS when connecting to Vertica:
Once you have configured the scheduler to use SSL, start it and verify that it can load data. For example, to start the scheduler with a configuration file named weblog.conf, use the command:
After configuring Vertica, Kafka, and your scheduler to use TLS/SSL authentication and encryption, you may encounter issues with data streaming.
After configuring Vertica, Kafka, and your scheduler to use TLS/SSL authentication and encryption, you may encounter issues with data streaming. This section explains some of the more common errors you may encounter, and how to trouble shoot them.
Errors when launching the scheduler
You may see errors like this when launching the scheduler:
$ vkconfig launch --conf weblog.conf
java.sql.SQLNonTransientException: com.vertica.solutions.kafka.exception.ConfigurationException:
No keystore system property found: null
at com.vertica.solutions.kafka.util.SQLUtilities.getConnection(SQLUtilities.java:181)
at com.vertica.solutions.kafka.cli.CLIUtil.assertDBConnectionWorks(CLIUtil.java:40)
at com.vertica.solutions.kafka.Launcher.run(Launcher.java:135)
at com.vertica.solutions.kafka.Launcher.main(Launcher.java:263)
Caused by: com.vertica.solutions.kafka.exception.ConfigurationException: No keystore system property found: null
at com.vertica.solutions.kafka.security.KeyStoreUtil.loadStore(KeyStoreUtil.java:77)
at com.vertica.solutions.kafka.security.KeyStoreUtil.<init>(KeyStoreUtil.java:42)
at com.vertica.solutions.kafka.util.SQLUtilities.getConnection(SQLUtilities.java:179)
... 3 more
The scheduler throws these errors when it cannot locate or read the keystore or truststore files. To resolve this issue:
Verify you have set the VKCONFIG_JVM_OPTS Linux environment variable. Without this variable, the scheduler will not know where to find the truststore and keystore to use when creating TLS/SSL connections. See Step 2: Set the VKCONFIG_JVM_OPTS Environment Variable for more information.
Verify that the keystore and truststore files are located in the path you set in the VKCONFIG_JVM_OPTS environment variable.
Verify that the user account that runs the scheduler has read access to the trustore and keystore files.
Verify that the key password you provide in the scheduler configuration is correct. Note that you must supply the password for the key, not the keystore.
Another possible error message is a failure to set up a TLS Keystore:
Exception in thread "main" java.sql.SQLRecoverableException: [Vertica][VJDBC](100024) IOException while communicating with server: java.io.IOException: Failed to create an SSLSocketFactory when setting up TLS: keystore not found.
at com.vertica.io.ProtocolStream.logAndConvertToNetworkException(Unknown Source)
at com.vertica.io.ProtocolStream.enableSSL(Unknown Source)
at com.vertica.io.ProtocolStream.initSession(Unknown Source)
at com.vertica.core.VConnection.tryConnect(Unknown Source)
at com.vertica.core.VConnection.connect(Unknown Source)
. . .
This error can be caused by using a keystore or truststore file in a format other than JKS and not supplying the correct file extension. If the scheduler does not recognize the file extension of your keystore or truststore file name, it assumes the file is in JKS format. If the file isn't in this format, the scheduler will exit with the error message shown above. To correct this error, rename the keystore and truststore files to use the correct file extension. For example, if your files are in PKCS 12 filemat, change their file extension to .p12 or .pks.
Data does not load
If you find that scheduler is not loading data into your database, you should first query the stream_microbatch_history table to determine whether the scheduler is executing microbatches, and if so, what their results are. A faulty TLS/SSL configuration usually results in a status of NETWORK_ISSUE:
If you suspect an SSL issue, you can verify that Vertica is establishing a connection to Kafka by looking at Kafka's server.log file. Failed SSL connection attempts can appear in this log like this example:
java.io.IOException: Unexpected status returned by SSLEngine.wrap, expected
CLOSED, received OK. Will not send close message to peer.
at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:172)
at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:703)
at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:61)
at org.apache.kafka.common.network.Selector.doClose(Selector.java:739)
at org.apache.kafka.common.network.Selector.close(Selector.java:727)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:520)
at org.apache.kafka.common.network.Selector.poll(Selector.java:412)
at kafka.network.Processor.poll(SocketServer.scala:551)
at kafka.network.Processor.run(SocketServer.scala:468)
at java.lang.Thread.run(Thread.java:748)
If you do not see errors of this sort, you likely have a network problem between Kafka and Vertica. If you do see these errors, consider the following debugging steps:
Verify that the configuration of your Kafka cluster is uniform. For example, you may see connection errors if some Kafka nodes are set to require client authentication and others aren't.
Verify that the Common Names (CN) in the certificates and keys match the host name of the system.
Verify that the Kafka cluster is accepting connections on the ports and host names you specify in the server.properties file's listeners property. For example, suppose you use IP addresses in this setting, but use host names when defining the cluster in the scheduler's configuration. Then Kafka may reject the connection attempt by Vertica or Vertica may reject the Kafka node's identity.
If you are using client authentication in Kafka, try turning it off to see if the scheduler can connect. If disabling authentication allows the scheduler to stream data, then you can isolate the problem to client authentication. In this case, review the certificates and CAs of both the Kafka cluster and the scheduler. Ensure that the truststores include all of the CAs used to sign the key, up to and including the root CA.
Avro schema registry and KafkaAvroParser
At minimum, the KafkaAvroParser requires the following parameters to create a TLS connection between the Avro Schema Registry and Vertica:
schema_registry_url with the https scheme
schema_registry_ssl_ca_path
If and only if TLS access fails, determine what TLS schema registry information Vertica requires from the following:
Certificate Authority (CA)
TLS server certificate
TLS key
Provide only the necessary TLS schema registry information with KafkaAvroParser parameters. The TLS information must be accessible in the filesystem of each Vertica node that processes Avro data.
The following example shows how to pass these parameters to KafkaAvroParser: