Troubleshooting Kafka TLS/SSL connection issues

After configuring Vertica, Kafka, and your scheduler to use TLS/SSL authentication and encryption, you may encounter issues with data streaming.

After configuring Vertica, Kafka, and your scheduler to use TLS/SSL authentication and encryption, you may encounter issues with data streaming. This section explains some of the more common errors you may encounter, and how to trouble shoot them.

Errors when launching the scheduler

You may see errors like this when launching the scheduler:

$ vkconfig launch --conf weblog.conf
java.sql.SQLNonTransientException: com.vertica.solutions.kafka.exception.ConfigurationException:
       No keystore system property found: null
    at com.vertica.solutions.kafka.util.SQLUtilities.getConnection(SQLUtilities.java:181)
    at com.vertica.solutions.kafka.cli.CLIUtil.assertDBConnectionWorks(CLIUtil.java:40)
    at com.vertica.solutions.kafka.Launcher.run(Launcher.java:135)
    at com.vertica.solutions.kafka.Launcher.main(Launcher.java:263)
Caused by: com.vertica.solutions.kafka.exception.ConfigurationException: No keystore system property found: null
    at com.vertica.solutions.kafka.security.KeyStoreUtil.loadStore(KeyStoreUtil.java:77)
    at com.vertica.solutions.kafka.security.KeyStoreUtil.<init>(KeyStoreUtil.java:42)
    at com.vertica.solutions.kafka.util.SQLUtilities.getConnection(SQLUtilities.java:179)
    ... 3 more

The scheduler throws these errors when it cannot locate or read the keystore or truststore files. To resolve this issue:

  • Verify you have set the VKCONFIG_JVM_OPTS Linux environment variable. Without this variable, the scheduler will not know where to find the truststore and keystore to use when creating TLS/SSL connections. See Step 2: Set the VKCONFIG_JVM_OPTS Environment Variable for more information.

  • Verify that the keystore and truststore files are located in the path you set in the VKCONFIG_JVM_OPTS environment variable.

  • Verify that the user account that runs the scheduler has read access to the trustore and keystore files.

  • Verify that the key password you provide in the scheduler configuration is correct. Note that you must supply the password for the key, not the keystore.

Another possible error message is a failure to set up a TLS Keystore:

Exception in thread "main" java.sql.SQLRecoverableException: [Vertica][VJDBC](100024) IOException while communicating with server: java.io.IOException: Failed to create an SSLSocketFactory when setting up TLS: keystore not found.
        at com.vertica.io.ProtocolStream.logAndConvertToNetworkException(Unknown Source)
        at com.vertica.io.ProtocolStream.enableSSL(Unknown Source)
        at com.vertica.io.ProtocolStream.initSession(Unknown Source)
        at com.vertica.core.VConnection.tryConnect(Unknown Source)
        at com.vertica.core.VConnection.connect(Unknown Source)
        . . .

This error can be caused by using a keystore or truststore file in a format other than JKS and not supplying the correct file extension. If the scheduler does not recognize the file extension of your keystore or truststore file name, it assumes the file is in JKS format. If the file isn't in this format, the scheduler will exit with the error message shown above. To correct this error, rename the keystore and truststore files to use the correct file extension. For example, if your files are in PKCS 12 filemat, change their file extension to .p12 or .pks.

Data does not load

If you find that scheduler is not loading data into your database, you should first query the stream_microbatch_history table to determine whether the scheduler is executing microbatches, and if so, what their results are. A faulty TLS/SSL configuration usually results in a status of NETWORK_ISSUE:

=> SELECT frame_start, end_reason, end_reason_message FROM weblog_sched.stream_microbatch_history;
       frame_start       |  end_reason   | end_reason_message
-------------------------+---------------+--------------------
 2019-04-05 11:35:18.365 | NETWORK_ISSUE |
 2019-04-05 11:35:38.462 | NETWORK_ISSUE |

If you suspect an SSL issue, you can verify that Vertica is establishing a connection to Kafka by looking at Kafka's server.log file. Failed SSL connection attempts can appear in this log like this example:

java.io.IOException: Unexpected status returned by SSLEngine.wrap, expected
        CLOSED, received OK. Will not send close message to peer.
        at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:172)
        at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:703)
        at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:61)
        at org.apache.kafka.common.network.Selector.doClose(Selector.java:739)
        at org.apache.kafka.common.network.Selector.close(Selector.java:727)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:520)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:412)
        at kafka.network.Processor.poll(SocketServer.scala:551)
        at kafka.network.Processor.run(SocketServer.scala:468)
        at java.lang.Thread.run(Thread.java:748)

If you do not see errors of this sort, you likely have a network problem between Kafka and Vertica. If you do see these errors, consider the following debugging steps:

  • Verify that the configuration of your Kafka cluster is uniform. For example, you may see connection errors if some Kafka nodes are set to require client authentication and others aren't.

  • Verify that the Common Names (CN) in the certificates and keys match the host name of the system.

  • Verify that the Kafka cluster is accepting connections on the ports and host names you specify in the server.properties file's listeners property. For example, suppose you use IP addresses in this setting, but use host names when defining the cluster in the scheduler's configuration. Then Kafka may reject the connection attempt by Vertica or Vertica may reject the Kafka node's identity.

  • If you are using client authentication in Kafka, try turning it off to see if the scheduler can connect. If disabling authentication allows the scheduler to stream data, then you can isolate the problem to client authentication. In this case, review the certificates and CAs of both the Kafka cluster and the scheduler. Ensure that the truststores include all of the CAs used to sign the key, up to and including the root CA.

Avro schema registry and KafkaAvroParser

At minimum, the KafkaAvroParser requires the following parameters to create a TLS connection between the Avro Schema Registry and Vertica:

  • schema_registry_url with the https scheme

  • schema_registry_ssl_ca_path

If and only if TLS access fails, determine what TLS schema registry information Vertica requires from the following:

  • Certificate Authority (CA)

  • TLS server certificate

  • TLS key

Provide only the necessary TLS schema registry information with KafkaAvroParser parameters. The TLS information must be accessible in the filesystem of each Vertica node that processes Avro data.

The following example shows how to pass these parameters to KafkaAvroParser:

KafkaAvroParser(
    schema_registry_url='https://localhost:8081'
    schema_registry_ssl_ca_path='path/to/certificate-authority',
    schema_registry_ssl_cert_path='path/to/tls-server-certificate',
    schema_registry_ssl_key_path='path/to/private-tls-key',
    schema_registry_ssl_key_password_path='path/to/private-tls-key-password'
)