(Solved) Correct setup of TLS for FoundationDB

As a preface, I fundamentally love what has already been achieved with FoundationDB. Constructing a reliable and flexible primitive (KV) and building on top of it feels so clearly the right idea to me. I love that there are hints of BigTable history too :slight_smile:

I’ve been setting up a cluster for FoundationDB on a public cloud and have wanted to add TLS for communication. I have followed the instructions as best I can from FDB’s TLS page but have yet to get success.

The trace reports “FDBLibTLSVerifyCert VerifyError self signed certificate” followed by “TLSConnectionHandshakeError” so I’d imagine I’m just following the setup incorrectly. I also looked at the tests provided in the TLS plugin directory to try to shed some light but that hasn’t led me to success.

Given I’ve produced fdb.pem from scratch using the OpenSSL instructions on that page, can anyone help me with what’s the minimal amount of settings to have my servers and clients communicating using TLS? I may well be failing to set tls_certificate_file / tls_key_file / tls_ca_file correctly though I’ve tried a few permutations so far.

If someone helps me even in loose form here I’ll happily update the documentation so you’ll get less questions from people like me :wink: Thanks! :slight_smile:

P.S. I’ve also found that if the Python client has an issue with forming a connection (either due to TLS or purposefully breaking the cluster file by changing the descriptor for example) then Python will freeze indefinitely and must be Ctrl + Z’ed. I will investigate that further and potentially create an issue.

I can’t speak to the TLS connection setup, but I’ve hit that freezing issue before too. Ctrl+Z only backgrounds a process - those python processes will still be running. On a few occasions I’ve ended up needing to killall Python to stop it. (or killall python or something on linux.)

I don’t know why it doesn’t respond to SIGINT, but the problem is that the cluster’s database doesn’t exist / isn’t configured. You can fix that by connecting with fdbcli and running configure new single memory. (Or one of the other variants of configure new based on the storage engine and redundancy mode you want - help configure for options). Once you do that those python processes will probably suddenly spring back to life and start running again as if nothing happened.

Yeah, this doesn’t seem like great behavior. Feel free to raise an issue on GitHub for somebody to take a look.

I’m not super familiar with our TLS configuration options, but I tried playing around with this myself to see what I could find. For my first attempt, I tried the following:

  1. Update the cluster file to add the :tls suffix to each coordinator
  2. Create a certificate and key file (I ran the commands listed in this section)
  3. Add tls_certificate_file, tls_key_file, and tls_verify_peers to the fdbserver section of foundationdb.conf.
tls_certificate_file=/path/to/tls/fdb.pem
tls_key_file=/path/to/tls/private.key
tls_verify_peers=Check.Valid=0
  1. I connected to the cluster using fdbcli, which required me to pass --tls_certificate_file and --tls_key_file. From a client, I think you would either use the corresponding client options or the environment variables.

Notice that I’m turning off peer verification, which allows me to get past the “VerifyError self signed certificate”.

Once I confirmed that worked, I tried turning back on peer verification by removing the tls_verify_peers parameter. I then got it to work by modifying foundationdb.conf to add tls_ca_file=/path/to/tls/cert.crt, which is one of the files generated in step #2 above.

One thing I noticed, though, is that it doesn’t appear that backup agents can accept these TLS parameters, so my backup agent is not able to connect. I suppose it could be made to work using the environment variables, but I’ll raise an issue to support the command line options as well.

Good shout. Added an issue to github:

Thank you both @ajbeamon and @josephg, brilliant replies! I apologize for not replying earlier giving you proper thanks but I was essentially away from keyboard for two weeks.

Perfect step by step @ajbeamon. I was able to replicate your SSL setup after creating a new FDB cluster :sunny: For now I’m throwing the kitchen sink (cert file + key file + CA file + no verification) at fdbcli as my setup still doesn’t run without the CA file even with tls_verify_peers=Check.Valid=0 - but I really don’t mind as it works :wink:

I also saw that @bnamasivayam has already fixed the “Support TLS command line options in backup” and it’s waiting to be committed. So exciting!

I am curious if there’s a way of indicating the TLS information that should be forwarded to the backup agent from the configuration file however as it’s automatically started by fdbmonitor. I’m guessing the laziest setup would be (when the command line options are merged):

[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent --tls_certificate_file /etc/foundationdb/fdb.pem --tls_key_file /etc/foundationdb/private.key --tls_ca_file /etc/foundationdb/cert.crt

Until then (for my and others reference), backups can be performed by starting a “TLS happy” backup agent and running the standard backup operations as expected.

FDB_TLS_CERTIFICATE_FILE="/etc/foundationdb/fdb.pem" FDB_TLS_KEY_FILE="/etc/foundationdb/private.key" FDB_TLS_CA_FILE="/etc/foundationdb/cert.crt" /usr/lib/foundationdb/backup_agent/backup_agent -C /etc/foundationdb/fdb.cluster

I’ll look at adding a few small bits to FoundationDB’s TLS docs to make it easier for the next person who doesn’t have a @ajbeamon to appear magically out of nowhere to help them :slight_smile:

tldr; I honestly can’t tell you how excited I am. As noted in the original post, I’m a strong believer in the FoundationDB layered philosophy but I am almost equally amazed by the responsiveness of the forums and community. I’ll be contributing to it for certain.

2 Likes

Once the command-line options are available, you should be able to add each parameter as a line in the configuration, similar to how it is being done in the fdbserver example in the previous post. Each of those gets converted to a command-line option that is passed to the binary when run.

Glad to hear you were able to get it running! If at some point you want to pursue solving this last problem (which may be helpful for the documentation part), let us know and we can try to figure out what’s going on.

I have set up a TLS enabled cluster following the steps highlighted by @ajbeamon and have verified the behavior by trying to connect to the same using fdbcli and providing the necessary certificate and key files. But while connecting to the cluster from a java client showed some strange behavior. Setting the TLS paths in the network options while running the java client through Windows OS gave a TLS error but interestingly enough the same worked on a unix machine. The network options are set like this:

        fdb.options().setTLSCertPath("path to cert file");
        fdb.options().setTLSCaPath("path to CA file");
        fdb.options().setTLSKeyPath("path to key file");
        fdb.options().setTLSVerifyPeers("Check.Valid=1".getBytes());

and the stacktrace looks like this :

Exception in thread “main” com.apple.foundationdb.FDBException: TLS error
at com.apple.foundationdb.FDB.Network_setOption(Native Method)
at com.apple.foundationdb.OptionsSet.setOption(OptionsSet.java:55)
at com.apple.foundationdb.NetworkOptions.setTLSCertPath(NetworkOptions.java:87)
at Transaction.main(Transaction.java:33)

can anyone please help if i am missing something here?

And also for the record layer i couldn’t find any way from which the TLS paths can be set in the network options from the java client. Is there any other way to specify the tls paths in java client itself?