(Solved) Correct setup of TLS for FoundationDB

(Stephen Merity) #1

As a preface, I fundamentally love what has already been achieved with FoundationDB. Constructing a reliable and flexible primitive (KV) and building on top of it feels so clearly the right idea to me. I love that there are hints of BigTable history too :slight_smile:

I’ve been setting up a cluster for FoundationDB on a public cloud and have wanted to add TLS for communication. I have followed the instructions as best I can from FDB’s TLS page but have yet to get success.

The trace reports “FDBLibTLSVerifyCert VerifyError self signed certificate” followed by “TLSConnectionHandshakeError” so I’d imagine I’m just following the setup incorrectly. I also looked at the tests provided in the TLS plugin directory to try to shed some light but that hasn’t led me to success.

Given I’ve produced fdb.pem from scratch using the OpenSSL instructions on that page, can anyone help me with what’s the minimal amount of settings to have my servers and clients communicating using TLS? I may well be failing to set tls_certificate_file / tls_key_file / tls_ca_file correctly though I’ve tried a few permutations so far.

If someone helps me even in loose form here I’ll happily update the documentation so you’ll get less questions from people like me :wink: Thanks! :slight_smile:

P.S. I’ve also found that if the Python client has an issue with forming a connection (either due to TLS or purposefully breaking the cluster file by changing the descriptor for example) then Python will freeze indefinitely and must be Ctrl + Z’ed. I will investigate that further and potentially create an issue.

(Seph Gentle) #2

I can’t speak to the TLS connection setup, but I’ve hit that freezing issue before too. Ctrl+Z only backgrounds a process - those python processes will still be running. On a few occasions I’ve ended up needing to killall Python to stop it. (or killall python or something on linux.)

I don’t know why it doesn’t respond to SIGINT, but the problem is that the cluster’s database doesn’t exist / isn’t configured. You can fix that by connecting with fdbcli and running configure new single memory. (Or one of the other variants of configure new based on the storage engine and redundancy mode you want - help configure for options). Once you do that those python processes will probably suddenly spring back to life and start running again as if nothing happened.

(A.J. Beamon) #3

Yeah, this doesn’t seem like great behavior. Feel free to raise an issue on GitHub for somebody to take a look.

(A.J. Beamon) #4

I’m not super familiar with our TLS configuration options, but I tried playing around with this myself to see what I could find. For my first attempt, I tried the following:

  1. Update the cluster file to add the :tls suffix to each coordinator
  2. Create a certificate and key file (I ran the commands listed in this section)
  3. Add tls_certificate_file, tls_key_file, and tls_verify_peers to the fdbserver section of foundationdb.conf.
  1. I connected to the cluster using fdbcli, which required me to pass --tls_certificate_file and --tls_key_file. From a client, I think you would either use the corresponding client options or the environment variables.

Notice that I’m turning off peer verification, which allows me to get past the “VerifyError self signed certificate”.

Once I confirmed that worked, I tried turning back on peer verification by removing the tls_verify_peers parameter. I then got it to work by modifying foundationdb.conf to add tls_ca_file=/path/to/tls/cert.crt, which is one of the files generated in step #2 above.

One thing I noticed, though, is that it doesn’t appear that backup agents can accept these TLS parameters, so my backup agent is not able to connect. I suppose it could be made to work using the environment variables, but I’ll raise an issue to support the command line options as well.

(Seph Gentle) #5

Good shout. Added an issue to github:

(Stephen Merity) #6

Thank you both @ajbeamon and @josephg, brilliant replies! I apologize for not replying earlier giving you proper thanks but I was essentially away from keyboard for two weeks.

Perfect step by step @ajbeamon. I was able to replicate your SSL setup after creating a new FDB cluster :sunny: For now I’m throwing the kitchen sink (cert file + key file + CA file + no verification) at fdbcli as my setup still doesn’t run without the CA file even with tls_verify_peers=Check.Valid=0 - but I really don’t mind as it works :wink:

I also saw that @bnamasivayam has already fixed the “Support TLS command line options in backup” and it’s waiting to be committed. So exciting!

I am curious if there’s a way of indicating the TLS information that should be forwarded to the backup agent from the configuration file however as it’s automatically started by fdbmonitor. I’m guessing the laziest setup would be (when the command line options are merged):

command = /usr/lib/foundationdb/backup_agent/backup_agent --tls_certificate_file /etc/foundationdb/fdb.pem --tls_key_file /etc/foundationdb/private.key --tls_ca_file /etc/foundationdb/cert.crt

Until then (for my and others reference), backups can be performed by starting a “TLS happy” backup agent and running the standard backup operations as expected.

FDB_TLS_CERTIFICATE_FILE="/etc/foundationdb/fdb.pem" FDB_TLS_KEY_FILE="/etc/foundationdb/private.key" FDB_TLS_CA_FILE="/etc/foundationdb/cert.crt" /usr/lib/foundationdb/backup_agent/backup_agent -C /etc/foundationdb/fdb.cluster

I’ll look at adding a few small bits to FoundationDB’s TLS docs to make it easier for the next person who doesn’t have a @ajbeamon to appear magically out of nowhere to help them :slight_smile:

tldr; I honestly can’t tell you how excited I am. As noted in the original post, I’m a strong believer in the FoundationDB layered philosophy but I am almost equally amazed by the responsiveness of the forums and community. I’ll be contributing to it for certain.

(A.J. Beamon) #7

Once the command-line options are available, you should be able to add each parameter as a line in the configuration, similar to how it is being done in the fdbserver example in the previous post. Each of those gets converted to a command-line option that is passed to the binary when run.

Glad to hear you were able to get it running! If at some point you want to pursue solving this last problem (which may be helpful for the documentation part), let us know and we can try to figure out what’s going on.