FoundationDB

Production deployment


(Thomas Johson) #1

Hi,

After weeks of reading the documentation and playing with test codes I think I’m ready to deploy it in production. However I’m having hard time figuring out the deployment scheme (I browsed through the sitemap, this forum and also github issues and can’t find any meaningful resource how to structure a production cluster).

So here is a design I’m thinking about:

  • 5 servers
  • 8 cores each
  • 52GB ram each
  • 375GB local ssd as front cache (bcache)
  • 3TB SSD cache as storage
  • double redundancy

What I’m struggling to understand is how to distribute the communication and how to properly setup different classes of servers.

As each FDB instance is tied up to a CPU, it also has a different port. So to expose all of them to each other, do you put a proxy in front of it? So lets say I have 8 FDBs per server starting at the port 4500, then do I put a proxy on lets say 3333 in front of them and round-robin distribute the load on the server? Or is this somehow done on the client like it’s with aerospike and couple of others. Additionally, as I have 5 servers, do I again load balance the traffic evenly between all the servers through a proxy or is this done again on a client?

I suspect that both are done one the client through the cluster file, but I’m just guessing here.

What are the best practices when deploying and managing this kind of cluster? I haven’t found any extensive provision scripts (ansible or others), so trying to figure out what’s the best way to do so, especially as the cluster is populated through the cluster file. Do you produce it manually before the deployment and then just distribute it everywhere, or?

And finally, what’s the best structure of the process class in this kind of cluster? I read the guide but still not sure what’s the best, at least initial, structure. Does it depend on the purpose of the cluster?

My primary goal right now is to use it as a cache for our micro services but would like to expand the use case later on, replacing most of our NoSQL deployments (including redis).

Any suggestions are greatly appreciated.


(Viacheslav Biriukov) #2

Client library of fdb is in charge of all this magic. You just need to provide a proper start point file fdb.cluster with coordinators.

I did it manually.

Look at my topic, there is some info about process classes Segmentation fault error and broken cluster


(Thomas Johson) #3

This is very helpful, thank you.

You mentioned, that you manage your cluster manually. Any good tips on best practices? Do you just spin one servers and then copy the cluster file from that server to others, or do you do it in any other way?

Read through the thread, a lot of useful information there.


(Alec Grieser) #4

As @brk0v mentions, all of this kind of load balancing is done through the client in a way that is transparent to the user. In particular, the way this works is that each process in the cluster uses the cluster file as a discovery mechanism. Essentially, all of the servers get registered by a central authority (discovered through the cluster file) and given roles. Then the clients talk to that same central authority (directly or indirectly) and then can do things like load balance their subsequent requests amongst the cluster.

My guess is that because of how the networking works in fdb, adding a proxy between the client and servers would probably confuse the client. For example, each client reads keys be reading directly from a storage server that has the key it needs which it gets the address of through this registry information. If the IP and port registered don’t match the required one (because it’s through a load balancing proxy), you won’t be able to reach that server.

Creating a cluster file manually and then distributing it out sounds reasonable to me as a cluster file distribution scheme. There’s more on the format of this file in the Administration Guide. You could spin one up and then copy it over, but I suspect that that will end up being more hassle than it’s worth in the long run (though maybe it’s fine for an initial cluster).

I’m not sure I totally understand this question, but are you asking how to configure the process classes of each fdbserver process? I think the most helpful thing that may not be obvious from the docs is that the transaction log processes use way more disk I/O than the storage server processes, so for best performance, you will probably want to make sure that those processes are using a different disk (if possible, a better, faster disk) than the storage servers. If you’re looking for the number of each roll to include (e.g., logs=4, proxies=5, etc.), then that’s harder for me to give you more general advice, but that will depend on the number of machines in your cluster. Are you at 5 servers with 1 disk each? You might need to run an additional process per server of the “stateless” class (they won’t all get used). Also, it might be good if you can run at least three transaction logs and three storage servers (so that you can loose a disk from either one and still have two of each, the required number for double redundancy). So 6 disks (three large for the storage and three small for the transaction logs, if possible) would probably be my advice.


(Thomas Johson) #5

Thank you Alec,

This was truly exhausting answer. Really appreciate it.

How do you structure this though? It almost sounds like I would have to manually assign different types on each server to point to different hard-drives. Sounds like a devops nightmare. I wish that there would be an easier way to do this (maybe get inspired by tablespaces).

Would you just run this through fdb cli or rather through the config? I understand that this fine tuning eventually gives the performance that is to attractive, but the deployment of this is no easy feat.

Maybe you guys could invest a bit more into create some provisioning scripts? Would you give us (me) pointers, I’m happy to contribute. Hard to start when you know so little as I do though.

Anyway, really appreciate your answer. Will try my best to figure this out. Really want to get FDB a fighting chance.


(matthew zeier) #6

We build a seed file in terraform and stuff in an S3 bucket for a newly launched cluster. During the lifecycle of our application, we continually update the S3 bucker.

New hosts launched to expand the cluster, pull the seed file down and auto-join the existing cluster.