After weeks of reading the documentation and playing with test codes I think I’m ready to deploy it in production. However I’m having hard time figuring out the deployment scheme (I browsed through the sitemap, this forum and also github issues and can’t find any meaningful resource how to structure a production cluster).
So here is a design I’m thinking about:
8 cores each
52GB ram each
375GB local ssd as front cache (bcache)
3TB SSD cache as storage
What I’m struggling to understand is how to distribute the communication and how to properly setup different classes of servers.
As each FDB instance is tied up to a CPU, it also has a different port. So to expose all of them to each other, do you put a proxy in front of it? So lets say I have 8 FDBs per server starting at the port 4500, then do I put a proxy on lets say 3333 in front of them and round-robin distribute the load on the server? Or is this somehow done on the client like it’s with aerospike and couple of others. Additionally, as I have 5 servers, do I again load balance the traffic evenly between all the servers through a proxy or is this done again on a client?
I suspect that both are done one the client through the cluster file, but I’m just guessing here.
What are the best practices when deploying and managing this kind of cluster? I haven’t found any extensive provision scripts (ansible or others), so trying to figure out what’s the best way to do so, especially as the cluster is populated through the cluster file. Do you produce it manually before the deployment and then just distribute it everywhere, or?
And finally, what’s the best structure of the process class in this kind of cluster? I read the guide but still not sure what’s the best, at least initial, structure. Does it depend on the purpose of the cluster?
My primary goal right now is to use it as a cache for our micro services but would like to expand the use case later on, replacing most of our NoSQL deployments (including redis).
You mentioned, that you manage your cluster manually. Any good tips on best practices? Do you just spin one servers and then copy the cluster file from that server to others, or do you do it in any other way?
Read through the thread, a lot of useful information there.
As @brk0v mentions, all of this kind of load balancing is done through the client in a way that is transparent to the user. In particular, the way this works is that each process in the cluster uses the cluster file as a discovery mechanism. Essentially, all of the servers get registered by a central authority (discovered through the cluster file) and given roles. Then the clients talk to that same central authority (directly or indirectly) and then can do things like load balance their subsequent requests amongst the cluster.
My guess is that because of how the networking works in fdb, adding a proxy between the client and servers would probably confuse the client. For example, each client reads keys be reading directly from a storage server that has the key it needs which it gets the address of through this registry information. If the IP and port registered don’t match the required one (because it’s through a load balancing proxy), you won’t be able to reach that server.
Creating a cluster file manually and then distributing it out sounds reasonable to me as a cluster file distribution scheme. There’s more on the format of this file in the Administration Guide. You could spin one up and then copy it over, but I suspect that that will end up being more hassle than it’s worth in the long run (though maybe it’s fine for an initial cluster).
I’m not sure I totally understand this question, but are you asking how to configure the process classes of each fdbserver process? I think the most helpful thing that may not be obvious from the docs is that the transaction log processes use way more disk I/O than the storage server processes, so for best performance, you will probably want to make sure that those processes are using a different disk (if possible, a better, faster disk) than the storage servers. If you’re looking for the number of each roll to include (e.g., logs=4, proxies=5, etc.), then that’s harder for me to give you more general advice, but that will depend on the number of machines in your cluster. Are you at 5 servers with 1 disk each? You might need to run an additional process per server of the “stateless” class (they won’t all get used). Also, it might be good if you can run at least three transaction logs and three storage servers (so that you can loose a disk from either one and still have two of each, the required number for double redundancy). So 6 disks (three large for the storage and three small for the transaction logs, if possible) would probably be my advice.
This was truly exhausting answer. Really appreciate it.
How do you structure this though? It almost sounds like I would have to manually assign different types on each server to point to different hard-drives. Sounds like a devops nightmare. I wish that there would be an easier way to do this (maybe get inspired by tablespaces).
Would you just run this through fdb cli or rather through the config? I understand that this fine tuning eventually gives the performance that is to attractive, but the deployment of this is no easy feat.
Maybe you guys could invest a bit more into create some provisioning scripts? Would you give us (me) pointers, I’m happy to contribute. Hard to start when you know so little as I do though.
Anyway, really appreciate your answer. Will try my best to figure this out. Really want to get FDB a fighting chance.
Yeah, I also suggest we make a kind of automated setup script to simplified deploy process, more friendly to fdb newbies. Especially for fdb, which has many decoupling roles/processes, people are hard to figure out how to distribute them and use machine resources efficiently to achieve best performance.
There is trick that was used by many existing FDB users: do not shoot for the best bare performance (e.g. transaction latency) from the start. Instead 1) benchmark your setup a) bare fdb cluster b) and, application using fdb cluster, 2) secure monitoring (e.g. AFAIU available disk space is very top priority) 3) configure the backup system 4) and plan and script a recovery plan.
Make it work, make it right, then make it fast
FDB provides an awesome potential for growth both from a number of user perspective, AND from a product feature perspective, there is a clear way to scale-up, but to scale-up you need to deliver, and FDB is also great to deliver complex features even without SQL layer. Getting a good grasp on the public interface whatever the programming language is very economical.