So, I have been stress testing rev 1 of deploying FDB to AWS. This subject is documented in it’s own way in a load of forums, I’m surprised there is no single stop documentation or templates for spinning up a reliable cluster.
Rev 1:
3 x servers c5d.large (small, but for testing).
Use NVMe ephemeral for storage, double configuration for redundancy. Backup to s3.
Let FDB determine process types.
I made cloudformation scripts to:
launch a first server
mount the nvme drive
install fdb, change config to point logs and data to nvme drive, run 2 processes (4500, 4501)
push cluster file to s3
launch remaining servers
repeat 2-3
pull cluster file from s3
configure ssd, double
This hummed along fine for some time. Then one server crashed. I rebooted the server, the cluster returned, but could not confirm data health. Then I got an OOM fault from the same server on startup. The cluster limped along and I tried to find ways to get the health back. Next, the server bricked. There was a data corruption on the ephemeral drive (I think) and AWS dumped the disk and rebooted with a new one: this destroyed all data and bricked the server, because the /etc/fstab entry for the nvme drive was pointing to a uuid that no longer existed.
Is anyone running FDB on NVMe ephemeral drives and run into similar issues? Is EBS the only option for storage?
I know of a user running on ephemeral disks who saw an IO controller failure or similar fault that resulted in a hung cluster because the disk had a tlog on it. Other than that instance (which was fixed by killing the process), I have not heard of complaints similar to yours. This was also on the c5d family I think except larger instances.
Your configuration of only 4GB of memory and 2 processes is not ideal. The recommended is 4GB per process minimum. Instead of running 2 processes per host on that instance size try just running one. This is probably why you OOMd if there was an active workload stressing the cluster. If it were idle that would be another story…
100% agreed. The config is simply for a few stages of testing. I’m just ironing out the automation bugs and testing backup and restore under a minimal workload; thought I’d save some pennies with the small instances. The OOM was surprising, but it appears some processes are memory intensive (looks like transaction servers). The intent WAS to move to the following configuration:
3 x i3.xlarge - storage class
3 x i3.large - transaction class
3 x c5.xlarge - stateless class
But seeing the input of 8GB per process is the actual requirement / process. I’ll have to rethink this.
What is making this difficult is that the instance bricked. No ssh, no logs, etc. Here’s to hoping the original failure that led to the disk mount error was simply the OOM.
For some background on this project: I selected FDB for the flexibility, scalability, and speed I can gain on some unique data models that are used for ML and BI. A lot of the big data models are write once read many. Using the ephemeral NVMe drives gives us a cost and speed boost that is worth it if it can work. Since the development team is small, our API also uses FDB for our resources. After a series of events, I’m now having to take on the deployment of FDB in the cloud. I need to become a FDB DBA (unless someone has made a managed FDB product in the cloud??). Right now, I’m learning all I can by pouring through these forums and finding bits of information and piecing them together. Any interest in starting a “best practices in cloud deployment” guide? I’d help all I can.
For roughly the same budget I would deploy 5 i3.xlarge. The problem with specializing machines for roles in a small cluster like that is you’ve actually increased the number of machines which will cause a recovery during any of their failures without gaining any extra machine redundancy for your data. I would also advise a triple replicated configuration instead of double unless this data is being imported from some external source you can easily get back.
That configuration with (3 x 2) cores of transaction processes is probably more than you’d need to service (3 x 4) cores of storage processes.
If you must specialize machines for roles I would put transaction and stateless together and onto as few servers of the largest servers you can.
Fair point. The idea of configuring so many transaction processes was to be able to scale as our customer base converts to this solution. More data?: Just up the storage servers. In my mind a tiered scaling approach would make changes manageable up to a point where scaling the cluster’s core infra would be a great problem to have. It is possible for us to scale from ~1’sTB to ~100’sTB of managed data demand within a year or two.
This does seem overly complicated of a scheme, now. I can always configure machine roles later if needed. Thanks for the input! I’ve outlined where I’m getting my info. If you have any input on additional resources to find best practices, I’d appreciate it.