Hi,
I’m working on a research project where I’m looking into using FoundationDB as a database backend for Samba, the opensource implementation of the Microsoft fileserver protocol SMB.
I’m struggling with achieving scalability when going from one fdb node to more, up to 32. When running an SMB benchmark I’m only seeing an increase in performance by a factor of two when going from one fdb node to 32. I’m maxing out at at 24k fdb txn/s with 32 nodes, regardless of changing the number of log or storage server.
Some background: Windows maintains additional per file-handle state like caching info and mandatory sharemodes which can’t be mapped to POSIX and hence Samba has to store them in database.
Traditionally on a single node server we use our homegrown TDB, which is basically a mmapped hashtable with chaining. But then, Samba also supports clustering by sharing a cluster filesystem like Ceph, GlusterFS, GPFS or any other cluster filesystem.
There, in order to maintain the Windows specific state in a cluster coherent fashion, we use our homegrown clustered database “ctdb”.
This works quite well with exceptional performance and reasonable scalability. However, ctdb cheats: database records are not replicated so the database will loose data on node failure. Historically these crash semantics have been “good enough” for the fileserver, but in oder to follow suite of MS who specified and implemented the so called “Transparent Failover” for the SMB protocol, Samba needs a failsafe distributed database with at least linearizable consistency.
Which brings us to FoundationDB.
I’m currently running benchmarks of a Samba prototype that connects to fdb using Python C bindings in Samba to hook to a Python script that consumes the fdb Python module.
While this generally works and I was able to achieve up to 2000 SMB protocol open/close ops per second in the past with a proof-of-concept three node fdb server in a local datacenter that was not really well configured or tuned.
To allow testing with larger Samba and FoundationDB clusters I’m now using terraform with Azure backend where we can set the cluster size just by changing some terraform variables.
Unfortunately, I’m not seeing the scalability I was expecting. I’m able to drive up to 24,000 txn/s with the following config:
- 32 VMs in Azure
- each VM has two cores and two disks
- each VM configured with two fdb processes
- datadir goes to the dedicated second disk
- the disk does roughly 8k IOPS per second (measured with fio)
- assuming performance would benefit from log processes I’ve configured the first fdb process on all nodes with class=log and left the second one unspecified
- db access pattern is completely non-concurrent 100% writes, small keys (24 bytes), small records (1KB max)
I’ve been scraping the forums for some advice on how to properly configure and tune fdb, which at least made me switch from fat VMs with many cores to two core CPUs giving a better disk/CPU ratio.
It seems FoundationDB gets overwhelmed once more then a few thousand (sic) clients are connected. I had to tweak a few system resource limits (max-files, POSIX aio contexts) to get past 1000 clients.
I’m able to run my bechmark tool with 4000 clients, which translates to 4000 fdb connections, but beyond that operation latency sky rockets and I’m getting application failures. Is fdb supposed to handle a large number of client, say 5k, 50k or more?
Looooong story short: am I hitting scalability limits or is my setup just not properly configured?
Thanks in advance for any pointer!