Hi there, I have been troubleshooting some strange queue buildups on the SS process in my FDB cluster (SSD-2, 2-replication, ver 6.0.18). I have not even been able to spot anything obvious standing out that points to this behavior.
It appears, that on a couple of SS nodes, the queue size starts to grow gradually (sometimes two and sometimes just on a single node) and reaches saturation levels.
We do have one component in our data modeling behavior that adds a small key-value pair with a common prefix (that would most likely land on a common shard).
This cluster is of 10 machines hosting 2 fdb process each. One of the FDB processes on each node hosts an SS. The write volume is quite low: ~700 commits/sec with ~3000 key-writes/sec. The bytes write is about 12MB/sec (across the cluster).
I have also checked that the input.bytes is more or less same for all SS nodes (with ±20% from the mean value).
There are no severity 30/40 logs; there is very occasional severity 20 logs. I do see this kind of logs with moderate frequency on the lagging nodes:
<Event Severity="10" Time="1561563010.610610" Type="SSSlowTakeLock2" ID="6adfc187592bcea7" From="0000000000000000" Duration="0.236079" Version="6531103871665" Machine="10.191.15.15:4500" LogGroup="default" Roles="SS" />
Few things that may be interesting:
- Things were running fine on this cluster for around a couple of weeks (cluster had about 2TB of replicated data, and 850GB of unreplicated KV).
- A few days back, we scheduled a cleaner task, that deletes old data from the store. The “old” data is spread around randomly in the whole KV space. Since then, we have been seeing this peculiar behavior.
As I am running only one SS per disk, could I be saturating the disk with the load mentioned above? I see that we are doing about 1100-1300 write iops/sec
(not transaction or keys - but disk write iops) on an average per SS node.
Running following command shows that I am getting about 1200 iops
on the disk. I am using iodepth of 1, as I am assuming this is the correct way to simulate fdbserver’s write bandwidth (I’ve read here that it uses only one outstanding write request at a time).
Could this indicate that I am saturating the disk here? Even at such low write rates? And could “deleting” randomly distributed kv rows on an ongoing basis resulting in lesser output per io (due to emptier disk pages) - just shooting in dark here
fio --filename=/var/lib/ubuntu/fio.test --direct=1 --rw=randwrite --ioengine=libaio --bs=4K --iodepth=1 --name=bw-test --size=100M
ubuntu@platform6:~$ fio --filename=/var/lib/ubuntu/fio.test --direct=1 --rw=randwrite --ioengine=libaio --bs=4K --iodepth=1 --name=bw-test --size=100M
bw-test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/6093KB/0KB /s] [0/1523/0 iops] [eta 00m:00s]
bw-test: (groupid=0, jobs=1): err= 0: pid=2743: Wed Jun 26 16:13:49 2019
write: io=102400KB, bw=4611.8KB/s, iops=1152, runt= 22204msec
slat (usec): min=10, max=8326, avg=24.91, stdev=71.05
clat (usec): min=3, max=709228, avg=838.87, stdev=6296.14
lat (usec): min=313, max=709243, avg=864.18, stdev=6296.50
I will update the thread as I get more understanding, or if someone wants any other specific data points. Just wanted to put out all the observations I have so far to see if someone could spot anything!