Troubleshooting queue build up

alexmiller · June 27, 2019, 3:11am

To point out other things that could cause this sort of problem:

Your clients could be read hotspot’ing that storage server. Reads are prioritized higher than writes, so too many reads can cause a storage server to lag behind. Be skeptical of this if you’re at 100% CPU usage.
You could have a noisy neighbor on that one machine
FDB will split shards based on write bandwidth, but I think it actually doesn’t try to then distribute those write-bandwidth shards well. It’s possible that your one storage server is a part of multiple teams that were assigned write-hot shards, and excluding it would force a shard re-assignment that would resolve the issue.
A cleaner job doing clear ranges could cause a lot of deferred work. I don’t think we’ve seen FDB6 have saturation issues from a large clear range, but it’s possible it’s AWS/EBS specific? Large clear range performance
Data distribution can sometimes cause bad performance effects if it’s too aggressive. I’ve heard other people on EBS needing to dial this down, sometimes.

One storage server only doing an upper limit of ~6MB/s of write sounds reasonable to me. (Lookin’ forward to that faster storage engine, @stevedhams )

Topic		Replies	Views
How to troubleshoot throughput performance degrade? Using FoundationDB performance	35	4354	June 20, 2019
Storage queue limiting performance when initially loading data Using FoundationDB	10	2760	October 14, 2019
Understand FDB read/write with disk IOPS/throughtput/Blocksize Using FoundationDB	3	878	August 21, 2024
Scaling issues with FDB for write throughput Running FoundationDB	6	1845	September 14, 2020
Storage Server CPU bottleneck - Growing data lag Using FoundationDB performance	22	3048	December 13, 2021

Troubleshooting queue build up

Related topics