Hi @mengxu, thanks for your reply. I looked at the trace events for that lagging process, it doesn’t have “BgDDMountainChopper” nor “BgDDValleyFiller” event at all. If I filter on Severity="20"
, these are all existing event types FetchPast
, N2_ReadError
, N2_ReadProbeError
, ShardServerFutureVersion1000x
, SlowSSLoopx100
, `StorageServerUpdateLag,.
To provide more context, we implemented expiry logic mainly based on this approach, but have 4 different kind of auxiliary keys. During performance testing, even though all keys generated from the client side are totally random, on API side we add 4 static prefixes. Would that be the reason the workload tends to fall on a few storage processes because all keys have only a few common prefixes? (Is there a trace event to see what (key, value) get routed to which process?)
iostat output: (we combine 4 physical ssds by RAID 0 and mount it as md0)
ubuntu@dev-ray-trb-fdb087.us-e:/dev$ iostat --human
Linux 4.15.0-1021-aws (dev-ray-trb-fdb087) 06/11/19 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
24.9% 0.0% 14.6% 0.1% 0.0% 60.3%
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
loop0 0.02 0.0k 0.0k 10.5M 0.0k
loop1 0.01 0.0k 0.0k 3.5M 0.0k
loop2 0.03 0.0k 0.0k 12.3M 0.0k
loop3 0.01 0.0k 0.0k 3.7M 0.0k
xvda 0.57 1.2k 12.4k 457.9M 4.8G
nvme0n1 33641.35 54.4M 161.2M 21.0T 62.3T
nvme1n1 33486.13 54.2M 156.9M 21.0T 60.7T
nvme2n1 33335.21 54.1M 150.0M 20.9T 58.0T
nvme3n1 33238.43 54.0M 150.0M 20.9T 58.0T
md0 210527.52 216.7M 619.0M 83.8T 239.4T
If above is true, do you have other suggestion to implement expiry logic in FoundationDB?