We noticed one of our cluster has a set of storage server that are running at very high CPU (>90%) when the rest is somewhat normal.
When I started to look I noticed that one of the hotrange was for the system key. I wrote a small script to read \xff\xff/metrics/data_distribution_stats/
and found that this range:
b'\xff\x02/blog/\x93*\x18\t\xd9\xec\xc2NF\xfc\x899*\x95\\\xf8S\x00\x00\x07f\xc5'
has a shard_bytes_per_second of 30224KB for a shard_bytes of 2860KB.
It’s the busiest shard in our database.
I looked at this post: Sum of key-value sizes seems incorrect and there is nothing outstanding in the output of fdbbackup status
and cleanup
only wants to remove things that are a few minutes old.
What could be the reason for a very large blog
range ?