How to repair lagging process?

(Shgxwxl) #1

I got the message like this,some process is lagging,io 100%, and make cluster unhealthy

Replication health - HEALING: Only one replica remains of some data

Storage server lagging by 17879 seconds ( 4% cpu; 3% machine; 0.314 Gbps; 97% disk IO; 1.3 GB / 13.7 GB RAM )
Storage server lagging by 17969 seconds.

(Justin Lowery) #2

The exact solution varies slightly depending on your setup, though in common cases if the issue is strictly drive I/O as it appears, you will want to add another SSD to a server and start an FDB process, assigned to the ‘storage’ class, that uses the SSD.

That is a very long time for the storage process to be lagging behind. What kind of hardware and setup is this? What was the workload on FDB when this occurred?

Assuming you are using a healthy SSD: Do you have control over the non-FDB process using up the available I/O? If you are on a cloud provider and a shared server, you might want to consider moving to a dedicated instance, or otherwise using in-memory storage.

(Shgxwxl) #3

when add another SSD,could cluster repair the old process automaticly?
or we need to operate something to make the process become healthy
we are not using SSD,we use
6 machines which has
4TB SATA *10
and try to write 2TB data to the cluster

Redundancy mode - double
Storage engine - ssd-2
Coordinators - 5

FoundationDB processes - 54 (less 0 excluded; 1 with errors)
Machines - 6
Memory availability - 8.0 GB per process on machine with least available
Retransmissions rate - 1 Hz
Fault Tolerance - 1 machine
Server time - 05/29/18 18:48:06

Replication health - Healthy (Repartitioning.)
Moving data - 0.265 GB
Sum of key-value sizes - 25.295 GB
Disk space used - 64.441 GB

Operating space:
Storage server - 2650.4 GB free on most full server
Log server - 2650.4 GB free on most full server

Read rate - 4 Hz
Write rate - 1930 Hz
Transactions started - 4 Hz
Transactions committed - 4 Hz
Conflict rate - 0 Hz

(Justin Lowery) #4

Yes, the cluster should repair itself automatically.

I am guessing that you are saying there are 10 4TB drives in an array in total, or are you stating a raid configuration? Ether way, if nothing else is using the disks, the configuration with spinning HDDs might not be able to reach the needed IOPS for that workload, as FDB is intended to be used with SSDs.

Also, I see that your log server processes and storage processes appear to use the same storage array. It may be difficult to do with your setup, though it is good practice performance-wise to separate those.

(Shgxwxl) #5

thanks for you help
if we have ssd and sata both ,make which one to use ssd may become better ? log server processes or storage processes
now we use disk without raid
every process run in a standlone disk

(Justin Lowery) #6

I’m not sure exactly. I would try both, starting with the SSD as a storage process’s device, and see what works best in your setup.

In general, HDDs aren’t recommended and I would imagine that only some types of RAID configurations with striping might allow for the needed I/O. I am unsure if multiple single-disk storage processes used together in a cluster would be fast enough for proper operation.

(Shgxwxl) #7

after long time batch writing,the fdbcluster seems to be broken
will cluster recover automaticly
we stop writing and wait 2 hours, the cluster still unavailable

Using cluster file `fdb.cluster’.

Unable to read database configuration.

Redundancy mode - unknown
Storage engine - unknown
Coordinators - unknown

FoundationDB processes - 56
Machines - 6
Memory availability - 7.3 GB per process on machine with least available
Retransmissions rate - 3 Hz
Server time - 05/30/18 16:14:04

Replication health - unknown
Moving data - unknown
Sum of key-value sizes - unknown
Disk space used - unknown

(Justin Lowery) #8

This is caused by non-existant configuration entries within the system subspace, \xFF, as you can see here.

I doubt that it is possible for FDB to repair itself from here based on your prior setup info and this error message.

You may want to try getting started with storing only a subset of your data, using SSDs or in-memory storage.

Otherwise, I imagine that you would want to ensure that your disks can keep up with the write load, and limit writes on the client-side based on your write speeds, as FDB is optimized for use with SSDs, which tend to write at least 4x faster than conventional HDDs.

Edit: I should add that you might be able to make something like I suggested by parsing the load counters of the status from the value of the \xff\xff/status/json key. You might then correlate disk I/O with the status metrics. Just throwing this out there in case you have to make it work with HDDs. It’s just a guess.

(Shgxwxl) #9

Edit: \xff\xff/status/json may be very helpful, thanks

thanks for your suggestion, I use ssd to test already,but something unexpected still happen
I noticed that when cluster’s status become moving data, the writing speed will become slow down,
and cluster will do expensive works for repartitioning
if we continue to write,the moving data will become larger,and can‘'t be reduced when writing
I guess that we need to limit the write speed by the cluster’s status

Replication health - Healthy (Repartitioning.)
Moving data - 36.238 GB
Sum of key-value sizes - 98.980 GB
Disk space used - 298.678 GB

but how can I get this informations, should fdb c bingdings need to expose cluster’s healthy condition to help client to ajust their works?

some code in fdbserver/

          if( averagePartitionSize >= 0 ) {
                StatusObject moving_data;
                moving_data["in_queue_bytes"] = partitionsInQueue * averagePartitionSize;
                moving_data["in_flight_bytes"] = partitionsInFlight * averagePartitionSize;
                moving_data["total_written_bytes"] = totalBytesWritten;

                // TODO: moving_data["rate_bytes"] = makeCounter(hz, c, r);
                statusObjData["moving_data"] = moving_data;

                statusObjData["average_partition_size_bytes"] = averagePartitionSize;

            if (highestPriority >= PRIORITY_TEAM_0_LEFT) {
                stateSectionObj["healthy"] = false;
                stateSectionObj["name"] = "missing_data";
                stateSectionObj["description"] = "No replicas remain of some data";
                stateSectionObj["min_replicas_remaining"] = 0;
                *minReplicasRemaining = 0;

some code in fdbclient/

Future StatusClient::statusFetcher( Reference clusterFile ) {
return statusFetcherImpl(clusterFile);

(Justin Lowery) #10

Parse the JSON-encoded status that I suggested in the previous post and then check if == "healthy_repartitioning".

It’s really as simple as that for detecting the state. The other counters for that are stored in