StorageServerFailed: io_timeout

This error occurred after I upgraded FDB from 6.0.15 to 6.3.23, the status show all the process had joined the cluster, and these processes has error.
I suspect that the IO performance of the disk may be too poor, so I restart the CVMs with errors one by one, and then the cluster returns to healthy, and “fully_recovered” appears in the trace event .But when I restart the whole cluster, these errors appear again。
machine: 97 16C32GB
processes: 194
kv sizes: 11.777 TB
what’s wrong with the cluster? How should I check?

6.2.30 included this change: throw an io_timeout if it takes more than 2 minutes to commit on the storage server by sfc-gh-etschannen · Pull Request #4243 · apple/foundationdb · GitHub

Try bumping the timeout knob to see if that’s what you’re hitting?

Thanks. Can I set the variable(max_storage_commit_time) to a large value by modifying the configuration parameters? The initial value of max_storage_commit_time is 120.

Yup, that’s the knob I set on my clusters.

How to set the knob? which parameter in the /etc/foundationdb/foundationdb.conf?

You should be able to do something like the following in your foundationdb.conf

[fdbserver.4500]
knob_max_storage_commit_time = 300

Thanks a lot. It works