After a storage node is down for some time and comes up, what criteria does FDB use to decide whether to bring its data up to date and continue to use them, OR to discard its existing data and repopulate them from other nodes?
- Is it the time elapsed when the node is down? Say, if it is down for 10 mins or more.
- Is it the amount of data changed during the downtime? Say, if more than N MB of transaction data has occurred.
- Is it related to TX log size? Whether it can bring the node up to date by the data kept in TX log.
- Something else.
- Or a combination of these factors?
Are there config parameters we can use to tune the threshold?
We find that nodes (Kubernetes pods) in our fdb clusters have downtime quite often.
Sometimes it’s involuntary, like the underlying physical host has hardware problems.
Other times it’s voluntary, e.g., fdb version upgrade, or mandatory quarterly host OS update in secured env where the update and reboot can take 5 to 10 mins.Understanding the exact criteria will help our fdb operations significantly.
Thank you.