Fault Tolerance - 0 zones after setting locality_zoneid

ex3ndr · May 18, 2021, 7:17pm

For some reason just after i added to each machine locality_zoneid it started to show fault tolerance 0 zones.

My setup:

triple SSD
5 coordinators each in separate zone
3 machines with 2 processes each in 3 zones all have correct zone id set.

What I am missing why I can’t lose any zone?

ex3ndr · May 18, 2021, 8:14pm

I switched everything to locality_data_hall and got “Fault tolerance: 0 machines”, switched to three_data_hall still same. What could be wrong?

ex3ndr · May 18, 2021, 8:21pm

This is my status json: { "client" : { "cluster_file" : { "path" : "/etc/foun - Pastebin.com

ajbeamon · May 18, 2021, 8:53pm

A triple replicated cluster requires three transaction logs in different zones in order to be available. If you have only three zones in your cluster, then the loss of any one zone would mean that your cluster is unavailable. As a result, your fault tolerance is zero.

If you add two additional zones, then your fault tolerance should be the maximum value of two. You could also configure into a double replicated cluster and have the maximum fault tolerance there of one.

See Configuration — FoundationDB 6.3 for more information.

ex3ndr · May 18, 2021, 9:32pm

Thank you! I will try to convert one storage process to log and check if it would work.

But! In docs you linked there are no mention that triple replication requires logs in three zones to make it work. I mean, why fdbcli not mentioning that this is about avaliability, not data integrity? Why with triple replication losing a single log server causes data loss then? Is this a true?

UPD: Yep, i added new log process and it still zero fault tolerance
UPD2: This is my fdbtop output:

ip              port    cpu%  mem%  iops    net    class        roles                                                                                                       
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.100    4500    14    35    2799    55     storage      storage                                                                                                    
                 4501    3     34    2795    2      storage      storage                                                                                                    
                 4502    11    33    2798    63     storage      storage                                                                                                    
                 4503    16    34    2793    88     storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.101    4500    4     34    3841    8      storage      storage                                                                                                    
                 4501    12    34    3842    52     storage      storage                                                                                                    
                 4502    9     34    3841    59     storage      storage                                                                                                    
                 4503    8     34    3841    5      storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.119    4500    8     35    1623    1      storage      storage                                                                                                    
                 4501    22    4     1613    8      log          log                                                                                                        
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.120    4500    22    35    6700    67     storage      storage                                                                                                    
                 4501    16    33    6877    92     storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.121    4500    20    35    3932    70     storage      storage                                                                                                    
                 4501    2     14    3933    0      storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.14     4500    15    34    3651    85     storage      storage                                                                                                    
                 4501    4     34    3651    3      storage      storage                                                                                                    
                 4502    5     34    3657    9      storage      storage                                                                                                    
                 4503    12    34    3648    47     storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.48     4500    16    35    1317    54     storage      storage                                                                                                    
                 4501    17    4     1317    7      log          log                                                                                                        
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.50     4500    24    41    5463    117    storage      storage                                                                                                    
                 4501    18    6     5503    7      log          log                                                                                                        
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.88     4500    43    35    8461    238    storage      storage                                                                                                    
                 4501    39    35    8352    72     log          log,storage                                                                                                
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.89     4500    13    34    1305    3      storage      storage                                                                                                    
                 4501    3     25    1305    0      storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.90     4500    7     34    1032    2      storage      storage                                                                                                    
                 4501    2     9     1038    1      storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.91     4500    25    34    5706    96     storage      storage                                                                                                    
                 4501    3     34    5737    1      storage      storage                                                                                                    
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.94     4500    4     5     -       5      stateless    cluster_controller                                                                                         
                 4501    1     4     -       0      stateless    resolver                                                                                                   
                 4502    1     4     -       1      stateless    resolver                                                                                                   
                 4503    4     4     -       2      stateless    data_distributor,master,ratekeeper                                                                         
--------------  ------  ----  ----  ------  -----  -----------  ------------------------------------                                                                        
 10.138.0.99     4500    33    6     -       23     stateless    proxy                                                                                                      
                 4501    31    4     -       22     stateless    proxy                                                                                                      
                 4502    32    5     -       22     stateless    proxy                                                                                                      
                 4503    1     4     -       0      stateless    resolver

ajbeamon · May 18, 2021, 10:55pm

Can you send a copy of your status json output with the 4th log (and preferably running in triple redundancy again)?

ex3ndr · May 18, 2021, 11:38pm

Meanwhile i upgraded cluster to 6.3 and it surfaced some stats about logs and they looks ok too.

FDB 6.3.12, triple replication, 4 logs: { "client" : { "cluster_file" : { "path" : "/etc/foun - Pastebin.com

alexmiller · May 19, 2021, 3:37am

I wonder if this maybe has something to do with the coordinators not being a part of the cluster, and thus maybe the fault tolerance calculation is missing the locality information?

It looks like the code assumes that it can fetch the correct zone information about coordinators from the list of workers, but if the coordinator isn’t a worker, then it would get a default created empty string as the zone, and then it’d look like all coordinators are in the same (empty) zone?

ajbeamon · May 19, 2021, 4:26pm

That seems very plausible, good catch. I’ve created an issue to address this problem: Coordinator fault tolerance calculation depends on coordinators being part of the cluster · Issue #4833 · apple/foundationdb · GitHub.

I’m not sure that there’s a great way to work around this problem except to have your coordinators be part of the cluster for now.

Topic		Replies	Views
Fault Tolerance changes from "2 machines" to "0 machines (2 without data loss)" Using FoundationDB	1	569	June 5, 2019
Max Tolerable Zone Failures for Availability and Data Using FoundationDB	1	601	June 24, 2020
Three data hall and zone ID Running FoundationDB	1	894	July 31, 2020
Ideal setup for Fault Tolerance = 3 in Triple mode Using FoundationDB	2	1196	July 15, 2019
6.0.18 Reporting incorrect Fault Tolerance via fdbcli with triple redundancy mode? Using FoundationDB	14	1959	April 6, 2019

Fault Tolerance - 0 zones after setting locality_zoneid

Related topics