Three_data_hall vs multi_dc

stefanvasilic4 · January 23, 2024, 4:13pm

Hello,

While running some performance tests on three_data_hall and multi_dc setup , i get results to be 10 times slower in multi_dc scenario, for loading and reading. All storage and logs count being aprox the same. Increasing number of logs, proxies, log routers do not have an impact on performance ; only reducing nr of storage servers seems to improve the numbers a little bit (nothing significant) . Both scenarios use triple replication, and reside in one k8s across 3 namespaces mapped to 3 AZ (from cloud provider). Is there a logical justification for this type of behavior ?

stefanvasilic4 · February 19, 2024, 1:55am

johscheuer · February 19, 2024, 3:05pm

Could you share the multi_dc setup? I assume that one dc is mapped to one AZ? I’m not an expert with the three_data_hall setup but there are some subtle differences between the three_data_hall and the multi_dc setup:

In three_data_hall there will be 3 replicas of a storage team, one per data hall.
In multi_dc there will be 6 replicas of a storage team, 3 per dc (primary and remote), spread across the fault domains.
In three_data_hall a commit is replicated 4 times, having 2 replicas per data hall. A commit has to wait until the data is persisted on all 4 log servers.
In multi_dc there will be 3 replicas in the main dc + and with one_satellite_double there will be 2 replicas in the satellite. The mutations for the remote side are then fetched from the satellite (adding some additional load). A commit has to wait until all 5 log servers have persisted the data.

When a client reads the data from the remote side it can happen that the storage server in the remote side has to wait until the according version is available locally. That means the log routers in the remote side must have fetched the new version/mutations from the satellite.

I simplified a few steps but the ideas should be correct. Have you checked what is the limiting factor? e.g. is the RateKeeper throttling, and if yes, what is the reason for throttling? It would also be great to know the read/write ratio + if reads are local or not.

Topic		Replies	Views
Multi DC replication fails during DR test Kubernetes Operator operator	16	596	May 29, 2024
Single 'hot' storage nodes in three_data_hall mode? Using FoundationDB	0	77	June 8, 2024
Three_data_hall coordinators Kubernetes Operator	6	280	November 21, 2023
Operator release 1.27 supports three_data_hall Kubernetes Operator	0	256	October 24, 2023
Optimal configuration for more than 3 DCs Using FoundationDB	7	1315	July 15, 2019

Three_data_hall vs multi_dc

Related topics