2 region 3 DC config issue

binzhangebay · May 3, 2019, 10:09pm

I’m trying to config a 3 DC, 2 region cluster.

Region A ( write ) = DC1 and DC2
Region B = DC3

After i do some data load test, i can find data file / log file is large on DC1 and DC2. But DC3 's data file and log file size is very very small. so i suspect data is not replicated to region B.

Below is my config. can you please suggest where 's wrong?

VM                  datacenter_id      $ID  class    $ID  class    $ID  class     $ID   class
----------------------------- --------- ------ --------- ----------- --------------- --------------------
pgc3lvsprojecta01-555694.stratus.   dc1    4500 stateless 4501 storage  4502 storage   4505 transaction 
pgc3lvsprojecta02-555705.stratus.   dc1    4500 stateless 4501 storage  4502 storage   4505 transaction 
pgc3lvsprojecta03-555706.stratus.   dc1    4500 stateless 4503 transaction  4505 stateless 
pgc3lvsprojecta04-555707.stratus.   dc2    4500 stateless 4503 transaction  4505 stateless 
c3lvspgdbbztest00-357425.stratus.   dc2    4500 stateless 4503 transaction  4505 stateless 
c3lvspgdbbztest01-357362.stratus.    dc3    4500 stateless 4501 storage  4502 storage 
c3lvspgdbbztest02-357373.stratus.    dc3    4500 stateless 4501 storage  4502 storage 
c3lvspgdbbztest03-357424.stratus.    dc3    4500 stateless 4503 transaction  4505 stateless

Sample conf file on one machine, other machine’s difference is at datacenter_id and [fdbserver.$ID]

[root@pgc3lvsprojecta01-555694 foundationdb]# cat foundationdb.conf

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
## by default, restart_backoff = restart_delay_reset_interval = restart_delay
# initial_restart_delay = 0
# restart_backoff = 60
# restart_delay_reset_interval = 60
cluster_file = /etc/foundationdb/fdb.cluster
# delete_envvars =
# kill_on_configuration_change = true

## Default parameters for individual fdbserver processes
[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb
# logsize = 10MiB
# maxlogssize = 100MiB
# machine_id =
datacenter_id = dc1
# class =
# memory = 8GiB
# storage_memory = 1GiB
# metrics_cluster =
# metrics_prefix =

## An individual fdbserver process with id 4500
## Parameters set here override defaults from the [fdbserver] section
[fdbserver.4500]
class = stateless
[fdbserver.4501]
class = storage
[fdbserver.4502]
class = storage
[fdbserver.4505]
class = transaction
[backup_agent]
command = /usr/lib/foundationdb/backup_agent/backup_agent
logdir = /var/log/foundationdb

[backup_agent.1]

fdb> status

Using cluster file `fdb.cluster'.

Configuration:
  Redundancy mode        - double
  Storage engine         - ssd-2
  Coordinators           - 5
  Desired Proxies        - 3

Cluster:
  FoundationDB processes - 28
  Machines               - 8
  Memory availability    - 5.4 GB per process on machine with least available
  Fault Tolerance        - 1 machine
  Server time            - 05/03/19 15:03:40

Data:
  Replication health     - Healthy (Rebalancing)
  Moving data            - 0.000 GB
  Sum of key-value sizes - 818 MB
  Disk space used        - 2.383 GB

  "cluster" : {
        "clients" : {
            "count" : 0
        },
        "cluster_controller_timestamp" : 1556921091,
        "configuration" : {
            "coordinators_count" : 5,
            "excluded_servers" : [
            ],
            "proxies" : 3,
            "redundancy_mode" : "double",
            "regions" : [
                {
                    "datacenters" : [
                        {
                            "id" : "dc1",
                            "priority" : 1
                        },
                        {
                            "id" : "dc2",
                            "priority" : 0,
                            "satellite" : 1
                        }
                    ],
                    "satellite_redundancy_mode" : "one_satellite_double"
                },
                {
                    "datacenters" : [
                        {
                            "id" : "dc3",
                            "priority" : 0
                        }
                    ]
                }
            ],
            "storage_engine" : "ssd-2",
            "usable_regions" : 1
        },

machine c3lvspgdbbztest03-357424 belong to dc3, but it storage server process’s data dir don’t have any data file.

[root@c3lvspgdbbztest03-357424 foundationdb]# ps -ef |grep class
foundat+ 15776 15774  1 10:30 ?        00:03:30 /usr/sbin/fdbserver --class stateless --cluster_file /etc/foundationdb/fdb.cluster --datacenter_id dc3 --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --public_address auto:4500
foundat+ 15777 15774  0 10:30 ?        00:02:06 /usr/sbin/fdbserver --class transaction --cluster_file /etc/foundationdb/fdb.cluster --datacenter_id dc3 --datadir /var/lib/foundationdb/data/4503 --listen_address public --logdir /var/log/foundationdb --public_address auto:4503
foundat+ 15778 15774  1 10:30 ?        00:03:08 /usr/sbin/fdbserver --class stateless --cluster_file /etc/foundationdb/fdb.cluster --datacenter_id dc3 --datadir /var/lib/foundationdb/data/4505 --listen_address public --logdir /var/log/foundationdb --public_address auto:4505
root     19772 15256  0 15:06 pts/1    00:00:00 grep --color=auto class
 
[root@c3lvspgdbbztest03-357424 foundationdb]# ls -l /var/lib/foundationdb/data/4503
total 8
-rw-r--r-- 1 foundationdb foundationdb   11 May  3 11:22 fitness
-rw---S--- 1 foundationdb foundationdb 4096 May  3 10:30 processId
[root@c3lvspgdbbztest03-357424 foundationdb]#

alexmiller · May 3, 2019, 11:45pm

Changing the usable_regions configuration:

The usable_regions configuration option determines the number of regions which have a replica of the database.
Increasing the usable_regions will start copying data from the active region to the remote region. Reducing the usable_regions will immediately drop the replicas in the remote region. During these changes, only one primary datacenter can have priority >= 0. This enforces exactly which region will lose its replica.

So you probably just need to configure usable_regions:=2

binzhangebay · May 4, 2019, 1:00am

fdb> configure usable_regions:=2
ERROR: When changing usable_regions, only one region can have priority >= 0
Type `configure FORCE <TOKEN>*' to configure without this check
fdb>

this is our region config json:

 {
"regions":[{
    "datacenters":[{
        "id":"dc1",
        "priority":1
    },{
        "id":"dc2",
        "priority":0,
        "satellite":1
    }],
    "satellite_redundancy_mode":"one_satellite_double"
    },
    {
    "datacenters":[{
        "id":"dc3",
        "priority":0
       }]
    }]
}

alexmiller · May 4, 2019, 1:35am

Ah, I was letting you be risky, but fdbcli won’t by default.

It wants you to set dc3 to have a priority of -1, and then configure usable_regions to 2, so that the system won’t think that if dc1 fails, it could recover to dc3.

Similarly, if you wanted to go to a one region configuration, you’d first need to change the priorities to make it explicit which region you want the primary to be, and then configure usable_regions to 1.

binzhangebay · May 4, 2019, 3:25am

i changed piority=-1 for dc3, and is able to set usable_regions=2; then i can see that data is auto streamed to region 2(dc3).

We have 3 DC, set-up 2 region. Region A include dc1 & dc2, region B include dc3. our purpose is, if either dc1 or dc2 power down, region B(dc3) become write dc/region. When dc1/dc2 power on, cluster auto swichover write node to region A(dc1 & dc2). i will do more test on this layout.

appreciate your promptly help.

alexmiller · May 4, 2019, 6:48am

Once your data is replicated to the secondary, you’ll want to switch the priority back to a non-negative number, otherwise the cluster won’t switch over when the primary dies.

There’s a DatacenterVersionDifference trace event that you can use to see the version difference between the primary and secondary. If it’s under 5,000,000, then you’re probably caught up.

Topic		Replies	Views
Swap the roles of DC2 and DC3 in a 2-region, 3-DC cluster Using FoundationDB	0	474	May 28, 2020
Simulating FDB data center failure Using FoundationDB performance	4	976	December 9, 2019
How to Configure region's in FoundationDB? Using FoundationDB performance	1	1601	October 24, 2019
Optimal configuration for more than 3 DCs Using FoundationDB	7	1330	July 15, 2019
Multi-region configuration - spreading copies inside a region across AZs Using FoundationDB	3	1467	December 18, 2018

2 region 3 DC config issue

Related topics