Dual stack cluster configuration

pmalamy · March 23, 2022, 11:04am

Hi!

We flawlessly run several FoundationDB clusters on IPv6 only network with TLS. One of our future workloads will involve IPv4 only clients as well which would imply those clusters to support dual stack IP.

Note: in the following examples, IPv6 documentation prefix and IPv4 RF1918 are used on purpose.

Our current IPv6 only configuration looks like the following:

foundationdb.conf (needless to say that locality_machineid is different on each member of a cluster):

[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
cluster_file = /etc/foundationdb/fdb.cluster

[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID:tls
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb/fdbserver
locality_machineid = hostN.fqdn.tld
locality_dcid = gv2
trace_format = json
tls_certificate_file = /etc/foundationdb/ssl/cert.pem
tls_key_file = /etc/foundationdb/ssl/key.pem
tls_verify_peers = <redacted>
tls_password = 
tls_ca_file = /etc/foundationdb/ssl/ca.pem
knob_tls_cert_refresh_delay_seconds = 3600

[fdbserver.4600]

fdb.cluster

store_fdb_pp_gv2:TWwLTw1STKgEnix1p3jAadSZWDoKLIof@[2001:0db8::41d:18ff:fe00:d27]:4600:tls,[2001:0db8::47c:aaff:fe00:13ed]:4600:tls,[2001:0db8::4a5:38ff:fe00:1313]:4600:tls,[2001:0db8::4c4:66ff:fe00:46]:4600:tls,[2001:0db8::4cd:5eff:fe00:124d]:4600:tls

Detailed status output of such cluster:

$ fdbcli --exec 'status details'
[...]
Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 5
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 5
  Zones                  - 5
  Machines               - 5
[...]
Data:
  Replication health     - Healthy
[...]
Process performance details: 
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls (<redacted>) 
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls (<redacted>) 
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls (<redacted>) 
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls (<redacted>) 
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls (<redacted>) 

Coordination servers:
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls  (reachable)
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls  (reachable)
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls  (reachable)
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls  (reachable)
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls  (reachable)

In an attempt to make a cluster dual stacked, we changed the configuration file foundationdb.conf in order to specify IPv6 and ipv4 addresses instead of using auto and public, not setting TLS on IPv4 on purpose:

 [fdbserver]
 command = /usr/sbin/fdbserver
-public_address = auto:$ID:tls
-listen_address = public
+public_address = [2001:0db8::41d:18ff:fe00:d27]:$ID:tls,172.20.9.17:$ID
+listen_address = [2001:0db8::41d:18ff:fe00:d27]:$ID:tls,172.20.9.17:$ID
 datadir = /var/lib/foundationdb/data/$ID
 logdir = /var/log/foundationdb/fdbserver
 locality_machineid = hostN.fqdn.tld

and we updated the fdb.cluster file accordingly:

store_fdb_pp_gv2:TsmKlq7cqUsuTkbbfFTDjeG4lBJRPHpj@172.20.8.18:4600,172.20.9.17:4600,172.20.9.183:4600,172.20.9.233:4600,172.20.10.236:4600,[2001:0db8::41d:18ff:fe00:d27]:4600:tls,[2001:0db8::47c:aaff:fe00:13ed]:4600:tls,[2001:0db8::4a5:38ff:fe00:1313]:4600:tls,[2001:0db8::4c4:66ff:fe00:46]:4600:tls,[2001:0db8::4cd:5eff:fe00:124d]:4600:tls

Those changes made on each member of the cluster with their very own IP addresses, the cluster starts and seems to run:

Detailed status output of such cluster:

Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 10
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 5
  Zones                  - 5
  Machines               - 5
[...]
Data:
  Replication health     - Healthy
[...]
Process performance details:
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls (<redacted>)
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls (<redacted>)
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls (<redacted>)
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls (<redacted>)
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls (<redacted>)

Coordination servers:
  172.20.8.18:4600  (reachable)
  172.20.9.17:4600  (reachable)
  172.20.9.183:4600  (reachable)
  172.20.9.233:4600  (reachable)
  172.20.10.236:4600  (reachable)
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls  (reachable)
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls  (reachable)
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls  (reachable)
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls  (reachable)
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls  (reachable)

Although it seems to work, I think something is not right. First, the coordinators number has doubled from 5 to 10. Then, the processes are IPv6 only. I can paste/upload output of status json if needed.

I would like to know if dual stack FDB cluster configuration can be done, if we are heading in the right direction to achieve it or completely lost I welcome all recommendations related the topic!

Thanks!

pmalamy · March 25, 2022, 2:49pm

Update on our dual stack experiments.
First, I realized that I have forgotten to write that we currently run version 6.3.23 of FoundationDB binaries.

Second, with the configuration posted above, even though the coordinators quorum is reached, all of them being reachable and the database considered healthy, I’ve noticed that the Fault Tolerance is decreased from 2 zones to 0. Meaning that the redundancy is not assured anymore.

I also tried to fiddle with the public_address and listen_address parameters (like using [::]:$ID:tls among other tests), but did not succeed in obtaining a satisfying state. Even though fdbserver process is bound to the two IP:PORT declared in configuration, the FDB processes are still only accessible via IPv6.

ss -lpent '( sport = :4600 )'
State       Recv-Q      Send-Q                                     Local Address:Port             Peer Address:Port      Process                                                                                                                  
LISTEN      0           128                                          172.20.9.17:4600                  0.0.0.0:*          users:(("fdbserver",pid=144018,fd=14)) uid:1425 ino:139280706 sk:21 <->                                                 
LISTEN      0           128               [2001:0db8::41d:18ff:fe00:d27]:4600                     [::]:*          users:(("fdbserver",pid=144018,fd=15)) uid:1425 ino:139280708 sk:23 v6only:1 <->

I really wonder if there is a way to obtain a cluster that would be fully dual stack and if not, what would be the needed changes to be done on the source side.

Any ideas?

pyr · April 4, 2022, 1:42pm

I am running into the same issue, and would be interested in any developments here

Topic		Replies	Views
Issues with V6.2 TLS Cluster Using FoundationDB	11	1602	January 28, 2020
(Solved) Correct setup of TLS for FoundationDB Using FoundationDB	7	3447	December 3, 2020
Set up fdb to use TLS failure with N2_ConnectError Using FoundationDB	8	1024	November 23, 2021
Issue with TLS in foundationdb docker container Using FoundationDB	4	607	January 5, 2021
TLS Mixed Cluster v7.3 Running FoundationDB	0	58	August 20, 2024

Dual stack cluster configuration

Related topics