Dual stack cluster configuration

Hi!

We flawlessly run several FoundationDB clusters on IPv6 only network with TLS. One of our future workloads will involve IPv4 only clients as well which would imply those clusters to support dual stack IP.

Note: in the following examples, IPv6 documentation prefix and IPv4 RF1918 are used on purpose.

Our current IPv6 only configuration looks like the following:

  • foundationdb.conf (needless to say that locality_machineid is different on each member of a cluster):
[fdbmonitor]
user = foundationdb
group = foundationdb

[general]
restart_delay = 60
cluster_file = /etc/foundationdb/fdb.cluster

[fdbserver]
command = /usr/sbin/fdbserver
public_address = auto:$ID:tls
listen_address = public
datadir = /var/lib/foundationdb/data/$ID
logdir = /var/log/foundationdb/fdbserver
locality_machineid = hostN.fqdn.tld
locality_dcid = gv2
trace_format = json
tls_certificate_file = /etc/foundationdb/ssl/cert.pem
tls_key_file = /etc/foundationdb/ssl/key.pem
tls_verify_peers = <redacted>
tls_password = 
tls_ca_file = /etc/foundationdb/ssl/ca.pem
knob_tls_cert_refresh_delay_seconds = 3600

[fdbserver.4600]
  • fdb.cluster
store_fdb_pp_gv2:TWwLTw1STKgEnix1p3jAadSZWDoKLIof@[2001:0db8::41d:18ff:fe00:d27]:4600:tls,[2001:0db8::47c:aaff:fe00:13ed]:4600:tls,[2001:0db8::4a5:38ff:fe00:1313]:4600:tls,[2001:0db8::4c4:66ff:fe00:46]:4600:tls,[2001:0db8::4cd:5eff:fe00:124d]:4600:tls
  • Detailed status output of such cluster:
$ fdbcli --exec 'status details'
[...]
Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 5
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 5
  Zones                  - 5
  Machines               - 5
[...]
Data:
  Replication health     - Healthy
[...]
Process performance details: 
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls (<redacted>) 
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls (<redacted>) 
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls (<redacted>) 
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls (<redacted>) 
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls (<redacted>) 

Coordination servers:
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls  (reachable)
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls  (reachable)
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls  (reachable)
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls  (reachable)
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls  (reachable)

In an attempt to make a cluster dual stacked, we changed the configuration file foundationdb.conf in order to specify IPv6 and ipv4 addresses instead of using auto and public, not setting TLS on IPv4 on purpose:

 [fdbserver]
 command = /usr/sbin/fdbserver
-public_address = auto:$ID:tls
-listen_address = public
+public_address = [2001:0db8::41d:18ff:fe00:d27]:$ID:tls,172.20.9.17:$ID
+listen_address = [2001:0db8::41d:18ff:fe00:d27]:$ID:tls,172.20.9.17:$ID
 datadir = /var/lib/foundationdb/data/$ID
 logdir = /var/log/foundationdb/fdbserver
 locality_machineid = hostN.fqdn.tld

and we updated the fdb.cluster file accordingly:

store_fdb_pp_gv2:TsmKlq7cqUsuTkbbfFTDjeG4lBJRPHpj@172.20.8.18:4600,172.20.9.17:4600,172.20.9.183:4600,172.20.9.233:4600,172.20.10.236:4600,[2001:0db8::41d:18ff:fe00:d27]:4600:tls,[2001:0db8::47c:aaff:fe00:13ed]:4600:tls,[2001:0db8::4a5:38ff:fe00:1313]:4600:tls,[2001:0db8::4c4:66ff:fe00:46]:4600:tls,[2001:0db8::4cd:5eff:fe00:124d]:4600:tls

Those changes made on each member of the cluster with their very own IP addresses, the cluster starts and seems to run:

  • Detailed status output of such cluster:
Configuration:
  Redundancy mode        - triple
  Storage engine         - ssd-2
  Coordinators           - 10
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 5
  Zones                  - 5
  Machines               - 5
[...]
Data:
  Replication health     - Healthy
[...]
Process performance details:
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls (<redacted>)
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls (<redacted>)
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls (<redacted>)
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls (<redacted>)
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls (<redacted>)

Coordination servers:
  172.20.8.18:4600  (reachable)
  172.20.9.17:4600  (reachable)
  172.20.9.183:4600  (reachable)
  172.20.9.233:4600  (reachable)
  172.20.10.236:4600  (reachable)
  [2001:0db8::41d:18ff:fe00:d27]:4600:tls  (reachable)
  [2001:0db8::47c:aaff:fe00:13ed]:4600:tls  (reachable)
  [2001:0db8::4a5:38ff:fe00:1313]:4600:tls  (reachable)
  [2001:0db8::4c4:66ff:fe00:46]:4600:tls  (reachable)
  [2001:0db8::4cd:5eff:fe00:124d]:4600:tls  (reachable)

Although it seems to work, I think something is not right. First, the coordinators number has doubled from 5 to 10. Then, the processes are IPv6 only. I can paste/upload output of status json if needed.

I would like to know if dual stack FDB cluster configuration can be done, if we are heading in the right direction to achieve it or completely lost :slight_smile: I welcome all recommendations related the topic!

Thanks!

3 Likes

Update on our dual stack experiments.
First, I realized that I have forgotten to write that we currently run version 6.3.23 of FoundationDB binaries.

Second, with the configuration posted above, even though the coordinators quorum is reached, all of them being reachable and the database considered healthy, I’ve noticed that the Fault Tolerance is decreased from 2 zones to 0. Meaning that the redundancy is not assured anymore.

I also tried to fiddle with the public_address and listen_address parameters (like using [::]:$ID:tls among other tests), but did not succeed in obtaining a satisfying state. Even though fdbserver process is bound to the two IP:PORT declared in configuration, the FDB processes are still only accessible via IPv6.

ss -lpent '( sport = :4600 )'
State       Recv-Q      Send-Q                                     Local Address:Port             Peer Address:Port      Process                                                                                                                  
LISTEN      0           128                                          172.20.9.17:4600                  0.0.0.0:*          users:(("fdbserver",pid=144018,fd=14)) uid:1425 ino:139280706 sk:21 <->                                                 
LISTEN      0           128               [2001:0db8::41d:18ff:fe00:d27]:4600                     [::]:*          users:(("fdbserver",pid=144018,fd=15)) uid:1425 ino:139280708 sk:23 v6only:1 <->   

I really wonder if there is a way to obtain a cluster that would be fully dual stack and if not, what would be the needed changes to be done on the source side.

Any ideas?

2 Likes

I am running into the same issue, and would be interested in any developments here

1 Like