FoundationDB

Proposal: Don't identify a coordinator based on its IP


(Bastien Gysler) #1

As many people pointed out, deploying FoundationDB with a container-orchestration tool, such as Kubernetes, Docker Swarm, and others can be problematic for multiple reasons. One of the main problem is the unstable nature of IP addresses. Kubernetes provides stable network IDs with its StatefulSet, but when a Pod restarts, the IP address will most certainly change.

There is a suggestion to use hostname instead and thus rely on DNS to uniquely identify nodes. However, despite that it would solve the problem on Kubernetes/Swarm, this change would introduce another potential source of failure (See wwilson answer).

My proposal would be to uniquely identify nodes with an arbitrary string that would be either generated by FoundationDB when a node starts for the first time or defined by in the config file.
The main benefit of such a solution is that it does not rely on movable parts, such as IP addresses, or DNS names to identify a node. I think that what defines a node identity is the data it carries, no matter its IP address or hostname. Moreover, because it is an arbitrary string, it would be flexible enough to allow people to still use IP addresses or DNS names if they would like to.

Now one of the problems is to reach these nodes. Here is an example of an fdb.cluster file:

<desc>:<id>@urn:node:<uuid>:<ip>:<port>:(<tls>),urn:node:<uuid>:<ip>:<port>:(<tls>)

Since IP, port, and TLS can change, the canonical ID would be urn:node:<uuid>.

I am not familiar with the internal implementation of FoundationDB, so I cannot assess the impact of such a change, but I am aware that this would introduce new challenges. For example, what happen when a node B has the IP address of node A that just left the cluster. Requests between nodes would have to contain this unique identifier to check its recipient.

I would be interested to hear what the community thinks about it whether they run FoundationDB in a container or not.

Related issues/threads:


(David Scherer) #2

I could be wrong, but I think FDB already solves the specific problem you are trying to solve with unique IDs.

  1. fdbserver roles other than coordinators only expect their IP address to be stable while the process is actually running. Every time they start up they register with the coordinators, cluster controller service and/or system keyspace the IP address needed to reach them, and to the extent that their identities actually need to persist, they use various unique identifiers.

  2. #1 is tested in simulation by swapping data folders (which should be isomorphic to swapping IP addresses)

  3. Coordinators need to have stable IP addresses, because that’s the only way we have to reach them via the cluster file. If the cluster file doesn’t contain enough correct IP addresses (perhaps just one if a coordinator change has occurred to reflect the new IP addresses?), an fdbserver won’t be able to join the cluster. I don’t think this can be improved by using more unique identifiers, but see below.

  4. I think it is sound, however, for coordinators to swap or change IP addresses. A unique identifier is used to identify the cluster for which coordination information is stored, and I don’t think the consensus algorithm cares about the identity of different coordinators as long as the same coordinator isn’t reached by more than one address. As long as IP address changes occur sufficiently slowly, you could keep the cluster working indefinitely by just doing periodic coordinator changes.

  5. I’m not sure if we are adequately testing #4, however. The tricky part would be to not break availability in simulations by running into #3. @Evan?

The feature that I would like to see for integration with external service discovery is to make it so that, as a command line parameter, environment variable or (client) network option you can pass cluster file contents or just coordinator IP addresses obtained from your service discovery system to FDB servers and clients. FDB attempts to contact these coordinators in parallel with the ones in the cluster file, and then updates its cluster file (as it does today) if a more up to date configuration is found. If service discovery is down or the information is out of date it does no harm. Ideally then your FDB cluster keeps working if either your service discovery is up or enough of your coordinators have kept their IP addresses. I would still recommend setting up coordinators with stable IP addresses if your orchestration system permits this, but this should make the best of a given situation.


Allowing client APIs to use an "in-memory" fdb.cluster file?
(Bastien Gysler) #3

First off, thanks for your prompt answer @dave!

I would like to rectify my proposal. When I mentioned nodes, I was actually referring to coordinators. I deployed a 3 nodes cluster and because all of them act both as a regular node and a coordinator, I mixed up the wording. I changed the title.

Coordinators need to have stable IP addresses, because that’s the only way we have to reach them via the cluster file. […] I don’t think this can be improved by using more unique identifiers.

Basically I see two cases where the IP addresses should be handled differently:

  1. Contact the cluster coordinators (when a node starts)
  2. Ongoing communication among peers

Case 1
I agree that unique identifiers won’t solve the problem in this specific context. We need to rely on stable IP addresses at some point to contact at least one coordinator to join the cluster.

That means the fdb.cluster could probably stay as is. <desc>:<id>@192.168.1.1:4500,192.168.1.2:4500

Case 2
Here is the part where I think unique identifiers would solve a problem. But first I would like to stress that I’m fairly new to FoundationDB, so feel free to correct me if I’m wrong here.

I made the assumption that a node relies solely on the fdb.cluster file to find coordinators in the cluster.

Scenario: Cluster of 4 nodes and 3 of them are coordinators.
If two Pods (coordinators) that are being restarted by Kubernetes (e.g. update, crash, …), they will re-appear with a different IP address and the cluster is now blocked. I cannot update the list of coordinators anymore because there is no quorum and a force eviction won’t work because it is a coordinator. As far as I am aware, Kubernetes does not allow me to assign a static IP to fix the issue. The cluster is basically locked forever.

Using an arbitrary string to uniquely identify a coordinator would solve this problem. When a node or a coordinator (re-)join a cluster, it would share its IP/Port and ID with the cluster. The cluster would then share a table with all peers that would look like:

| ID | Type | IP | Port |
| 1 | C | 192.168.1.1 | 4500 |
| 2 | C | 192.168.1.2 | 4500 |
| 3 | C | 192.168.1.3 | 4500 |
| 4 | N | 192.168.1.4 | 4500 |

This table would be continuously shared among all peers in the cluster, so that when a coordinator leaves the cluster and joins back again with a different IP address, all peers would be informed and thus not affected by an IP/Port change.

The feature that I would like to see for integration with external service discovery is to make it so that, as a command line parameter, environment variable or (client) network option you can pass cluster file contents or just coordinator IP addresses obtained from your service discovery system to FDB servers and clients. FDB attempts to contact these coordinators in parallel with the ones in the cluster file, and then updates its cluster file (as it does today) if a more up to date configuration is found.

That is a good idea! This fdb.cluster file is definitely not ideal. Is it even necessary to keep it?

Using a command line parameter, environment variable, or anything else should be enough to contact the cluster. Then the cluster would continuously share a transient table as I explained above.

Ideally then your FDB cluster keeps working if either your service discovery is up or enough of your coordinators have kept their IP addresses. I would still recommend setting up coordinators with stable IP addresses if your orchestration system permits this, but this should make the best of a given situation.

Obviously the idea is to keep IP addresses as stable as possible, but when shit hits the fan, FoundationDB (deployed with a container-orchestration tool) should be able to recover from it.


(A.J. Beamon) #4

I think one of the main arguments for the cluster file is that it provides a durable local record of the current connection information for the cluster and doesn’t have any dependencies on external services.

While a process is connected to the cluster, the cluster can undergo all kinds of changes to its members. If, for example, you migrate your cluster to new hosts (i.e. replace every process in the cluster), then the connection information you used when your process started up before the migration will not be valid after the migration. The processes (both client and server) that are connected to a cluster will update their cluster files in response to these changes, ensuring they can reconnect if they die.

Without the cluster file, if your process dies and tries to reconnect with the same connection string it did originally, it would be unable to in the scenario I described above. Or, if you were relying on some external service discovery to provide you with up-to-date information, you are now dependent on that service being up and functioning in order for your processes to be able to reconnect.


(Bastien Gysler) #5

Thanks for your answer.

I realised after I replied to @dave that he already explained it. It makes sense to keep a fallback in case the external service you’d be relying upon is down.


(Alex Miller) #6

I don’t believe that would entirely solve the issue? While an FDB process is running, one could have e.g. kubernetes restart all of the coordinators and assign them a different IP. It would be expected for the cluster to continue working as long as their data volumes are the same and DNS now resolves them to the new IP. I don’t think that would be the case with this proposed feature, as there’s no way to re-resolve what was provided from service discovery?


(David Scherer) #7

I said IP addresses, but there is no obvious reason why you couldn’t allow DNS names or even an arbitrary command that outputs addresses. The cluster file would still contain IPs, and we would only need to poll resolution when not successfully connected to a quorum of coordinators.

You still need a mechanism for doing coordination changes automatically when needed. And honestly this probably just scratches the surface of problems with running anything reliable on Kubernetes. How do you handle changes to the cluster size safely? How do you run a cluster across multiple datacenters (and hence Kubernetes clusters)? I dunno.


(Mikael Rapp ) #8

allow DNS names or /…/ arbitrary command that outputs addresses /…/ The cluster file would still contain IPs

I don’t see how this solves the issue of setting new coordinator servers when there isn’t a quorum or coordinator servers available. Or in Basgys words:

I cannot update the list of coordinators anymore because there is no quorum and a force eviction won’t work because it is a coordinator. As far as I am aware, Kubernetes does not allow me to assign a static IP to fix the issue. The cluster is basically locked forever.

If the coordinators are identified by a DNS instead of IP, then no quorum change is needed, nodes need only to re-query the dns server on startup or IF coordinator is un-reachable. This would allow for a completely dead cluster to recover.

I suppose the issue could also be fixed if the option to force new coordination servers to be set (without a quorum) was available but this seems like a dangerous approach.

(@basgys : I did found a workaround in k8 by giving each coordinator pod its own service (with static ip); ugly but seems to work)


(Alex Miller) #9

There’s a couple pieces of trouble hidden in those steps though:

  1. How would a coordinator know that it’s a coordinator?
  2. What happens if DNS is unavailable?

For (1), we could have each process resolve the hostname and check against its own IP. This seems like it’d run into trouble with multi-IP’d hosts or NAT’d situations. We could have each process connect to each hostname and request an ID that it compares with a locally randomly generated (or assigned) one. This starts to sound very similar to @basgys’s proposal.

Considering DNS unavailability, this would mean that a process knowing that it’s a coordinator would rely on DNS. It would probably be wise to cache the IP resolved from DNS for a coordinator in the cluster file as well, and I believe we should have the ability to store a tiny bit of state about if we were formerly the coordinator in the pidfile where we store other preferences as well.

Is it just the case that if you’re running in kubernetes and DNS is unavailable, then so many other things would be broken and unavailable that FDB unavailability wouldn’t even comparatively be a problem?


(Mikael Rapp ) #10

(2) I agree that DNS availability adds another failure point. However I think the discussion is not if DNS should be mandatory but if it could be a supported alternative; If so, then everyone can decide if DNS is an acceptable point of failure given their respective infrastructure. In a K8 environment I would argue that depending on DNS is reasonable.


(Kwasi Ohene-Adu) #11

I’d agree with @Zatte. I believe it makes sense for FDB to optionally support DNS resolution of coordinator host names. To answer your question @alexmiller: yes, DNS is extremely robust in Kubernetes. If that mechanism ceases to work, there are bigger problems to worry about.


(Chr1st0ph) #12

I believe we need another feature based on DNS: support for “seeds identifited by hostnames”.

Thus, you do not have to “copy” the fdb.cluster file in Kubernetes for adding new nodes. You just define one or few seed nodes ( probably coordinators ), and these seed nodes provide the necessary topology information for new nodes and applications.

What are your thoughts ?


(Tobad357) #13

Is there any progress on this as we would prefer to run FDB on K8S but currently it gets quite difficult if we for example delete all pods and they change ips

Was there any news during the conference?