So you have an application that can run in a container or virtual machine. It’s stateful - maybe it has a Postgres or MySQL database embedded in it. Or both, probably. You got it from a vendor or it was built by someone who left your company years ago. It works fine, and you don’t anticipate needing to scale it beyond the resources you can give a single VM. You certainly aren’t interested in porting the application to use a different database. But you don’t want to worry about operating it and you want it to keep working despite machine or even datacenter failures.
So run it in a virtual machine which mounts a network block device (e.g. using Linux’s NBD protocol). Connect it to a layer like https://github.com/spullara/nbd that implements NBD on top of FoundationDB. And then connect it to a FoundationDB cluster with the fault tolerance properties you want.
There are a few things needed to fully realize this vision:
-
That nbd layer needs to be made rock solid. For example, it should validate its lease in every transaction.
-
I think the best leasing strategy is that the last attempt to mount a volume always succeeds (while ensuring that any subsequent write attempts by the previous owner will block forever or terminate its VM). The idea is that if the application could survive being rebooted (it uses fsync() correctly if necessary) it can survive being replaced this way. Then the health of the application is monitored externally by orchestration software and if it fails, a new instance is started, mounts the volume, and proceeds
2b. An alternative design is that an attempt to mount a volume gets in line and waits for any existing mounter to either end or fail to update a lease key in a timely way. You still ensure that in the latter case the original mounter is prevented from making writes. Orchestration software always keeps an extra instance around trying to take over, and when it detects a health problem with the existing instance it tries to shut it down. I think this is a less reliable approach.
-
FoundationDB should support satellite replication (coming in 6.0?) so that you can do practical, performant multi-region fault tolerance. Then failover of such legacy applications can be done automatically and safely even in most region failure scenarios. This is where the benefit of this strategy over existing cloud block stores fully pays off.
-
Maybe FoundationDB’s storage engine (or a storage engine) could do a more efficient job of storing lots of exactly 4KiB values.
-
This solution as a whole should be packaged up and integrated with cluster management software so that it is easy to install a legacy application and get this extremely high level of fault tolerance automatically.
-
It should be possible to extend this vision to “clusters” of legacy virtual machines as well
Thoughts?