Hello *,
First, sorry for the long post in advance. But I’d like to say that, as of yesterday, I’ve added a FoundationDB package to the NixOS Linux distribution, as well as a module for using FoundationDB in a declarative manner on NixOS.[1] I’m a software engineer and NixOS package maintainer, so I figure I’d drop by and announce it and ask some questions concerning packaging for “non-standard” Linux distributions.
I also think NixOS is perhaps the first distro to pick up a working FoundationDB build (made easier for reasons explained below), and I hope this can help other developers get it working where needed, for the time being.
Some background
NixOS is Linux distribution built on a declarative package management. Rather than running sets of commands to bring a system to some state, by modifying the existing state – you write a declaration specifying how you would like your machine to work, which is then realized for you. Multiple system states can co-exist, and you switch between them to go from one to another. The results of this design decision are profound in practice, including atomic system upgrades, transactional system rollbacks, reproducible builds (that properly track dependencies), supporting conflicting/multiple library versions, and more.
This turns out to be extremely useful for FoundationDB which has an… interesting build process, to say the least.
For example, the declarative description of the FoundationDB package for NixOS may end up as a better starting point for building it yourself[2] on foreign systems, vs something like the Docker image, once you learn to read it. In particular, this description:
-
Correctly captures build requirements. If a build requirement is not listed in the declarative description (under the name
buildInputs
), it is not available during compilation/build time. Thus, removal of ‘gawk’ means the ‘awk’ command will not be available, and the build will fail. This means every build dependency must be explicitly listed, and there are no cases where this would fail (for example, by failing to suggest to a user toapt-get install
something.) Networking is removed, all builds are sandboxed with filesystem namespaces, and more. -
Handles odd version dependencies. For example, the FoundationDB build includes a copy of Boost 1.52.0, as a header package. This package is private to the FoundationDB build and never exposed to any other packages, as it is otherwise too old to support. While Nix actually supports as far back as Boost 1.55, 1.52 is too old to use with our generalized Boost build infrastructure. Creating a custom variant of Boost is extremely easy.
Similarly, NixOS natively handles multiple GCC versions, and the FoundationDB build is overridden explicitly to GCC 4.9 this in all configurations, as that’s what it supports. -
Tells you precisely how results are installed; for example, you must build
all
andfdb_c
in the Makefile targets, and then copy the resulting files out of a few different places. -
Atomic. Any updates to any dependencies for FoundationDB (or the dependencies of its dependencies) will imply a rebuild of FoundationDB. Because every package, every dependency, is described is controlled in this single git repository, this effectively means Nix provides a nearly-transparent, completely reproducible build environment for FoundationDB. This is something not even Docker can easily provide, as even a single
apt get install <foo>
from an un-pinned source at any point in the creation of the image can be fatal. (This is why people typically standardize on Docker images and not Docker files – because all too often Dockerfiles are not actually reproducible and are poisoned like this.) -
Tells you what needs patching. For example, Nix removes all
.git
directories from the source code when performing builds (it’s may be prohibitively large anyway), and many other build systems/CI systems for Linux distros tend to strip this as well – preferring explicitly packaged tarballs. This required a patch to the FoundationDB build system to not search for.git
, as it otherwise required it for determining version revision information.
FoundationDB on NixOS
Using FoundationDB with the included NixOS module is pretty easy. With the latest master
version of NixOS, you can enable it automatically on boot in just a few lines. For example, here’s a copy of my configuration, running FoundationDB on a 1950X ThreadRipper with 64GB of RAM (~5GB per process).
services.foundationdb = {
enable = true;
dataDir = "/data/fdb/data";
logDir = "/data/fdb/logs";
serverProcesses = 12;
backupProcesses = 6;
extraReadWritePaths = [ "/data/fdb/backups/" ];
};
That’s it, and FoundationDB will automatically be installed – binary packages will be downloaded, put into place, and system services started all at once. This simple declaration has a lot of logic behind it[2]:
-
fdbmonitor
and allfdbserver
processes are controlled by a single systemd unit, calledfoundationdb.service
, which is available viasystemctl
. It’s correctly started after networking interfaces are available and the system is booted. - Data directories and log directories are pointed explicitly onto fast NVMe drives (but default to
/var/log/foundationdb
and/var/lib/foundationdb
like usual). - Auto-mouting semantics. Thanks to systemd’s
RequiresMountsFor=
directive, when FoundationDB is started, any filesystem mounts along the path of the data or log directories are automatically mounted upon server start, and a systemd dependency is implicitly added to ensure this mount happens first and succeeds. For my server, my main OS partition is on an NVMe boot drive, while my FoundationDB data is on a completely separate NVMe drive. This feature ensures the NVMe mount point is always available prior to server initialization. - The FoundationDB processes are heavily sandboxed, with explicit paths to read/write to (e.g. for backups, or for
/etc/foundationdb
). All forms of new permission elevation are denied,/dev
is restricted, and/tmp
is privately mounted in a way that is isolated from all other units. This is done through the use of systemd security features to put FoundationDB in its own control group and namespace on Linux. For example, the entire Linux system is effectively marked read-only from the POV of FoundationDB; any attempts to read or write outside of the log directory, data directory, or backup paths results in an explicit error. Further paths (like/boot
) are not even readable at all. This is a security feature (which NixOS takes more seriously now), but also a sanity one to ensure FoundationDB is “being a good citizen” in a hostile world. - As expected, a new database (
configure new single ssd
) is also initialized at first startup.
The remaining options have sensible defaults, mostly following the ordinary default FoundationDB parameters. For example, you can also configure process class, storage memory, locality (zone/machine/dc/hall IDs), and more.[3]
You never need to run individual edits to an fdbmonitor configuration file, it’s all controlled here. Like Nix package builds, changes to this FoundationDB description are are atomic: any change either results in a restarted FoundationDB service, or, if the service fails (say, due to an invalid configuration parameter), the results are rolled back.
For NixOS users, I have also written a section in the manual about FoundationDB usage[4], which may be relevant or interesting to people here – I’d appreciate feedback! (Currently using a local copy I built myself, but this documentation will appear on https://nixos.org soon enough.)
Some questions
-
Is there any timeframe on support for newer GCC/boost libraries? Right now, while NixOS keeps copies of GCC 4.9, and the FoundationDB package provides Boost 1.52.0 (headers only), it would be nice to eliminate these hard dependencies. One reason is that GCC 4.9 is effectively on life support in NixOS; in fact there is almost nothing else that seems to explicitly require it. Had I not added FoundationDB recently, it’s possible GCC 4.9 would have been axed before too long, making adding it much harder.
The other reason is just maintainabiliy; lots of specific workarounds ultimately hurt long-term maintainability for us, so working with upstreams to fix these problems is often desireable. (NixOS recently switched to GCC 7.3.0 as the default compiler, although 5.x and 6.x are also available.)
-
Is the above security policy for the FoundationDB server “correct”? In particular, while I prefer to enhance security of NixOS services where-ever possible that doesn’t break functionality (which has not seemed to happen with FDB), I’d like to ask if this is OK. More generally: In practice, having been an OSS maintainer of big projects, and a Linux distro maintainer, I know both sides of this: deviating from upstream policy enough is likely to create its own issues. However, these issues almost always make their way back upstream, not through the package maintainer. This creates an unfortunate series of events where primary upstream developers have to sort through downstream users, downstream packages, and downstream setup choices – and this isn’t always a good use of time on behalf of the developers.
So I’d like to know if I’m doing something wrong or something you consider bad. If I am, I’ll gladly change this to be more in-line with what upstream expects, and it will be easier for everyone.
In particular, FoundationDB does not seem to offer systemd unit descriptions for its packages (rather, it offers
init.d
-style scripts, which systemd distros such as recent RHEL, Ubuntu etc can transparently handle), so it’s possible none of this is wrong, but simply unfamiliar. (If you’d like, I’d be more than willing to help contribute upstream unit files, if you would like to provide them.) -
Is there any explicit support policy on 3rd party distro packages? In a prior life, I developed and worked on proprietary Linux software that was sold to customers in a variety of (bizarre, horrifying) settings, so I understand the need for things like singular, static binaries, etc. This leads into a lot of the weird stuff in the build system, such as libc++ frobnication (something to do with symbol versioning?) and recent plans I read elsewhere to use
__asm__
to mark glibc memcpy versions to link against.Of course, there’s nothing wrong with that! You have to do serious QA on a piece of software like FDB. But then the question is: what about packages you did not compile in this controlled environment? Should there be a notice “this isn’t an official upstream package”? Should there be any notice? Should downstream say something? Or should you say “Only packages provided by foundationdb.org have undergone our quality assurance, on these systems:” on the homepage? I’m not sure.
I’m afraid my prior experience never lead into a proprietary project becoming OSS, so I’m not sure I can offer any guidance here on what to do. You’ve still got customers to support so you can’t just exactly obliterate all the stuff you have in place for this, but it adds a bit of tension between (formerly) proprietary product users and OSS users.
-
There is a lot of weird stuff going on in the build systems, relating to static libraries. I had to write a patch in order to fix this; in particular, it would seem as if the makefiles put
ld
flags like-lstdc++
inside of_LIBS
variables for individual subprojects; but this is wrong, because_LIBS
is for tracking files that will be generated by rules in the build system, not the actual flags that are needed at link time. That’s_LDFLAGS
, and as you can see from the patch, it essentially just moves a bunch ofxxx_LIBS += -lfoo
toxxx_LDFLAGS += -lfoo
in order to get things working.Without this patch, the net effect is that the build system will immediately fail; for example, Make will try to build
fdbcli
, butfdbcli
needs the_LIBS
dependencies to be built first. However, it has no idea how to build-ldl
or whatnot because that’s not a rule, that’s a link-time flag. So you get a mysterious error"Error: cannot make dependency "-ldl", needed for bin/fdbcli"
or whatnot.I do not know why this is needed. Of course, you all wrote the build system! So maybe my understanding of
_LIBS
is wrong. Perhaps I missed a subtle build dependency. Do_LIBS
get transformed into_LDFLAGS
by something? If so, where? If not, why does the build system list-ldl
in_LIBS
– when it seems to contain rule dependencies, not link flags? Should everything really be in_LDFLAGS
like my patch has done? Has some subtle semantics of Make or the build tools changed? It’s unclear to me why this is needed, but it’s fishy.
Some suggestions
This list is incomplete, just a set of thoughts.
-
Please add a
make install
target. This should not rely on any particular packaging file, it should just runinstall
on the files in the right places. Unless I completely missed it (I don’t think I did?), this is an oversight that makes 3rd party packaging much more annoying. In one particular annoyance,fdbrestore
,fdbdr
,dr_agent
etc are all actual symlinks to one binary (backup_agent
), and determine their operation based on the exe name. I didn’t realize this until I had dug in and read the source code! So instead I had to ‘extract’ the fdb .dpkg and look at the filesystem hierarchy. Then I mostly replicated that. Butmake install
would have done it for me…Please note that it is essential that a command like
make install
respect the traditionalPREFIX=
andDESTDIR=
environment variables, so that maintainers can install them into arbitrary places, which is vital. These environment variables do not need to have any special meaning to the FoundationDB source code, they merely need to exist, and be respected, when finally copying files around duringmake install
. (FDB, as far as I can see, doesn’t rely so much on fixed hardcoded paths outside of/etc/foundationdb
, and even that is fixable in the fdbmonitor file.) I’m sure you’re aware of it, but I only mention it because addingmake install
but not adding that will just have distro maintainers right back here. -
Please make libstdc++ frobnication,
link-validate.sh
, etc optional during build. First, on a system like NixOS, and, indeed, almost any Linux distro based around compiling packages for users, and shipping them those packages – validating symbol names is meaningless, because there is only one global glibc, and it is explicitly compiled against. There will never be older versions that mysteriously appear out of nowhere, only newer ones. Second, the particulars of frobnication aren’t the same everywhere, for example, in NixOS I just sheepishly hackedlibstdc++_pic
and replaced it withlibstdc++
in the Makefile, because the_pic
variant is an Ubuntu specific anomaly! Then I just disabled the link-validate check.These are surely vital for officially sanctioned static binary packages for your customers/users, but for distro maintainers, being able to turn this off would be excellent.
-
Please make the existence of
.git
optional at build time. Many distros and packaging systems prefer to use tarballs (with optional signatures or hashes) instead of direct git repositories, so.git
is not always going to be available. This causes FoundationDB’s build to fail since it requires it. For prior projects, one way we did this was to add asdist
target to create a tarball of the source code without git, for every release. Then, put a.release-version
file inside the tarball containing the actual versioning information, as part ofmake sdist
. -
The FoundationDB Documentation site map is missing many important pages, for example, such as the “Administration” page and “TLS” page, which do not seem to be available. The TLS page simply doesn’t seem to exist anywhere but the repo, while the Administration page can at least be found on the current documentation using the “Search” bar. But neither are linked from “Site Map”!
-
Please don’t use
-Werror
, anywhere, under any circumstances. In fact I suggest forgetting that flag has ever existed (as a distro maintainer, I almost wish it didn’t). Ideally-Werror
could be injected into the build system, for example during CI on systems you perfectly control, or FoundationDB developer machines. It’s basically always wrong otherwise, though, and using it should require a dance on part of the person doing it. I had to patch this out of./Makefile
for things to work.In particular, for NixOS, while the C++ compiler is GCC 4.9, i.e. old, glibc is far newer. That’s unsurprising: there can only be one glibc, but many compilers. This results in GCC throwing warnings (errors) due to more modern glibc headers than the one that the build system/CI system currently tests against. (For example, glibc headers over time have been augmented with things like
warn_unused_result
, meaning perfectly warning-free code can become warning-laden, without even touching the compiler, only glibc.)While the compiler version is fixed, you can pretty much never guarantee the glibc version is fixed. Maybe NixOS is using glibc 2.25 with GCC 4.9, while another system is only using 2.23 with 4.9. Unless you test every possible combination of a specific compiler version against an array of glibcs.
In particular, fixing these few issues would make the NixOS FoundationDB package far simpler, and it would allow me to remove several of my patches.
Testing, limitations
I’ve tested FoundationDB-on-NixOS with a 9 node cluster split 3x3 in geographically distinct data centers (Amsterdam, New York, Bangalore) with proper locality settings, in triple datacenter mode. Autocoordination automatically promoted 7 coordinators out of the 9 nodes in distinct datacenters to achieve this (odd numbers: 2 coords in 2 of these regions, 3 coords in the last region). This cluster then had inserts (fdbcli --no-status --exec 'writemode on; set fooX barX
’, for many X
values) performed on it repeatedly while taking nodes and coordinators offline at random intervals for random periods. (Nothing fancy to stress things, more like “I hit some commands at random intervals while watching Netflix to see what would happen”).
There are currently several limitations. The biggest one is that TLS support is NOT supported in the NixOS configuration right now. This isn’t due to a bug, just lack of time over the past weekend on my part. (The above 9-node/3-DC test was in fact encrypted – but using Wireguard as an overlay VPN in order to transparently encrypt data-in-flight. As a bonus, the wg
command on any node gives a hand indication of the raw network bandwidth use between any two nodes). I expect to enable this soon after some testing of FDBLibTLS.so
The second major limitation is that only the C bindings are installed, not the Python, Ruby, Java, or Go bindings. Each of these will likely require help from fellow NixOS maintainers to properly maintain and build, as language-specific packages and bindings often require bespoke setup. I’m a C/Haskell programmer however, so I admit I’m unlikely to get to this soon without cajoling, since my needs are met…
NixOS just recently had its latest release, version 18.03, in late March. I have no plans to backport FoundationDB packages in any form to NixOS 18.03. The next release is NixOS 18.09, due around September, and I expect NixOS’s FoundationDB support will be quite featureful and ready by this time, and there is nothing to suggest its reversion. Until then, the semantics and default configurations may change!
DR mode, PITR for active clusters, recovery etc has not been tested extensively (I did not set up a second cluster in my tests), but ideally should work fine, as backup_agent
seems to work fine too on my local machines.
There are probably some other bugs and problems I’ve missed, surely.
Links
[1] https://nixos.org
[2] https://github.com/NixOS/nixpkgs/blob/617db2df96a75f7808d544b57aa97d9859377e84/pkgs/servers/foundationdb/default.nix
[3] https://github.com/NixOS/nixpkgs/blob/18f28a6413e33416576f632367f0a4816c74c188/nixos/modules/services/databases/foundationdb.nix
[4] https://inner-haven.net/~aseipp/nix/fdb-manual/index.html#module-foundationdb