FoundationDB does not run on Windows Subsystem For Linux (WSL)

I’m not sure why you’d want to do it, but if you’d want to do it: installing FoundationDB 5.2.5 on Windows Subsystem for Linux with an Ubuntu userland does not work (Win10 1803)

For those wondering what the hell I’m talking about: yes, this is a real thing https://docs.microsoft.com/en-us/windows/wsl/faq

Installing the server package fails

~/fdb$ sudo dpkg -i foundationdb-server_5.2.5-1_amd64.deb
Selecting previously unselected package foundationdb-server.
Preparing to unpack foundationdb-server_5.2.5-1_amd64.deb ...
Unpacking foundationdb-server (5.2.5-1) ...
Setting up foundationdb-clients (5.2.5-1) ...
Adding group `foundationdb' (GID 115) ...
Done.
Adding system user `foundationdb' (UID 111) ...
Adding new user `foundationdb' (UID 111) with group `foundationdb' ...
Not creating home directory `/var/lib/foundationdb'.
...
Setting up foundationdb-server (5.2.5-1) ...
ERROR: Disk i/o operation failed (1510)
dpkg: error processing package foundationdb-server (--configure):
 installed foundationdb-server package post-installation script subprocess returned error exit status 1
dmesg: read kernel buffer failed: Function not implemented
                                                          
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (237-3ubuntu10.3) ...
Errors were encountered while processing:
 foundationdb-server
E: Sub-process /usr/bin/dpkg returned an error code (1)

Installing the client package does not fail, but running fdbcli fails immediately:

~/fdb$ fdbcli
ERROR: Disk i/o operation failed (1510)

The log files all look identical to this (this is the whole file! nothing more)

<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1540663766.702862" Type="Binding" Machine="127.0.0.1:4500" ID="0000000000000000" PublicAddress="127.0.0.1:4500" ListenAddress="127.0.0.1:4500" logGroup="default"/>
<Event Severity="10" Time="1540663766.702862" Type="IOSetupError" Machine="127.0.0.1:4500" ID="0000000000000000" UnixErrorCode="26" UnixError="Function not implemented" logGroup="default"/>
<Event Severity="40" Time="1540663766.702862" Type="MainError" Machine="127.0.0.1:4500" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" logGroup="default" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x12abcc4 0x12aad32 0x432900 0x7fbfc1241b97"/>

I guess fdb must calling into some syscalls that are not emulated by the windows kernel, crashes, gets restarted, rince, repeat…

Running fdbserver would have been cool, but at least I can install it on the Windows host itself, but it would have been cool to be able to use the client and fdbcli from a linux host, to test linux-native apps (in my case, running .NET Core on Linux on my Windows laptop, without paying the cost of a VM)

I’m not an expert on WSL, but could this be related to the fact that FoundationDB uses O_DIRECT for I/O? That same problem has caused issues for those trying to run FDB within a Docker for Mac container with a mounted volume as the data directory: https://github.com/apple/foundationdb/issues/842

I’m basically suggesting this because of the fact that it’s failing with an I/O error on startup, and (absent, say, a bad disk), the above issue is I think where I’ve seen that before.

It’s possible. The way I understand it, WSL works by translating linux syscalls into their equivalent Windows kernel calls, but not everything is either implemented, or possible. O_DIRECT I/O may be one of the unsupported calls.

If you implement a workaround for platforms that do not support O_DIRECT, it’s possible that WSL may be fixed as well.

Now to be fair, I don’t expect the server to be fully working, but the client part would be nice. And my suspicion is that it’s only the code that deals with the fdb.cluster file that fails (log files are written to disk without issues).

IOSetupError is caused by a failure to setup linux kernel aio. I take it that’s not supported in WSL?

Maybe using AsyncFileEIO instead of AsyncFileKAIO in this case would work? I think this would require a code change to test.

It looks like AIO is not supported by WSL, according to this issue (that impacts mysql as well): https://github.com/Microsoft/WSL/issues/3631 or https://github.com/Microsoft/WSL/issues/2113

Are there other linux platforms that don’t support AIO? If WSL is recognized as a Linux platform by the client, then that would mean that “if (platform == LINUX) …” is not a sufficient test, and would require either a runtime check for AIO support (with fallback to something else), or a custom package built for WSL? That’s starting to look like a lot of work…

I haven’t really looked into it, so I’m not sure. I don’t think I’ve heard anyone else mention that they had trouble with AIO on a Linux platform before, though.

Another option is to have it be a runtime toggleable piece of behavior, perhaps using knobs. I haven’t attempted to do this, but this may be the only place where you need to modify the behavior: https://github.com/apple/foundationdb/blob/51afb29e3b23e41e55c1aec1a2865b29baf6073b/fdbrpc/Net2FileSystem.cpp#L61.

As a side note, maybe avoiding AIO in this way is a better option for those who can’t use O_DIRECT than reverting to synchronous behavior (see https://github.com/apple/foundationdb/pull/859).

I found this topic being pointed to elsewhere, so to circle back:

As of FDB 6.1 (and thanks to #1283), one can pass --knob_disable_posix_kernel_aio=1 on the command line to fdbserver, which will cause it to not try to use KAIO on linux, and instead fall back to using a threadpool which issues synchronous IO calls. This should allow FDB to work on the Windows Subsystem for Linux, and feel free to report back/open an issue if it doesn’t.

I tried it with v6.2 still on WSL v1, and I’m still getting an issue with both fdbserver and fdbcli.

Adding knob_disable_posix_kernel_aio=1 to foundationdb.conf or passing it as an argument does not change anything, server crashes immediately on start:

krzys@WSL:~# fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen_address public --logdir /var/log/foundationdb --public_address auto:4500 --knob_disable_posix_kernel_aio=1
Error: Disk i/o operation failed

Logs:

<?xml version="1.0"?>
<Trace>
<Event Severity="10" Time="1574407473.543700" Type="TLSConnectionLoadingPlugin" ID="0000000000000000" Plugin="fdb-libressl-plugin" Machine="127.0.0.1:4500" LogGroup="default" />
<Event Severity="10" Time="1574407473.545043" Type="Net2Starting" ID="0000000000000000" Machine="127.0.0.1:4500" LogGroup="default" />
<Event Severity="10" Time="1574407473.545056" Type="Binding" ID="0000000000000000" PublicAddress="127.0.0.1:4500" ListenAddress="127.0.0.1:4500" Machine="127.0.0.1:4500" LogGroup="default" />
<Event Severity="10" Time="1574407473.545056" Type="IOSetupError" ID="0000000000000000" UnixErrorCode="26" UnixError="Function not implemented" Machine="127.0.0.1:4500" LogGroup="default" />
<Event Severity="40" Time="1574407473.545056" Type="MainError" ID="0000000000000000" Error="io_error" ErrorDescription="Disk i/o operation failed" ErrorCode="1510" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x194a7bf 0x1949bcf 0x4ebb2e 0x7fce95641b97" Machine="127.0.0.1:4500" LogGroup="default" />
</Trace>

fdbcli also crashes on start (while reading fdb.cluster ?)

krzys@WSL:~# fdbcli --version
FoundationDB CLI 6.2 (v6.2.7)
source version d3c4bd9c5ac29ef98c024317353e3c1a24a9d6b3
protocol fdb00b062010001
krzys@WSL:~# fdbcli
ERROR: Disk i/o operation failed (1510)
krzys@WSL:~#

Am I missing something?

Anyway, this is for WSL v1, which will be soon replaced by WSL v2 that will run a real linux kernel (under an hypervisor) so this issue may go away by itself, though it may be usefull to find why here the knob seems to be ignored in some places?

sigh.

A check for the knob is missing on the init call. Thanks for testing, and let me go file an issue

After upgrading to Windows 10 2004 and WSL 2, I can say that FoundationDB does install on WSL2 and seems to be running fine:

By default it binds 127.0.0.1 which is only accessible from inside the WSL guest, not from the windows host. To fix that, I had to edit the /etc/foundationdb/fdb.cluster and change the IP to the guest’s IP. By default the guest and hosts communicates via an Hyper-V Virtual Ethernet Adapter (it was using 172.x.x.x/20 something in my case).

With that done, and by copying the fdb.cluster file over to the windows host, I was able to connect to it:

C:\DATA>fdbcli -C wsl.cluster --exec "status"
Using cluster file `wsl.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - memory-2
  Coordinators           - 1

Cluster:
  FoundationDB processes - 1
  Zones                  - 1
  Machines               - 1
  Memory availability    - 24.9 GB per process on machine with least available
  Fault Tolerance        - 0 machines
  Server time            - 06/03/20 00:05:12

Data:
  Replication health     - Healthy
  Moving data            - 0.000 GB
  Sum of key-value sizes - 0 MB
  Disk space used        - 105 MB

Operating space:
  Storage server         - 1.0 GB free on most full server
  Log server             - 240.3 GB free on most full server

Workload:
  Read rate              - 21 Hz
  Write rate             - 1 Hz
  Transactions started   - 8 Hz
  Transactions committed - 1 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Client time: 06/03/20 00:05:12

Running ps aux in the guest yields the smallest process list I’ve ever seen :slight_smile:

krzys@SATSUKI:~$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0    892   548 ?        Sl   Jun02   0:00 /init
root     31236  0.0  0.0    892    80 ?        Ss   Jun02   0:00 /init
root     31237  0.0  0.0    892    80 ?        R    Jun02   0:00 /init
krzys    31238  0.0  0.0   8852  5820 pts/0    Ss   Jun02   0:00 -bash
root     32058  0.0  0.0   5236  2392 ?        Ss   00:00   0:00 /usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid
foundat+ 32059  0.1  0.0 167700 14132 ?        Sl   00:00   0:00 /usr/lib/foundationdb/backup_agent/backup_agent --cluster_file /etc/foundationdb/fdb.cluster --logdir /var/log/foun
foundat+ 32060  0.5  0.1 478500 38780 ?        Sl   00:00   0:01 /usr/sbin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/data/4500 --listen
krzys    32081  0.0  0.0   9104  3540 pts/0    R+   00:05   0:00 ps aux

Since WSL2 is using a virtualized linux kernel, it should work much better than before. Though with the fact that the file system is virtualized as well, the performance will probably be less than running the Windows version on the the Windows host? (needs to be tested!)

1 Like