Proposal: Dropping MSVC

markus.pilman · February 12, 2020, 1:24am

Windows was always an officially supported OS. However, Windows support comes at a great cost and I am not convinced that we currently are paying the cost. Therefore I would like to propose that we drop support for the Microsoft Compiler (MSVC).

To the best of my understanding FDB on Windows is pretty much untested. Just having a badly tested binary on the official download page is not very responsible. We also never look at compiler warnings on Windows, so even the theoretical benefit of having the code verified by another compiler toolchain is not that big of a benefit. But having a compiler and standard library with a potentially significantly different behavior seems risky.

I think the main reason we still officially support Windows (badly) is because people are generally afraid to cut features (and sometimes for good reasons).

I understand there’s value in being able to run FDB on Windows (mostly for development - I don’t think I would trust it enough to actually run a production system on Windows).

I wanted to know what others think? To be clear: I don’t think we should drop Windows support. However, I don’t think we should continue to compile with MSVC and instead find alternatives (they are ordered by amount of work it takes to make it happen).

Alternative One: Docker

This would be the easiest solution and probably also works: https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/set-up-environment?tabs=Windows-Server

The main drawback is that this is virtualization. However, I would not be surprised if FDB in a VM is faster than FDB natively running on Windows. In fact I would be surprised if that wouldn’t be the case (I don’t believe we regularly run benchmarks on Windows).

Pros:

Probably very limited amount of work - this should already be possible
Probably the most robust solution - most of the testing is done on Linux and this will probably have the lowest number of changes compared to test and production systems out there.

Cons:

High setup cost: for an application developer who want to build an application on top of FDB, setting up docker might be a high cost.

Alternative Two: WSL

The Linux subsystem for Windows can run Linux applications natively on Windows. This has more or less the same pros and cons to using Docker.

Alternative Three: Use Clang

Clang can build native Windows applications and is officially supported by Microsoft (you can even download it as part of Visual Studio).

Pros:

We would use a tested toolchain. This would at least mitigate the risk that a different compiler would introduce new bugs or that the standard library would behave drastically differently
We could still ship a normal Windows installer.
This could potentially perform well - and maybe even better than a virtualized solution.

Cons:

Everything in Platform.cpp would still be different - including the IAsyncFile implementation - so there will still be need for proper testing.
We would still need to maintain a Windows build

WolfDan · February 12, 2020, 2:28pm

Windows user here,

I’m not agaisnt the proposal at all, in fact I’ve noticed some issues with fdb on windows like really large memory usage after usage (even though the storage is set to ssd) and so on, I never filled an issue because I know the real production enviroment will be linux, I use windows only for development since is my main OS

I’m fine using any of the proposals, clang looks like a really good option but as mention has the same problem as msvc which needs to be maintained

Also wsl does not work right now FoundationDB does not run on Windows Subsystem For Linux (WSL) , wsl2 is on development and I’d guess will make running fdb a native like experience

I do think @KrzysFR has more to say about the proposal since he maintain the C# client bindings

wohali · February 12, 2020, 5:09pm

Hi Markus,

This is not great news for Apache CouchDB.

Part of our selection criteria for moving to a FDB backend was proper Windows support. There is a large (though not primary) user base presently using CouchDB in desktop settings on MacOS and Windows. These are small office / home office type applications where clustering isn’t useful. We also have a few (large) client using CouchDB on Windows for actual clustered servers (though we would generally counsel them to use Linux/BSD instead.)

Alternatives 1 and 2 are no-gos for these environments, I’m afraid. We’ve been down this path with them before and both have been summarily rejected for the time being. (The reasons are largely administrative, political and legal, and less technical.)

For CouchDB, should either of these alternatives be selected, we might have to drop plans to use FDB on all platforms, and build against something like SQLite on Windows instead. At the very least, that would delay our full FDB-based release by some time. It also wouldn’t help for the Windows-based clustered installations, which would no longer be possible.

Alternative 3 is probably fine for us, though our experience with CouchDB packaging and building its toolchain and dependencies (Erlang, ICU, curl, OpenSSL, Mozilla Spidermonkey) are that MSVC builds are both more performant and more broadly compatible with other native Windows applications. What will matter is that our native FDB driver in Erlang still be compile-able with MSVC, which it should be. (FYI, cross-linking of separately built MSVC and MinGW-built binaries is frequently not possible – see Erlang’s Win32 build walkthrough for a partial explanation.)

Could you provide more detail as to why Clang is preferred over MSVC? If it’s just cli options, there are wrappers extant to make cl.exe and link.exe act more like gcc/g++ and ld. If it’s MSBuild vs. GNU make, you can absolutely use make with MSVC, that’s how we build CouchDB today.

I’m happy to share my experience in comparing and contrasting both compiler chains, and possibly help find workarounds, though my time for contribution is limited.

For reference, in case you’re looking at your download numbers for Windows to inform this decision, CouchDB right now are wrapping our 3.0 release (the last major release before moving to FDB). We were planning to start exploring FDB on Windows with CouchDB in the next couple of months; it’s not been the focus to date.

markus.pilman · February 12, 2020, 6:08pm

Just to be clear: this is not news - this is just a discussion and nothing has been decided yet.

alexmiller · February 12, 2020, 6:24pm

Can you finish this sentence? It’s unclear to me if you’re proposing dropping officially produced builds for fdbserver on windows, or for the client as well.

Should I interpret this as that you’re proposing dropping a windows-native build, but not all ways of running FDB on Windows?

markus.pilman · February 12, 2020, 9:04pm

Sorry, yes. I meant to say that I don’t want to continue to support MSVC.

That would be one option that I was proposing. Alternative would be to use clang instead of MSVC to compile the native Windows build. From what I hear, this seems to be the better alternative.

Understood. I was hoping to get feedback like yours.

Just to be clear: MinGW and clang are very different beasts (at least that is my understanding). I would not propose to use MinGW exactly for the reasons you pointed out (and others). For context: I used to develop full time on Windows with Visual Studio for quite a long time in a previous job.

With clang for Windows you get the following things (that you don’t get with MinGW):

Official Microsoft support.
Ability to use Win32 API
Full ABI compatibility with MSVC binaries (this one I am not 100% sure).

There are several benefits. One is that if your code compiles on Linux or MacOS you will have a higher confidence that it will also compile on Windows (I broke the windows build several times before). The same is true for building behavior. Sure you can use make instead of msbuild build but the behavior is still wildly different. One example is that MSVC implements parallel builds very different from clang and gcc (as in: clang and gcc don’t implement it while MSVC does - which is very confusing if you have to maintain a build system).

But the strongest argument for me is actually correctness:

The standard library is different (we have to support libstdc++, libc++, and whatever the MSVC thing is called). Some things are different between those and we might rely on some libstdc++/libc++ specific behavior (for example when it comes to destructor call ordering within containers for which the standard doesn’t define anything but most implementations will give you something deterministic).
Compilers generate very different code and have very different behavior (for example when it comes to things like memory alignment). gcc and clang very close to each other but MSVC is not. So if we rely on gcc behavior somewhere we might very well have a Windows bug (but not a Linux/MacOS bug).

So having MSVC means that our testing surface increases dramatically. Snowflake is certainly not going to make big investments into Windows (and I assume Apple isn’t either - but I don’t know).

So for 7.0 nothing will change anyways. However, I created this Ticket. I will try to find some time to see how hard it would be to compile with clang on Windows. I could then give a binary to you guys and you can evaluate this.

Also: if you’re going to rely on a Windows build, are you planning to invest anything in that area? Because if you are willing to invest some time in maintaining and testing Windows, my opinion would shift drastically.

KrzysFR · February 12, 2020, 9:20pm

We’ve been running FoundationDB clusters on mostly Windows platforms (desktop for dev, server for “production”) for about 5 years, and except a few minor bugs in the early years, the only issues we’ve had were hardware failures. If there are compiler warnings, they don’t seem to hurt that much.

Up until now, the main component we really 100% need running on Windows is the client. All the recent deployments in production were application servers running on Windows 2016/2019 Server, that talked to an fdb cluster running on Linux (either ubuntu or RHEL). Though it is very handy to have also the server running on Windows for development, training/proof of concept with a single VM running on a laptop, or on CI build VMs that target the .NET Framework (which is still windows only).

Regarding performance… not much to say about it. It is true that the Linux implementation is a bit faster, but the performance of a Windows client to Linux cluster is still orders of magnitudes faster than what we need.

Regarding virtualization, running on Hyper-V, there is an issue with Dynamic Memory which is not understood by the fdbserver process, so we have to use a fixed amount of ram for these VM, nothing too hard to solve.

The .NET/C# binding already has support for .NET Core 3.1 and I’m keeping support for .NET Framework, until .NET 5 comes along (which is supposed to “unify” the desktop-only windows world with the .NET core “runs everywhere” world).

I’ve never used the other bindings (java, python, go, …) on Windows itself, so I can’t say much about that aspect, but if they call the same C API that I do, I don’t think that would change anything.

Regarding MSVC/VS support, I’ve only occasionally had to try to build fdb on Visual Studio, so I’m not particularly attached to it. Being able to build the windows version to produce win64 binaries where I just need to run some command is fine by me.

I think if you try to use Visual Studio Code as the IDE, and any build system you want, it would be still a decent development experience on Windows.

KrzysFR · February 12, 2020, 9:38pm

We’ve been “testing” on Windows for years

If there was a significant regression on released windows builds, I’d probably be notified very quickly by my CI servers

Though, since the download page does not list all the recent builds, and there are no nightly builds for the master branch, I’m not able to test the windows binaries until they appear on the download page, and I notice they are here. May if there was a way to have more up to date builds (including nightlies?) builds for all platforms (not only Windows), maybe the time to react to breaking changes would be quicker?

For example, I’m still waiting for this fix to hit the download page, in order to test it.

markus.pilman · February 12, 2020, 9:50pm

Interesting, thanks for your input.

Just a few minor notes and questions:

Visual Studio supports clang. So it is my understanding that you could still use Visual Studio. Visual Studio Code currently doesn’t work well on Windows because cmake doesn’t support C# for non-MSBuild builds (and VS Code uses Ninja) - on other platforms we use mono so there this is less of an issue. But I think this is a solvable problem if anyone wants to develop on Windows with VS Code. Currently only Visual Studio (or msbuild) is supported and a switch to clang probably would not change this.

I guess the main thing that is missing is simulation tests. If you are willing to set up some small architecture that runs some of these simulation tests on Windows I would be happy to help you with that. I assume you don’t run simulation test? If you do that would be awesome!

It also probably wouldn’t be too hard for Apple to provide you a nightly Windows build if you want to do some of this testing. The only question remaining than would be: who would fix bugs that you find like this? Reproducing a Windows failure requires a Windows machine…

Is it ok if I ping you as soon as I have a Windows/Clang build so you could do some minor testing on that and see whether it would in theory work for you? Also to make sure we don’t break the C# bindings… The only Windows machine I have is my gaming PC at home - so the amount of time I can invest into Windows is very limited

KrzysFR · February 12, 2020, 10:54pm

Sure, we have the capacity to run VMs with heavy I/O on Windows, on Hyper-V (multiple NVMe drives, 10gbps between hyper-v hosts). We have also several bare metal clusters that were supposed to serve as fdb cluster and kubernetes for benchmarking and “test subjects” (and by that I mean subject to great abuse like remote PSU shutdown), but we could spare a few to run tests. I’d just need instruction on how to setup things. If you already have VHDX or VMDK already setup, it would be even easier.

Yes, that’s possible. I have a test suite that runs the .NET binding test suite on windows machine, and it will also be used a lot more than before in future products so we will probably see any regression quicker.

I’ve tried a few years back to setup automatic CI with AppVeyor and others, but there was no way to install third-party software on their Windows build images at the time (so basically the only .NET binding tests they run are those that don’t touch the database… not very useful). Looks like there are now solutions to that issue, so we could also run public CI tests in the cloud (not only in our internal infrastructure).

I’ve also started looking at github actions, and it looks like there’s a way to create custom actions that install custom software, though I’m not sure if they have Windows worker images. I’m only using github actions to build the .NET Core binaries currently. (.NET Core · Workflow runs · Doxense/foundationdb-dotnet-client · GitHub)

wohali · February 13, 2020, 12:07am

Awesome! I’m relieved to know that you’re up to speed on these things.

Thanks for the nudge to investigate this more closely. It looks like if you use clang in clang-cl mode, you are correct on #3. My last experience with clang on Windows was that it was itself built as MinGW, so that this has changed is good news. Looks like you might need VS2019; not that it’s directly relevant, but we’re on VS2017 ourselves so far.

Thanks so much for the explanation, that’s very informative, and makes a lot of sense. I’m also glad to see that @KrzysFR is actively investing in this area extensively.

We’re actually just getting to the place where we can have Windows in our CI matrix on a regular basis for Apache CouchDB, pre-FDB. (We ran into many of the same issues that @KrzysFR mentioned with AppVeyor, and Apache’s own internal Windows Jenkins workers.) And as you know, overall work to knit FDB into CouchDB is still ongoing.

I don’t know how much longer it will be before we’re ready to start regular integration CI testing, but once we’ve got a stable prototype CouchDB-on-FDB, we should also be ready by then to start running Windows in the larger CI matrix of platforms we cover. We’d certainly share any problems we found via that process at a minimum.

Does that help?

markus.pilman · February 13, 2020, 4:20am

Yes thanks a lot, this gives me a much better picture. Also just in general: I was investing a lot of time into Windows and just knowing that I didn’t do all of this only to have a checkbox for Windows support but I was actually doing something that people can use is helpful

I’m aware of that. I am not sure whether this is an issue - but I also expect this to be more of a long time effort.

FDB tests actually need way less than that. Simulation tests generally run in a single process (the whole cluster is just simulated). However, if you have a cluster you could also do some performance tests which might be helpful if you find any issues there.

I will come back to this forum thread as soon as I did more investigation. But to wrap up in my view this seems to be the consensus:

C# and Erlang clients need to work natively on Windows
fdbserver needs to run natively on Windows.
People are generally fine with switching the compiler as long as the above works. So if we go to clang we will probably support both compiler for one or two releases to make sure we don’t run into any compatibility issues.
Folks are aware that our Windows testing-story is not great and Christophe and Joan might be willing to invest some amount of resources into this to potentially report bugs and performance issues back to the community.

@alexmiller (pinging you as you’re kind of the community manager for Apple ): I am planning to look into this probably this week-end. The fact that I didn’t get any opinions from Apple folks means you generally agree with the stuff here? Can you please check and if switching compiler is a no-no for you I will not spend anymore time on it.

alexmiller · February 13, 2020, 8:00pm

We’ve had a decent number of MSVC specific issues over time, so I would think most folk are on board with using clang on Windows if possible. Mucking with the CI build images isn’t exactly my favorite maintenance task to do, but I’d also be happier if the windows build worked more like the other platforms.

Topic		Replies	Views
Looking for community support for the Windows build Development	27	3190	January 31, 2021
FoundationDB does not run on Windows Subsystem For Linux (WSL) Using FoundationDB	9	3687	June 2, 2020
FreeBSD support for FoundationDB Development	27	4926	August 1, 2018
FoundationDB Windows Download 404s Using FoundationDB	12	1454	September 30, 2021
Building on Windows Development	9	3359	October 13, 2019

Proposal: Dropping MSVC

Alternative One: Docker

Pros:

Cons:

Alternative Two: WSL

Alternative Three: Use Clang

Related topics