Proof of concept filesystem layer

Hello, I’ve thrown together a proof of concept filesystem layer in C++, using FUSE’s low-level async API.

It seems to handle standard file operations correctly: git is able to operate on top of it. But, it doesn’t do proper permissions checking, and there’s no shortage of TODOs in the code for spots where error handling isn’t fleshed out. It isn’t ready for prime time. As such, I hope nobody will run off and post this around the Internet as “filesystem layer for foundationdb!”

I’m hoping it will spark some discussion on how such a layer ought to be interact with/store data in FDB. My rough outline for what I’m doing (to the extent that I finished implementing it) is in kvlayout.txt.

(On a non-FDB front, this is my first real use of modern C++. I last spent any time with it circa 1999. I’d be very interested in hearing, perhaps via outside of the forum, opinions on the use/abuse of C++ in my code.)

4 Likes

Interesting!

One question I have here is: how do you break stuff down into transactions? Filesystems are typically not transactional but FDB is which might introduce some interesting corner-cases.

For example, if I have one process writing to a file and I do fail -f file on another shell, when will I see changes?

One question I have here is: how do you break stuff down into transactions? Filesystems are typically not transactional but FDB is which might introduce some interesting corner-cases.

Every call FUSE makes into the filesystem creates a transaction, which is used for just that call. (There’s a caveat on that, but it is a bad idea that’ll probably go away.)

Mostly FDB just means I can put a filesystem on top of a KV store, and ignore all issues of concurrency. Where a real filesystem would need to worry about locking structures, I just stuff operations into transactions and follow the retry rules.

For example, if I have one process writing to a file and I do tail -f file on another shell, when will I see changes?

It’s complicated by the kernel and FUSE, but I’d expect the same behavior as any other FUSE filesystem. As each write makes its way to the filesystem, they’ll be committed separately. FDB’s transactional nature is just being used to ensure that writes which span blocks (I currently divide the file up into 8k blocks per KV pair), and any inode metadata updates are safe.

To get more interesting behavior in the case you describe, it might be possible to implement a special ioctl that could let the filesystem know to open a transaction for operations on a file handle, and use it for all operations until a closing ioctl attempts to commit it.

Update:

The filesystem now passes a number of tests:

  • https://github.com/billziss-gh/secfs.test
    • fstest whole suite (I think FUSE is patching over some of the unimplemented functions, so, caveat emptor.)
    • fstools/fsx
    • haven’t tried the rest, yet
  • sqlite can build, and pass its entire test suite, on top of the filesystem
    • so, hey, there you go everybody: SQL on FDB. :crazy_face:

Plenty of other work to do, but this seemed like a good time to mention it again. :tada:

6 Likes