Experience of combining FoundationDB with a MPP data warehouse

Hi,

First thank you for open source foundationdb.

Deepgreen is our MPP (massive parallel processing) data warehouse. We have a flexible plugin system to access other database or storage systems. Combining the two, FoundationDB can be used to absorbing short read/write transactions and Deepgreen brings full SQL capability for real time analytic tasks.

Deepgreen can also be used as bulk load tool to fdb – on a virtualbox VM running on my laptop, I can insert 10K records (TPCH lineitem, each record is about 20 bytes key, 100 bytes value) per second. Reading from fdb is about 300K records per second. Fdb is configured to use SSD. These numbers are inline with the results published by other people.

We also do in database analytics. We implemented BSP (bulk sync parallel, that is google pregel) on top of fdb. As example, with some reasonable effort of combining/reducing messages, we can run bellman-ford smoothly. The bsp implementation is open source, available from https://github.com/vitesse-ftian/dggo/tree/master/vitessedata/bsp

The problems, we run into during the exercise, are all minor compared to the benefits, but here they are. The go binding is not go get-able. We have to work around the 5-seconds transaction limit, therefore, the transaction model viewed from Deepgreen is kind of strange – but roughly it is read committed. The master branch of go binding does not work with the lastest release 5.1, so we have to hack around it. Also, the documentation for deploying a cluster is not so easy to follow. :slight_smile:

Overall, great product, impressive work!

Thanks,
Feng

1 Like

That’s great to hear! Thanks letting us know how things went, and I’d also be happy to read any further updates you have about how running FoundationDB has gone for you. Is any of the resulting work something that is open and we can point people to?

@alloc ?

I don’t believe it’s a thing we’ve had a lot of people run through recently, so I’m really not surprised that there’s rough edges. If you have any concrete descriptions of what you found hard to follow or what could be improved, I’d appreciate it. :slight_smile:

You rang?

So, this is (essentially) intentional. Each binding has what’s known as a “header version”, and they will refuse to load versions of the FDB client library with an older header version because they can’t guarantee that there aren’t methods in the newer version that don’t exist in that older version. (This is to support adding new methods to fdb_c.) I believe that you would run into something like “symbol doesn’t exist” error when loading if we didn’t do this for all new functions, though it’s also possible that that could be gotten around with with some fancy footwork that was aware of what functions were added with each header version. Nevertheless, it is easier to just throw an error if someone gives you a library that it is too old, and that’s what is done.

(Note: older bindings are compatible with newer clients–they only reject an fdb_c if the header version is too old, not if it’s too new. This means that we have to keep old functions around essentially as long as we support bindings with that version, but it means that the client can freely update only their fdb_c version and leave everything else untouched.)

To get back to the go bindings, it means that we need to have a header version somewhere–in particular it’s here: fdb.go, line 119. This is on master, and as you can see, it uses header version 520, but the max supported version on release-5.1 is 510–hence the error. Now, it could be changed to 510—but what that would really mean is that the go bindings on master would be the 5.1 version of the bindings, and any other changes to (for example) expose new functionality in versions of FDB newer than 5.1 couldn’t be on there, or it will run into errors. (For example, you might try calling tr.AppendIfFits() on a version of the go bindings on master that have been made to work with the current 5.1 release and see what happens.) But that would mean that we keep the 5.1 code for FDB on the release-5.1 branch…except the go bindings where we use the master branch…until some arbitrary point in the future where we switch master over to be 5.2 and people’s code stops building correctly because they are still on 5.1, but we’ve decided they should really be on 5.2 by now. Ugh.

For that reason, the go bindings for release 5.1 can be found on the release-5.1 branch. (In general, the bindings as of release x.y.z should be on the release-x.y.z tag.) The fdb-go-install.sh script is supposed to make some of that easier (as it downloads the FDB source, generates some needed generated header and go files, and switches it to a configurable tag, though I definitely think that it could be improved to, for example, do a better job not regenerating those header files if they already exist). It could also probably be improved as go develops a more mature package versioning system.

By the way, what was your hack to get it to work? Was it to just update the header version in that function?

Thank you for explaining. If it is intentional/by design, that is fine.

What I did, is to just go get the binding, then cd to the src in ~/go (GOPATH) and git checkout the release 5.1 branch. Then client will build. For the other machines in the cluster, I just scp -r ~/go and it works.

Interesting. Well, that makes sense. Did you have the needed include files (foundationdb/fdb_c_options.g.h, for example) in the default include location, too? That’s another build-time dependency of the go bindings (well, of the bindings).

No i did not touch include path, seems go build can find the default header installation location. i am on unutun, fdb is installed from the .deb

1 Like