FoundationDB

CouchDB considering rearchitecting as an FDB layer?


(Will Wilson) #1

CouchDB is considering using FDB for all storage and replication in their next major version:

https://mail-archives.apache.org/mod_mbox/couchdb-dev/201901.mbox/<8EBBB4D1-C555-49DB-9A0A-A65D2E89E481@apache.org>

If you read the rest of the discussion in that thread, it sounds like they may actually be doing this.


(Dave Lester) #2

I imagine there are folks from the Apache CouchDB community that may read this. If so, hello! Welcome to the project. :wave:

It looks like there has been a good deal of discussion on the CouchDB project mailing lists, but I’m sure folks here would be happy to answer any questions re: the project and community if we can be of help.


(Adam Kocoloski) #3

:wave:

Thanks! As one of the original authors of CouchDB’s current clustering system I hope that we do pull this off :slight_smile: We’ve got a lot of respect for how FoundationDB has tackled a number of really difficult distributed systems problems in an extensible fashion.

I know folks have already started reaching out with some questions on the forums, thanks for the warm welcome!


(Robert Newson) #4

Hi!

Thanks for the warm welcome! Expect more of the CouchDB folks on these forums in the coming days, in fact I’ll send out a note encouraging them to say hi in this thread.

The CouchDB PMC has a few questions about FoundationDB around governance, so we’ll be picking that up in another thread soon.

B.


(Jan Lehnardt) #5

Hi everyone,

thanks for the extremely welcome…uh…welcome here :slight_smile: — We are still somewhere between the duking-out-if-we-can-do-it and do-we-like-what-that-would-look-like phases and I also foresee that we have a lot of questions coming up and it is great to see to have your support on our journey.

I’m personally impressed what I’ve learned about FoundationDB in the past month or so, so this is all rather exciting :slight_smile:

If you’re at all curious, you can read up in real-time how this all goes on our end: https://lists.apache.org/list.html?dev@couchdb.apache.org most of the [DISCUSS] threads are relevant.

If you’re interested in participating in the discussions, you can subscribe by sending an email to dev-subscribe@couchdb.apache.org

Best
Jan

Vice President Apache CouchDB and CouchDB PMC Chair


(Robert Newson) #6

oops, I guess I didn’t really introduce myself. I’m also on the CouchDB PMC, as is Adam above.

B.


(Dave Cottlehuber) #7

Ditto for the welcome :smiley: - another CouchDB PMCer here. I’m particularly interested in the operational side of things, especially so on FreeBSD, so I’ll be muddling my way through that over the coming months. I see @wolfspider’s PR for v6.0 support and that’s pretty exciting.

A+
Dave


(Garren Smith) #8

Hi Everyone,

Another CouchDB PMC member here. I’ve really enjoyed learning about FDB. Looking forward to getting more involved in the community.

Cheers
Garren


(Joan Touzet) #9

Hi FoundationDB people,

Yet another PMC member - hi there! We’re a tightly knit team, and I join the rest of everyone above in thanking you for such a warm welcome!

I’ve been very busy as of late and haven’t had as much time to catch up on FDB stuff, but a couple of short calls have been very illuminating.

My CouchDB focus tends to be release engineering (packaging, containers, desktop installers, etc.), documentation, operationalisation (like Dave above) and testing. Very interested in hearing about plans and improvements in all of these spaces to meet our current needs.


(Seph Gentle) #10

FWIW that little collaborative editing demo I posted about in another thread is running on freebsd using @wolfspider’s fork. Operationally it seemed identical to running fdb on other platforms, although obv it’ll be better when fdb is in ports rather than needing to be built manually from source.


(David Scherer) #11

Welcome!

After quickly skimming the CouchDB mailing list, it seems that a big open question is how to efficiently materialize and query custom reductions. That is (I guess), Couch lets you specify an arbitrary map and associative reduce function in JS, and then keeps a materialized view of the results in a btree. The btree is augmented with partial reductions so that you can then query a reduction over some range of (mapped) keys efficiently, and can also be updated (lazily?) efficiently. But with an external storage layer like fdb you don’t have a straightforward way to augment the btree. Is that right?

I think you could do this on FDB using a data structure analogous to a skip list (rather than a tree). That is, you have an index where (say) a random 1/100 of documents’ mapped keys are stored, together with the reduction of all documents up to the next index entry. Then another with 1/100 of those documents’ keys (so 1/10000 of all documents), etc. You can do basically the same things in terms of queries and updates as with an augmented tree.

The FDB record layer uses a variation of this approach to implement the RankIndexMaintainer (a rank index allows you to quickly find the Nth highest value of a field, or determine the number of records with field values in a range). I looked to see if the record layer has a more closely analogous index maintainer for arbitrary aggregations, but if it does I missed it.

When you do this with simple reduction functions like sum, you can use FDB atomic mutations to update the index entries in the common case that the document being updated isn’t sampled into the index, so there should be very few conflicts even under high concurrency. With custom reduction functions this is a little trickier, but I think it can still be made to work OK. For example: Keep an index of what parts of this index are out of date. When doing a query, compute reductions of the parts that are both relevant and out of date in the query transaction. Then try to write the reductions back to the index in a separate transaction, checking that they haven’t been invalidated by concurrent updates. Under very high contention you will essentially fall back to reducing the top parts of the tree on each query.

(Sorry for posting this here rather than on the CouchDB mailing list where it would probably be more appropriate, but I’m on vacation writing this on my phone, so figuring out how to subscribe seems daunting)


(Iilyak) #12

Hi!

Thanks for the warm welcome! One of the Apache CouchDB committers here. Looking forward to learn more about FDB and be involved in the community.

Best regards,
iilyak


(Robert Newson) #13

Hi Dave,

Yes, it sounds like you understand the problem correctly. I summarised it in a recent post on couchdb-dev as “The current approach to reduce only works because we control the writing of the b+tree nodes, including when they’re split, etc, so we’re able to maintain intermediate values on the inner nodes as the data changes over time.”.

The skip list idea is intriguing, thanks for expounding on it. In CouchDB, while it is possible to write a Javascript reduce function, this is almost always done incorrectly. CouchDB’s reduce is only really suited to aggregations (sum, count, min, max), which reduces (sorry) the size of the problem we really need to solve. You are right to focus on the invalidation aspect, that’s where it gets tricky to a) get it right b) make it efficient.

If you (or anyone) wants to join the mailing list, just email dev-subscribe@couchdb.apache.org and you’re in.

B.


(Kyle Snavely) #14

Hello all! Long-time CouchDB community member here (no fancy PMC status, sorry). My day to day focus is on build/release/performance engineering for IBM Cloudant.

I look forward to contributing and wanted to take the opprotunity to introduce myself. Thanks for the great database, documentation, and community!

Kyle


(Jesse Bennett) #15

I checked out the site, that’s really cool! I was looking around trying to figure out how things were done and was very impressed. I had been scrawling down ideas in a notebook about a week ago and there it is already implemented- what I was questioning if that is possible or not. Right now I’m testing the Java client which seems to be working and next is the Go client.