CouchDB considering rearchitecting as an FDB layer?

Welcome!

After quickly skimming the CouchDB mailing list, it seems that a big open question is how to efficiently materialize and query custom reductions. That is (I guess), Couch lets you specify an arbitrary map and associative reduce function in JS, and then keeps a materialized view of the results in a btree. The btree is augmented with partial reductions so that you can then query a reduction over some range of (mapped) keys efficiently, and can also be updated (lazily?) efficiently. But with an external storage layer like fdb you don’t have a straightforward way to augment the btree. Is that right?

I think you could do this on FDB using a data structure analogous to a skip list (rather than a tree). That is, you have an index where (say) a random 1/100 of documents’ mapped keys are stored, together with the reduction of all documents up to the next index entry. Then another with 1/100 of those documents’ keys (so 1/10000 of all documents), etc. You can do basically the same things in terms of queries and updates as with an augmented tree.

The FDB record layer uses a variation of this approach to implement the RankIndexMaintainer (a rank index allows you to quickly find the Nth highest value of a field, or determine the number of records with field values in a range). I looked to see if the record layer has a more closely analogous index maintainer for arbitrary aggregations, but if it does I missed it.

When you do this with simple reduction functions like sum, you can use FDB atomic mutations to update the index entries in the common case that the document being updated isn’t sampled into the index, so there should be very few conflicts even under high concurrency. With custom reduction functions this is a little trickier, but I think it can still be made to work OK. For example: Keep an index of what parts of this index are out of date. When doing a query, compute reductions of the parts that are both relevant and out of date in the query transaction. Then try to write the reductions back to the index in a separate transaction, checking that they haven’t been invalidated by concurrent updates. Under very high contention you will essentially fall back to reducing the top parts of the tree on each query.

(Sorry for posting this here rather than on the CouchDB mailing list where it would probably be more appropriate, but I’m on vacation writing this on my phone, so figuring out how to subscribe seems daunting)