To get good performance, you need to have many concurrent reads in flight at the same time. Cardinality statistics will help to know how many reads to do in each range of keys that represents the index. High cardinality range would require lots of reads, low cardinality range just one read. Split into non-overlapping chunks as appropriate. This is pretty easy since the primary keys at the end of the index key are already sorted for you for each index prefix.
I wrote a basic version just now in Node that does the actual calculation of the index intersection by assigning each primary key an offset for a bitmap using a counter and a hash table (an actual hash table not a JavaScript object ). Then I intersected them with Roaring Bitmap and outputted the result into a Uint32Array. The data itself was two large buffers of 16 byte keys that you could imagine representing keys in FoundationDB.
At two indexes, one with 1 million items and the other with 500k, the entire process took ~500ms or 333ns per processed index entry. A C++ implementation with codegen could do better, but an interpreted model would probably be worse due to cache misses. The majority of the time in this example is taken up by the hash table lookups (100ns for check key, 200ns for write key).
Reading 1.5m keys from FoundationDB would take quite a while and you’d eventually start running up against the transaction size limit. Just keys alone, without any overhead, for that specific example query would be 24mb.