Torture test gRPC facade w/ Record Layer

Had been trying to find a good torture test for the the Lionrock gRPC facade for FoundationDB and decided to go down the path of building a gRPC Record Layer client: (it currently depends on a snapshot build of the record layer from my PR)

  1. Needed to refactor RL to support non-native FoundationDB Java Client - Refactor FDBDatabase and FDBDatabaseFactory by panghy · Pull Request #1344 · FoundationDB/fdb-record-layer · GitHub (currently open) in the record-layer
  2. On my private branch, swapped the real FDBDatabaseFactory to the gRPC version - Hard-wire to test the record-layer with a locallly running lionrock s… · panghy/fdb-record-layer@ccc6d16 · GitHub
  3. Ran the test, fix some bugs, repeat =p
  4. Only the following tests would fail (expectedly):
    • FDBDatabaseImplTest.performNoOpAgainstFakeCluster()
    • FDBRecordContextTest.timeoutTalkingToFakeCluster()

Apart from the odd transaction_too_old exceptions from a couple of large tests (they are flaky due to the database responding slower than the real thing), all of the tests passes!

Some flaky ones include:

  • TextIndexTest.[4]Index{‘Simple$text_suffixes’, text}#4
  • OnlineIndexerBuildRankIndexTest$Unsafe.addWhileBuildingParallelRank

With that, one can connect the RL library to an FDB cluster over gRPC without native binaries. The tests passes normally in about 6-7 minutes but takes about 14-15 minutes via gRPC. That’s mostly from the fact that there is no client-side caching (RYW or otherwise) and every get/getRange call needs to hit the network.

Hi Clement!

Yes, thanks for the contribution for the refactoring of the FDBDatabaseFactory and FDBDatabase! I saw that @MMcM made some comments, and I’m doing a pass as well (I’ll try to get you some feedback over the next day or two). I’m not too surprised about the performance difference – as you noted, without a RYW cache sitting on the client, there is a lot of back-and-forth that wouldn’t have otherwise been necessary and, obviously, each of these low level calls now involves a proxy through the RPC service and, when you get to large chained pipelines of work (for example, in complex queries), the latencies will be magnified.

There had previously been some discussion about a formal RPC interface for FDB (FoundationDB RPC Layer Requirements · apple/foundationdb Wiki · GitHub) and I had had similar concerns over the resulting performance without a RYW cache in the client. Even then, I think that for really complex operations, it would require something akin to a Record Layer Service to allow the complexity to be pushed closer to the database.

1 Like

Thanks! Yeah, I was mulling about actually writing the cache in the client itself but that means keeping track of read ranges and layering that with mutations (and tracking keys that are invalid if read) given that one can turn RYWs on and off. That seemed a bit too dangerous (but we now have quite a bit of stress testing that can be applied against it at least =p).