Correct Usage of the FDBRecordStoreStateCache

I have been trying to use the FDBRecordStoreStateCache to increase the write throughput of our processes writing into the cluster.
My initial attempt doesn’t seem to be working as i see the RecordStore creation from a context performs in a very non-deterministic way.
This is how i initialize the Database -

try {

            FDBDatabase fdb = FDBDatabaseFactory.instance().getDatabase(config.getClusterFilePath());
            fdb.setTrackLastSeenVersion(true);
            fdb.setStoreStateCache(fdb.getStoreStateCache());

            return fdb;
        } catch (FDBException e) {
            throw new ConnectionException(String.format("Unable to connect to FDB database due to %s", e.getMessage()), e);
        }

Correspondingly , every time i open a transaction against the database i use the below to get a record store against a context -

FDBRecordStore store =  FDBRecordStore.newBuilder()
                .setContext(context)
                .setKeySpacePath(getKeyspacePath(
                        "Some String identifier"
                        , "Some Store Name"
                        , "Some Env Value"))
                .setMetaDataProvider(cachedRecordStoreMetadata) // "we use local metadata, which doesn't evolve" 
                .setStoreStateCache(getDatabase().getStoreStateCache())
                .setIndexMaintenanceFilter(indexMaintainer.NO_NULLS_FILTERED)
                .createOrOpen();

I’m serially opening transactions and writing a batch of 1000 records which correspond to ~1-2Mb of data per transaction.
i profiled the total time it took to create a record store and varies between ~50ms to > 2000ms .

Few Questions -

  1. I don’t think this is the correct way to use the RecordStoreStateCache. Are there any examples of how to use this correctly ?
  2. In what cases should i expect to see a performance with this Cache ? I read in the API docs that we need to enable WeakReadSemantics, though couldn’t get a way to enable that through any API.
    Does opening multiple parallel transactions benefit here ?

Would be great if anyone can help with the right approach here ?

Hm, it does seem like we could do with some better documentation around this feature, though tuning for write throughput can be a difficult task that may or may not improve with store state caching. I’ve tried to go over both some of the questions you had about write throughput and also some questions about store state caching, but feel free to ask about details that are unclear, etc.

It’s possibly worth noting that FDB performance degrades with larger transactions, which could be partially the problem. It may be worth experimenting with different transaction sizes, but I think we generally encourage transactions to stay under 1 MB.

See: https://apple.github.io/foundationdb/known-limitations.html#large-transactions

Hmm, interesting. There a couple of things that happen when a store is opened, but the primary things that happen are (1) the transaction determines the current “read version” (essentially “read timestamp”) from the database and (2) it reads a few keys from the database and potentially writes some to initialize the store. In that first step, the database can sometimes elect to inject latency as a rate limiting tactic. I believe there are also a few issues that can sometimes give that request weird tail latencies prior to FDB 6.3 (which is not yet fully released), so that might be part of it. Caching the store state allows you to skip the second step, so it’s possible that caching the store state won’t help with the spread that much (though that second step can cause hot spotting in the database that can cause the cluster to decide to rate limit, which may be part of the problem).

The main thing to do is to provide the database with a store state cache instance. Something like:

FDBDatabase fdb = FDBDatabaseFactory.instance().getDatabase(config.getClusterFilePath());
MetaDataVersionStampStoreStateCacheFactory factory = MetaDataVersionStampStoreStateCacheFactory.newInstance();
fdb.setStoreStateCache(factory.getCache(fdb));

You can also set the cache on an individual record store rather than the database, if you prefer. If you don’t set it on a record store, it will inherit it from the database. Depending on which implementation of the FDBRecordStoreStateCache you use, there may be additional steps to get the state to actually be cached. In particular:

  1. The MetaDataVersionStampStoreStateCache requires you to configure any individual record stores by enabling caching on that store by calling recordStore.setStateCacheable(true) (as an administrative action outside of a normal request flow). There are a few more details in the linked Javadoc. Once this has been enabled, all subsequent reads to that store will be able to use the cached store state for all store openings. This store state does not require using WeakReadSemantics.
  2. The ReadVersionRecordStoreStateCache is really only useful if you’re using WeakReadSemantics, though it doesn’t require any other configuration, but it will only be able to use a cached store state if the same record store is opened multiple times with the same read version (e.g., a store is opened in transaction1, it’s state is cached, and then the cached read from version from the first transaction is used with transaction2 and the store is opened again). As a result, it’s a lot less versatile, but the fact that it just works for all stores without additional administrative transactions is kind of nice. I’ll discuss WeakReadSemantics below.

By default, the store is configured with an implementation of the store state cache that always forwards along to the database (i.e., it does no caching), which is I think what’s going on in the code samples you’ve included in your question.

You can also provide an FDBRecordStoreStateCacheFactory to the FDBDatabaseFactory instead of providing a FDBRecordStoreStateCache to the FDBDatabase:

FDBDatabaseFactory.instances().setStoreStateCacheFactory(MetaDataVersionStampStoreStateCacheFactory.newInstance());

If you do this before the first FDBDatabase is created, then every FDBDatabase will be initialized with an instance of the store state cache built from the factory provided.

See: MetaDataVersionStampStoreStateCacheFactory
See: ReadVersionRecordStoreStateCacheFactory

Well, primarily, this will only help if you open the same store many times. For a bit of context, as is discussed in the Record Layer paper, one thing the Record Layer lets you do is create many, many record stores with a shared schema, all on the same cluster, which might have some benefits for certain datasets (e.g., separating individual users’ data into different homogeneous record stores). So if you’re doing something like that, and every operation opens a new record store, then the cache will have a very high miss rate, and it won’t be particularly useful.

However, if you’re using the Record Layer more like a tradition RDMS and have one (of a very small number) of Record Stores that is opened with every operation, then caching this information can be very beneficial. In particular, there are roughly two benefits, from what I can tell:

  1. Decreased latency at store opening time, as the operation can read from local memory instead of a database call over the network. This can probably save a few milliseconds from the request.
  2. Decreased hot spotting, as without caching, every operation will read from the same handful of keys (storing the store state in the database). These requests will probably end up all on the same few FDB storage servers (even if the cluster is large), and so this becomes a bottleneck. By caching the information, the hot spot can be sidestepped.

So I’d only expect an improvement if (1) you care about shaving off the latency of each request or (2) the store state read is becoming a bottleneck that is influencing your total throughput. AFAIK, the easiest way to know if you’re being limited in the second case is to look at cluster metrics (e.g., see if there are heightened “storage queues” on certain servers), though I’m not exactly sure (there may be easier ways to tell). Note that if you’re not limited, then I don’t think this will cause problems with throughput, just latency.

For more on this problem and how caching the store state can help, see this video from the 2019 FDB summit: https://youtu.be/_mDIhQ1HLcs

As I discussed above, that’s only true if one is using the ReadVersionRecordStoreStateCache. What that will do is allow multiple transactions to use a cached “read version” (essentially “read timestamp”) without having to get an up-to-date one from the database, with the tradeoff being that one might get stale (but consistent) reads. Perhaps less obviously, this can also increase the “conflict” rates on transactions where writes are performed (as it increases the “window” of time in which other transactions can be committed that cause the transaction to fail).

The main way to use this API is to first enable tracking read and commit versions on the database:

fdb.setTrackLastSeenVersion(true);

And then, you can set a WeakReadSemantics when creating a transaction. For example:

FDBRecordContextConfig contextConfig = FDBRecordContextConfig.newBuilder()
    .setWeakReadSemantics(new WeakReadSemantics(0L, TimeUnit.SECONDS(1).toMillis())
    .build();
FDBRecordContext context = fdb.openContext(contextConfig);

This will produce a transaction that will use a cached read version (as long as there’s one at most 1 second old).

If you can tolerate stale reads, this can help decrease the latency of your operations, though not necessarily the throughput (in most circumstances; if the processes in the cluster that hand out read versions are overwhelmed, then this might help with throughput).

See: FDBRecordContextConfig

You can definitely get better throughput by opening up more transactions in parallel (until your client is saturated, at which point you might need to load balance across multiple clients to improve throughput). In terms of latency, I don’t think this will get either better or worse with cached store states, and maybe it will be more likely that a read version is cached if you’re using WeakReadSemantics.