Hm, it does seem like we could do with some better documentation around this feature, though tuning for write throughput can be a difficult task that may or may not improve with store state caching. I’ve tried to go over both some of the questions you had about write throughput and also some questions about store state caching, but feel free to ask about details that are unclear, etc.
It’s possibly worth noting that FDB performance degrades with larger transactions, which could be partially the problem. It may be worth experimenting with different transaction sizes, but I think we generally encourage transactions to stay under 1 MB.
Hmm, interesting. There a couple of things that happen when a store is opened, but the primary things that happen are (1) the transaction determines the current “read version” (essentially “read timestamp”) from the database and (2) it reads a few keys from the database and potentially writes some to initialize the store. In that first step, the database can sometimes elect to inject latency as a rate limiting tactic. I believe there are also a few issues that can sometimes give that request weird tail latencies prior to FDB 6.3 (which is not yet fully released), so that might be part of it. Caching the store state allows you to skip the second step, so it’s possible that caching the store state won’t help with the spread that much (though that second step can cause hot spotting in the database that can cause the cluster to decide to rate limit, which may be part of the problem).
The main thing to do is to provide the database with a store state cache instance. Something like:
FDBDatabase fdb = FDBDatabaseFactory.instance().getDatabase(config.getClusterFilePath());
MetaDataVersionStampStoreStateCacheFactory factory = MetaDataVersionStampStoreStateCacheFactory.newInstance();
You can also set the cache on an individual record store rather than the database, if you prefer. If you don’t set it on a record store, it will inherit it from the database. Depending on which implementation of the
FDBRecordStoreStateCache you use, there may be additional steps to get the state to actually be cached. In particular:
- The MetaDataVersionStampStoreStateCache requires you to configure any individual record stores by enabling caching on that store by calling
recordStore.setStateCacheable(true) (as an administrative action outside of a normal request flow). There are a few more details in the linked Javadoc. Once this has been enabled, all subsequent reads to that store will be able to use the cached store state for all store openings. This store state does not require using
- The ReadVersionRecordStoreStateCache is really only useful if you’re using
WeakReadSemantics, though it doesn’t require any other configuration, but it will only be able to use a cached store state if the same record store is opened multiple times with the same read version (e.g., a store is opened in transaction1, it’s state is cached, and then the cached read from version from the first transaction is used with transaction2 and the store is opened again). As a result, it’s a lot less versatile, but the fact that it just works for all stores without additional administrative transactions is kind of nice. I’ll discuss
By default, the store is configured with an implementation of the store state cache that always forwards along to the database (i.e., it does no caching), which is I think what’s going on in the code samples you’ve included in your question.
You can also provide an
FDBRecordStoreStateCacheFactory to the
FDBDatabaseFactory instead of providing a
FDBRecordStoreStateCache to the
If you do this before the first
FDBDatabase is created, then every
FDBDatabase will be initialized with an instance of the store state cache built from the factory provided.
Well, primarily, this will only help if you open the same store many times. For a bit of context, as is discussed in the Record Layer paper, one thing the Record Layer lets you do is create many, many record stores with a shared schema, all on the same cluster, which might have some benefits for certain datasets (e.g., separating individual users’ data into different homogeneous record stores). So if you’re doing something like that, and every operation opens a new record store, then the cache will have a very high miss rate, and it won’t be particularly useful.
However, if you’re using the Record Layer more like a tradition RDMS and have one (of a very small number) of Record Stores that is opened with every operation, then caching this information can be very beneficial. In particular, there are roughly two benefits, from what I can tell:
- Decreased latency at store opening time, as the operation can read from local memory instead of a database call over the network. This can probably save a few milliseconds from the request.
- Decreased hot spotting, as without caching, every operation will read from the same handful of keys (storing the store state in the database). These requests will probably end up all on the same few FDB storage servers (even if the cluster is large), and so this becomes a bottleneck. By caching the information, the hot spot can be sidestepped.
So I’d only expect an improvement if (1) you care about shaving off the latency of each request or (2) the store state read is becoming a bottleneck that is influencing your total throughput. AFAIK, the easiest way to know if you’re being limited in the second case is to look at cluster metrics (e.g., see if there are heightened “storage queues” on certain servers), though I’m not exactly sure (there may be easier ways to tell). Note that if you’re not limited, then I don’t think this will cause problems with throughput, just latency.
For more on this problem and how caching the store state can help, see this video from the 2019 FDB summit: https://youtu.be/_mDIhQ1HLcs
As I discussed above, that’s only true if one is using the
ReadVersionRecordStoreStateCache. What that will do is allow multiple transactions to use a cached “read version” (essentially “read timestamp”) without having to get an up-to-date one from the database, with the tradeoff being that one might get stale (but consistent) reads. Perhaps less obviously, this can also increase the “conflict” rates on transactions where writes are performed (as it increases the “window” of time in which other transactions can be committed that cause the transaction to fail).
The main way to use this API is to first enable tracking read and commit versions on the database:
And then, you can set a
WeakReadSemantics when creating a transaction. For example:
FDBRecordContextConfig contextConfig = FDBRecordContextConfig.newBuilder()
.setWeakReadSemantics(new WeakReadSemantics(0L, TimeUnit.SECONDS(1).toMillis())
FDBRecordContext context = fdb.openContext(contextConfig);
This will produce a transaction that will use a cached read version (as long as there’s one at most 1 second old).
If you can tolerate stale reads, this can help decrease the latency of your operations, though not necessarily the throughput (in most circumstances; if the processes in the cluster that hand out read versions are overwhelmed, then this might help with throughput).
You can definitely get better throughput by opening up more transactions in parallel (until your client is saturated, at which point you might need to load balance across multiple clients to improve throughput). In terms of latency, I don’t think this will get either better or worse with cached store states, and maybe it will be more likely that a read version is cached if you’re using