I haven’t benchmarked this exact scenario, but my understanding is that concurrent writes to the same key, no matter if the key/value pair is identical to an existing one (i.e. an effective no-op), or different (i.e. an overwrite), or a new key (i.e. an insert), will always issue conflicts against other, outstanding concurrent transactions (read AND write), for the same keyspace. If you are attempting to keep this data so you can e.g. iterate or list the set of all categories, then this also means any range-scans on the Categories_Dir
will be invalidated and restarted every write (possibly leading to livelock, essentially). Therefore: yes, if you write this empty category value on each insert of an item, you will cause significant contention among concurrent readers/writers on that key, and probably cause significant amounts transactional restarts.
You are better off first attempting to issue a read for that key to see if the category exists, and then writing it, if and only if it does not exist, when you insert some item. This always incurs an extra read penalty in the category keyspace (and a possible write), but reads will not conflict with other reads, and as a result are significantly easier to scale than writes with almost any system – FoundationDB included. Because it is unlikely categories are unique-per-item, and that many items share the same category, it means most of the workload will then be much more read-oriented.
Note that once you have this in place, another issue can occur: hot keys. For example, if the distribution of categories among all items is something like a Zipfian distribution, then the most-popular category among all items will likely be the most popular by a large margin, and therefore this single category key will have many, many read requests directed to it. Because FDB automatically shards keys for you among the servers that hold data, this single key will always be served by a single server that is responsible for it – meaning this single server will have an abundance of requests sent to it, relative to all others. This can result in significant contention and waste/asymmetry of compute resources.
There are some schemes you can take in order to keep this workload read-mostly while not causing so much contention on a single key space, but it depends a bit more on the data model. Of course, the thing is you already had this problem in your original design! A hot write-key is simply a much worse problem than a hot read-key, is all. And you may not hit the point (scale, data size) where these hot read-keys cause significant problems, so you can probably leave that until you start looking at performance numbers.