I have a usecase to optimize a single large transaction which add/update/delete more than 10K keys. Value size are about 1KB with a upper bound of 32KB. Key size are typically 64bytes. The biggest problem is the number of keys which are part of the transaction.
The bottleneck is mostly due to the business logic involved but I am trying to figure out the best way to optimize it.
Business Logic - For each key lookup the previous instance from db (if exists) and check update count between old and new object. Typical flow is that the client does a plenty of business processing so the old reads could be from a different read version. So we need to do a second level check before writing to the database.
Transaction Mutex - For certain type of transactions we need to serialize the writes, so we use a global mutex which gets updated for every write. This is how we serialize across multiple clients.
Whole transaction is run by a single thread in a serial fashion.
Each key is stored twice in the database , One with a very small piece of metadata and other with actual value. Metadata is what is pulled for the checker. This includes the update count.
The keys are not ordered. I cant do a range lookup to fetch those keys.
Its common to have 100 million keys in a production database.
Measured latency - The end to end latency observed is about 2.5 seconds for 10K keys. Of the 2.5 seconds about 1.5 seconds is taken by the checker function which looks up previous value of the new key.
Questions -
- In java binding is the transaction Object thread-safe ? My idea is to spin multiple threads with the transaction object which will work on a set of keys in parallel. If its not thread-safe then create multiple txn and set the same readversion for lookups and run writes in the main transaction.
- 99% of the keys will be insert - which means that key shouldnt exist in the database. But still I have to lookup this key to be 100% certain no-one hasnt created this as part of different transaction.
Question - Is looking up a non-existent key more expensive than a key which is recently written?
Is there any technique I can use to distinguish between create and update in an optimal way ? - I suspect that since we read a very large key set as part of the same transaction the resolver has to validate a large set of keys for read-conflict. Is my understanding correct ? Any way to optimize it ?
- Will disabling RYW cache improve write performance ?
Thanks for your help.