FoundationDB

How to minimize transaction conflicts on atomic operations?


(Wolf Dan) #1

Hello! I hope you’re having a great day ^^

First let me explain the use I’m giving to the atomic operations in my project (exactly the ‘add’ operation).

I’m working on a graph layer, I’ve a global counter that gives every node a unique integer id, because of that transaction conflict while getting a new id during a node creating is prominent (which makes bulk inserts longer than expected) so I’d like to know if there’s any way to minimize or completely avoid this issue

The implementation details are described here in case you want to know about more about it (is Elixir code, so I’ve written this implementation notes to make it easier to understand)

A solution that I’m already working on is to make a “node based” counter instead of a global one so every “node_name” has its own counter, that minimize the key conflicts, but I want to know a better solution for it

I’ve read here in the forums that the versionstamp could work as a unique identifier, but it is quite big (always 12 bytes) compared to an ‘integer’ implementation, which in my “chunk” implementation could be a bad practice since every unique id is part of every single property key, the index key and the relationship node key, so I wonder if is a good idea to change the atomic operation to versionstamp

I’m open to change all the implementation in order to make it efficient ^^, so please let me know if you have a better solution!

Thank you!


(Ryan Worl) #2

Check out the high-contention allocator used by the directory layer. Your counter will not be sequential, but it will be an integer that is reasonably small.

Here is an Elixir implementation: https://www.activesphere.com/blog/2018/08/05/high-contention-allocator


(Christophe Chevalier) #3

Technically, a versionstamp is 80 bits (10 bytes) followed by an optional 16-bit per-transaction counter. Some bindings only expose the 96-bits combined stamps, but you could potentially reduce the size a bit, though it would still be larger than a conventional integer (3-5 bytes if using the tuple layer)

If you are already reading the counter key in the same transaction that increments it, then it is not really an atomic increment: it will be automatically downgraded to a regular write by the client when you commit (since it already knows the value, it will compute the incremented value locally and update the counter to that value).

This means that “atomically” incrementing ID counters does not give you any performance boost against the naive implementation, and of course you get the same conflict rate.

If you don’t want to pay the cost of having 10-12 bytes UIDs, then you have to use high contention allocators, like the one @ryanworl linked to, or maybe find a way to pre-allocate ranges of UIDs at the app level (each server allocates a range of let’s say 1000 ids and distributes ids locally), which can be tricky to get right.


(Wolf Dan) #4

Oh! Thank you so much, I remember have seen it, but I never realized that I can use it for my use case, I just tried it and it works without problems, is exactly what I was looking for! (it made drop half of the time on my bulk insert operation! It’s amazing!)

Thanks for the clarification!

So in order to keep it atomic I just need to make the operation but do not expect or get the result inside the same transaction, right? I can do reads or writes on other keys withing the same transaction?

Doing that way will minimize the key conflict? I want to implement a counter to track the amount of edges that a node has on a determinate property name, since I don’t need to get back the counter.


(Christophe Chevalier) #5

Yes. If you were updating a total counter of items in a table, that would work fine, but if you need the updated value from the same transaction (for uids or primary keys) then this will not be fast because of read conflicts.

Under the hood, if you don’t read the key and only atomically increment it, the client sends “increments that key” to the cluster who will perform the operation later in the pipline. If you read the key and increment it, then you would still conflict if the value changes anyway, so the client can convert the operation into a regular read/write. Note that if in the same transaction you atomically increment the key twice, you’ll see that it only send one increment for +2 (or +N, basically it merges the operations into one).

Yes, that’s one the scenario that atomic operations were designed for: counters that are read by other transactions (where before atomic operations existed, back in I think v2.x you had to also use high contention counters).

Versionstamps are designed to help solve (among others) the problem of uid counters when you need something that increments (with gaps), but at the cost of larger keys. This helps a lot solving the perf issues with queues (like message queues, ordering of events, …).

In your repo, it looks like the uid will always be the first parf of the key, so you may be saved by the future storage engine being worked on (talk about this: https://www.youtube.com/watch?v=nlus1Z7TVTI&list=PLbzoR-pLrL6q7uYN-94-p_-Q3hyAmpI7o&index=10), that will introduce key prefix compression: the versionstamps are currently implemented as the cluster’s commit version (large number that increments by ~1,000,000 per second), so the first 4-6 bytes will not change frequently (every day? week?). With prefix compression, this means that the actual cost (on disk) of versionstamps will be down back to the 4-6 bytes (so same as integers). I’m not sure if the prefix compression scheme will be used for network transmission, but at least you will pay less for storage.

This won’t help much if uids are not the first part of the key or in the value (so basically indexes, or internal pointers to other entities).

Another point: if you have a location that is updated by each mutating transaction (a counter), then you will create a “hot spot” for that particular storage process. You could try to break the counter into multiple keys far apart in the key space so that they end up on multiple storage nodes, but that could be somewhat difficult to achieve. This impacts global counters as well as voting tallies like an index (..., 'best_girl_votes', <character_uid>) = [32 bit counter]: the most popular characters will be mutated more frequently than some others, and may create hotspots. I think there is also a talk in the playlist that talks about hotspots and how to avoid them.

Sooo… maybe versionstamps + hot stops mitigation could still be a solution, at least long term? You’d need to do some calculations and cost breakdown for this. If you go that route, take a look at VersionStamp uniqueness and monotonicity for some caveats.


(Ryan Worl) #6

There is an issue now for the important caveat mentioned in that thread: https://github.com/apple/foundationdb/issues/1073

Also, the record layer paper mentions how Apple worked around the non-uniqueness across clusters. Search for the word “incarnation” in the paper for details. It isn’t directly related to that caveat (although may have been?), but rather for moving data between clusters.


(Wolf Dan) #7

Thanks for the explanation! Makes way more sense that way :slight_smile:

I’ll stick to the high-contention allocator for the moment, I’ve noticed that in the way my layer works consume a lot of storage, adding more bytes will make it even worse…

This is quite interesting, tho is easier if the votes just increment, but if someone rejects his vote things become much more complex

hahahaha nice reference :smiley:

Which paper are you refering to? ^^

Anyway, again thank you so much for all the help!


(Ryan Worl) #8