What is the most efficient way to generate version stamps in FDB

I want to have a service that generates unique and monotonically increasing numbers (no other constraints).

The most straightforward way of doing this is to write a value into FDB and use transactions to increment this. However, this will cause contention and having a transaction for this seems to be a bit too heavy weight.

The next idea I have is that one could create a dummy-transaction that wouldn’t write anything and commit that (we could achieve that by adding a non-existing key to the write conflict set). This would give me a versionstamp but it would still require two round trips to FDB (one to get the read version and one for the commit) and the commit is still kind of expensive).

Is there something I am missing? What would be the cheapest way of doing this?

1 Like

Hi Markus,

You can use the FDB version stamp feature (as represented in java: https://apple.github.io/foundationdb/javadoc/com/apple/foundationdb/tuple/Versionstamp.html). This is a place holder in a key or value that is filled in during commit using the commit version assigned at the server (the details are described in the javadocs).

Thanks. Yes I am aware of this feature. However this is only useful if the reason you generate the versionstamp is so you can write it into fdb.

For our use-case we don’t need that - so having an additional write to storage is slower (as we would need to follow this with a read) and would put additional pressure to the storage. We just want to order operations externally to FDB.

I was considering to implement a no-op commit. This would just be an empty commit for which we woul generate a version stamp. But this requires code changes I think.

Yeah, because even blind writes get a read version (one of the two round trips you mentioned) to avoid a failed write “resurfacing” after some time, though that shouldn’t matter if your operation is actually . In theory, you could get a monotonic number just from the read version, but those aren’t unique.

You also might be able to get away with batching requests, so that you generate 1 versionstamp from the (relatively expensive) two round trips from FDB, and then have the sequencing server order multiple requests by (versionstamp, server-provided-sequence-number). This could increase throughput, I think, though not latency.

1 Like

If you’re fine with doing code changes for this, then the cheapest way to do this is an RPC to the master for its current timestamp, without acquiring it as a commit version.

Otherwise, I think empty commit is your best bet. We’ve talked about adding a “commit cannot be a no-op” flag before, but I don’t think anyone ever got around to it.

Wouldn’t you need to acquire it as a commit version, though, to avoid someone else getting the same one?

I suppose this could be a special API or, well, a special \xff\xff key or something.

Thanks for these suggestions. If I make any code-changes I want to do the most efficient thing. I tend to agree that using the master would be the same cheapest option, but:

This seems to be a problem. However, the master could also keep a 4 byte number around that it increments as well (and resets whenever the version changes). The other thing I am unsure about is whether it is generally a good idea to allow clients to directly talk to the master.

Another alternative I could think of would be a special call to the proxy - something like getVersionStamp. The proxy would queue all those requests and whenever it finishes a commit it would reply to all of them. This would have the following benefits:

  • Clients don’t need to talk to the master for this.
  • We save the roundtrip to get a read version
  • Should be much faster than a commit, as we don’t need to wait for batching. Instead the latency should be roughly the frequency at which the proxy finishes commits.

Obviously we would then need to handle the corner case where there are no commits for a long time. But in that case we could still commit a dummy transaction.

Any thoughts on this design?

I mean, generally not, and you’d have to have some pretty set QPS limits on this. You also don’t need to acquire it as a read version, but you do need to either have the master increment the version, or do your extra counter bytes at the end trick.

You could also have the proxy batch them until the next commit it does, and then hand out all versions between the previous commit version and this commit version, as the master already hands out version ranges to proxies. If you’re fine with adding a 4 byte counter to the end of that, then that’s infinite unique increasing integers, without burdening the master.

If there are no commits for a while, then there’s still the idle transactions that happen every… 250ms? 500ms? I’m forgetting.

Also, I vote for calling this “sequence” and not “versionstamp” to avoid overloading terms.

1 Like

On the other hand, there may be some appeal in actually implementing this by giving out real versionstamps such that they can interoperate with them.

I’m fine with it returning a data of type Version (or Versionstamp, or whatever clients normally get). I just don’t want someone to post to the forums saying “I’m having trouble with versionstamp operations”, and I then have to ask “which?”.

Also, see Scott Gray’s confusion above.

1 Like

Well, there are already three different things you can do related to versionstamps – set a key, set a value, and get the versionstamp for a transaction. To me, it sounds like the this is just an optimized version of the third one for a “transaction” that doesn’t need to do anything else. I think keeping the naming consistent makes it clear how it can be used to sequence with other versionstamps.

1 Like

I agree with AJ here. It should be versionstamp because it is a versionstamp. Additionally to AJs points: if it is called sequence it will be confusing to people because it is not a monotonically increasing number (and people who used databases like PostgreSQL before will probably expect that they are).

@markus.pilman Did you specifically want to generate version stamps more efficiently (or have this feature built into FDB core), or would an independent service backed by FDB that generates monotonically increasing longs be enough? I understand that one would have to build failover/ha etc if standing up their own service instance.

having an independent service would work but is not something we consider for multiple reasons:

  • We already run FDB everywhere. Spinning up a new service requires a lot of operational work.
  • I don’t think we can build something that will be significantly faster or scale significantly faster - mostly because we will need some form of durability.
  • Providing versionstamps is a feature FDB already provides - so I think having an independent way of getting a versionstamp would not really be a new feature - just a performance improvement over the existing stuff

Sure, makes sense.

I was thinking that if standing up an independent service was not an issue then the service could do something like Percolator oracle and be made very fast (almost no FDB overhead): something like: have a single key that records last consumed “id”; have the service read this id and then reserve next K ids (million or so) by writing back to this key with existing + K; service just hands out the incremental ids from the reserved id-space; a background thread ensures that we never run out of reserved in-memory id space…

You’re right, doing something like that would be efficient. For now I don’t think performance requirements will warrant that effort, but we definitely would consider something like that if FDB wouldn’t be able to provide this feature in a scalable enough way

I had forgotten that Version and Versionstamp are different, as Versionstamp is (Version,Index in Batch). It wouldn’t be that large of a change to GetReadVersion to have the proxy add to each GetReadVersionReply the index of where this read version request was in the GRV batch to restore the uniqueness of the returned value.

Naively though, this would then be broken by read version batching on the client. However, there were previous discussions of making GetReadVersionRequest also communicate the number of batched requests that it represents, so that the Proxy/Ratekeeper can throttle more accurately. (Currently a client batching 20GRVs into 1 request is treated the same by Ratekeeper as a client batching 1 GRV into 1 request.). If this was implemented, then it’d be an easy change to have Proxies maintain a running sum of the number of batched requests as it iterates through and sends replies, so that each batched request then has a monotonic chunk of batch numbers it can use when constructing versionstamps.

I like this idea, although I think a few details would need to be ironed out. For example:

But you can have 6 proxies all of them replying to clients with the same read version. But I think this problem is solvable (the last few bits could be the index of the proxy you’re talking to).

There is a transaction option COMMIT_ON_FIRST_PROXY that allows you to only talk to a special “first” proxy, so the read version + index would be enough to be unique. The downside is this may become a performance bottleneck, since only one proxy is used.