[Java] why is setReadVersion not part of ReadTransaction/ReadSnapshot

gaurav · August 19, 2018, 8:03am

Hi,

Why is setReadVersion() method not part of ReadTransaction/ReadSnapshot ? Is it not logically correct (and useful) to use this functionality for read-only or snapshot operations?

–
thanks,
gaurav

alloc · August 20, 2018, 5:03pm

I think the idea behind “ReadTransaction” and “ReadSnapshot” was that it was supposed to be a read-only view into the transaction’s state. In particular, it should only really be used as, say, the argument into another function that uses the transaction as a context in which to perform reads, but it shouldn’t have too many side-effects, if that makes sense.

In other words, the “correct” way to set the read version of a ReadTransaction is to create the transaction, set the read version, and then pass it to some other function that just uses the transaction as a way to perform reads.

That being said, I could kind of see adding this function to this class (like, it wouldn’t be too out of place). I might want a good motivating example before I could say I’m definitely for it, though.

ajbeamon · August 20, 2018, 6:16pm

Yeah, that was the original intent.

I think one reason we could argue for adding this is that currently it’s not possible to use this function with the Database.read functions, which provide you with a ReadTransaction rather than a full Transaction. For what it’s worth, we do expose options on the ReadTransaction now, which I think may have been done for the same reason.

gaurav · August 21, 2018, 2:02pm

At first I found it a bit odd as Database.read callbacks did not provide a way to setReadVersion on the read transaction. So I duplicated the Database.read with my own read..() methods.

Another related thing I found missing was not being able to call snapshot() on the Database.read method provided ReadTransaction.

Not a big issue, and I could easily work around; but having these methods on ReadTransaction would probably make it play nicer with Database.read methods.

ajbeamon · August 21, 2018, 2:17pm

Calling snapshot() on a read-only transaction should have no effect, except possibly to make reads a bit cheaper. If that’s the case, though, the right thing to do is probably to call snapshot() automatically rather than making it callable on a ReadTransaction.

gaurav · August 21, 2018, 2:20pm

So do you suggest that we do that using Database.run methods? Or should another set of methods - something like Database.readSnapshot be exposed?

(my intent is to make read-only transactions on snapshots - if there is any possibility of this being cheaper)

ajbeamon · August 21, 2018, 2:52pm

For now, I would personally probably use the read functions unless you needed to set the read version manually. If you really wanted the possible optimization of calling snapshot(), you certainly could do that, though I’d likely try to measure whether it really makes any appreciable difference first.

My suggestion above for changing the implementation in the bindings was to have the existing read functions call this automatically. I don’t think there’s any need for having 2 versions of the read functions.

gaurav · August 21, 2018, 3:28pm

Understood - and it makes sense. Thanks for the suggestions!

alloc · August 21, 2018, 4:13pm

While this is true, I know I’ve wanted in the past to do something like take a read-only view of a transaction when performing some operation that only reads but I also want to control what conflict ranges get added. I could have sworn that there was some GitHub Issue that I filed a while ago about improving the ReadTransaction API to include things like snapshot(), isSnapshot(), and something like addReadConflictRangeIfNotSnapshot(), but I can’t find it now.

alloc · August 21, 2018, 4:15pm

I guess in theory, this could break code that does something like:

db.read((ReadTransaction rtr) -> {
   Transaction tr = (Transaction) rtr;
   return null;
});

In some sense, this is a total misuse of the read API, so maybe we’re okay with that breaking.

ajbeamon · August 21, 2018, 4:59pm

To provide a bit more context for what the potential optimization is here, I’ll give a bit more detail about what’s happening. Using snapshot reads prevents adding conflict ranges for reads, which reduces a little the amount of work the client has to do handling that read. The read issued to the cluster is unchanged, and if this is a read-only transaction, then you don’t have to ever send the conflict ranges to the cluster.

Based on the above, what you’re likely to see is a negligible change to read latency (except maybe when reading from cache; i.e. reading keys that were already read or written in the same transaction) and a possibly more noticeable change to client read throughput.

For latency, typically the bulk of this cost is in sending the request and waiting for the storage server to answer it. The amount of work done on the client is typically relatively small in comparison, so optimizing that cost will have a relatively small effect. If you are reading from cache, though, then you won’t need to send a request to the cluster and this optimization might help depending on how much it actually reduces the client-side work being performed.

For per-client throughput, usually the bottleneck here is the fact that there is a single thread (which we call the network thread) doing all of the client-side work for each request. You are therefore limited by how much you can get done on a single core. If by skipping the conflict ranges you can noticeably reduce the amount of work done on the network thread, then you can increase the number of reads you do before that thread saturates.

So just to throw out some fake numbers and a simplified example, let’s say each read takes 10 microseconds of work on the network thread and 500 microseconds to make the request to the cluster and get a response. If we can shave 1 microsecond off the network thread work, then our total latency decreases by ~0.2%. For a cached read, if we can shave off the same 1 microsecond, we would save 10% of our (already relatively small) latency.

In terms of throughput, our client could handle at most 100,000 requests per second (1 second / 10 microseconds). If we reduced this time to 9 microseconds, we could handle at most 111,111 requests per second, or an 11% increase in single-client throughput.

All this is to give you an idea of when this optimization might matter, depending of course on what the actual numbers are. If you find that you are saturating your network thread and it’s costly for you to spin up more clients, then it might be worth it to you. Or, if you do lots of cached reads, then you could potentially see a benefit as well. If you do run a test of this, I’d love to hear the details.

All that said, if there is an appreciable benefit to using snapshot reads, then we’re likely to implement the described change directly in the bindings, which would mean that the optimization will only be relevant until that change lands.

ajbeamon · August 21, 2018, 5:01pm

alloc:

I guess in theory, this could break code that does something like:
db.read((ReadTransaction rtr) -> {
   Transaction tr = (Transaction) rtr;
   return null;
});
In some sense, this is a total misuse of the read API, so maybe we’re okay with that breaking.

I do agree that’s a misuse, but if we wanted to support it then this change should be easy enough to API version.

ajbeamon · August 21, 2018, 5:22pm

Yeah, that’s interesting. Such a function would only be useful if it is sometimes called with a full Transaction, but where this really makes a difference is if you also want it to be callable from a read-only context. It does of course also limit the API that you can use in a full Transaction scenario, but that’s probably not as big of a win (especially since to change that we’d have to expand the limited API).

I suppose there’s an increased potential for confusion with this change in that it may appear that some reads add conflict ranges when they in fact do not (a problem you alluded to by suggesting things like addReadConflictRangeIfNotSnapshot()).

alloc · August 21, 2018, 6:01pm

Yeah, that’s right. The idea would be this would allow the author of a library to do something like create a function whose signature takes a ReadTransaction or ReadTransactionContext. Then if it is given a full transaction, it adds the conflict ranges, but if given a snapshot transaction, it won’t. This is particularly important if one wanted to do something like use snapshot reads within the function. (Something like a “choose one” function that picks a key from from a key range by doing a snapshot read and then returning a random element. Then you want your contract to be that only the key returned gets a read conflict range and only if it is passed a full transaction.)

gaurav · August 22, 2018, 2:19am

Thank you for the details A.J. I concur that the optimization savings are paltry here, and a robustly designed API is more important in this context.

Topic		Replies	Views
Read-only operations on FDBTransaction vs ReadSnapshot Using FoundationDB	1	682	August 8, 2018
Java Bindings: Additional methods on the ReadTransaction interface to manage read conflict ranges FoundationDB Core bindings	0	874	April 13, 2018
Why Read Version is necessary for read-write transactions? FoundationDB Core	12	2022	October 13, 2020
Read-only transactions Using FoundationDB	4	753	July 3, 2022
What's the purpose of `TransactionRead` in official Python bindings Using FoundationDB bindings	2	726	May 10, 2018

[Java] why is setReadVersion not part of ReadTransaction/ReadSnapshot

Related topics