API for Transaction Size

(Alec Grieser) #1

It seems like an API for transaction size would be generally useful, especially as there are a few pathologies that FDB runs into with larger transactions, so it’s often useful for clients to know how large their transactions are. (See: Issue #1466: Better handling of large transactions.)

There are a few proposals on what might be some work that could help clients with that. These are primarily designed around how one would improve on doing a bunch of work in a series of transactions (like, say index building):

  1. A generic “get transaction size” API that can be called at any time. It would hopefully use the same metric as transactions do upon commit so that the user could do something like while(tr.getSize() < some_threshold) add_more_work(tr);. Then their loop stops when they have done as much as they are willing in a single transaction. (Say, 1 MB or 750 kB.) The tricky thing (as I understand it) with adding this is that such an API would need to handle things like updating its understanding of transaction size every time a clear range or even a get range happens, which might be somewhat tricky. It could also recalculate the value from the transaction’s data structures, but that could be expensive if it’s something that is done after, say, each time a significant data segment is added.
  2. A “get committed transaction size” API that can only be called after a transaction has been committed. This would allow someone who’s doing a bunch of work to then inspect how big their already committed transaction was. Then if the transaction exceeds some value, they can themselves apply a limit to do less work in the next transaction. This is obviously less good then knowing before one commits (or before one adds more work), but it might be easier to implement as the transaction already has to calculate its size before it commits, so it would then be as simple as just “remembering” what the value was.
  3. An option for artificially limiting the size of a transaction to something less than the official limit. (This was proposed by @ryanworl in Issue #1466.) This is probably easier to implement than the other two options insofar as it would be taking a constant and making it something that the user can configure. (Possibly on a database-wide or maybe transaction level–or both?) This would then show up as client-side errors, which would keep the error from hurting the server.

If people have thoughts on these proposals, that would be good to know. I think the first two of these would require an update to the libfdb_c API and therefore also all of the bindings. The third can be accomplished solely through adding more options in vexilographer (from an API point of view).

(A.J. Beamon) #2

It seems that the option described in item 3 already exists internally, and I think I would be in favor of exposing it as an option in our API, regardless of what is done with respect to the other 2.

For item 1, is there any potential value in a naive function that gives an upper bound for your transaction size without considering the potential coalescing that can take place between conflict ranges or mutations?

(Alec Grieser) #3

I think so. The use cases that I had in mind would all be using this as a kind of rough “let me know if I’m doing too much” kind of thumb in the wind kind of thing, so it wouldn’t need to be exact, and overestimating is probably better than underestimating (to a point).

(Meng Xu) #4

A dummy question:
Is the transaction handler destroyed immediately after commit?
If yes, how do we get the transaction info. after commit?
If no, when is a transaction handler destroyed?

(Alec Grieser) #5

AFAIK, the transaction isn’t destroyed after commit (automatically, at least). There’s an explicit destroy method on transactions that the bindings call when they are done with it.

Note that the API proposed in the quoted section is essentially how we already expose the committed version to the user (though not the commit versionstamp).