API for Transaction Size

It seems like an API for transaction size would be generally useful, especially as there are a few pathologies that FDB runs into with larger transactions, so it’s often useful for clients to know how large their transactions are. (See: Issue #1466: Better handling of large transactions.)

There are a few proposals on what might be some work that could help clients with that. These are primarily designed around how one would improve on doing a bunch of work in a series of transactions (like, say index building):

  1. A generic “get transaction size” API that can be called at any time. It would hopefully use the same metric as transactions do upon commit so that the user could do something like while(tr.getSize() < some_threshold) add_more_work(tr);. Then their loop stops when they have done as much as they are willing in a single transaction. (Say, 1 MB or 750 kB.) The tricky thing (as I understand it) with adding this is that such an API would need to handle things like updating its understanding of transaction size every time a clear range or even a get range happens, which might be somewhat tricky. It could also recalculate the value from the transaction’s data structures, but that could be expensive if it’s something that is done after, say, each time a significant data segment is added.
  2. A “get committed transaction size” API that can only be called after a transaction has been committed. This would allow someone who’s doing a bunch of work to then inspect how big their already committed transaction was. Then if the transaction exceeds some value, they can themselves apply a limit to do less work in the next transaction. This is obviously less good then knowing before one commits (or before one adds more work), but it might be easier to implement as the transaction already has to calculate its size before it commits, so it would then be as simple as just “remembering” what the value was.
  3. An option for artificially limiting the size of a transaction to something less than the official limit. (This was proposed by @ryanworl in Issue #1466.) This is probably easier to implement than the other two options insofar as it would be taking a constant and making it something that the user can configure. (Possibly on a database-wide or maybe transaction level–or both?) This would then show up as client-side errors, which would keep the error from hurting the server.

If people have thoughts on these proposals, that would be good to know. I think the first two of these would require an update to the libfdb_c API and therefore also all of the bindings. The third can be accomplished solely through adding more options in vexilographer (from an API point of view).

It seems that the option described in item 3 already exists internally, and I think I would be in favor of exposing it as an option in our API, regardless of what is done with respect to the other 2.

For item 1, is there any potential value in a naive function that gives an upper bound for your transaction size without considering the potential coalescing that can take place between conflict ranges or mutations?

I think so. The use cases that I had in mind would all be using this as a kind of rough “let me know if I’m doing too much” kind of thumb in the wind kind of thing, so it wouldn’t need to be exact, and overestimating is probably better than underestimating (to a point).

A dummy question:
Is the transaction handler destroyed immediately after commit?
If yes, how do we get the transaction info. after commit?
If no, when is a transaction handler destroyed?

AFAIK, the transaction isn’t destroyed after commit (automatically, at least). There’s an explicit destroy method on transactions that the bindings call when they are done with it.

Note that the API proposed in the quoted section is essentially how we already expose the committed version to the user (though not the commit versionstamp).

The new 620 API adds fdb_transaction_get_approximate_size which returns a future:

    // This function intentionally returns an FDBFuture instead of an integer directly,
    // so that calling this API can see the effect of previous mutations on the transaction.
    // Specifically, mutations are applied asynchronously by the main thread. In order to
    // see them, this call has to be serviced by the main thread too.
    DLLEXPORT WARN_UNUSED_RESULT FDBFuture*
    fdb_transaction_get_approximate_size(FDBTransaction* tr);

I already had something similar to option 1, a property on transaction objects that tried to keep a count of the mutation size done so far. This property is being used in a lot of while(…) loops that try to fill up transactions up to some threshold before committing and starting a new one (for bulk inserts, things like that).

The new method is async, which makes it a bit awkward to drop into older code that was using a property before, but the comments seems to indicate that the future will only touch the client’s thread, and not the network, so the latency may be small enough to ignore…?

Will the future always complete immediately - like some already existing futures such as fdb_database_create_transaction - or is it possible to have a significant latency to obtain the value? Does this depends if there are concurrent writes on the same transaction handle or not?

I was maybe wondering exposing the result as a ValueTask<long> in .NET, which is a task-like object optimized for futures that are almost always already completed (like cached values or when reading paged resultsets from the network), but if the value is not immediately available, it would be better to expose as a Task<long> (optimized for results that are not yet available)… though it comes with a significant memory and CPU overhead…

Should I may keep an even more approximate property that is not async (for algorithm that want to fill up a transaction) and only use fdb_transaction_get_approximate_size when I want a more precise value?

The function needs to serialize onto the network thread behind other previously run operations in order to count their effects, but it won’t communicate at all with the cluster. It should return fairly quickly unless you are keeping the network thread very busy, in which case you’ll be experiencing extra latency on all future returning operations.