How to do status check using the Java API

Dear FoundationDB community,

I am wondering how to do a status check using the Java API. I look and grep through the Javadoc and couldn’t find anything.

My issue is: with a cluster file that does point to a machine with no FDB server running, a call to fdb.openDatabase() will succeed. Other operations will hang. Process hangs more specifically on the call to join here [1].

Here is a simple demo which hangs forever [2].

So,

  1. Should I be able to do a status check with the Java API?
  2. Are there best practices to avoid the mentioned issue?

Thank you very much,
Marc

[1] https://github.com/apple/foundationdb/blob/master/bindings/java/src/main/com/apple/foundationdb/FDBDatabase.java#L53 .
[2] https://gist.github.com/marctrem/2cea4118914b28b26fc6fe29b503622a

We typically just fetch a readversion (tx.getReadVersion().get()) to confirm that the cluster is alive (it also tells you if its busy).

You can simply query the following key: \xff\xff/status/json

This is a special key that returns the same json-string as fdbcli --exec "status json" on the command line.

A small caveat: this will not execute a get operation but call an internal API call and this call isn’t very cheap. Therefore, if many clients execute this very often, your ClusterController will have significantly higher CPU. I would therefore recommend to also set the STATUS_MIN_TIME_BETWEEN_REQUESTS knob for all your fdbserver to something >0 (0 is default). This will increase the response time of your call by that value but it will reduce CPU usage as fdb will in this case batch requests together.

2 Likes

Still hangs when there is no server to connect to. :confused:

Works! Thank you for mentioning the caveat.

On another note, is it just me or it would be nice for the future to exceptionally return when the database is not available? Do I at least make sense? For now I guess I’m just gonna go with a orTimeout(timeout, unit) on the returned CompletableFuture and just assume that if the future didn’t complete before the deadline, there must have been a problem with the DB.

Thanks you very much!

Yeah, we typically set timeouts on getting the read version.

1 Like

I could see the argument here. The issue is that it’s not really possible to tell whether the database is unavailable because the server is down or the network is just taking a long time to respond. So the best that we could do is timeout and throw an error like “unable to connect to the database in a certain amount of time” (which is what the status json call does).

That being said, you generally should be able to specify timeouts for transactions using the “setTimeout” transaction option. I could be wrong, but I think it’s possible that the “getReadVersion” command doesn’t honor that timeout if specified, which if that’s true, then that’s a bug. But by default, our timeout is “unlimited”, which I suppose one could disagree with. One issue with using our timeouts, though, is that the error that gets returned is “transaction_cancelled”, which can also happen if you cancel the transaction yourself (rather than something like “transacion_timed_out”). So, your mileage may vary.

2 Likes