How to do status check using the Java API

marctrem · June 29, 2018, 1:27am

Dear FoundationDB community,

I am wondering how to do a status check using the Java API. I look and grep through the Javadoc and couldn’t find anything.

My issue is: with a cluster file that does point to a machine with no FDB server running, a call to fdb.openDatabase() will succeed. Other operations will hang. Process hangs more specifically on the call to join here [1].

Here is a simple demo which hangs forever [2].

So,

Should I be able to do a status check with the Java API?
Are there best practices to avoid the mentioned issue?

Thank you very much,
Marc

[1] https://github.com/apple/foundationdb/blob/master/bindings/java/src/main/com/apple/foundationdb/FDBDatabase.java#L53 .
[2] https://gist.github.com/marctrem/2cea4118914b28b26fc6fe29b503622a

panghy · June 29, 2018, 4:21am

We typically just fetch a readversion (tx.getReadVersion().get()) to confirm that the cluster is alive (it also tells you if its busy).

markus.pilman · June 29, 2018, 2:38pm

You can simply query the following key: \xff\xff/status/json

This is a special key that returns the same json-string as fdbcli --exec "status json" on the command line.

A small caveat: this will not execute a get operation but call an internal API call and this call isn’t very cheap. Therefore, if many clients execute this very often, your ClusterController will have significantly higher CPU. I would therefore recommend to also set the STATUS_MIN_TIME_BETWEEN_REQUESTS knob for all your fdbserver to something >0 (0 is default). This will increase the response time of your call by that value but it will reduce CPU usage as fdb will in this case batch requests together.

marctrem · June 29, 2018, 9:32pm

Still hangs when there is no server to connect to.

Works! Thank you for mentioning the caveat.

On another note, is it just me or it would be nice for the future to exceptionally return when the database is not available? Do I at least make sense? For now I guess I’m just gonna go with a orTimeout(timeout, unit) on the returned CompletableFuture and just assume that if the future didn’t complete before the deadline, there must have been a problem with the DB.

Thanks you very much!

panghy · June 29, 2018, 9:46pm

Yeah, we typically set timeouts on getting the read version.

alloc · June 30, 2018, 7:28pm

I could see the argument here. The issue is that it’s not really possible to tell whether the database is unavailable because the server is down or the network is just taking a long time to respond. So the best that we could do is timeout and throw an error like “unable to connect to the database in a certain amount of time” (which is what the status json call does).

That being said, you generally should be able to specify timeouts for transactions using the “setTimeout” transaction option. I could be wrong, but I think it’s possible that the “getReadVersion” command doesn’t honor that timeout if specified, which if that’s true, then that’s a bug. But by default, our timeout is “unlimited”, which I suppose one could disagree with. One issue with using our timeouts, though, is that the error that gets returned is “transaction_cancelled”, which can also happen if you cancel the transaction yourself (rather than something like “transacion_timed_out”). So, your mileage may vary.

Topic		Replies	Views
Failed to retrieve machine-readable status via the FoundationDB Java Client API Using FoundationDB bindings	3	1235	March 11, 2019
FoundationDB Status as JSON Development	0	786	May 8, 2018
Is “status json” the right way to check whether FDB cluster is in a single-datacenter mode and how scalable it is as a query? Running FoundationDB bindings , performance	0	322	October 27, 2021
Health / Readiness checks Using FoundationDB	7	1087	May 18, 2020
Troubles scaling up the cluster Using FoundationDB	31	3732	November 1, 2018

How to do status check using the Java API

Related topics