ClusterController.actor.cpp#statusServer is somewhat inefficient

panghy · February 4, 2019, 3:26pm

It seems like the CC spends quite a bit of time serializing the same reply to each requester and we can see that it’s always spending time doing JSON serialization for each call to requests_batch.back().reply.send(result.get()); instead of doing it once and sending the same JSON response back to the sender. With STATUS_MIN_TIME_BETWEEN_REQUESTS set this could be worse since more requests can be collected and even longer time might be spent on the serialization. A new change by Trevor adds a limit on the number of request that we would collect which should help but I think the fundamental issue here is that the JSON serialization is just too painfully slow (for large clusters especially).

panghy · February 4, 2019, 3:59pm

Something like https://github.com/panghy/foundationdb/commit/0a1d6133190ab83d260cb9268a9318cabe1ec20b seems to help (yield during the loop that serializes).

Evan · February 4, 2019, 8:00pm

In 6.0 we made status serialization much better. https://github.com/apple/foundationdb/pull/758/files

It is possible that the yield might still be necessary if a lot of clients are asking for status, but I believe Trevor’s change should handle that.

SteavedHams · February 4, 2019, 8:32pm

To summarize the recent status rewrite: The previous implementation involved a ton of unnecessary std::string copying and building up a fairly deep map of maps, which then had to be serialized with a traversal of all of the nested maps. In the new version, the functions generating status entries write to a JSON object representation that accumulates writes efficiently into an array of buffers and allows insertion of one document into another without copying data.

The result is that status documents are much faster to generate and much faster to serialize to serialize for sending to the client. Serialization involves iterating over a single array containing buffers of various sizes.

panghy · February 4, 2019, 8:33pm

Cool, might need to pull that to our own internal builds then. Thanks for the info. Will see if slow tasks comes up still with the fix.

Topic		Replies	Views
FoundationDB Status as JSON Development	0	782	May 8, 2018
Question on parallel commit latencies Using FoundationDB	8	1575	May 31, 2019
Few queries on status json Using FoundationDB	5	1221	December 7, 2018
Unable to communicate with the cluster controller at XXX to get status Using FoundationDB	1	404	October 30, 2023
I made a tool for browsing `status json` Using FoundationDB	1	453	November 3, 2022

ClusterController.actor.cpp#statusServer is somewhat inefficient

Related Topics