FoundationDB

ClusterController.actor.cpp#statusServer is somewhat inefficient


(Clement Pang) #1

It seems like the CC spends quite a bit of time serializing the same reply to each requester and we can see that it’s always spending time doing JSON serialization for each call to requests_batch.back().reply.send(result.get()); instead of doing it once and sending the same JSON response back to the sender. With STATUS_MIN_TIME_BETWEEN_REQUESTS set this could be worse since more requests can be collected and even longer time might be spent on the serialization. A new change by Trevor adds a limit on the number of request that we would collect which should help but I think the fundamental issue here is that the JSON serialization is just too painfully slow (for large clusters especially).


(Clement Pang) #2

Something like https://github.com/panghy/foundationdb/commit/0a1d6133190ab83d260cb9268a9318cabe1ec20b seems to help (yield during the loop that serializes).


(Evan Tschannen) #3

In 6.0 we made status serialization much better. https://github.com/apple/foundationdb/pull/758/files

It is possible that the yield might still be necessary if a lot of clients are asking for status, but I believe Trevor’s change should handle that.


(Steve Atherton) #4

To summarize the recent status rewrite: The previous implementation involved a ton of unnecessary std::string copying and building up a fairly deep map of maps, which then had to be serialized with a traversal of all of the nested maps. In the new version, the functions generating status entries write to a JSON object representation that accumulates writes efficiently into an array of buffers and allows insertion of one document into another without copying data.

The result is that status documents are much faster to generate and much faster to serialize to serialize for sending to the client. Serialization involves iterating over a single array containing buffers of various sizes.


(Clement Pang) #5

Cool, might need to pull that to our own internal builds then. Thanks for the info. Will see if slow tasks comes up still with the fix.