What does FetchKeysTooLong trace event denote?

I am seeing few trace events like below in a deployment

<Event Severity="30" Time="1585765420.463122" Type="FetchKeysTooLong" ID="0000000000000000" Duration="32400" Phase="0" Begin="\x151\x15\x09\x15&amp;3\x00\x00\x07l6?\xf5}\x00\x00\x00\x00\x15\x14" End="\x151\x15\x09\x15&amp;3\x00\x00\x07l7\xb8\xd1\xb7\x00\x00\x00\x00\x15S" Machine="a.b.c.d:xxxx" LogGroup="default" Roles="SS,TL" />

Can someone let me know what does this event denote?

I have read Understanding Log Files.. Is there any other documentation that I can refer to get more details about different types of events?

Foundation DB Version - 6.2.11.

FetchKeys is the process of one storage server fetching a key range from another storage server as part of a data distribution action. FetchKeysTooLong is letting you know that reading the key range from some other server took longer than expected.

There’s a variety of reasons as to why this could be. Your storage servers could have been saturated with client reads for some time. Storage servers try to fetch as of their current version, so if this particular storage server was behind for some reason (which seems suspicious, because it’s both a transaction log and storage server), then that could be why fetch keys took a long time.

Alone, I wouldn’t worry about seeing this once. It’s more useful in context when trying to dig into higher level issues. For instance, it would help explain why a shard movement took a long time, why restoring fault tolerance in a cluster took a while, etc.

Understanding Log Files is the (very) work-in-progress page that was being used to collect write-ups of various trace events. That is the current best (and not very good) documentation that we have on this subject.

Thanks @alexmiller for replying. Can you let me know what does Phase denote in FetchKeysTooLong event?

Can you also let me know how to read the SlowTask events? For example in an event like below how to know this slow task is related to what? I see a TaskID but not sure how to make sense out of it.

<Event Severity="30" Time="1585810867.802134" Type="SlowTask" ID="0000000000000000" TaskID="10000" MClocks="576.392" Duration="0.261998" SampleRate="1" NumYields="0" Machine="a.b.c.d:xxxx" LogGroup="default" Roles="CD,SS" />

Anyone any pointers on my below query?

Can you let me know what does Phase denote in FetchKeysTooLong event?

Can you also let me know how to read the SlowTask events? For example in an event like below how to know this slow task is related to what? I see a TaskID but not sure how to make sense out of it.

<Event Severity="30" Time="1585810867.802134" Type="SlowTask" ID="0000000000000000" TaskID="10000" MClocks="576.392" Duration="0.261998" SampleRate="1" NumYields="0" Machine="a.b.c.d:xxxx" LogGroup="default" Roles="CD,SS" />

(Sorry, a lot of us are focusing on trying to finish off work for FDB6.3, so response time might be a bit worse.)

Phase is defined as enum Phase { WaitPrevious, Fetching, Waiting };

TaskID maps to TaskPriority, which can give a vague idea of what the task is that’s taking a while. I’d suggest instead looking for Net2SlowTaskTrace events, which will give you a backtrace.

1 Like