We plan to release OpenTelemetry tracing in 7.1 and it will replace the OpenTracing implementation. For the user, this change should be seamless and the existing knobs will remain the same. FWIW we are not aware of anyone using the current OpenTracing implementation at this moment. As a follow up, we’ve also built an OpenTelemetry receiver specifically for processing these distributed traces from FDB. Comparing open-telemetry:main...sfc-gh-rjenkins:foundationdb-trace-receiver · open-telemetry/opentelemetry-collector-contrib · GitHub. We plan to open a PR on this new tracing receiver and get this merged upstream into the OpenTelemetry project so all FoundationDB users can utilize it with the OpenTelemetry collector system. If you have any questions please let me know! cc: @firstname.lastname@example.org.
It’s good to know someone outside of Snowflake is looking into tracing Hopefully porting your PoC to the new Open Telemetry stuff will be relatively simple.
Reasons to not use the gRPC stuff are pretty much the ones you outlines. Back when we implemented the Open Tracing stuff we decided to use msgpack and UDP for two main reasons:
It works well with fluentd and the plan was to have fluentd some of the heavily lifting.
We wanted to reduce the amount of CPU spent in FDB for tracing to a minimum. Especially storage servers have the problem that they don’t scale up (we can only scale out). So CPU is a very precious resource.
+1 to @markus.pilman’s comments. Also I had already written the OpenTelemetry collector receiver to handle msgpack encoding, so it should be quick to add the OpenTelemetry support. The receiver will likely keep support for decoding the OpenTracing and OpenTelemetry formats so anyone using 7.0 can still use the receiver.
One note: It’s looking that the 7.1 announcement might be premature. We’ll probably make the switch to the OpenTelemetry tracing format in a release after 7.1, however the implementation did land in 7.1 it’s just not being called anywhere. We’re working on the migration plan now. Also release notes should state OpenTracing is still working in 7.1, but it is now deprecated, and will be migrated to OpenTelemetry in a coming release.
As we are waiting for 7.0 to deploy our first cluster, we had some time to think about the observability of the cluster We are operating a lot of distributed systems, so adding another one to the list is not that easy. As a result, We care a lot about the platform’s observability.
I’m not worried at all That’s the good think about PoC, not in production, can be easily rewritten.