Paper Reviews: Distributed Tracing

Continuing with distributed tracing papers. The first two are classics from before Dapper. The last one is a recent paper that uses eBPF for lower-layer tracing.

The Japanese version of this article is available here.

X-Trace: A Pervasive Network Tracing Framework

R. Fonseca, G. Porter, R. H. Katz, S. Shenker, and I. Stoica, “X-trace: A pervasive network tracing framework,” in Proceedings of the 4th USENIX Conference on Networked Systems Design & Implementation, ser. NSDI'07. USA: USENIX Association, 2007, p. 20.

Overview

Proposes comprehensive tracing by propagating unified metadata across different applications and network layers and administrative domains, constructing a tree that shows the flow of requests
The authors were networking researchers at UC Berkeley at the time. Each has since gone on to achieve remarkable things in various fields (OpenFlow, RAID, Mesos, etc.)

Thoughts

As one of the earliest papers on annotation-based tracing, the problem setting is simpler and easier to understand compared to recent large-scale and complex systems
Rather than implementing within a specific service infrastructure under a single administrator, they aim to support this across the entire internet and all protocols, crossing administrative boundaries – quite an ambitious vision
Tracing across multiple applications (pushNext) is still mainstream and makes sense, but is there really a need for tracing across multiple layers (pushDown)?
- Especially since embedding metadata in application-layer protocols, TCP, and IP respectively would result in data duplication and significant overhead without much practical benefit
Tracing is useful for diagnosing performance under normal conditions, but the anomaly diagnosis shown in the Usage Scenarios (particularly the DNS story) seems achievable through other means
- Anomalies in individual components should be discovered and reported by their respective administrators; there shouldn’t be a need for another administrator to learn about them first through traces
While the desire to trace across administrative boundaries is understandable, in practice, disclosing even partial trace data to others seems difficult (from a security risk and confidential information leakage perspective)
- If you’re doing comprehensive tracing but each party collects and analyzes trace data independently, it defeats the purpose
- I wonder how this stands today? OpenTelemetry exists, but I haven’t heard of administrators sharing trace data with each other, so it’s probably still difficult

Causeway: Operating System Support For Controlling And Analyzing The Execution Of Distributed Programs

A. Chanda, K. Elmeleegy, A. L. Cox, and W. Zwaenepoel, “Causeway: Operating system support for controlling and analyzing the execution of distributed programs,” in Proceedings of the 10th Conference on Hot Topics in Operating Systems - Volume 10, ser. HOTOS'05. USA: USENIX Association, 2005, p. 18.

Overview

Facilitates the development of meta-applications that require metadata propagation by supporting metadata propagation at the kernel level in distributed programs through Causeway

Thoughts

Can this really eliminate application-level instrumentation?
- The metadata injection and access interface seems designed for meta-applications (actors) to call, but wouldn’t it be necessary to pass that metadata to the application?
Is communication overhead not considered?
The demand for protocol-independent metadata propagation with reduced application instrumentation has existed since this long ago, yet there still doesn’t seem to be a fundamental solution, suggesting it’s an extremely difficult problem
The idea of propagation at lower layers could achieve more with today’s technology
- Kernel extensions could potentially be done with eBPF
The paper is so old that it’s difficult to understand the assumed terminology, and it’s hard to gauge how novel it was at the time

Enhancing Packet Tracing of Microservices in Container Overlay Networks using eBPF

C. Lee, R. Yoshitani, and T. Hirotsu, “Enhancing packet tracing of microservices in container overlay networks using ebpf,” in Proceedings of the 17th Asian Internet Engineering Conference, ser. AINTEC ‘22. New York, NY, USA: Association for Computing Machinery, 2022, p. 53–61. [Online]. Available: https://doi.org/10.1145/3570748.3570756

Overview

Proposes using eBPF to extend annotation-based tracing with latency measurement in container overlay networks
A group at Hosei University researching distributed systems and tracing. The first author is from Toyota’s research lab and works on distributed systems, SDN, and NFV.

Thoughts

When it comes to container overlay networks and eBPF, Cilium comes to mind, but it’s curious that there’s no mention of it at all
- It’s good that they support Flannel and Calico with VXLAN and IPIP analysis, but wouldn’t it be easier to extend Cilium?
The problem that annotation-based tracing only considers the application layer is valid, and the idea of extending it to the infrastructure layer is interesting
- Since we’re already incurring the cost of propagating tracing context, it would be nice to find more uses for it
I didn’t know there was a group in Japan doing this kind of research
I’d like to read or re-implement their code as a way to study eBPF

Hiroya Onoe

Paper Reviews: Distributed Tracing

X-Trace: A Pervasive Network Tracing Framework

Overview

Thoughts

Causeway: Operating System Support For Controlling And Analyzing The Execution Of Distributed Programs

Overview

Thoughts

Enhancing Packet Tracing of Microservices in Container Overlay Networks using eBPF

Overview

Thoughts

Table of Contents