Paper Reviews: Distributed Tracing & Borg

Inspired by https://joisino.hatenablog.com/, I decided to try reading papers every day as much as possible. To help maintain the habit, I’m publishing excerpts from my paper notes along with my thoughts. To start, I’ll introduce some well-known papers in the distributed tracing field, which is related to my own research, along with papers from the recently held NSDI'23, and the Borg paper from Google, which Kubernetes is based on.

The Japanese version of this article is available here.

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag, “Dapper, a large-scale distributed systems tracing infrastructure,” Google, Inc., Tech. Rep., 2010. [Online]. Available: https://research.google.com/archive/papers/dapper-2010-1.pdf

Overview

Proposes an annotation-based distributed tracing tool and introduces use cases at Google
Achieves low overhead, application-level transparency, scalability, and online analysis

Thoughts

Is it still the case today that all services communicate through a common RPC mechanism? Nowadays, wouldn’t instrumentation be needed for each communication protocol?
It’s impressive that a large-scale system like Google’s has a unified application development framework
The mainstream approach in current distributed tracing/APM papers and tools is propagating a Trace ID, and I wonder if Dapper was the first to do this at such a large scale in the early days

Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems

J. Mace, R. Roelke, and R. Fonseca, “Pivot tracing: Dynamic causal monitoring for distributed systems,” in Proceedings of the 25th Symposium on Operating Systems Principles, ser. SOSP ‘15. New York, NY, USA: Association for Computing Machinery, 2015, p. 378–393. [Online]. Available: https://doi.org/10.1145/2815400.2815415

Overview

A tracing system that can dynamically determine which metrics to record and capture causal relationships between events across system boundaries
The authors have published many well-known distributed tracing papers, primarily around Canopy and X-Trace

Thoughts

Is the advantage of Dynamic Instrumentation really that significant? The effort of defining tracepoints vs. direct instrumentation doesn’t seem that different
Query-based aggregation and causal relationship extraction seems useful
Dynamic Instrumentation seems useful enough that it could be applied more effectively to tracing, but the fact that it isn’t widely used suggests there might be some drawbacks
It would be interesting to read through all the distributed tracing papers by these authors

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

L. Zhang, Z. Xie, V. Anand, Y. Vigfusson, and J. Mace, “The benefit of hindsight: Tracing Edge-Cases in distributed systems,” in 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). Boston, MA: USENIX Association, Apr. 2023, pp. 321–339. [Online]. Available: https://www.usenix.org/conference/nsdi23/presentation/zhang-lei

Overview

A distributed tracing method that performs tail sampling with low overhead by retroactively collecting trace data only when triggered, enabling collection of edge-case traces
The authors are primarily from MPI-SWS and work on distributed systems and cloud-related topics; the last author is the researcher behind Pivot Tracing, Canopy, and Tracing Plane

Thoughts

Being a cutting-edge top-conference paper from a group that has published many well-known distributed tracing papers, the classification and history of distributed tracing was explained clearly and was very educational
They seem to have put effort into the data structures for managing trace data before collection, but couldn’t this alone improve conventional tail sampling to some extent?
If you add a trigger that samples randomly, could you also do non-edge-case tracing simultaneously?
Looking at the results, it seems that collecting all trace data has a greater impact on overhead than tracing all requests. If that’s the case, head sampling might be better than tail sampling for non-edge-case tracing.
By integrating well with OpenTelemetry instrumentation, existing systems could be traced, increasing the number of evaluation targets – something I’d like to reference

Canopy: An End-to-End Performance Tracing And Analysis System

J. Kaldor, J. Mace, M. Bejda, E. Gao, W. Kuropatwa, J. O’Neill, K. W. Ong, B. Schaller, P. Shan, B. Viscomi, V. Venkataraman, K. Veeraraghavan, and Y. J. Song, “Canopy: An end-to-end performance tracing and analysis system,” in Proceedings of the 26th Symposium on Operating Systems Principles, ser. SOSP ‘17. New York, NY, USA: Association for Computing Machinery, 2017, p. 34–50. [Online]. Available: https://doi.org/10.1145/3132747.3132749

Overview

An annotation-based tracing system that separates instrumentation from analysis, making each customizable, enabling low-level tracing across applications with different characteristics while allowing high-level modeling through aggregation
Research by J. Mace (of Pivot Tracing) and a group at Facebook

Thoughts

The separation of instrumentation and analysis, and tracing across applications with different characteristics, which they emphasize, seem like things others have done as well
Beyond that, it felt more like an introduction of Facebook’s system, and I couldn’t quite identify the novelty
Is the contribution that they consolidated the individual techniques that others have also done?
Is the novelty in collecting low-level log data in various formats and modeling it into a unified format that’s easy to aggregate?

Large-Scale Cluster Management at Google with Borg

A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at google with borg,” in Proceedings of the Tenth European Conference on Computer Systems, ser. EuroSys ‘15. New York, NY, USA: Association for Computing Machinery, 2015. [Online]. Available: https://doi.org/10.1145/2741948.2741964

Overview

Proposes Borg, a cluster manager that runs hundreds of thousands of jobs from thousands of applications across multiple clusters spanning tens of thousands of machines, and discusses lessons learned from 10 years of operation at Google and how they were applied to Kubernetes

Thoughts

Learning about Borg’s design philosophy helped me understand the reasoning behind Kubernetes’ design. The discussion of the complex problem setting and how it influenced Kubernetes was particularly interesting.
Borg gives the impression of being inferior to Kubernetes, which makes sense since Kubernetes was built with Borg as a reference, but I wonder what the current version of Borg used at Google looks like.

Hiroya Onoe

Paper Reviews: Distributed Tracing & Borg

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

Overview

Thoughts

Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems

Overview

Thoughts

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

Overview

Thoughts

Canopy: An End-to-End Performance Tracing And Analysis System

Overview

Thoughts

Large-Scale Cluster Management at Google with Borg

Overview

Thoughts

Table of Contents