Inspired by https://joisino.hatenablog.com/, I decided to try reading papers every day as much as possible. To help maintain the habit, I’m publishing excerpts from my paper notes along with my thoughts. To start, I’ll introduce some well-known papers in the distributed tracing field, which is related to my own research, along with papers from the recently held NSDI'23, and the Borg paper from Google, which Kubernetes is based on.

The Japanese version of this article is available here.

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag, “Dapper, a large-scale distributed systems tracing infrastructure,” Google, Inc., Tech. Rep., 2010. [Online]. Available: https://research.google.com/archive/papers/dapper-2010-1.pdf

Overview
  • Proposes an annotation-based distributed tracing tool and introduces use cases at Google
  • Achieves low overhead, application-level transparency, scalability, and online analysis
Thoughts
  • Is it still the case today that all services communicate through a common RPC mechanism? Nowadays, wouldn’t instrumentation be needed for each communication protocol?
  • It’s impressive that a large-scale system like Google’s has a unified application development framework
  • The mainstream approach in current distributed tracing/APM papers and tools is propagating a Trace ID, and I wonder if Dapper was the first to do this at such a large scale in the early days

Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems

J. Mace, R. Roelke, and R. Fonseca, “Pivot tracing: Dynamic causal monitoring for distributed systems,” in Proceedings of the 25th Symposium on Operating Systems Principles, ser. SOSP ‘15. New York, NY, USA: Association for Computing Machinery, 2015, p. 378–393. [Online]. Available: https://doi.org/10.1145/2815400.2815415

Overview
  • A tracing system that can dynamically determine which metrics to record and capture causal relationships between events across system boundaries
  • The authors have published many well-known distributed tracing papers, primarily around Canopy and X-Trace
Thoughts
  • Is the advantage of Dynamic Instrumentation really that significant? The effort of defining tracepoints vs. direct instrumentation doesn’t seem that different
  • Query-based aggregation and causal relationship extraction seems useful
  • Dynamic Instrumentation seems useful enough that it could be applied more effectively to tracing, but the fact that it isn’t widely used suggests there might be some drawbacks
  • It would be interesting to read through all the distributed tracing papers by these authors

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

L. Zhang, Z. Xie, V. Anand, Y. Vigfusson, and J. Mace, “The benefit of hindsight: Tracing Edge-Cases in distributed systems,” in 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). Boston, MA: USENIX Association, Apr. 2023, pp. 321–339. [Online]. Available: https://www.usenix.org/conference/nsdi23/presentation/zhang-lei

Overview
  • A distributed tracing method that performs tail sampling with low overhead by retroactively collecting trace data only when triggered, enabling collection of edge-case traces
  • The authors are primarily from MPI-SWS and work on distributed systems and cloud-related topics; the last author is the researcher behind Pivot Tracing, Canopy, and Tracing Plane
Thoughts
  • Being a cutting-edge top-conference paper from a group that has published many well-known distributed tracing papers, the classification and history of distributed tracing was explained clearly and was very educational
  • They seem to have put effort into the data structures for managing trace data before collection, but couldn’t this alone improve conventional tail sampling to some extent?
  • If you add a trigger that samples randomly, could you also do non-edge-case tracing simultaneously?
  • Looking at the results, it seems that collecting all trace data has a greater impact on overhead than tracing all requests. If that’s the case, head sampling might be better than tail sampling for non-edge-case tracing.
  • By integrating well with OpenTelemetry instrumentation, existing systems could be traced, increasing the number of evaluation targets – something I’d like to reference

Canopy: An End-to-End Performance Tracing And Analysis System

J. Kaldor, J. Mace, M. Bejda, E. Gao, W. Kuropatwa, J. O’Neill, K. W. Ong, B. Schaller, P. Shan, B. Viscomi, V. Venkataraman, K. Veeraraghavan, and Y. J. Song, “Canopy: An end-to-end performance tracing and analysis system,” in Proceedings of the 26th Symposium on Operating Systems Principles, ser. SOSP ‘17. New York, NY, USA: Association for Computing Machinery, 2017, p. 34–50. [Online]. Available: https://doi.org/10.1145/3132747.3132749

Overview
  • An annotation-based tracing system that separates instrumentation from analysis, making each customizable, enabling low-level tracing across applications with different characteristics while allowing high-level modeling through aggregation
  • Research by J. Mace (of Pivot Tracing) and a group at Facebook
Thoughts
  • The separation of instrumentation and analysis, and tracing across applications with different characteristics, which they emphasize, seem like things others have done as well
  • Beyond that, it felt more like an introduction of Facebook’s system, and I couldn’t quite identify the novelty
  • Is the contribution that they consolidated the individual techniques that others have also done?
  • Is the novelty in collecting low-level log data in various formats and modeling it into a unified format that’s easy to aggregate?

Large-Scale Cluster Management at Google with Borg

A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at google with borg,” in Proceedings of the Tenth European Conference on Computer Systems, ser. EuroSys ‘15. New York, NY, USA: Association for Computing Machinery, 2015. [Online]. Available: https://doi.org/10.1145/2741948.2741964

Overview
  • Proposes Borg, a cluster manager that runs hundreds of thousands of jobs from thousands of applications across multiple clusters spanning tens of thousands of machines, and discusses lessons learned from 10 years of operation at Google and how they were applied to Kubernetes
Thoughts
  • Learning about Borg’s design philosophy helped me understand the reasoning behind Kubernetes’ design. The discussion of the complex problem setting and how it influenced Kubernetes was particularly interesting.
  • Borg gives the impression of being inferior to Kubernetes, which makes sense since Kubernetes was built with Borg as a reference, but I wonder what the current version of Borg used at Google looks like.