• Scalability! But at what COST?

    After a long break, we made our first paper Scalability! But at what COST? by Frank McSherry, Michael Isard, and Derek Murray. It is fairly short paper published in HotOS’15 which raises some interesting questions about distributed systems research, and the focus on scalability as the holy grail of performance.

  • EPaxos

    EPaxos is a leaderless Paxos variant which tries to reduce latencies for a geo-distributed replica group by enabling the client to use the replica with the lowest round-trip latency as the operation leader, and optimistically skipping a round of replica communication by inter-operation conflict detection

  • Sinfonia

    Sinfonia is a service that allows hosts to share application data in a fault-tolerant, scalable, and consistent manner using a novel mini-transaction primitive. We read this paper because it provides an interesting alternative to message passing for building distributed systems.

  • Discretized Streams

    Discretized Streams: Fault Tolerant Computing at Scale describes additions to the Spark system to handle streaming data. Compared to other streaming systems, Spark Streaming offers a more robust fault recovery and straggler handling strategies using the Resilient Distributed Dataset (RDD) memory abstraction. In addition to allowing parallel recovery, Spark Streaming is one of the first systems which can incorporate batch and interactive query models all within the same system.

  • SPANStore

    Several cloud providers provide storage in many data centers globally, and customers can use simple PUTs and GETs to store and retrieve data without dealing with the complexities of the storage infrastructure. However, in reality, every storage system leaves replication across data centers to the application, and although replication across all data centers provides low latency, it is expensive.

  • Chain Replication

    Chain Replication is a paper from ‘04 by Renesse and Schneider. The system is interesting, because it is a primary-backup system with an unconventional architecture that aimed to achieve high throughput and availability while maintaining strong consistency.

  • MDCC

    There is a write performance trade off when consistently replicating across multiple data centers due to the high latency when sending messages between data centers (often > 100ms). Existing protocols use forms of two phase commit which incur 3 blocking round trips between data centers.

  • CoralCDN

    CoralCDN is a system that allows small websites or those with limited resources a method for remaining available in the face of flash-crowds. The system is interesting for a number of reasons: it is a live running system that the public can use, it’s peer to peer, self organizing, and also has a follow-up paper that analyzes the system with five years of hindsight.

  • Zookeeper

    Zookeeper is a practical system with replicated storage used by Yahoo!. We are interested in understanding what replication protocol Zookeeper uses, why they needed a new replication protocol, what applications/services people build upon it, and what features are required to make such a replicated storage system practical.

  • Thialfi

    Several of the papers we’ve read recently have focused on sophisticated, generic fault tolerance abstractions based on complex protocols. Thialfi offers a contrast: its approach to fault tolerance is intentionally simple, while at the same time being resilient to arbitrary (halting) failure, including entire data centers. Thialfi’s approach to fault tolerance permeates the design of its abstraction, unlike Raft and VR, which provide general-purpose state machine replication.

  • VR Revisited

    Viewstamped Replication is a mechanism for providing replication through a Primary / Backup scheme. This paper provides a distilled view of this technique along with several optimizations that can be applied. In particular, this paper focuses solely on the Viewstamped Replication protocol, without looking at any specific implementation or uses.

  • Cheriton and Skeen

    Though from 1993, in its time this paper sparked some controversy, provoking an impassioned response. We wanted to understand the debate about the question of providing ordering guarantees as part of the network.

  • PacificA

    In the PacificA paper, the authors describe very clearly how to properly implement a primary/backup replicated storage layer with strong consistency.

  • Spanner

    Spanner is a highly distributed, externally consistent database developed by Google. It provides replication and transactions over a geographically distributed set of servers. Spanner uses time bounds, Paxos, and two-phase commit to ensure external consistency.

  • Raft

    Raft is a new consensus algorithm that is optimized for “ease of implementation”. Its main purpose is to present a protocol that is more understandable than Paxos, which, for many practitioners, is difficult to implement correctly. Viewstamped Replication is more similar to Raft, however it is far less popular than Paxos, so it is unfortunately not focused on in the paper.