Paxos FAQ Q: How does Paxos prevent split-brain? A: For Paxos, split brain means that different servers "agree" on different values. The places in Paxos where it must wait for a majority to respond are enough to prevent this. If there are two active proposers, the Paxos algorithm requires them both to get prepare replies from a majority of acceptors, and accept replies from a majority of acceptors. The two proposers' majorities must overlap in at least one server; the reply from that server will tell the losing proposer (the one with the smaller n) that it has lost, or (if the lower number proposer has already reached agreement) will tell the higher-numbered server what the agreed value was. If a network failure has partitioned the servers, then at most one of the partitions can possibly contain a majority of the servers. So Paxos will only be able to reach agreement in that partition. Servers in the other partitions won't be able to agree on anything. Thus split brain is avoided under network partition. Q: What are some ways to ensure unique proposal numbers? A: A proposer can put an ID that's different for each proposer in the low bits of the proposal number. For example, a proposer could put its IP address in the low bits. Then different proposers will never use the same proposal number. Each proposer can avoid re-using the same proposal number by keeping a variable holding the highest proposal number it has used so far, and, when it starts a new proposal, make sure the new proposal number is higher. If you are feeling lucky, you could use numbers that have the current time in the high bits and a random number in the low bits. You'd want lots of bits for the random number to keep the probability of collision super low. Q: In 2.4, what is the distinguished proposer about? A: The idea is that the implementation should try to make it likely that there's at most one active proposal at a time. This is not a requirement for correctness; it is advice about how to reach agreement faster. One way to do this is to have each proposer wait a random amount of time before sending prepare messages. Then there the proposer who chose the smallest random number will go first and will likely complete agreement before the other proposers finish their sleeping. There can still be more than one proposer, but this randomization scheme breaks the symmetry among the proposers and causes it to be likely that one of them wins. Q: How does the algorithm terminate? A: A Paxos agreement doesn't have a well-defined termination point. However, if there are no failures, a proposer can observe that agreement has been reached (because it gets accept replies from everyone), and it can safely stop proposing at that point. Q: What are some use cases for Paxos? A: Paxos helps you build replicated services, which can increase fault tolerance. For example, one can build a Paxos-replicated database with three servers, such that if one of the servers fails, the other two can continue to execute database requests. You can see various uses if you search the web for papers about Chubby, Spanner, Megastore, Spinnaker, and Zookeeper. Q: How fast is Paxos? A: There are two main limits to performance: message exchanges, and writing updated n and v information to disk (so they can be recovered if the server crashes and restarts). Paxos requires at least two message exchanges in order to agree; if all hosts are within the same datacenter this might take a few hundred microseconds. Paxos also requires two writes to the disk (one for prepare, one for accept); depending on disk technology this could take anywhere from a few hundred microseconds to a few tens of milliseconds. So without clever optimizations Paxos will be relatively slow. This paper describes some of what's needed for good performance: http://research.google.com/archive/paxos_made_live.html Q: What does the paper mean by a leader on page 9? Does Paxos have a leader? A: Section 3 of the paper is not about Paxos itself, but about one way to implement and use Paxos. Paxos itself doesn't require a leader. I do not know whether any real-world Paxos implementations use the ideas in Section 3. It may be that when the paper speaks of electing a leader, it means using Paxos to agree on a leader. Or one could use any of a number of schemes that usually (but not always) result in agreement on a leader (since Paxos will be correct regardless), such as letting the participant with the lowest IP address propose with no delay, but requiring other participants to sleep for a random amount of time before proposing. Much of the point of a leader is to avoid simultaneous different proposals, which require multiple Paxos rounds to sort out. The best paper I know about real-life implementation and use of Paxos is Paxos Made Live: http://research.google.com/archive/paxos_made_live.html Section 5.2 on master leases may be the most relevant.