View Changes Purpose of a view change is to force agreement on: Who is the new primary. The contents of the database. The status of uncommitted or partially committed operations. Interesting cases: Primary fails. Slave recovers. Multiple view changes started at the same time. Every server has a current view_number in stable storage. view_number = < sequence_number, initiating_server_id > The second part helps break ties. A server can initiate a view change at any time. But probably should only do so if it thinks active server set has changed. I.e. if it can't reach a server that used to be up. Or if it just rebooted. First, initiating server chooses a new view number. new view_number = < old_sequence_number + 1, initiating_server_id > (Using its own server ID). Sends out a prepare view change message with new view_number. Waits until it gets a quorum of positive replies. Other servers might reject: If they don't respond (implicit). If they know about a pending view change with higher view_number. Might be same seq # but higher initiating_server_id. In the latter rejection case, the other view change will win. So we can abort our view change and adopt the new one. So we're OK -- we're making progress. If there's a quorum, one of the pending view changes will win. We want as many servers as possible to be in new set. Even if there are competing view changes. So higher view_number takes precedence. Rather than, for example, collecting locks. Now everybody knows there's a view change in progress. How do we carry over state from the previous view? I.e. operations that hadn't fully committed. How to reconstruct primary's state? We care about partially committed operations. Phase 1 replies from slaves include info about pending operations. Primary will re-issue any such operation after view change completes. Is this correct for ops that previous primary hadn't acked to client? Primary will re-send prepares to slaves. Collect quorum of replies. Send ACK to client. Send commit to slaves. This may cause operation to occur twice on some slaves. Oops. Is this correct for ops that previous primary acked to client? May cause the client to get a second ACK for the operation. How to rebuild a failed server's state when it re-joins? New server initiates a view change. This prevents new operations from starting. New server asks a member of old view for complete state. Then proceeds as usual. Expensive! Could transfer data before view change, then transfer just new ops during view change. Winning initiator sends out a view change commit message. After which normal request processing proceeds.