6.824 - Fall 2007

6.824 Lab 7: Replicated State Machine

Last change: $Date: 2007/11/07 14:29:09 $

Due: Wednesday, November 7th, 11:59pm.

Introduction

In this lab you will replicate your lock server using the replicated state machine approach (see Schneider's RSM paper for a good, but non-required, reference). In the replicated state machine approach, one machine is the master; the master receives requests from clients and executes them on all replicas. To ensure that all replicas have identical state, the replicas must execute all requests in the same order and all requests must produce the same result on all replicas (i.e., the handlers must be deterministic). When the master fails, any of the replicas can take over its job, because they all should have the same state as the failed master. (We'll be discussing an example replicated state machine in Lecture 13.)

The primary task in the lab is building a replicated state machine library (RSM) on top of the existing RPC library for YFS. RSM allows you to plug in any RPC program you have written so far and replicate it as a replicated state machine. To ensure the appropriate behavior, however, there are certain constraints on the RPC handlers. Most importantly, the RPC handlers must run to completion without blocking, and be deterministic. These constraints will ensure that all replicas execute all requests in the same order and with the same result.

Once you have built the replicated state machine library we will plug in the caching lock server you built in Lab 5. In this lab it should become clear why we asked you to write the caching lock server in a way that there are no blocking handlers for the server.

To test your implementation you will need to run the tests for the previous labs, but you may want to start just testing your code with lock_tester. You should test your replicated lock server using the lock_tester first without any failing lock servers. Once that works, you should test by killing some of the lock servers (using the kill command or control-C) while the test is running. We provide you with a testing program, rsm_tester.pl, to make this easier. Even though servers fail, lock_tester should complete successfully. Note that there are no requirements for running with RPC_LOSSY in this lab, though Lab 8 may require it. You also need to ensure that the file system tests for Labs 2, 3, and 4 still complete successfully when the lock_server runs as a replicated state machine.

In this lab you will not start new lock servers after a failure or after the system has been running for a while. To handle these cases we need more sophisticated protocols---for example, we need to ensure that all slaves agree on a single master, despite the presence of network failures, and transfer state to bring new members up to speed. These more sophisticated protocols will be be the topic of the final lab.

RSM: Replicated state machine library

The RSM library has a client object and a server object.

The RSM client takes in its constructor the address of the master, and immediately contacts the master to learn the addresses of all the other replicas. A program can invoke a call method on the client object (just as if it were an RPC client), which will marshall a request. The RSM client sends this request as an argument in another RPC (invoke) to the master. If this RPC fails or times out, the client contacts the replicas one by one until it finds a master to accept the request. If a replica receives an invoke RPC and it isn't master, it should refuse to execute the client request. (The client should automatically fail over to the next replica until it is the master.)

A server can create an instance of the library by creating an RSM object. The constructor takes as arguments the address of the curent master and the address of this server. The first RSM object created typically is the master (i.e., the address of the master is the same as the address of this server). Each RSM object has a view object that tracks the current set of replicas, and the current sequence number for the group. For this lab, the view constructor simply reads in a static list of replicas from a file called "config"; we will improve this in the final lab. Once the RSM object is created, a server will typically register some RPC handlers with the RSM (instead of registering them with an rpcs object).

When the master gets a new request from an RSM client, it assigns the request a sequence number, and issues an invoke RPC on each replica. The replicas unmarshall the request, and invoke the handler registered for the RPC number, supplying an argument and results structure.

Getting Started

Since you are building on the past labs, ensure the code in your Lab 6 directory passes all tests for Labs 1, 2, 3, 4, 5 and 6 before starting in on this lab.

Begin by initializing your Lab 7 directory with your implementation from Lab 6 and then updating it with some new infrastructure code in http://pdos.csail.mit.edu/6.824/labs/yfs-lab7.tgz for Lab 7.

% wget -nc http://pdos.csail.mit.edu/6.824/labs/yfs-lab7.tgz
% rsync -av l6/* l7
% tar xzvf yfs-lab7.tgz

This will add these new files to your l7/ directory:

rsm_protocol.h: This file defines two different RPC protocols. rsm_client_protocol is used between the rsm_client and rsm objects, while rsm_protocol is used between the rsm master object and its slaves.
rsm.{cc,h}: These files implement the RSM server object. This should be used in place of an rpcs object for the server class you'd like to replicate (lock_server, in this lab). It handles registering RPC handlers as usual with the reg method, and so should require minimal changes to the lock_server. You are responsible for filling in the following methods:
- client_invoke: This is called by a client to invoke a new RPC on the master. If this RSM object is not the master, it should reply with the NOTMASTER status. If it is the master, it should assign the RPC a new sequence number, and send an invoke RPC to all its slaves. Note that serial execution on the slaves is acceptable. Once all live slaves reply successfully, the master can execute the invoked RPC locally, and then reply to the client. You should supply a timeout to the RPC in case any of the slaves have died (see the heartbeater method for an example use of a timeout -- a reasonable timeout value might be 500 or 1000 milliseconds). If a slave times out (i.e., the invoke call returns a -1), the slave should be removed from the group using the view object.
- invoke: This is invoked on slaves by the master RSM object. Each slave should ensure this is an RPC with the expected sequence number, and if so, execute it locally and reply back to the master. It should also ensure that it is not currently the master; if it is, it should reply with an error.
We provide you with some infrastructure in this class that you can use to simplify your implementation:
- execute: This method unmarshalls the RSM representation of a client's RPC and executes it using the registered RPC handler.
- heartbeater: This is a thread that continually sends RPCs to the master to ensure it is alive. If the master doesn't respond to a heartbeat, the slave removes it and picks a new master using the view object.
- amimaster: This method returns true if this object is currently the master. You can call this method from the lock_server object to decide whether or not to send RPCs to the client -- only the master should be interacting with the lock_client_cache.
- client_members: This is an RPC handler that returns a list of members to an rsm_client. This is called by the rsm_client constructor.
- rsm_mutex: You should lock this for the duration of every RPC handler. It is ok to keep this locked across RPCs -- in fact, it is necessary to ensure that the slaves are executing the RPCs in lock-step.
rsm_client.{cc,h}: These files implement the RSM client object. This should be used by client classes that communicate with a replicated server, in place of an rpcc object (in this lab, you'll use it in the lock_client_cache class). It can send RPCs as usual with the call method. You are responsible for filling in the following method:
- invoke: We provide the basic invoke call to the RSM server, but you'll need to handle the case where the master fails -- in that case, the RPC call will return a -1, or if you call the wrong destination, a NOTMASTER status code. If this happens, you should try the next replica in the dsts list.
view.{cc,h}: The view class helps the RSM object manage the current group of replicas, the master of the group, and the current sequence number for the group. As mentioned earlier, this class's constructor reads the initial list of replicas from a file called "config". Useful methods:
- set_master: It sets the master, using the replica with the smallest hostname, alphabetically.
- master: Return the hostname of the current master.
- members: Return a vector of the hostnames of all the members (including the master). You can turn these hostnames into destinations using the make_sockaddr method, as you did in Lab 5.
- remove_member: Removes a replica from the group. Useful when a replica notices another has failed.
- seqno: Returns the current sequence number.
- incseqno: Increments the sequence number.
rsm_tester.pl: This script runs through a few scenarios of killing lock servers, and ensures that the lock_tester still completes successfully. More on this below.

The new code will overwrite the Makefile and lock_demo.cc files from your previous labs, so you may have to redo any changes in those files.

Testing

Our measure of success is surviving failed replicas, including failures of the master. For this lab, you'll need to pass tests 1, 2, and 3 of rsm_tester.pl (as well as making sure all the file system tests from previous labs work). rsm_tester.pl runs four different tests:

Replicated lock server, no failures: this test just starts up three replicas of the lock server, and runs the standard lock_tester tests.
Replicated lock server, failed replica: this test does the same as the first test, but it kills one of the non-master replicas during the test.
Replicated lock server, failed master: this test does the same as the first test, but it kills the master replica during the test.
Replicated lock server, join replica, kill master: this test starts two lock server replicas, then later joins a third, and then finally kills the master during the test.

The tester picks random ports for the lock server replicas, and starts them up. It redirects output to log files, named as lock_server-[master_port] [my_port].out. The log for the tester is lock_tester-[master_port].out. For this lab, you need to pass the first three tests. You should execute the command as follows, and see similar output:

% ./rsm_tester.pl 1 2 3
test1: start 3-process replicated state machine, run lock_tester
spawned: 2530 2543 2544
./lock_tester: passed all tests successfully
test2: start 3-process rsm, kill slave while tester is running
spawned: 3244 3257 3261
kill: 3257
./lock_tester: passed all tests successfully
test3: start 3-process rsm, kill master while tester is running
spawned: 3899 3912 3916
kill: 3899
./lock_tester: passed all tests successfully
tests done
%

which runs the first three tests above (you can also run the tests individually by just specifying a single test number). We will get to the fourth test (and possibly others) in the final lab.

Step One: replicated state machine without failures

As a first step, just get the basic replicated state machine working. Assume for now that none of the replicas will fail. The basic protocol is:

The master invokes the RPC at each slave replica, in series.
Each slave executes the RPC locally, and replies back to the master.
The master executes the RPC locally, and replies back to the client.

This will involve filling in the methods in rsm.cc mentioned above, as well editing your existing caching lock server/client to use the RSM objects, rather than the RPC objects. In particular:

lock_smain.cc should now take in a second port as a command line argument, create an rsm object using the command line string arguments argv[1] and argv[2] (the first should be the master's port, the second is the port this server should listen on), and pass that rsm object to the lock_server_cache object. It should also register RPC handlers with the rsm, and not with the rpcs (which is no longer needed).
lock_server_cache should now take in an rsm object in its constructor (probably as a pointer) and save it, so that it can use the amimaster method later.
lock_client_cache should now contain an rsm_client object, initialized using the same destination xdst currently passed to the constructor. It should send all RPCs using this object (note that these calls no longer require passing in a sockaddr_in). You can look at lock_demo.cc for an example.
rsm should hold its mutex for the entirety of each of its RPC handlers.

Remember that only the master lock_server process should be communicating directly with the client. Therefore, you should protect any communication from lock_server_cache to lock_client_cache with a call to rsm::amimaster().

Once this works, you should be able to pass './rsm_tester.pl 1'.

Step Two: slave failures

Next, handler the case where the master tries to invoke an RPC on a slave, but the slave has failed. This should cause the master to remove the replica from its list of replicas.

Once this works, you should be able to pass './rsm_tester.pl 1 2'.

Step Three: RSM client handles master failures

We've supplied you with the heartbeat code that allows the slaves to pick a new master if the current master dies. All that's left to do in rsm_client is to fail over to a new master if an RPC invocation times out.

Important: Unfortunately, because the outgoing RPC calls of the slaves and the master are not completely synchronized, it could be the case that at some moment in time, the slaves believe they have successfully sent a revoke or a grant RPC, but the master has not yet sent the RPC. If the master dies before it sends the RPC, and one of the slaves becomes the master, that message will never be sent to the client. The easiest way to fix this problem is to ensure that your lock_client_cache objects retry their acquire RPCs after some timeout whenever they get a RETRY status code returned from an acquire call.

More specifically, in Lab 5, when the lock_client_cache received a RETRY, it was enough to wait on a conditional variable forever until it received a grant for that lock. Now, in Lab 7, when the master fails, that grant may never come. So, your lock_client_cache should take the RETRY message literally: instead of calling pthread_cond_wait, it should call pthread_cond_timedwait using some small timeout value (say 2 seconds). This call will return either when the conditional variable is signalled (as usual), or when the timeout happens, whichever happens first. The return value of this call will be ETIMEDOUT if the timeout occurred, but if the condition variable was signalled it will be zero. If a timeout did occur, your code should retry the acquire. Check rpc.cc for an example use of pthread_cond_timedwait.

Once this works, you should be able to pass './rsm_tester.pl 1 2 3' many times in a row.

Hints

Be aware that your lock_client_cache should always make sure that lu is not NULL before calling dorelease, otherwise it won't work for the lock_tester test.
You must eliminate any randomness in the lock server, or at the very least make sure the random number generator in each replica is seeded with the same value.
From the perspective of rsm_client, it may be difficult to tell the difference between a master failure, and a response from a live master that's delayed because one of its slaves has died. Make sure your client can handle both cases robustly.
For this lab, we don't require handling the case where a slave fails, and then the master fails. Currently in your code, the new master will probably not be aware that the other slave has failed -- in fact, if that slave was supposed to be the new master, the real new master might not realize it should take over the role. This is ok for now.

Challenges

Here are a few things you can do if you finish the lab early and feel like improving your code. These are not required, and there are no associated bonus points, but some of you may find them interesting.

There's no reason that the slaves replicas must receive the RPCs in series. You may be able to improve performance by executing them in parallel.
Try running these tests with RPC_LOSSY=5 (or even higher) and see if you can get it working reliably. This might be difficult without some sort of agreement protocol, but it might get you thinking about the issues that will come up in the final lab.
Use the RSM library to replicate the extent server.

Collaboration policy

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assignment. You are not allowed to look at anyone else's solution (and you're not allowed to look at solutions from previous years). You may discuss the assignments with other students, but you may not look at or copy each others' code.

Handin procedure

You will need to email your completed code as a gzipped tar file to 6.824-submit@pdos.csail.mit.edu by the deadline at the top of the page. To do this, execute these commands:

% cd ~/yfs/l7
% make clean
% rm core*
% rm *log
% cd ..
% tar czvf `whoami`-lab7.tgz l7/

That should produce a file called [your_user_name]-lab7.tgz in your yfs/ directory. Attach that file to an email and send it to the 6.824 submit address.

For questions or comments, email 6.824-staff@pdos.csail.mit.edu.
Back to 6.824 home.