In this lab you will replicate your lock server using the replicated state machine approach (see Schneider's RSM paper for a good, but non-required, reference). In the replicated state machine approach, one machine is the master; the master receives requests from clients and executes them on all replicas. To ensure that all replicas have identical state, the replicas must execute all requests in the same order and all requests must produce the same result on all replicas (i.e., the handlers must be deterministic). When the master fails, any of the replicas can take over its job, because they all should have the same state as the failed master. (We'll be discussing an example replicated state machine in Lecture 13.)
The primary task in the lab is building a replicated state machine library (RSM) on top of the existing RPC library for YFS. RSM allows you to plug in any RPC program you have written so far and replicate it as a replicated state machine. To ensure the appropriate behavior, however, there are certain constraints on the RPC handlers. Most importantly, the RPC handlers must run to completion without blocking, and be deterministic. These constraints will ensure that all replicas execute all requests in the same order and with the same result.
Once you have built the replicated state machine library we will plug in the caching lock server you built in Lab 5. In this lab it should become clear why we asked you to write the caching lock server in a way that there are no blocking handlers for the server.
To test your implementation you will need to run the tests for the previous labs, but you may want to start just testing your code with lock_tester. You should test your replicated lock server using the lock_tester first without any failing lock servers. Once that works, you should test by killing some of the lock servers (using the kill command or control-C) while the test is running. We provide you with a testing program, rsm_tester.pl, to make this easier. Even though servers fail, lock_tester should complete successfully. Note that there are no requirements for running with RPC_LOSSY in this lab, though Lab 8 may require it. You also need to ensure that the file system tests for Labs 2, 3, and 4 still complete successfully when the lock_server runs as a replicated state machine.
In this lab you will not start new lock servers after a failure or after the system has been running for a while. To handle these cases we need more sophisticated protocols---for example, we need to ensure that all slaves agree on a single master, despite the presence of network failures, and transfer state to bring new members up to speed. These more sophisticated protocols will be be the topic of the final lab.
The RSM library has a client object and a server object.
The RSM client takes in its constructor the address of the master, and immediately contacts the master to learn the addresses of all the other replicas. A program can invoke a call method on the client object (just as if it were an RPC client), which will marshall a request. The RSM client sends this request as an argument in another RPC (invoke) to the master. If this RPC fails or times out, the client contacts the replicas one by one until it finds a master to accept the request. If a replica receives an invoke RPC and it isn't master, it should refuse to execute the client request. (The client should automatically fail over to the next replica until it is the master.)
A server can create an instance of the library by creating an RSM object. The constructor takes as arguments the address of the curent master and the address of this server. The first RSM object created typically is the master (i.e., the address of the master is the same as the address of this server). Each RSM object has a view object that tracks the current set of replicas, and the current sequence number for the group. For this lab, the view constructor simply reads in a static list of replicas from a file called "config"; we will improve this in the final lab. Once the RSM object is created, a server will typically register some RPC handlers with the RSM (instead of registering them with an rpcs object).
When the master gets a new request from an RSM client, it assigns the request a sequence number, and issues an invoke RPC on each replica. The replicas unmarshall the request, and invoke the handler registered for the RPC number, supplying an argument and results structure.
Begin by initializing your Lab 7 directory with your implementation from Lab 6 and then updating it with some new infrastructure code in http://pdos.csail.mit.edu/6.824/labs/yfs-lab7.tgz for Lab 7.
% wget -nc http://pdos.csail.mit.edu/6.824/labs/yfs-lab7.tgz % rsync -av l6/* l7 % tar xzvf yfs-lab7.tgz
This will add these new files to your l7/ directory:
rsm_protocol.h: This file defines two different RPC protocols. rsm_client_protocol is used between the rsm_client and rsm objects, while rsm_protocol is used between the rsm master object and its slaves.
rsm.{cc,h}: These files implement the RSM server object. This should be used in place of an rpcs object for the server class you'd like to replicate (lock_server, in this lab). It handles registering RPC handlers as usual with the reg method, and so should require minimal changes to the lock_server. You are responsible for filling in the following methods:
client_invoke: This is called by a client to invoke a new RPC on the master. If this RSM object is not the master, it should reply with the NOTMASTER status. If it is the master, it should assign the RPC a new sequence number, and send an invoke RPC to all its slaves. Note that serial execution on the slaves is acceptable. Once all live slaves reply successfully, the master can execute the invoked RPC locally, and then reply to the client. You should supply a timeout to the RPC in case any of the slaves have died (see the heartbeater method for an example use of a timeout -- a reasonable timeout value might be 500 or 1000 milliseconds). If a slave times out (i.e., the invoke call returns a -1), the slave should be removed from the group using the view object.
invoke: This is invoked on slaves by the master RSM object. Each slave should ensure this is an RPC with the expected sequence number, and if so, execute it locally and reply back to the master. It should also ensure that it is not currently the master; if it is, it should reply with an error.
We provide you with some infrastructure in this class that you can use to simplify your implementation:
execute: This method unmarshalls the RSM representation of a client's RPC and executes it using the registered RPC handler.
heartbeater: This is a thread that continually sends RPCs to the master to ensure it is alive. If the master doesn't respond to a heartbeat, the slave removes it and picks a new master using the view object.
amimaster: This method returns true if this object is currently the master. You can call this method from the lock_server object to decide whether or not to send RPCs to the client -- only the master should be interacting with the lock_client_cache.
client_members: This is an RPC handler that returns a list of members to an rsm_client. This is called by the rsm_client constructor.
rsm_mutex: You should lock this for the duration of every RPC handler. It is ok to keep this locked across RPCs -- in fact, it is necessary to ensure that the slaves are executing the RPCs in lock-step.
rsm_client.{cc,h}: These files implement the RSM client object. This should be used by client classes that communicate with a replicated server, in place of an rpcc object (in this lab, you'll use it in the lock_client_cache class). It can send RPCs as usual with the call method. You are responsible for filling in the following method:
invoke: We provide the basic invoke call to the RSM server, but you'll need to handle the case where the master fails -- in that case, the RPC call will return a -1, or if you call the wrong destination, a NOTMASTER status code. If this happens, you should try the next replica in the dsts list.
view.{cc,h}: The view class helps the RSM object manage the current group of replicas, the master of the group, and the current sequence number for the group. As mentioned earlier, this class's constructor reads the initial list of replicas from a file called "config". Useful methods:
set_master: It sets the master, using the replica with the smallest hostname, alphabetically.
master: Return the hostname of the current master.
members: Return a vector of the hostnames of all the members (including the master). You can turn these hostnames into destinations using the make_sockaddr method, as you did in Lab 5.
remove_member: Removes a replica from the group. Useful when a replica notices another has failed.
seqno: Returns the current sequence number.
incseqno: Increments the sequence number.
rsm_tester.pl: This script runs through a few scenarios of killing lock servers, and ensures that the lock_tester still completes successfully. More on this below.
The new code will overwrite the Makefile and lock_demo.cc files from your previous labs, so you may have to redo any changes in those files.
Our measure of success is surviving failed replicas, including failures of the master. For this lab, you'll need to pass tests 1, 2, and 3 of rsm_tester.pl (as well as making sure all the file system tests from previous labs work). rsm_tester.pl runs four different tests:
% ./rsm_tester.pl 1 2 3 test1: start 3-process replicated state machine, run lock_tester spawned: 2530 2543 2544 ./lock_tester: passed all tests successfully test2: start 3-process rsm, kill slave while tester is running spawned: 3244 3257 3261 kill: 3257 ./lock_tester: passed all tests successfully test3: start 3-process rsm, kill master while tester is running spawned: 3899 3912 3916 kill: 3899 ./lock_tester: passed all tests successfully tests done %which runs the first three tests above (you can also run the tests individually by just specifying a single test number). We will get to the fourth test (and possibly others) in the final lab.
Remember that only the master lock_server process should be communicating directly with the client. Therefore, you should protect any communication from lock_server_cache to lock_client_cache with a call to rsm::amimaster().
Once this works, you should be able to pass './rsm_tester.pl 1'.
Next, handler the case where the master tries to invoke an RPC on a slave, but the slave has failed. This should cause the master to remove the replica from its list of replicas.
Once this works, you should be able to pass './rsm_tester.pl 1 2'.
We've supplied you with the heartbeat code that allows the slaves to pick a new master if the current master dies. All that's left to do in rsm_client is to fail over to a new master if an RPC invocation times out.
Important: Unfortunately, because the outgoing RPC calls of the slaves and the master are not completely synchronized, it could be the case that at some moment in time, the slaves believe they have successfully sent a revoke or a grant RPC, but the master has not yet sent the RPC. If the master dies before it sends the RPC, and one of the slaves becomes the master, that message will never be sent to the client. The easiest way to fix this problem is to ensure that your lock_client_cache objects retry their acquire RPCs after some timeout whenever they get a RETRY status code returned from an acquire call.
More specifically, in Lab 5, when the lock_client_cache received a RETRY, it was enough to wait on a conditional variable forever until it received a grant for that lock. Now, in Lab 7, when the master fails, that grant may never come. So, your lock_client_cache should take the RETRY message literally: instead of calling pthread_cond_wait, it should call pthread_cond_timedwait using some small timeout value (say 2 seconds). This call will return either when the conditional variable is signalled (as usual), or when the timeout happens, whichever happens first. The return value of this call will be ETIMEDOUT if the timeout occurred, but if the condition variable was signalled it will be zero. If a timeout did occur, your code should retry the acquire. Check rpc.cc for an example use of pthread_cond_timedwait.
Once this works, you should be able to pass './rsm_tester.pl 1 2 3' many times in a row.
% cd ~/yfs/l7 % make clean % rm core* % rm *log % cd .. % tar czvf `whoami`-lab7.tgz l7/That should produce a file called [your_user_name]-lab7.tgz in your yfs/ directory. Attach that file to an email and send it to the 6.824 submit address.