6.824 - Fall 2007

6.824 Lab 1: Lock Server

Last change: $Date: 2007/09/05 23:04:44 $

Due: Friday, September 14, 2007 at 5:00 pm

Introduction

This is the first in a sequence of labs in which you'll build a multi-server file system called Yet-Another File System (yfs) in the spirit of Frangipani. In this lab you'll supply the logic for a lock server that you'll use in subsequent labs to keep multiple file servers consistent.

At the end of all the labs, your file server architecture will look like this:

----------------     -------------
|              |     |           |
|   App   yfs--|-----|extent srvr|----- yfs on other hosts
|    |    |    |  |  |           |  |
|--------------|  |  -------------  |
|    |    |    |  |  -------------  |
|    Kernel    |  |  |           |  |
|  FUSE module |  ---| lock srvr |---
|              |     |           |
----------------     -------------

You'll write a file server process, labeled yfs above, using the FUSE toolkit. Each client host will run a copy of yfs. yfs will appear to local applications on the same machine by registering via FUSE to receive file system events from the operating system. The yfs extent server will store all the file system data on an extent server on the network, instead of on a local disk. yfs servers on multiple client hosts can share the file system by sharing a single extent server.

This architecture is appealing because (in principle) it shouldn't slow down very much as you add client hosts. Most of the complexity is in the per-client yfs program, so new clients make use of their own CPUs rather than competing with existing clients for the server's CPU. The extent server is shared, but hopefully it's simple and fast enough to handle a large number of clients. In contrast, a conventional NFS server is pretty complex (it has a complete file system implementation) so it's more likely to be a bottleneck when shared by many NFS clients. The only fly in the ointment is that the yfs servers need a locking protocol to avoid inconsistent updates. In this lab you'll implement the core logic of a lock server.

Getting Started

For this first lab, you should be able to use any SunOS or Linux Athena machine, as well as your own Linux/BSD/MacOS laptops or desktops. First, create a directory for your 6.824 labs (in our example, we'll call it "yfs"), and download and un-tar the lab RPC code from http://pdos.csail.mit.edu/6.824/labs/yfs-rpc.tgz using wget.

% mkdir yfs
% cd yfs
% wget -nc http://pdos.csail.mit.edu/6.824/labs/yfs-rpc.tgz
% tar xzvf yfs-rpc.tgz

We provide you with a skeleton RPC-based lock server, a lock client interface, a sample application that uses the lock client interface, and a tester. You should fetch this code from http://pdos.csail.mit.edu/6.824/labs/yfs-lab1.tgz to your machine using wget.

% wget -nc http://pdos.csail.mit.edu/6.824/labs/yfs-lab1.tgz
% tar xzvf yfs-lab1.tgz
% cd l1
% make

Now start up the lock server, giving it a port number on which to listen to RPC requests. You'll need to choose a UDP port number that other programs aren't using. For example:

% ./lock_server 3772

Now open a second terminal on the same machine and run lock_demo, giving it the port number on which the server is listening:

% cd ~/yfs/l1
% ./lock_demo 3772
stat request
stat returned 0
%

lock_demo asks the server for the number of times a given lock has been acquired, using the stat RPC that we have provided. In the skeleton code, this will always return 0.

The skeleton code for the lock client does not do anything yet for the acquire and release operations; similarly, the lock server does not implement any form of lock granting or releasing. It will be up to you to not only fill in the client and server functionality, but to implement the RPC protocol between the two processes. You should study the given stat protocol to see a working example of an RPC within our system.

We have supplied you with a program lock_tester that tests whether the server grants each lock just once at any given time, under a variety of conditions. You run lock_tester with the same arguments as lock_demo. A successful run of lock_tester (with a correct lock server) will look like this:

% ./lock_tester 3772
simple lock client
acquire a release a acquire a release a
acquire a acquire b release b releasea
test2: client 0 acquire a release a
test2: client 2 acquire a release a
. . .
./lock_tester: passed all tests successfully

If your lock server isn't correct, lock_tester will print an error message. For example, if lock_tester complains "error: server granted 61 twice!", the problem is probably that lock_tester sent two simultaneous requests for the same lock, and the server granted the lock twice (once for each request). A correct server would have sent one grant, waited for a release, and only then sent a second grant. The 61 is the lock name "a" in hex.

Your lock_tester must also be able to handle lossy networks, where both RPC requests and replies can be lost or duplicated. We will be testing your program using the lock_tester over channels with a lossy percentage of at most 5%. To test your code against such loss rates, you can set the RPC_LOSSY environment variable to the percentage of packets that will be dropped or duplicated. For example, to test your code using Test 2 of the lock_tester under 5% loss rate, do the following under the bash shell:

% export RPC_LOSSY=5
% ./lock_server 3772 &
% ./lock_tester 3772 2
. . .
./lock_tester: passed all tests successfully
% fg
[Use Ctrl-C to kill the lock_server]

If you are using the tcsh shell, use 'setenv RPC_LOSSY 5' instead of the export command above. Note that it will take a bit longer to run through the tests using lossy channels -- this is because the RPC library must timeout and retransmit the RPCs.

Requirements

Your job is to modify lock_smain.cc, lock_server.cc, lock_server.h, and lock_client.cc so that they implement a correct lock server. Intuitively, correctness means that the lock server grants a given lock to at most one requester at a given time. More precisely, consider the sequence of acquire and release RPCs that leave and arrive at the server for a particular lock. The grants and releases must strictly alternate in the sequence: there should never be two grants not separated by a release for the same lock.

Your lock server must pass the lock_tester tests; you should ensure it passes several times in a row to guarantee there are no rare bugs. We will test your code from your l1/ directory with our own copy of the rpc directory and our own lock_tester.cc. You are free to add new files to the directory as long as the Makefile compiles them appropriately, but you should not need to.

About the Lock Server

The lock server's RPC messages are defined in lock_protocol.h. We have defined RPC numbers for acquire and release operations, and though you are not required to use them it might make your life easier. However, you must still register handlers for these RPCs in lock_smain.cc, which instructs the RPC system to listen for particular RPCs and forward them to the lock server. Any additional RPCs you wish to define must be added both to lock_protocol.h and to lock_smain.cc.

The lock server can manage many distinct locks. Each lock has a name; a name is a string of bytes. Names need not be printable ASCII, which is why the lock_tester prints lock names in hex rather than ASCII. The set of locks is open-ended: if a client asks for a lock that the server has never seen before, the server should create the lock and grant it to the client.

lock_client.cc and lock_client.h implement a client-side interface to the lock server. The interface provides acquire() and release() functions, and must take care of sending and receiving RPCs. See lock_demo.cc for an example of how an application uses the interface.

Different modules in a single application might ask for the same lock at roughly the same time. lock_client.cc handles this by sending two separate acquire RPCs to the server. When one of the modules is done with the lock, lock_client.cc releases the lock back to the server and waits for a second grant.

Important note: The RPC service used by the lock server makes no guarantee that every RPC will be delivered exactly once. In particular, some RPCs might be delivered to the server more than once. For example, if a client tries to acquire a lock, but the server blocks or is too busy to service the request right away, the RPC system might re-send the same RPC, causing the lock server to service it twice (though only a single reply will be delivered to the client). Your lock client and server must be able to deal with this situation reliably; the lock_tester will fail if they do not.

For this lab, you will not have to worry about complete network failures, or security concerns such as malicious clients releasing locks that they don't hold.

Tips

Coding plan

You'll need to modify class lock_server in lock_server.h and lock_server.cc to accept additional RPCs from the lock client, and to keep track of which locks are currently granted and which clients are waiting for locks. There is just one instance of class lock_server in the lock server. You'll also need to register RPCs in lock_smain.cc.

You'll also need to modify class lock_client by filling in the interface implementation found in lock_client.cc. You'll need to modify the lock_client::acquire() function in lock_client.cc to ask the server for a lock, and not return to the client until that lock is granted (or perhaps until the lock server or RPC system return an error). Similarly, you'll need to modify the lock_client::release() function in lock_client.cc to tell the server that the lock should be released.

To give you a feel for how much work should be involved, our solution adds about 120 total lines of code to lock_server.cc, lock_client.cc and lock_smain.cc.

pthreads

This series of labs is built around the POSIX threads (pthreads) programming model. The pthreads package allows you to run different tasks concurrently within a process, lock access to shared data during critical sections, and communicate between threads using shared data. A comprehensive guide to programming with pthreads can be found here: http://www.llnl.gov/computing/tutorials/pthreads/.

The RPC system in the labs launches a new thread for each RPC received by the server. Similarly, the lock_tester code launches several concurrent threads during the course of its testing. Thus, if you have data in either the lock_server or lock_client classes that might be accessed by multiple threads, you need to protect access to that data using mutexes. pthreads provides a class, pthread_mutex_t, that you can use to lock these variables during access. The important methods you'll need to use are pthread_mutex_init, pthread_mutex_lock, and pthread_mutex_unlock. If a thread tries to lock a locked mutex, the lock call will not return until the original locker unlocks the mutex (and only when it is unlocked from the original thread -- you cannot lock and unlock pthread mutexes from different threads). See here for more details, and look at the example use of count_mutex in the lock_tester.

You might also find that you need to have one or more threads wait for something to happen in a different thread. You might want to use the class pthread_cond_t, and the methods pthread_cond_init, pthread_cond_wait, and pthread_cond_signal, for this purpose. These go hand-in-hand with the mutexes, please see here for more details.

STL

For data management issues, we recommend that you use the Standard Template Library, a collection of C++ classes designed to accomplish many common programming tasks. You will probably find the classes std::string, std::map and std::vector particularly useful during these labs.

Debugging

printf statements are always your friend when debugging any kind of problem in your programs. However, there are more advanced debugging techniques available. If your program is crashing, you should consider using gdb to figure out where in the program the crash occurs, and what the memory looked like at the time of the crash. You can use gdb on core files as explained in the Lab Aids section, or you can use gdb on a live program. For example, you can run your lock server under gdb as follows:

% gdb ./lock_server
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.8"...

(gdb) run 3772
Starting program: yfs/l1/lock_server 3772

Then you can set breakpoints, examine memory at any time, etc.

Collaboration Policy

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assignment. You are not allowed to look at anyone else's solution (and you're not allowed to look at code from previous years). You may discuss the assignments with other students, but you may not look at or copy each others' code.

Handin procedure

You will need to email your completed code as a gzipped tar file to 6.824-staff@pdos.csail.mit.edu by the deadline at the top of the page. To do this, execute these commands:

% cd ~/yfs
% tar czvf `whoami`-lab1.tgz l1/

That should produce a file called [your_user_name]-lab1.tgz in your yfs/ directory. Attach that file to an email and send it to the 6.824 staff address.

For questions or comments, email 6.824-staff@pdos.csail.mit.edu.
Back to 6.824 home.