6.824 - Spring 2004

6.824 Lab 5: File Server Part Two

Due: Tuesday March 16th, 1:00pm.

Note

The code for db->remove() in the blockdbc.C that we give you does not interact correctly with caching: it always sends the BLOCK_REMOVE RPC to the block server, even in the case of a newly created block that it has never sent to the server. If you feel like fixing this, the best solution is to modify blockdbc.C to remember whether each block is newly created and never been sent to the server, and to suppress the BLOCK_REMOVE RPC in that case. It's also fine to simply ignore the status bool in the callback called by remove_fh().

Introduction

Now it's time to turn ccfs into a distributed NFS server, so that you can mount the same file system on multiple hosts. There are three main parts to this task. First, each ccfs should keep a local cache of blocks to increase its own efficiency and to reduce the load it places on the shared block server. Second, the ccfs servers need to achieve cache consistency: they need to make sure they when they use a block, they use the latest contents. Third, the ccfs servers need to make sure that multi-step operations such as NFS3_CREATE are atomic with respect to concurrent operations on the same directories or files.

We supply you a simple write-back block cache, a shared lock server, and a simple client interface to the lock server. Your job is to modify fs.C to use locks, to modify the lock client to cache released locks and give them back to the server only when other hosts request them, and to achieve cache consistency by ensuring that ccfs only caches data as long as it holds the lock that protects the data.

Your server will be a success if it manages to operate out of its local block and lock cache when reading/writing files and directories that other hosts aren't looking at, but maintains correctness when the same files and directories are concurrently read and updated on multiple hosts.

The Software

Download the Lab 5 starter files from http://pdos.lcs.mit.edu/6.824/labs/fs-lab-2.tgz.

% wget http://pdos.lcs.mit.edu/6.824/labs/fs-lab-2.tgz
% tar xzvf fs-lab-2.tgz
% cd fs-lab-2
% ./setup
% ./configure --with-dmalloc --with-classfs=/u/6.824/classfs-0.0 i386-freebsd
% gmake

At this point you should add your code for NFS3_REMOVE from the previous lab.

To illustrate the problem you're trying to solve, try this:

% ./blockdbd 5567 &
% ./ccfs d1 localhost 5567 &
root file handle: 69f83479b7517cab
% ./ccfs d2 localhost 5567 69f83479b7517cab &
% touch /classfs/d1/fff
% ls -lt /classfs/d2/.
%

Even though /classfs/d1 and /classfs/d2 should be the same underlying file system, a file created in one did not appear in the other. The reason is that this new version of the ccfs software has write-back block caching: when you created fff, the first ccfs did not write the modified blocks back to the block server, but modied them only in its local cache.

Since your file server from the previous lab doesn't have a block cache, it would have shown fff in the d2 directory listing.

The Block Cache

blockdbc.C contains ccfs's client interface to the block server. The block client implements a write-back cache. blockdbc::get() first checks for the desired block in the cache; it only contacts the block server if the block isn't in the cache. blockdbc::put() only writes the new value to the cache; it does not send the new value to the block server.

blockdbc provies two new functions to control the cache: flush() and flush_prefix(). flush(key, cb) writes the indicated block to the server if it is dirty (modified) in the local cache, discards the block from the cache, and calls the call-back when the server replies. flush_prefix(prefix, cb) does the same thing for every cached block whose key starts with the indicated prefix. flush_prefix() could be used to flush a file handle and all associated data blocks in one call.

The Lock Server

The block server program provides a logically separate lock server, in lock_server.C. You don't need to modify the lock server, though you will modify the lock client code. The lock server maintains a set of named locks. The lock server does not know what the locks mean or what they are protecting. It just remembers which client (ccfs process) holds each lock, and which client(s) are waiting for each held lock.

A client asks for a lock by sending the lock server an ACQUIRE RPC with the name of the lock as argument. The lock server immediately replies, but the reply does not mean anything other than it received the request. The server adds the client to a list of clients waiting for locks. When the lock is free, the server will send that client a GRANT RPC with the name of the lock.

When a client is done with a lock, it sends the server a RELEASE RPC. The server may then grant the lock to some other waiting client, or (if no-one is waiting) simply forget about the lock.

To help support client caching of locks, the lock server sends a client a REVOKE RPC if the client is holding a lock that some other client has requested.

The Lock Client

When ccfs.C sets up the server, it creates an instance of a lock_client (source in lock_client.C) and gives it to the fs. You can get at the lock_client using the lc element of fs.

fs.C can ask for a lock by calling the lock client's acquire(name, cb) method; the lock client then asks the lock server for the named lock, and calls cb when the server grants the lock. fs.C should call release(name) when it's done with the lock.

As a convenience for cache management, you can supply the lock client with a callback which it calls just before sending each RELEASE RPC to the lock server. This gives you the chance to flush blocks out of the block cache when you give a lock to some other ccfs process. fs.C uses this feature to arrange for fs::flush() to be called whenever a lock is released. The lock client calls fs::flush() with a callback which flush() must call when it's done flushing the block cache; it's this callback that actually sends the RELEASE RPC back to the lock server.

Correct Operation

Your goal is to produce a ccfs server that's correct as well as high performance. Correctness has two aspects: atomicity and cache consistency.

Concurrent NFS RPCs should be atomic with respect to each other. Suppose two ccfs processes are serving the same underlying file system (i.e. they share a block server and a root file handle). If you send each of them an NFS RPC at the same time, the final result should be a result that could have occurred if you had sent the two RPCs one at a time.

For example, suppose there's a directory containing the file "f". If you send concurrent NFS3_REMOVE("f") RPCs to the two ccfs servers, then one REMOVE should delete the file from the directory and return a successful reply, and the other REMOVE should return a NFS3ERR_NOENT error and do nothing to the directory. They should not both erase the directory entry and both return a successful result.

A consistent cache is one that always provides the latest value of each block. If ccfs #1 modifies a block, and then ccfs #2 reads the block, then ccfs #2 should see the modifications made by ccfs #1.

Your Job

The ccfs we've provided you does not implement NFS RPCs atomically, and it is not cache-consistent. It is pretty fast, though, since each ccfs operates entirely out of its own local cache without ever talking to the block server. Your job is to add correctness while while still operating out of the local cache when that's compatible with cache consistency.

First, use locks to build yourself a server that executes NFS RPCs atomically. You probably want to have one lock per file and directory, using the file handle as the lock's name. Modify each RPC handler to acquire the relevant lock (perhaps locks) at the beginning, and release them just before the RPC sends the reply. You might find it convenient to modify get_fh() and put_fh() to acquire() and release() the relevant lock.

Second, add cache consistency by flushing blocks when they are no longer protected by locks. You can do this by calling the block client's flush_prefix() method from fs::flush(). You want to maintain some invariants: every block is protected by exactly one lock, and the block is only in a ccfs's cache when the ccfs holds the lock. If you get() blocks only after acquiring the relevant lock, and flush blocks from your cache before the lock client sends the lock back to the lock server, you should be OK.

Third, add lock caching (or "lazy release" of locks). You want to cache locks because at this point the block cache isn't doing you much good. You can only cache blocks while you hold the corresponding lock, and you only hold locks for the duration of NFS RPCs. The solution is to modify the lock client to retain each lock after fs.C calls release(), and only send the lock back to the server when the server sends a REVOKE RPC. This means that you can also leave blocks in the client cache until the corresponding lock is REVOKEd. If no other ccfs's are using the same blocks as you, you can then operate entirely out of the block and lock caches, and never talk to the block server.

You're not responsible for any of the following, some of which could make good 6.824 projects:

Making NFS RPCs atomic with respect to crashes. You should assume there are no crashes.
Distinguishing between exclusive write-locks and shared read-locks.
Recovering locks and dirty blocks from crashed clients.
Recovering from a crashed lock server.
Making the blocks durable or fault-tolerant (i.e. writing them to disk or replicating the block server).

Testing

We've provided you with two testing programs, one for correctness and one for performance.

test2a tests cache consistency and atomicity. Run two copies of ccfs on the same host, using the same block server and root handle, but different directory names under /classfs. Then run test2a, specifying the two directory names. For example:

% ./blockdbd 5567 &
% ./ccfs dir1 localhost 5567 &
root file handle: 4d1def4be444163c
% ./ccfs dir2 localhost 5567 4d1def4be444163c &
% ./test2a /classfs/dir1 /classfs/dir2
Create then read: OK
Unlink: OK
Append: OK
Readdir: OK
Many sequential creates: OK
Concurrent creates: OK
Concurrent creates of the same file: OK
Concurrent create/delete: OK
Concurrent creates, same file, same server: OK
test2a: Passed all tests.

If test2a says "Passed all tests", then it thinks your server provides both atomicity and cache consistency. test2a works by creating and deleting files in the same directory via the two ccfs's.

Use test2b to test performance. It creates two sub-directories and creates/deletes files in only one of the two directories through each of two ccfs servers. If your ccfs caches both blocks and locks, this should be pretty fast. In order to measure performance, we need the block server to print out how many put and get operations the ccfs's performance; you can do this with the -v flag. So do this in one window:

% ./blockdbd -v 5567

And this in another window:

% ./ccfs dir1 localhost 5567 &
root file handle: 4d1def4be444163c
% ./ccfs dir2 localhost 5567 4d1def4be444163c &
% ./test2b /classfs/dir1 /classfs/dir2
Create/delete in separate directories: OK
%

The blockdbd should print out lines of numbers. It prints out a line every 100 put or get operations. Each lines specifies the total number of puts and gets so far. You're interested in the last line it prints, when test2b finishes. If you've implemented caching of locks and blocks well, the last line should show about 10 puts and about 800 gets. If your ccfs didn't cache anything, blockdbd would print about 1400 puts and 64000 gets; this is about the performance your ccfs from Lab 4 should achieve. Good caching can dramatically reduce the load on the shared block server.

Hints

You can test atomicity alone by un-defining BLOCKCACHE at the beginning of blockdbc.C. This will turn off caching, so that all put()s and get()s go to the block server. If your server has implemented atomicity correctly, then you should pass all the test2a tests when you turn off BLOCKCACHE.

Don't use the nfscall or the RPC arguments or reply areas after you send the RPC reply; those objects are deallocated as part of replying. In practice this means you should release locks before replying, since typically the lock will be named by a file handle that's contained in the RPC arguments.

The lock client interface uses str to name locks, but you'll probably be using file handles as lock names. You can convert between the two representations with fs::fh2s() and fs::s2fh().

You can turn on dmalloc debugging to help you find malloc()/free() errors and use of deallocated memory:

% setenv DMALLOC_OPTIONS debug=0x24f41d03,inter=100
% ccfs ...

Collaboration policy

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assigment. You are not allowed to look at anyone else's solution (and you're not allowed to look at solutions from previous years). You may discuss the assignments with other students, but you may not look at or copy each others' code.

Handin procedure

You should hand in a gzipped tarball fs-lab-2-handin.tgz produced by gmake handin-2. Copy this file to ~/handin/fs-lab-2-handin.tgz. We will use the first copy of the file that we can find after the deadline.