6.824 Final Project Assignment
Due date for team list: October 11.
Due date for project proposal: October 18.
Due date for progress report: November 13.
Due date for completed project and paper: December 4.
Introduction
In this lab you will define your own project, execute it, and write a
paper about it. The final project is structured in three parts:
- Project proposal. The proposal is a short (maximum of two pages)
proposal for what your project will be. It should state what problem
you are solving, why you are solving it, what software you will write,
and what the expected results will be. You won't be judged on your
proposal; it is there to help you to get started.
- Progress report. This should include a draft of your paper's
abstract, introduction, related work, and design sections; and
(separately) a short status report on your software. We expect the
draft paper sections to be in good shape at this point.
- Project paper. Your paper should be patterned after the research
papers we have read in class. It should contain a problem description
and motivation, a review of related work,
a description of the design of your solution, a
description of your implementation, and an evaluation of how well your
system solved the original problem. The paper should be no longer
than ten pages long (see below for formatting details). Your project
grade will be based on the quality of your paper.
In addition, on the last day of class we will run a mock program
committee meeting, in which you will evaluate each others' papers and
choose the ones most likely to be accepted at a good conference.
Doing a good project is a daunting task. In general, it is best
to tackle a well-defined small problem and do a good job evaluating it. To
help you to define a project we will offer you some suggestions (see
below). We also expect to be involved in all stages of your project.
Please, come talk to us about your project ideas, how you should
execute the project, what you should write about in your final paper,
etc.
The project is to be executed in teams of 3 or 4 students. Find
team-mates and send their names by e-mail to the TA. The email is due
by Tuesday October 11 (that's pretty soon).
Suggestions for projects
You should feel free to propose any project you like, as long as it is
related to operating systems or distributed systems and has a
substantial system-building and evaluation component.
If you are in the PhD program, we expect your proposal to involve
some new idea (i.e. look like research).
We suggest that you base your implementation on the asynchronous
programming libraries you used for the labs. In past years students
have found sfsusrv and their web proxies to be particularly useful
starting points for projects. If your project needs to run on the
client side of a network file system, you may want to use the software
described in Section 6 of the user-level file system toolkit paper;
you can find the software in /home/to2/labs/classfs
and ccfs.tar.gz.
Here is a list of project suggestions. Some of them are more or
less complete ideas; others are starting points from which
to think about project ideas.
- Make a distributed shared memory (DSM) system, so that processes
running on different machines can share an address space. The Shared
virtual memory section of Appel and
Li's paper outlines how to do this, and includes references to
more detailed descriptions. You would need a plan to allow caching
but maintain consistency. You would also want to find at least one
program that could take good advantage of DSM, to help you evaluate
your system.
- Design and implement a disk scheduler that enforces priority. The
point would be to give disk operations that interactive processes are
waiting for priority over (for example) background page-outs,
read-ahead, delayed writes, &c. Perhaps the disk scheduler should also
pay attention to process priorities. You would need to demonstrate
that the scheduler actually improved some aspect of system
performance. The danger is that there is probably a tradeoff between
enforcing priority and scheduling the disk efficiently.
- Add some of the explicit event queuing ideas from SEDA to libasync.
SEDA should allow one to build servers that have not just high
performance, but also the ability to handle overload in a stable
way. You might build a SEDA-like toolkit using libasync, implement a
high performance server (perhaps a web server) using the toolkit, and
demonstrate that it had better behavior under overload than a more
naively designed server.
- Build a service that maintains consistent replicated data. You
could build a general-purpose service (like DDS)
or an application that replicates in a way tailored to that
application's needs (like the Porcupine
mailbox service).
- Make a fast distributed index generator. The input would be a set
of text documents; the output would be, for every word, a sorted list
of all the places the word occurs (i.e. document # / offset pairs).
The input might be hundreds of gigabytes in size. This is one of the
basic operations required to run a Web search engine, and takes on the
order of a week to compute. One place you can learn about index
generation is the book Managing
Gigabytes. You'll also want to check out existing work in this
area, for example NowSort.
- Build a version of
Freenet
in which each node has an identifier,
and a node preferentially stores/caches files whose IDs are close to
its node ID. This might help concentrate documents on predictable
nodes, and thus make lookups more efficient. It might also make it
less likely that unpopular documents are discarded, since there might
be a more natural notion of a node that ought to store a document. A
significant challenge would be to demonstrate that this idea genuinely
improves Freenet's performance or data availability.
- Implement a system like
Network Objects
in C++.
- Build a more full-featured version of your Semantic File System
lab. You needn't preserve any of the specifics of the original
semantic file system proposal, just the spirit.
- Design and build a proxy that allows access via SFS to resources
other than files on the server's disk. For example, build an SFS front
end to a database. This would be a useful tool for making Athena
resources such as Hesiod and Moira accessible with a file-system
interface. Access to FTP servers via SFS may also be an interesting
project. In all cases the challenge is figure out how to provide a
sensible interface to objects that don't act like standard UNIX files.
You may be able to learn from the Plan 9 9P protocol.
- Implement flexible access control lists for SFS. Right now SFS
servers applies UNIX-style access control, and thus every user needs
to have an account (or something much like one) on the SFS server.
Instead, you could imagine controlling access to your files by placing
users' public keys in access control lists. You would get precise
control this way, and you could give access to your files to users who
did not have accounts on the SFS server. In practice, you would not
want to put public keys in the ACLs; instead you would want to be able
to name users and groups of users. You might want to use
SDSI/SPKI
to help you map names to public keys.
- Design and build an on-disk file system representation consisting
of just a B-tree. You probably want to modify sfsusrv to make calls to
a B-tree package such as Berkeley DB rather
than (as currently) to the UNIX file system. Your challenges are to
figure out (1) how to make the NFS operations efficient using the
B-tree and (2) how to make crash recovery work well. You can view
this as an elegant simplification of the SGI XFS file system.
- Choose some aspect of
NFS
that is slow, and re-design the
protocol, client algorithms, and/or server algorithms and disk layout
to make it faster. Implement your design and show that it's a good
idea. For example, any RPC that causes a conventional NFS server to
write the disk tends to be slow, because servers make sure the writes
are on the disk before they send the RPC reply.
NFS3
(here's the spec)
relaxes this
restriction for data writes -- the client continues to buffer written
data after the write RPC, allowing the server flexibility to batch and
schedule writes; the client only insists that the server flush the
writes to disk when the client wants to re-use the buffers. Perhaps
similar changes to the protcol could make operations like file create
and delete faster, by having the client log such operations and letting
the server complete them at its convenience. The client would replay
some of the log to the server after a server crash and reboot. The
challenge here is to achieve higher performance while retaining
reasonable behavior after failures.
- Design a disk layout for a file system and implement it in an SFS
server. Make sure your layout and update algorithms have good crash
recovery properties. You may want to look at this somewhat anqique
6.033 lab
assignment; the goal is the same though the
SFS tools are now different.
Don't worry if some other group plans to work on the same suggestion
as you do -- we can probably find a way for multiple groups to share a
general project area without significant overlap.
You can look
here
to see the projects from this course last year.
Your Paper
This section provides some suggestions and guidelines on writing style
and some of the things we will look for in your final paper.
Suggestions on Writing Style
Your paper should be as long as is necessary to explain the problem,
your solution, the reasons for your
choices,
and your analysis of your solution.
It should be no longer than that. The body of your paper must not
exceed ten 11-point, single-spaced pages in length. Please use
1-inch margins. In general, your paper's style and arrangement should
be similar to the papers we've read in class.
A good paper begins with an abstract. The abstract should summarize
what a reader will learn by reading the paper. It should not
be an outline of the organization of the paper. It should describe the
problem to be addressed, the essential points of your solution, and
any conclusions you have drawn. It should be about 150 words long.
The body of your paper should expand the points made in the abstract.
Here you should:
- Introduce the problem and the externally imposed constraints,
and explain why the problem is worth solving.
- State the goals of your solution clearly.
- Describe the design of your solution.
You may wish to divide the description into a high level
architecture and a set of lower-level implementation decisions.
This would be a good place for pictures and diagrams.
- Analyze how well the system you built fulfils your goals.
Depending on your system, the analysis might deal with
performance in the sense of throughput or running time;
but keep in mind that factors such as reliability,
functionality, and
useability may be as or more important goals than
performance for some systems.
- Briefly review related work in the area of your project.
The goal is to show either how you extended existing work
or how you improved on it.
- Conclude with a review of lessons learned from your work.
- Cite your sources as you mention them in the text of your
paper, and list all references at the end of the paper;
the format and style should be similar to the technical
papers we read in class. When in doubt, cite the source;
use "personal communication" citations if you have to (e.g.
for ideas given to you by fellow students).
Write for an audience that understands basic O/S and network concepts
and has a fair amount of experience applying them in various
situations, but has not thought carefully about the particular problem
you are dealing with.
How do we evaluate your paper?
When evaluating your paper, we will look at both content
and writing.
Some content considerations:
-
Do you provide motivation for why the problem you chose is
worthwhile or interesting?
-
Does your solution address the goals you stated?
-
Do you explain your decisions and the trade-offs?
-
How complex is your solution? Simple is better, yet sometimes simple won't
do the job. But unnecessary complexity is bad.
-
Does your solution fit well with the rest of the system? If your solution
requires modifying every piece of hardware, software, and data in sight,
it won't be credible, unless you can come up with a very good story why
everything needs to be changed.
-
Is your analysis clear?
Some writing considerations:
-
Is the report easy to understand?
-
Is it well organized and coherent?
-
Does it use diagrams where appropriate?
-
Is there a good abstract and bibliography?
You can find other helpful suggestions on writing this kind of report in
the M.I.T. Writing Program's on-line guide to writing Design and Feasibility
Reports. You may also want to look at the Mayfield
Handbook's explanation of IEEE documentation style. A very good
book on writing style is: "The Elements of Style," by William Strunk Jr.
and E. B. White, Third Ed., MacMillan Publishing Co., New York, NY, 1979.
What to Hand In
You should e-mail your team list to 6.824-staff@pdos.lcs.mit.edu
by October 11.
Your team should e-mail its proposal to 6.824-staff@pdos.lcs.mit.edu
by October 18. It should be no more than two pages. It should
be ordinary text, not an attachment.
Put a copy of the PostScript file containing your final paper
in ~/handin/final/paper.ps, and a tar file containing
your project source in ~/handin/final/source.tar.
Do this by December 4th. Your
project grade will be based on the paper, not on the source.
Make sure you save enough time to write a good paper, since that's
what will determine your grade!