6.824 Spring 2005 Paper Questions

Please remember to print your answer for each day's paper question on a sheet of paper and hand it in at the beginning of class.

Thu Apr 28: Frangipani: A Scalable Distributed File System, SOSP 1997. Suppose a server modifies an i-node, appends the modification to its log, then another server modifies the same i-node, then the first server crashes. The recovery system will see the i-node modification in the crashed server's log, but should not apply that log entry to the i-node, because that would un-do the second server's change. How does Frangipani avoid or cope with this situation?

Tue Apr 26: Recovery Management in QuickSilver, 1988. (You can skip Section 4.3 and Section 5.) Section 4.1 says that the one-phase commit protocol is used by servers that maintain only volatile state. What additional action(s) do servers with non-volatile (persistent) state do that requires them to participate in two-phase commit?

Thu Apr 21: FAB: building distributed enterprise disk arrays from commodity components. (Skip Sections 4.2 and 6.) To perform reads and writes of replicated blocks, FAB uses two types of timestamps, ordTs and valTs, as described in Section 4.1. Can FAB still guarantee strict linearizability without using ordTs? If yes, explain why; otherwise, describe a scenario where this consistency model breaks if FAB only uses valTs during reads and writes. According to strict linearizability, a read of a block must return the latest write to the block.

Wed Apr 13: Optimistic Replication Using Vector Time Pairs. The paper's Section 2.1.4 says that replicas need not keep deletion records. Suppose that replica R1 creates file F, then synchronizes with R2, then R1 deletes its copy of the file and discards the deletion notice, then R2 modifies its copy of the file, and then R1 and R2 synchronize again. What reasoning will R1 use to conclude that there is a conflict?

Thu Apr 8: TreadMarks: Distributed Shared Memory on Standard Workstations and Operating systems, USENIX 1994. Suppose that a simplified version of Treadmarks, called Dreadmarks, simply sent all modifications of variables between an acquire and a release to the next processor to acquire the same lock. Deadmarks doesn't otherwise communicate any changes to memory. Outline a specific simple situation in which Treadmarks would provide more useful or intuitive memory behavior than Dreadmarks.

Thu Apr 6: Ivy, PODC 1986. The write fault handler in Section 3.1 ends by sending a confirmation to the manager, and that the "Write server" code on the manager waits for this confirmation. Suppose you eliminated this confirmation (both the send and the wait) from Ivy. Describe a scenario in which lack of the confirmation would cause Ivy to behave incorrectly. You can assume that the network delivers all messages, and that none of the computers fail.

Thu Mar 31: Hypervisor-based Fault Tolerance, SOSP 1995. Suppose the primary fails due to a power failure, the backup takes over, and then power to the primary is restored. You would like the primary to resume replicated execution so that the system could tolerate a subsequent failure of the backup. Is this likely to be easy with hypervisor-based fault-tolerance? Why or why not?

Tue Mar 15: XOM, 2000. Suppose that every PC had a XOM CPU, and that a vendor of expensive CAD software wanted to use XOM for copy protection. The idea would be that a customer would provide his or her XOM public key to the vendor, which would encrypt the CAD program so that only a XOM CPU with the corresponding private key could read the program. A malicious customer might compute a new public/private key pair and send the public key to the vendor. If the vendor sent back a copy of the CAD program encrypted with this public key, the customer could easily decrypt it, thus breaking the copy protection scheme. How do you suppose the XOM designers intended that the CAD vendor prevent this attack?

Thu Mar 10: No question on OKWS.

Thu Mar 3: Backtracking Intrusions, 2003. What specific purpose does the BackTracker system serve? For example: if an intruder breaks in, modifies a bunch of files and leaves unnoticed, has this system been defeated? Why or why not?

Tue Mar 1: Manageability, availability, and performance in Porcupine: a highly scalable, cluster-based mail service, 1999. Suppose a Porcupine node crashes, then a mail message M1 arrives that should be stored in a fragment on the crashed node, then the user reads and deletes M1, then message M2 arrives that should be stored in the fragment, then the node reboots. The fragment is replicated (which is why the read and delete work). Ideally the node would end up with a copy of M2 but not M1 after the reboot. What specific Porcupine mechanism(s) ensure that the node gets a copy of M2? And what mechanism(s) ensure that it does not end up with a copy of M1?

Thu Feb 26: A Coherent Distributed File Cache with Directory Write-Behind. Read just Sections 1, 2, 3, and 4. Echo's guarantee B on page 6 says that writes to a given object become stable in the same order that they are performed. Does FSD provide ordering semantics at least as strict as this rule? That is, in every situation in which Echo enforces an update order because of this rule, would FSD also write the updates to disk in the same order? If yes, briefly say why; if not, give a counter-example.

Thu Feb 17: Reimplementing the Cedar File System Using Logging and Group Commit, 1987. At the end of Section 4, the paper says that during a one-byte file create FSD writes the leader+data page synchronously to the disk, but records the update to the file name table in memory and only writes it back to disk later. Why do you suppose the FSD designers decided to write the data page synchronously? What (if anything) might go wrong if FSD instead wrote the file's data in the in-memory disk cache, and only wrote it to disk later?

Tue Feb 15: Flash: An efficient and portable Web server, 1999. How would you expect the relative performance of MP, MT and AMPED servers improve if the Flash experiments were run on a multiprocessor machine? (with no modifications to the code)

Thu Feb 10: The Interaction of Architecture and Operating System Design, 1991. We're reading this paper in order to dig into some details of important O/S abstractions, and to talk about how they affect performance. You can skip Sections 3 and 4, but do read Sections 5 and 6. The Question: Table 1 shows that the MIPS R2000 runs applications 4.2 times faster than the CVAX, and that the newer R3000 runs applications 6.7 times faster than the CVAX. What do you suppose is the main reason why the R3000 is faster than the R2000?

Tue Feb 8: A toolkit for user-level file systems , 2001. Refer to the description of the automounter in Section 4.3. Suppose a user level file system has asked the automounter to mount it on /sfs/newfs. The paper says that when an application first accesses /sfs/newfs, the automounter returns a sequence of two symbolic links. The automounter delays the response to the second READLINK, and eventually returns "/sfs/newfs". What would go wrong if the automounter returned only one symbolic link, delayed the response to the first and only READLINK, and then returned "/sfs/newfs"?

Thu Feb 3: Design and Implementation of the Sun Network Filesystem, 1995. NFS file handles are intended be be file identifiers. Would it be reasonable for an NFS server implementation to use a file's name (e.g. "/usr/rtm/webproxy1. C") in the file handle, instead of the inode number? If not, give an example of a serious problem that would result. Ignore the question of whether file handles are big enough to hold long file names.