Hive: Fault Containment for Shared-Memory Multiprocessors Chapin, Rosenblum, Devine, Lahiri, Teodosiu, Gupta SOSP 1995 why are we reading this paper? FLASH is a big distributed computation system just like a bunch of machines on a network tring to handle failures in big systems relaxed failure model strict enough to be useful relaxed enough to be reasonable to implement i.e. more practical than Hypervisor scheme happens to involve a sequentially consistent memory system background what are the apps like? shared mem, threaded, scientific FLASH hardware, grid, per-node memory, cache-coherence, directory(?) stress that every node h/w can write all memory (modulo firewall) ordinary SMP kernel vs one kernel per node vs kernel per cell single system image across cells sharing seems to be file-system based every node has a disk? when I read, node w/ disk reads to local mem, I use that remote buffer anonymous memory is copy-on-write, not shared r/w? (5.3) what are the top-level goals of the entire system? huge shared-memory multiprocessor only justified if good support for shared memory and flexible allocation of CPUs so they're going to spread computations out over nodes a computation's memory a computation's threads what are the key problems? nodes fail, making memory inaccessible nodes return bad values for memory reads s/w issues wild writes that corrupt other nodes' memory what properties are they looking for? what is "fault containment" not true fault-tolerance / masking if a node fails, they are willing to lose the programs/data on that node they don't want the problem to spread and they'd like policies that make a 1% failure affect only 1% of apps what are they willing to give up? smp-style single kernel though they hack cell kernels to present single system image? what mechanisms do they propose careful reads point: detect kernel data mangling due to nodes failing not really protecting against arbitrary failures is the point really crash while updating some kernel data structure? firewall hardware help protect against wild writes where does the firewall hardware sit? guards memory module against remote writes what's in the firewall hardware? 64 bits per phys mem page, one bit per node when does the system set the firewall to allow writes? when any CPU has mapped that page so really just protecting against app wild writes OK, firewall protects against some wild writes but what about pages a failed node was allowed to write? they might have been corrupted before the crash! what does Hive due after a cell has crashed to deal w/ wild writes to allowed pages? they detect damaged files all user-level pages writeable by failed node? and give I/O errors to processes that had those files open and try to use them presumably including LD/ST as well as read()/write() looks like shared memory only occurs via shared mmap()ed files why is firewall better than VM protection? VM enforced by potentially faulty h/w and o/s firewall enforced by memory owner how do they detect failed cells? policies in 5.6: try to place a process's pages on few cells to minimize the number of nodes that could crash a process try to place a file's pages on few cells since entire files are marked bad if bad cell could write one page they talk about a memory fault model what is the model? how could you test a system like this? can it contain faults?