Required reading: KeyKOS, Confused deputy.
Reference reading: KeyKOS details.
the microkernel vision implement main o/s abstractions as user-space servers files, tcp, process mgmt, paging kernel provides minimum to support servers address spaces threads inter-process communication (IPC) some IPC handle which can send/receive messages why attractive? (compared to monolithic UNIX-style design) isolation (vs bugs) fault-tolerant (restart individual services) modular (understandable, replaceable services) easier to distribute (IPC allows services on other hosts) easier to run on multiprocessor (since many servers) easier to provide scheduling, priority (at least at kernel API) easier to provide security (at least at kernel API) history individual ideas around since the beginning lots of research projects starting early 1980s hit the big time w/ CMU's Mach in 1986 thought to be too slow in early 1990s now slowly returning (OSX, QNX in embedded systems/routers, etc) ideas very influential on non-microkernels microkernels in practice huge issue: Unix compatibility critical to widespread adoption difficult: Unix was not designed in a modular fashion Mach, L4: one big Unix server not a huge practical difference from a single Linux kernel Mach in particular was quite slow, many decided microkernels a bad idea KeyKOS: more interesting structure split up Unix into many entities first, some background on what problems KeyKOS was trying to solve
traditional access control model: process has some privileges (in Unix model: UID) for each syscall, kernel checks that process's privileges allow it (e.g. opening file) what problem was Norm Hardy facing on such a system? wanted to give fortran compiler access to a special file (/sysx/stat) to collect statistics gave fortran compiler (/sysx/fort) a "home files license" kind-of like setuid on Unix; allowed process to write to /sysx user executed "/sysx/fort code.f -o /sysx/bill" wiped out unrelated billing file whose problem is this? paper argues all components were working fine in isolation problem was giving extra privileges to the compiler -> every file access made by the compiler was now more privileged, even though only one of them (opening /sysx/stat) should have been! doing this correctly would have required looking at all places where we opened files in the compiler. proposal: explicitly specify privileges to use for every operation why might this be a good idea? easier to write secure, privileged programs program will not grant its privileges to things done on user's behalf what are the limitations? somewhat invasive design implications have to pass around capabilities instead of user-meaningful names notion of a user identity is at odds with a pure-capability design capabilities used in many settings hardware: Cambridge CAP system, x86 segments (in a way) OS kernel: Hydra, KeyKOS (and its successors) distributed systems: Amoeba self-authenticating capabilities don't need to be tracked by OS kernel the same way we'd track FDs capability-like things appear in many places Unix FDs Java object references (and type-safe languages in general) URLs (that include authority info, rather than using HTTP auth)
KeyKOS: capability as the base access control mechanism more structure than Mach very small kernel, provides a few kinds of objects to applications: devices - access to underlying hardware, given to device driver processes pages - 4KB of data, basically a page of memory nodes - a block of 16 keys (key is their term for a capability) segments - a virtual address space like a page table or page directory, implemented using nodes, mapping to page keys at the leaves can construct segments out of other segments meters - CPU time allocation (even CPU-time explicitly allocated!) domains - something like a Unix process domains are the most interesting object: 16 general-purpose key slots (kind-of like capability registers) effectively an implicit node object address slot: key to entire virtual memory of process meter slot: key to CPU meter for process keeper slot: key to another domain that will handle exceptions objects named by keys key is actually a 12-byte blob, but cannot handle their bytes directly instead, manipulated through explicit operations, kind-of like FDs (open, close, dup) the KeyBits service returns the actual 12 bytes behind any key given to it unknown: can you supply a virtualized KeyBits to hide the fact that other keys are being virtualized as well? what API does the kernel provide? at a low level, just 3 system calls: void FORK(key k, msg m): send m to k's domain, continue running afterwards msg *CALL(key k, msg m): send m to k's domain (+ newly-formed resume key for sender) and suspend msg *RETURN(key k, msg m): send m to k (will be returned from its CALL) and wait for next message why does the kernel require the receiver to be available? what's the alternative? why does RETURN wait for a new message? what if we don't want to wait? process/domain has 3 states: available, running, waiting process transition diagram kernel design suggests an object-oriented structure for applications other things you'd expect to be system calls are messages to special kernel-managed objects CALL to a device to access it how would you map a page of memory? what are the benefits of doing it this way? easy to override built-in kernel objects. how could one give keys to others? can clone keys (like dup on a Unix FD) keys include restriction flags -- e.g. can make a read-only IPC primitive allowed passing 4 keys (which could refer be nodes for more keys) can tack a "data byte" onto a key on creation (delivered to the key's domain on key invocation) useful for keeping track of where this key came from distinguish multiple callers of the same service domain we'll see how it might be used later on object keepers (for meters, segments, domains) handle exceptions: segment keeper: domain to handle page faults conceptual model: want program loads and stores to translate into IPC messages hardware doesn't quite do this for us so treat address space as a cache, and insert/remove mappings as needed load/store on an invalid mapping now will generate an IPC message meter keeper: domain to call when CPU time expires, controls CPU time allocation really more about billing than fine-grained scheduling policy domain keeper: domain to call for other CPU exceptions didn't vector FORK/CALL/RETURN system calls to keeper; should it have? kernel makes faults look like a CALL from the faulting domain to keeper invoked keeper gets a "service key" to faulting object, to fix it up as needed where do objects come from? a "bank", implemented as a domain of its own (synthesized the first time system boots up) top-level bank has all nodes and pages in a system other objects (domains, meters, segments) are really just special forms of a node how many nodes/pages did a bank have? total disk space in a system (has nothing to do with a machine's DRAM size) would that be a good idea today? probably not in general-purpose systems: want over-allocation for performance maybe OK for embedded or special-purpose systems invoke a bank to allocate a new object, returns a "service key" for that object for domain, can populate domain's slots and make it runnable very much like JOS exofork how to free up space? child space banks can be deallocated completely, releasing resources to parent what was the idea of persistence? applications have no notion of a disk, just memory kernel periodically saves complete system state to disk no other kernel state of interest kernel stacks, scheduler queues, buffers -> ephemeral open files, processes, messages waiting to be sent -> stored in kernel objects how did applications use the disk, then? just store data in memory kernel saves it to disk eventually to read from disk, just access memory; kernel will demand-page if necessary why does KeyKOS need persistence? kernel provides no way of recovering lost capabilities would have needed some special mechanism to "synthesize" capabilities on bootup with persistence, can use capabilities for controlling access to data on disk problems with persistence could be saving a lot of unnecessary stuff to disk could be hard to recover if "init" dies could be hard to kill runaway processes how do meters and space banks help? hard to deal with bad blocks KeyKOS (and its predecessor, GNOSIS) was a very different system than Unix no file system root every user has their own "home directory", which just mapped names to keys keys included both files and processes that you could invoke (using CALL or FORK) one user's home directory is not nameable by other users (unless granted) every user has a persistent shell domain that keeps a key to user's home dir a login process ("receptionist") keeps a password and start key for each user's shell domain user types in password, login does an RPC call into user's shell, passing in a key for user's terminal when invoking a command in shell, must say if argument is a string or a capability for file of that name must specify all capabilities at invocation time
how did the paper implement Unix compatibility? special Unix keeper to emulate Unix a separate Unix keeper for each Unix process shared state stored in explicitly-shared segment how did the filesystem work? separate domain for each inode file inodes stored data in segment objects to read/write, insert the file's segment into your address-space segment because file inode is isolated from the rest of the FS code, easy to implement local optimizations paper suggests storing data in the inode itself, for small files directory inodes supported lookup, insert, unlink messages who maintains consistency? what happens on a crash? how does fsync work? two kinds of fsync (in KeyKOS at least). file system access control Unix model inodes have an owner user ID and group ID access permissions for the user, group, and everyone else unfortunately this is fundamentally at odds with capability model (users and groups are ambient authority) paper doesn't describe what happened simple version: kernel domain keeps track of an application's user/group IDs, checks against inode's fields better version: embed who's accessing the file in the inode key's data byte (user, group, or everyone else) inode returns appropriate segment key based on the file's perms for the entity specified in data byte e.g. read-only segment for read-only access, or nothing if access prohibited directory inodes can propagate the data byte to any keys they return for files and sub-directories how would we get the user an appropriate key for his home directory in the first place? maybe directory inodes keep a user or group password, and allow a "dirlookup-with-password" op or KeyKOS-style: user home dir key kept by login service, shell keeps homedir key along w/ FS root what benefits do we get from implementing Unix with capabilities (KeyNIX)? perhaps a more reliable kernel (fault isolation) what happened if you crashed your kernel? maybe locked entries in the shared table -- not so great could corrupt other process table entries or file descriptors can we design the system a little better? what about shared file descriptors? what about process table? how to name entries? can run apps in different Unix universes (i.e. different process&FD table segment) on same machine could also have multiple file systems performance evaluation overall, seems to perform OK for such a radical design hard to draw any fundamental conclusions about KeyKOS -- little said about why Mach performs poorly 1st fundamental problem: namei slow what were their proposed alternatives? 1) one namei domain stores all directories, but keep one domain per file inode 2) one domain for entire filesystem, like Mach/L4 2nd fundamental problem: implementing asynchronous mechanisms (KeyKOS IPC is synchronous) how did they implement pipes? queue domain at each end to avoid blocking main kernel domain wanted to to handle signals (^C) even if process blocked writing to pipe figure 2 in paper [perhaps pseudo-code for kernel and queue domains] how did KeyKOS integrate the Unix model with capabilities? still susceptible to confused deputy problem capabilities have to be implemented at all levels, Unix lacks them avoiding confused deputies in Unix would require changing programming conventions
compare to other microkernels? KeyKOS has somewhat more structure than Mach synchronous IPC in KeyKOS, async in Mach complicates Mach kernel, hard to say if performance win or not Mach has lots of system calls for operating on ports KeyKOS implements the same thing with messages superficial difference: KeyKOS has fewer system calls (but so what..) more interesting difference: KeyKOS allows programs to provide their own implementation of objects that might traditionally be serviced by the kernel. Mach always handles system calls in the kernel, would need some special system call redirection mechanism. compare to JOS, exokernel? what ideas could we use from these papers? capability naming env IDs: cannot pass rights to others right now full persistence conflicts with exokernel philosophy but could work per-file protection domains might incur a high overhead in JOS, at least with naive impl lab 5 has a single file server, though could easily run many of them allowing applications to handle faults JOS kind-of does this with page fault upcalls might be nice to vector these to another environment or vector everything (including syscalls) to another environment compare to xv6, Linux? can use file descriptor passing as capabilities would have been nice to pass other things as capabilities too child PIDs (so processes other than parent can wait) a lot of process structure exposed to applications easier to write a debugger easier to do some lower-level optimizations still have ambient authority -- process has a uid must use setuid to switch privileges before/after operations leads to programs that try to emulate security checks in userspace part of why TOCTTOU problems are abundant why didn't pure-capability systems catch on? tightly couples naming and access control, limited notion of user identity requires structuring your data around your access control policy e.g. can give someone a capability for an entire directory, harder to specify permissions for each file biggest win was probably in distributed systems capabilities appear in various forms (Java objects) what about microkernels? running Unix in a single server process not very interesting doing interesting things like KeyKOS requires more design effort harder to evolve system because low-level primitives can be specific to original requirements, changes could require re-design optimizations or features that affect multiple subsystems are hard sendfile affects FS + network stack fixing priority inversion might require re-design