6.S081/6.828 2019 L16: Operating System Organization Topic: What should a kernel do? What should its abstractions / system calls look like? Three examples today: UNIX, microkernel, exokernel. Answers depend on the application, and on programmer taste! There is no single best answer But plenty of ideas, opinions, and debates We'll see some in the papers over the rest of the term This topic is more about ideas and less about specific mechanisms The traditional approach 1) big abstractions, and 2) a "monolithic" kernel implementation UNIX, Linux, xv6 Example: traditional treatment of CPU each process has its own "virtual" CPU kernel hides multiplexing implications: interrupts must save/restore all registers for transparency timer interrupts force transparent context switches maybe good: simple, many irritating details abstracted away maybe bad: applications may want to control scheduling, know about switches might be slow Example: traditional treatment of memory each process has its own address space kernel hides other processes' memory, physical addresses, MMU, page faults maybe good: no need to worry about other process's use of memory no need to worry about security, since private kernel has great freedom to play clever virtual memory tricks maybe bad: limits application-level VM tricks, as in Appel+Li The philosophy behind traditional kernels is abstraction: portable interfaces files, not disk controller registers address spaces, not MMU access simple interfaces, hidden complexity all I/O via FDs and read/write, not specialized for each device &c address spaces with transparent disk paging abstractions help the kernel manage resources process abstraction lets kernel be in charge of scheduling file/directory abstraction lets kernel be in charge of disk layout abstractions help the kernel enforce security file permissions processes with private address spaces lots of indirection e.g. FDs, virtual addresses, file names, PIDs helps kernel virtualize, hide, revoke, schedule, &c Abstractions are a win for app developer convenience app developers want to spend time building new application features they want the O/S to deal with everything else so they want power and portability and reasonable speed What's wrong with traditional kernels? big => complex => maybe buggy abstractions may be over-general and thus slow maybe I don't need all my registers saved on a context switch abstractions are sometimes awkward maybe I want to wait for a process that's not my child abstractions can hinder app-level optimizations DB may be better at laying out B-Tree files on disk than kernel FS Microkernels -- an alternate approach big idea: move most O/S functionality to user-space service processes kernel can be small, mostly IPC [diagram: h/w, kernel, services (FS VM net), apps] the hope: simple kernel can be fast and reliable services are easier to replace and customize Examples: Mach 3, L3 Microkernel strengths: IPC can be made very fast fast IPC is great for new user-level services, e.g. X server separate service processes force O/S modularity Microkernel weaknesses: kernel can't be tiny: needs to know about processes and memory you may need lots of IPCs, slow in aggregate it's painful to split the kernel into lots of service processes! splitting makes cross-service optimization hard so server processes tend to be big, not a huge win Microkernels have seen some success IPC/service idea widely used in e.g. macOS but not much for traditionional kernel services mostly for (lots of) new services, designed to be client/server some embedded O/Ses have strong microkernel flavor Exokernel (1995) the paper: full of interesting ideas O/S community paid lots of attention describes an early research prototype later SOSP 1997 paper realizes more of the vision Exokernel overview philosophy: eliminate all abstractions expose h/w, let app do what it wants [ h/w, kernel, environments, libOSes ] an exokernel would not provide address space, pipes, file system, TCP instead, let apps use MMU, phys mem, NIC, timer interrupts not portable -- but lots of app control per-app libOS implements abstractions perhaps POSIX address spaces, fork, file system, TCP, &c each app can have its own custom libOS and its own abstractions why? kernel may be faster due to streamlining, simplicity apps may be faster due to customized libOS, access to h/w Exokernel challenges: what resources should kernel expose? what does a libOS need in order to implement fork, pipes, &c? can low-level access to hardware resources be secure? can libOSes share? securely? e.g. pipes, files, wait/exit, &c can't be too customized! can libOS enforce security on shared abstractions? will many apps be able to benefit from custom libOSes? Exokernel memory interface what are the resources? kernel exposes physical pages and VA->PA MMU mappings what's the app->kernel system call interface? pa = AllocPage() -- user code sees physical addresses! TLBwr(va, pa, PTE flags) -- can set up any mapping! e.g. map twice. Grant(env, pa) -- to share memory with other processes DeallocPage(pa) and these kernel->app upcalls (like in the alarm lab): PageFault(va) ReleaseMemory() what does exokernel need to do? track what env owns what phys pages ensure app only creates mappings to phys pages it owns decide which app to ask to give up a phys page when system runs out that app gets to decide which of its pages perhaps cached data; perhaps first save page to disk typical use of VM calls lazy page allocation for some range of addresses, as in the Lab exokernel allows user-level implementation! PageFault(va): if va in range: pa = AllocPage() TLBwr(va, pa) jump to faulting PC else: oops A cool thing you could do w/ exokernel-style memory databases like to keep a cache of disk pages in memory [diagram: DB process w/ cache, DB on disk, swap area on disk] problem on traditional OS: assume an OS with demand-paging to/from disk if DB caches some data, and OS needs a phys page, OS may page-out a DB page holding a cached disk block but that's a waste of time: if DB knew, it could release phys page w/o writing, and later read it back from DB file (not paging area) 1. exokernel needs phys mem for some other app 2. exokernel sends DB a PleaseReleaseMemory() upcall 3. DB picks a clean page, calls DeallocPage(pa) 4. OR DB picks dirty page, saves to DB file, then DeallocPage(pa) Exokernel page fault up-calls are very fast (remember slow page faults were a complaint in Appel+Li) kernel trap handler is simple when dispatching back to user space save PC and a few registers in user-accessible memory decide what kind of trap return to user space handler user handler can continue without re-entering kernel (jump to saved PC) things the exokernel trap handler does *not* do that xv6 does: on entry: save all 32 registers switch page table set up kernel stack on return: restore 32 registers switch page table why exokernel trap dispatch can be so simple: kernel handler doesn't use most of the registers; preserves user values kernel runs with physical addressing, no MMU trap dispatch speed is a big factor in the paper's measurement results Exokernel CPU interface *not* transparent process switching; instead: kernel upcall to app when it gives CPU to app kernel upcall to app when it is taking away CPU (these are upcalls to fixed app locations, not transparent resume) so if app is running and timer interrupt causes end of slice [diagram] CPU interrupts from app into kernel kernel jumps back into app at "please yield" upcall app saves registers app calls Yield() when kernel decides to resume app kernel jumps into app at "resume" upcall app restores saved registers, jumps to saved PC A cool thing an app can do with exokernel CPU management suppose timer interrupt occurs in the middle of acquire(lock); ... release(lock); you don't want the app to hold the lock despite not running! then maybe other apps can't make forward progress so the "please yield" upcall can first complete the critical section Fast IPC process P1 wants to send msg to P2, get reply IPC on traditional kernel pipes (or sockets) a message / communication abstraction picture: two buffers in kernel slow: write+read + read+write -- 8 crossings 8 crossings (w/ register save/restore) two blocking calls (sleep+scheduler+wakeup) IPC on Aegis kernel Yield() can take a target process argument kernel up-calls into the target almost a direct jump to an instruction in target process entry address specified by target, not sender kernel leaves regs alone, so can contain arguments + return value target app uses Yield() to return fast: 4 crossings, little save/restore, no blocking read(), no sched note IPC execution just appears in the target! *not* a return from read() or ipc_recv() summary of low-level performance ideas pick apart big abstractions into small ops that apps can combine in new ways mostly about fast system calls, traps, and upcalls system call speed can be very important! slowness encourages complex system calls, discourages frequent calls trap path doesn't save most registers fast upcalls to user space (no need for kern to restore regs) protected call for IPC (just jump to known address; no pipe or send/recv) map some kernel structures into user space (pg tbl, reg save, ...) bigger ideas -- mostly about abstractions custom abstractions are a win for performance apps need low-level operations for this to work much of kernel can be implemented at user-level while preserving sharing and security very surprising protection does *not* require kernel to implement big abstractions e.g. can protect process pages w/o kernel managing address spaces 1997 paper develops this fully for file systems address space abstraction can be decomposed into phys page allocation and va->pa mappings what happened to the exokernel? people don't use exokernels but... first, a word about expectations Exokernel was a research project research success takes the form of influence change how people think help them see new possibilities perhaps they'll borrow specific ideas research success is not the same as having lots of users! it's rare for research to turn into products even if the research is good lasting influence from the exokernel: UNIX gives much more low-level control than it did in 1995 very important for some applications people think a lot about kernel extensibility now, e.g. kernel modules library operating systems are often used, e.g. in unikernels