6.828 2017 L16: Operating System Organization Topic: what should a kernel do? What kinds of system calls should it support? What abstractions should it provide? These depend on the application, and on programmer taste! There is no single best answer But plenty of ideas, opinions, and debates We'll see some in the papers over the rest of the term This topic is more about ideas and less about specific mechanisms The traditional approach 1) big abstractions, and 2) a "monolithic" kernel implementation UNIX, Linux, xv6 Example: traditional treatment of CPU kernel gives each process its own "virtual" CPU -- not shared implications: interrupts must save/restore all registers for transparency timer interrupts force transparent context switches maybe good: simple, many irritating details abstracted away maybe bad: much is hidden, e.g. scheduling; might be slow Example: traditional treatment of memory each process has its own private address space, looks like RAM maybe good: no need to worry about other process's use of memory no need to worry about security, since private kernel has great freedom to play clever virtual memory tricks maybe bad: limited scope for application-level VM tricks, as in Appel+Li Clever virtual memory tricks played by traditional kernels: lazy page table fill -- fast start-up for big allocations copy-on-write fork() (like Lab 4 but hidden in the kernel) demand paging: process bigger than available physical memory? "page-out" (write) pages to disk, mark PTEs invalid if process tries to use one of those pages, MMU causes page fault kern finds phys mem, page-in from disk, mark PTE valid then return to process -- transparent this works because apps use only a fraction of mem at a given time shared physical memory for executables and libraries zero-copy I/O by sharing block buffers between user and kernel The philosophy behind traditional kernels is abstraction: portable interfaces files, not disk controller registers address spaces, not MMU access simple interfaces, hidden complexity all I/O via FDs and read/write, not specialized for each device &c address spaces with transparent disk paging abstractions help the kernel manage resources process abstraction lets kernel be in charge of scheduling file/directory abstraction lets kernel be in charge of disk layout abstractions help the kernel enforce security file permissions processes with private address spaces lots of indirection e.g. FDs, virtual addresses, file names, PIDs helps kernel virtualize, hide, revoke, schedule, &c Abstractions are a win for app developer convenience app developers want to spend time building new application features they want the O/S to deal with everything else so they want power and portability and reasonable speed Traditional kernels are "monolithic" kernel is one big program, like xv6 easy for sub-systems to cooperate -- no irritating boundaries e.g. integrated paging and file system cache all code runs with high privilege -- no internal security restrictions What's wrong with traditional kernels? big => complex, buggy, unreliable (in principle, not so much in practice) abstractions may be over-general and thus slow maybe I don't need all my registers saved on a context switch abstractions are sometimes not quite right maybe I want to wait for a process that's not my child abstractions can hinder app-level optimizations DB may be better at laying out B-Tree files on disk than kernel FS Microkernels -- an alternate approach big idea: move most O/S functionality to user-space service processes kernel can be small, mostly IPC [diagram: h/w, kernel, services (FS VM net), apps] the hope: simple kernel can be fast and reliable services are easier to replace and customize Examples: Mach 3, L3 JOS is a mix of micro-kernel and exokernel Microkernel wins: you really can make IPC fast separate services force kernel developers to think about modularity good IPC is great for new user-level services, e.g. X server Microkernel losses: kernel can't be tiny: needs to know about processes and memory you may need lots of IPCs, slow in aggregate it's hard to split the kernel into lots of service processes! and it makes cross-service optimization harder so server processes tend to be big, not a huge win Microkernels have seen some success IPC/service idea widely used in e.g. OSX but not much for traditionional kernel services mostly for (lots of) new services, designed to be client/server some embedded O/Ses have strong microkernel flavor More next lecture Exokernel (1995) the paper: O/S community paid lots of attention full of interesting ideas describes an early research prototype later SOSP 1997 paper realizes more of the vision Exokernel overview philosophy: eliminate all abstractions expose h/w, let app do what it wants [ h/w, kernel, environments, libOSes ] an exokernel would not provide address space, pipes, file system, TCP instead, let apps use MMU, phys mem, NIC, timer interrupts not portable -- but lots of app control per-app libOS implements abstractions perhaps POSIX address spaces, fork, file system, TCP, &c each app can have its own custom libOS and its own abstractions why? kernel may be faster due to streamlining, simplicity apps may be faster b/c they can customize libOS Exokernel challenges: what resources to expose to libOSes? what kernel API needed to implement copy-on-write fork at user level? can libOSes share? securely? e.g. compiler reading editor's files can we have sharing+security without big kernel abstractions? will enough apps benefit from custom libOSes? Exokernel memory interface what are the resources? kernel exposes physical pages and VA->PA MMU mappings what's the app->kernel API? pa = AllocPage() TLBwr(va, pa) Grant(env, pa) -- to share memory DeallocPage(pa) and these kernel->app upcalls: PageFault(va) PleaseReleaseMemory() what does exokernel need to do? track what env owns what phys pages ensure app only creates mappings to phys pages it owns decide which app to ask to give up a phys page when system runs out that app gets to decide which of its pages typical use of VM calls application wants memory for a 100MB sparse array, lazily allocated not unlike the mmap() homework PageFault(va): if va in range: if va in table: TLBwr(va, table[va]) else: pa = AllocPage() table[va] = pa TLBwr(va, pa) jump to faulting PC A cool thing you could do w/ exokernel-style memory databases like to keep a cache of disk pages in memory problem on traditional OS: assume an OS with demand-paging to/from disk if DB caches some data, and OS needs a phys page, OS may page-out a DB page holding a cached disk block but that's a waste of time: if DB knew, it could release phys page w/o writing, and later read it back from DB file (not paging area) 1. exokernel needs phys mem for some other app 2. exokernel sends DB a PleaseReleaseMemory() upcall 3. DB picks a clean page, calls DeallocPage(pa) 4. OR DB picks dirty page, saves to DB file, then DeallocPage(pa) Exokernel CPU interface *not* transparent process switching; instead: kernel upcall to app when it gives CPU to app kernel upcall to app when it is taking away CPU (these are upcalls to fixed app locations, not transparent) so if app is running and timer interrupt causes end of slice [diagram] CPU interrupts from app into kernel kernel jumps back into app at "please yield" upcall app saves registers app calls Yield() when kernel decides to resume app kernel jumps into app at "resume" upcall app restores saved registers exokernel does not need to save/restore user registers (except PC) this makes syscall/trap/contextswitch fast A cool thing an app can do with exokernel CPU management suppose time slice ends in the middle of acquire(lock); ... release(lock); you don't want the app to hold the lock despite not running! then maybe other apps can't make forward progress so the "please yield" upcall can first complete the critical section Fast IPC IPC on traditional kernel pipes (or sockets) a message / communication abstraction picture: two buffers in kernel slow: write+read + read+write -- 8 crossings 8 crossings (w/ register save/restore) two blocking calls IPC on Aegis kernel Yield() can take a target process argument kernel up-calls into the target almost a direct jump to an instruction in target process kernel allows only entry at approved locations in target kernel leaves regs alone, so can contain arguments + return value target app uses Yield() to return fast: only 4 crossings, much less save/restore, no blocking read() note IPC execution just appears in the target! *not* a return from read() or ipc_recv() summary of low-level performance ideas mostly about fast system calls, traps, and upcalls system call speed can be very important! slowness encourages complex system calls, discourages frequent calls trap path doesn't save most registers fast upcalls to user space (no need for kern to restore regs) protected call for IPC (just jump to known address; no pipe or send/recv) map some kernel structures into user space (pg tbl, reg save, ...) bigger ideas -- mostly about abstractions custom abstractions are a win for performance apps need low-level operations for this to work much of kernel can be implemented at user-level while preserving sharing and security very surprising protection does *not* require kernel to implement big abstractions e.g. can protect process pages w/o kernel managing address spaces 1997 paper develops this fully for file systems address space abstraction can be decomposed into phys page allocation and va->pa mappings what happened to the exokernel? people don't use exokernels but... first, a word about expectations Exokernel was a research project research success takes the form of influence change how people think help them see new possibilities perhaps they'll borrow a few ideas research success is not the same as having lots of users! it's rare for research to turn into products even if the research is good lasting influence from the exokernel: UNIX gives much more low-level control than it did in 1995 very important for some applications people think a lot about kernel extensibility now, e.g. kernel modules library operating systems are often used, e.g. in unikernels