6.828 2012 Lecture 3: O/S Organization plan: O/S organization processes isolation topic: overall o/s design what should the main components be? what should the interfaces look like? why have an o/s at all? why not just a library? then apps are free to use it, or not -- flexible some tiny O/Ss for embedded processors work this way key requirement: support multiple activities multiplexing isolation interaction helpful approach: abstract services rather than raw hardware file system, not raw disk TCP, not raw ethernet processes, not raw CPU/memory abstractions often ease multiplexing and interaction and more convenient and portable note: i'm going to focus on mainstream designs (xv6, Linux, &c) for *every* aspect, someone has done it a different way! example: exokernel and VMM does *not* abstract anything! xv6 has only a few abstractions / services processes (cpu, mem) I/O (file descriptors) file system i'm going to focus on xv6 processes today a process is a running program it has its own memory, share of CPU, FDs, parent, children, &c it uses system calls to interact outside itself to get at kernel services xv6 basic design here very traditional (UNIX/Linux/&c) xv6 user/kernel organization h/w, kernel, user kernel is a big program services: process, FS, net low-level: devices, VM all of kernel runs w/ full hardware privilege (very convenient) system calls switch between user and kernel good: easy for sub-systems to cooperate (e.g. paging and file system) bad: interactions => complex, bugs are easy, no isolation within o/s called "monolithic"; traditional and successful worth thinking about what *has* to be in the kernel Q: could FS be a user-level library? why / why not? note: you could have a small kernel, most functionality at user-level microkernel, exokernel isolation is the most constraining consideration! isolation determines much of the basic design it's much of the reason why we need the notion of process at all isolation will come up again and again what is isolation? the process is the unit of isolation prevent process X from wrecking or spying on process Y memory, cpu, FDs, resource exhaustion prevent a process from wrecking the operating system itself i.e. from preventing kernel from enforcing isolation in the face of bugs or malice e.g. a bad process may try to trick the h/w or kernel what are all the mechanisms that keep processes isolated? user/kernel mode flag address spaces timeslicing system call interface the foundation of xv6's isolation: user/kernel mode flag controls whether instructions can access privileged h/w called CPL on the x86, bottom two bits of %cs CPL=0 -- kernel mode -- privileged CPL=3 -- user mode -- no privilege x86 CPL protects everything relevant to isolation writes to %cs (to defend CPL) every memory read/write I/O port accesses control register accesses (eflags, %cs4, ...) every serious microprocessor has something similar user/kernel mode flag is not enough protects only against direct attacks on the hardware kernel must configure control regs, page tables, &c to protect other stuff e.g. kernel memory how to do a system call -- switching CPL Q: would this be an OK design for user programs to make a system call: set CPL=0 jmp sys_open bad: user-specified instructions with CPL=0 Q: how about a combined instruction that sets CPL=0, but *requires* an immediate jump to someplace in the kernel? bad: user might jump somewhere awkward in the kernel the x86 answer: there are only a few permissible kernel entry points INT instruction sets CPL=0 and jumps to an entry point but user code can't otherwise modify CPL or jump anywhere else in kernel system call return sets CPL=3 before returning to user code also a combined instruction (can't separately set CPL and jmp) but kernel is allowed to jump anywhere in user code the result: well-defined notion of user vs kernel either CPL=3 and executing user code or CPL=0 and executing from entry point in kernel code not: CPL=0 and executing user CPL=0 and executing anywhere in kernel the user pleases Q: could one have process isolation WITHOUT h/w-supported kernel/user mode? yes! see Singularity O/S, later in semester but h/w user/kernel mode is the most popular plan how to isolate process memory? idea: "address space" give each process some memory it can access for its code, variables, heap, stack prevent it from accessing other memory (kernel or other processes) how to create isolated address spaces? xv6 uses x86 "paging hardware" MMU translates (or "maps") every address issued by program VA -> PA instruction fetch, data load/store for kernel and user there's no way for any instruction to directly use a PA MMU array w/ entry for each 4k range of "virtual" address space refers to phy address for that "page" this is the page table o/s tells h/w to switch page table when switching process why isolated? each page table entry (PTE) has a bit saying if user-mode instructions can use kernel only sets the bit for the memory in current process's address space paging h/w used in many ways, not just isolation e.g. copy-on-write fork(), see Lab 4 note: you don't need paging to isolate memory type safety, JVM, Singularity but paging is the most popular plan how to isolate CPU? prevent a process from hogging the CPU, e.g. buggy infinite loop how to force uncooperative process to yield h/w provides a periodic "clock interrupt" forcefully suspends current process jumps into kernel which can switch to a different process kernel must save/restore process state (registers) totally transparent, even to cooperative processes called "pre-emptive context switch" note: traditional, but maybe not perfect; see exokernel paper back to system calls i've talked a lot about how o/s isolates processes but need user/kernel to cooperate! user needs kernel services. what should user/kernel interaction look like? can't let user r/w kernel mem (well, you can, later...) kernel can r/w user mem but don't want to do this too much! so style of system call interface is pretty simple integers, strings (copying only), user-allocated buffers no objects, data structures, &c never any doubt about who owns memory let's illustrate by tracing sys calls in xv6 on-screen: xterm -fn 10x20 illustrate sh.c exercise draw parent/child diagram echo hi echo hi > x echo hi | wc