6.824 2004 Lecture 1 Opening distributed systems from a systems-building perspective robustness, high performance, flexible construction we'll study design you'll build real systems understand and synthesize many O/S and dist sys ideas Example: how to build HotMail? mail arrives from outside world store it until... user reads/deletes/saves it Solution 1: One server w/ disk to store mail-boxes [picture: MS, sending "clients", reading clients] Lots to talk about even w/ simple client/server system which is why we'll spend some time studying single-server systems Problem: performance Problem: consistenct w.r.t. client-side copies Problem: concurrent mail arrival, deletion Problem: crash recovery (crash while updating mail-box) Problem: availability Stable performance under high load Example: Starbucks. 5 seconds to write down incoming request. 10 seconds to make it. [graph: x=requests, y=output] max thruput at 4 drinks/minute. thruput goes to zero at 12 requests/minute. Efficiency *decreases* with load -- bad. Careful system design to avoid this -- flat line at 4 drinks. Peets, for example. Better: build systems whose efficiency *increases* w/ load w/ e.g. batching, disk scheduling Issue: scalable performance What if more clients than one server can handle? How to make use of more servers? Will dividing the load be easy? Easy case: dd another espresso machine and operator (stateless!) Medium: split mailboxes across servers (separate state, redirects) Hard: distributed index / search engine. update all or query all? Problem: efficient split Problem: load balance Problem: finding the right server Issue: high availability Can I get at my HotMail mailbox if some servers / networks are down? Yes: replicate the data. Problem: replica consistency. delete mail, re-appears. airline reservations. Problem: physical independence vs communication latency Problem: partition vs availability Tempting problem: 2 servers, 2x availability, 2x performance? Issue: security old view: secrecy via encryption (msg to Moscow embassy) user authentication via passwords &c all parties know each other! Internet and other big public systems have changed focus. sit at Internet cafe, give credit card number to Amazon was that really Amazon? who is "Robert Morris"? but I don't know "who" Amazon is. no purely technical approach is likely to solve this problem We want to understand the individual techniques, and how to assemble them. -------------- Course structure meetings: 1/2 lectures on fundamentals, 1/2 reading discussions research papers on working systems must read papers before class otherwise boring, and you can't pick it up by listening we will post paper questions 48 hours in advance (one waiting now) hand in answer on paper in class, one or two paragraphs two exams Labs: build real servers. Project. Programming, demo, paper (in style of readings). Sanjit is TA, office hours TBA URL Don't forget: sign up for course machine accounts look at the first lab, due in a week read paper for tomorrow --------------- O/S kernel overview context in which you build distributed systems o/s has big impact on design, robustness, performance sometimes because of o/s quirks mostly because o/s solves some hard problems This should be review for most of you Want to tell what I think is important Give you a chance to ask questions Prepare you for the first reading... What problems does o/s solve? sharing hardware resources protection communication hardware independence (everyone faces these problems) Approach to solutions? o/s designers think like programmers, abstractions + interfaces UNIX abstractions (we'll be programming UNIX in labs, my favorite O/S) process address space thread of control user ID file system file descriptor one-disk file pipe network connection device All this is implemented by a privileged "kernel". Note we're partially virtualizing o/s multiplexes physical resource among multiple processes CPU, memory simple model for apps easy for o/s to control, protect, share Can't completely virtualize the file system is not a virtual disk abstractions interact, must form a coherent set if o/s can start programs, it must know how to read files System call interface to kernel abstractions looks like function call, but special fork, exec open, read, creat Standard picture app (show two of them, mark addresses from zero) libraries ----- FS disk driver (mention address spaces, protection boundaries) (mention h/w runs kernel address space w/ special permissions) Why Big Kernels have been successful. easy for kernel subsystems to cooperate disk buffer shares phys mem with virtual mem system all kernel code is 100% privileged very simple security model easy to implement sophisticated and efficient services Why UNIX abstractions are not perfect kernel is big kernel has room for lots of bugs; it's all privileged kernel limits flexibility multiple threads per process? single thread crossing into a different address space? control disk layout of files for performance? don't like the kernel's TCP implementation? we'll discuss a number of improved abstractions Alternate set of abstractions: micro-kernel Move complex abstractions to server processes Talk to FS server, rather than FS module in kernel Kernel mostly handles IPC also grants h/w access to privileged servers e.g. FS server can read/write disk h/w Looks like a miniature distributed system! Move FS server to a different machine, via network? Lots of overlap with our concerns in this class. Let's review some basics which will come up a lot: process / kernel communication how processes and kernel wait for events (disk and network i/o) Life-cycle of a simple UNIX system call See the handout... Interesting points: protected transfer h/w allows process to get kernel permissions but only by jumping to *known* entry point in kernel process suspended until system call finishes What if the system call needs to wait, e.g. for the disk? sys_open(path) for each pathname component start read of directory from disk sleep waiting for the disk read process the directory contents sleep() save *kernel* registers to PCB1 (including SP) find runnable PCB2 restore PCB2 kernel registers (SP...) return Note: each user process has its own kernel stack [draw in diagram] kernel stack contains state of partially executed system call "kernel half" trap handler must execute on the right stack "blocking system call" What happens when disk completion interrupt occurs? CPU sees device wants to interrupt. Saves current process state much like system call. Enters kernel handler. Kernel points SP to special interrupt stack. Device interrupt routine sees a process was waiting for that I/O. Marks process as runnable. Returns from interrupt. Someday process scheduler will switch to the waiting process. Now let's look at how services use this kernel structure. -------------- Topic: basic server software structure Explain code in handout ***************** ran out of time while starting to talk about which parts of the server might block, and for how long. Problem [draw this] Time-lines for CPU, disk, network We're wasting CPU, disk, network. We may have lots of work AND an idle system. Not good. s/w structure forces one-at-time processing How can we use the system's resources more efficiently?