6.824 2004 Lecture 3: Event-Driven Programming Intro Why do system calls block, and what can we do about it? How can we structure event-driven programs? What do we want? System calls that don't block read, open, stat, ... Unified method to wait for completion So we can wait for multiple completions in a concurrent server. Why are non-blocking system calls hard in general? Typical system call implementation, inside the kernel: [Example 1] Can we just return to user program instead of wait_for_disk? No: how will kernel know where to continue? Lesson: not easy to make most system calls non-blocking and continuable. Options: New operating system with totally different syscall interface. One system call per non-blocking sub-operation. Kernel-supported threads, fork a new thread before each blocking call. Flash/AMPED. Microkernel: no system calls, only messages to servers. and non-blocking communication What can we do in UNIX? Unix system calls mostly like sys_read() example. Non-blocking didn't make sense 30 years ago anyway: Disk I/O relatively much faster than CPU. Communication I/O is special in UNIX. Sockets, pipes, terminals. Network support in early 80s required more sophisticated interfaces. telnet &c must wait for two input sources: net and terminal. So support for events was mandatory. Would be nice to have a general "wait for next event" system call. select() approximates that for socket I/O. Why non-blocking I/O is easy for communication. Socket reads are essentially atomic -- one step. Data has already arrived, or not; read() doesn't have to pull data. [Example 2] Don't need to block before decision about whether data is waiting. I.e. nothing like asking the disk to start a read. So select() can tell ahead of time if data is waiting. This is true for all communcation-like read()s: network sockets, pipes, terminals, mouse select() [write prototype on the board: nfds, reads, writes, excepts, time] Condition is actually "read() would not block"; might be EOF. select() blocks (if timeout > 0) to avoid wasteful polling. this is important; you *do* want to block in select(). Why select() for writing? All communcation fd's have buffered output. To keep devices busy while application does something else. Output buffers have finite size. write() ordinarily blocks if buffer is full; waits for drain. Easy for select() to tell if there is space. So it makes sense to select() for writing. But how much buffer space is there? Suppose select() sees 1 byte available in the output buffer. And you write 2 bytes? Ordinarily, write() will block until 2 bytes have been written. select() said it was OK to write, but write() blocks anyway! Solution: ahead of time, set O_NONBLOCK flag w/ set_nonblock() select() for writing call write() with as much data as you like write() returns how many it did write if any left over, select again, try to write the rest How to use organize even-driven program around select()? Bad (but simple) plan: At each place your code knows it wants to wait for two things, use select(). That is, look for read()s &c, add select() before each one. This is how ssh, telnet, &c work. But your program may be modular! One part wants to wait for input from fd1 An unrelated part wants to wait for input from fd2 How to do both without modules knowing too much about each other? Option 1: user-level threads, with fake read(). [Example 3] Option 2: event-driven programming library. A lot like threads, but upside-down: threads: threads call scheduler loop (thread_schedule). event-driven: scheduler loop (amain) calls callbacks. [Example 4] Simpler than user threads. Flash paper shows event-driven is more efficient, too. But there are some costs. How do event-driven programs maintain state? Ordinary programs, and threads, use variables on the stack. Thread's stack persists across function calls, and blocking operations. Event-Driven programs can't keep state on the stack. Since each callback must return immediately. We want to associate data/state with each callback. Could associate functionPointer+arg with each fd... wrap(fn, a, b) generates a closure. That is, a function pointer plus an environment of saved values. Transformation of ordinary code: [Example 5] Into event-driven-style code: Need to split f2, since read() blocks. Thus need to split f1. How to tell second half of f2 to call second half of f1? How to remember value of x across two halves of f1? Don't want to specialize f2 to know to call f1's second half. [Example 6] The cb being passed to fn() is a continuation! Why do we supply reference counted garbage collection in libasync? In the ordinary code, z and x are saved across calls on the stack. Space for those values is freed automatically on function return. In the event-driven code, stack not available for that kind of use. Values are stored in the closures created by wrap(). How do we know when we've used a callback the last time? That's why they're reference counted.