6.824 2002 Lecture 2: I/O Concurrency

Lecture overview:
  About I/O concurrency
  List of solution techniques
  Details in subsequent lectures
  Point is really s/w structure, not optimization

Review problem we ended with:
  Time-lines for CPU, disk, network
  We're wasting CPU, disk, network.
  Network delays can be VERY long.
  We may have lots of work AND an idle system. Not good.
  s/w structure forces one-at-time processing
  How can we use the system's resources more efficiently?

What we want is *I/O concurrency*
  (note *not* CPU concurrency)
  Ability to overlap I/O wait with other useful work.
  In web server case, I/O wait is mostly network latency.
  Could be disk I/O: compile 1st part of file while fetching 2nd part.
  Could be user interaction: emacs GC while waiting for you to type.

The 2nd lab assignment is all about I/O concurrency.
  Another web proxy, but handle multiple connections at once.

We may also want *CPU concurrency*
  Make use of multiple CPUs on shared memory machine.
  Often I/O concurrency tools can be used to get CPU concurrency.
  But often harder:
    Concurrent computations can easily interfere.
    But I/O system usually well isolated from s/w.
  CPU concurrency much less important than I/O concurrency: 2x, not 100x

Typical ways to get concurrency.
  This is about s/w structure.
  There are any number of potential structures.
  0. (One process)
  1. Multiple processes
  2. One process, many threads
  3. Event-driven
  Depends on O/S facilities and type of application.
    Degree of interaction among different sub-tasks.

Concurrency with multiple processes
  Start a new UNIX process for each client connection / request
  Master processes hands out connections.
  Now plenty of work available to keep system busy
  Still simple:
    fork() after accept() [write this out]
    Preserves original s/w structure.
  Isolated: bug for one client does not crash the whole server
    Most interaction hidden by O/S. E.g. lock the disk queue.
  Gets CPU concurrency as a side effect!

Multiple process problems
  Cost of starting a new process (fork()) may be high.
    New address space &c. 300 microseconds *min* on my computer.
  Processes are fairly isolated by default
    E.g. they do not share memory
    What if you want a web cache? Must be shared among processes.
    Or even just keep statistics?

Concurrency with threads
  Looks a bit like multiple processes
  But thread_fork() leaves address space alone
  So all threads share memory
  One stack per thread, inside process
  [picture: thread boxes inside process boxes]
  Seems simple -- still preserves single-process structure.
  Potentially easier to have e.g. shared web cache
    But programmer needs to know about some kind of locking.
  Also easier for one thread to corrupt another

Thread details matter
  Where is the thread scheduler?
  Does scheduler know about blocking operations (e.g. read())?
    So it can switch to another thread.
    If you don't do this, you don't get I/O concurrency...

Kernel-supported threads
  O/S kernel knows about each thread
  It knows a thread was just blocked, e.g. in disk read wait
    Can schedule another thread from that same process
  [picture: thread boxes dip down into the kernel]
  What does kernel need for this?
    Per-thread kernel stack.
    Per-thread tables (e.g. saved registers).
  Semantics:
    per-process resources: addr space, file descriptors
    per-thread resources: user stack, kernel stack, kernel state
  Kernel could schedule threads on different CPUs
  This sounds like just what we want for our server
  BUT kernel threads are usually expensive, just like processes
    Kernel has to help create each thread
    Kernel has to help with each context switch?
      So it knows which thread took a fault...
    Paper suggests getting into/out of kernel is very slow.
  Many O/S do not support kernel threads

User-level threads
  Implemented purely inside program, kernel does not know
  User scheduler for threads inside the program
    In addition to kernel process scheduler
  [picture]
  How does user scheduler know a thread blocked?
  How does user scheduler gain control again to run another thread?
  Answer: 
    thread library has fake read(), write(), accept(), &c system calls
  Scheduler remembers that a thread is waiting for some event
    Arranges with O/S to be notified of event
  Events:
    new connection
    data arrived on socket
    disk read completed
    client/socket ready to receive new data
  Scheduler checks for new events at each context switch
    Resumes a thread when its event arrives
  Like a miniature O/S inside the process
  Problem:
    O/S usually doesn't support this event model well
    Works for socket I/O
    Does *not* typically work for file operations
      They block whole process
      Since kernel does not know about user-level threads

Threads are hard to program
  The point is to share data structures
  Hard for programmer to use locks correctly
  Threads produce efficiency by making it look to the
    programmer as if many things are happening at once
  Needlessly confusing
  In fact, events occur one at a time
    Potentially could simplify programming model

Event-driven programming
  Suggested by user threads implementation
  Organize the s/w around arrival of events
  Write s/w in state-machine style
    When this event occurs, execute this function
  Library support to register interest in events
  The point: this preserves the serial natures of the events
    Programmer sees events/functions occuring one at a time

This is the style in which you'll implement 2nd lab assignment.