6.824 2001 lecture 1

Opening
  design and construction of robust, high-performance server systems
  not just study, also actually build real systems
  understand and synthesize many O/S and dist sys ideas

Picture
  examples: airline reservation system, hotmail
  clients, lots of them
  abstract service box
  what's inside the service? hardware and s/w modules
    DB back end. web formatter front ends.
  easy for small systems

Challenges
  blueprint for course content

Performance on one machine
  s/w structure. modules vs interaction complexity.
  flow of control among all the activities.
  example: serve client2 while waiting for client1's disk data

Performance on clusters
  again, modules and interaction. example: bank xfer.
  s/w structure.
  danger of perf being dominated by comms costs, not cpu time.
    more servers -> less performance

Performance under high load
  Example: Starbucks.
    5 seconds to write down incoming request. 10 seconds to make it.
    max thruput at 4 drinks/minute.
    thruput goes to zero at 12 requests/minute.
    [graph: x=requests, y=output]
  Efficiency *decreases* with load -- bad.
  Careful system design to avoid this -- flat line at 4 drinks.
    Peets, for example.
  Better: build systems whose efficiency *increases* w/ load
    w/ e.g. batching, disk scheduling

Fault tolerance
  hardest problem is partial failure
  client failures don't affect service
  hardware or s/w modules
  1. give up, 2. wait for fix, 3. tolerate
  tolerance itself leads to complexity, bugs, potential failure

Consistency
  fault tolerance and performance often addressed with replication
  what if replicas diverge?
    airline seat reservations
    prevent or repair
  what if multi-step operation interrupted by a crash?
    bank xfer between machines

Security
  systems with good security are often *easier* to use
  want your service to be accessible by everybody
  hardening against outside attack
  for legitimate users
    accept deposits vs view balance
  who are you? authentication
    hard: what does identity mean?
    example: pub-key signed e-mail; verify sig is not enough
  what are you allowed to do? authorization
  privacy and anonymity
    send me complains/comments. i may want to reply!
    curious mixture of authentication and anonymity

We want to understand the individual techniques, and how to assemble them.

Course structure
  meetings: 1/2 lectures on fundamentals, 1/2 reading discussions
  research papers on ideas and working systems
  must read papers before class
  two exams
  Labs: build real servers. 5 of them. on our machines.
  Frank's office hours
  Project. Programming and paper (in style of readings).
  programm committee
  Don't forget to sign up, read paper, do assignment

May be limited enrolment
  Depends on how many people end up taking it
  Will see after first assignment comes back

Topic: basic server software structure

Life-cycle of typical request, in server
  [draw this]
  accept connection from client
  read request from client
  find data on disk (or talk to back end DB)
  read data from disk
  write data over network to client
  
Could just wrap this in a big while() loop.
  Single program. Very simple! Look at the code in hand-out.
  Consequences?

Problem
  [draw this]
  Draw a time-line of when the CPU is busy
    Actually CPU, disk, network i/f?
    Explain "blocking" system calls.
  We're wasting CPU!
  We may have lots of work AND an idle system. Not good.
  s/w structure forces one-at-time processing
  How can we use the system's resources more efficiently?

Typical concurrency solutions:
  This is about s/w structure.
  There are any number of potential structures.
  0. (One process)
  1. Multiple processes
  2. One process, many threads
  3. Asynchronous
  Depends on O/S facilities.

Concurrency with multiple processes
  Start a new UNIX process for each client connection / request
  O/S scheduler knows if one process blocks, runs another one
    This is important...
  [draw picture of kernel, process addr spaces, each w/ doit() code]
  Master processes hands out connections.
  Now plenty of work available to keep system busy
  Still simple: see handout.
    Preserves original s/w structure.
  Isolated: bug for one client does not crash the whole server
    Most interaction hidden by O/S. E.g. lock the disk queue.
  Probably easy to take advantage of multiprocessors

Multiple process problems
  Cost of starting a new process (fork()) may be high.
    New address space &c. 300 microseconds *min* on my computer.
    Perhaps a static pool of processes
    Master hands out new connections to existing idle processes
  How many concurrent requests to serve?
    Matters if the existence of a process is expensive.
    I.e. consumes memory, table space, &c.
  Processes are fairly isolated by default
    E.g. they do not share memory
    What if you want a web cache? Must be shared among processes.
    Or even just keep statistics?

Concurrency with threads
  Looks a bit like multiple processes
  But thread_fork() leaves address space alone
  So all threads share memory
  [picture: thread boxes inside process boxes]
  Seems simple -- still preserves single-process structure.
  Potentially easier to have e.g. shared web cache
    But programmer needs to know about some kind of locking.
  Also easier for one thread to corrupt another

Thread details matter
  Where is the thread scheduler?
  Does it actually see all blocking operations?

Kernel threads
  O/S kernel knows about each thread
  It knows a thread was just blocked, e.g. in disk read wait
    Can schedule another thread from that same process
  [picture: thread boxes dip down into the kernel]
  Kernel knows enough to schedule threads on different CPUs
  This sounds like just what we want for our server
  BUT kernel threads are usually expensive, just like processes
  Many O/S do not support kernel threads

User threads
  Implemented purely inside program, kernel does not know
  User scheduler for threads inside the program
    In addition to kernel process scheduler
  [picture]
  How does user scheduler know a thread blocked?
  How does user scheduler gain control again to run another thread?
  Answer: 
    thread library has fake read(), write(), accept(), &c system calls
  Scheduler remembers that a thread is waiting for some event
    Arranges with O/S to be notified of event
    Resumes a thread when its event arrives
  Problem:
    O/S usually doesn't support this event model well
    Works for socket I/O, not for file operations

Threads are hard to program
  The point is to share data structures
  Hard for programmer to use locks correctly
  Threads produce efficiency by making it look to the
    programmer as if many things are happening at once
  Needlessly confusing
  In fact, events occur one at a time

Alternate view
  Suggested by user threads implementation
  Organize the s/w around arrival of events [list these]
    New client connections
    Client data
    Disk read completions
    Packet departures (i.e. buffer space is free)
  Library support to register interest in events
  Write s/w in state-machine style
    When this event occurs, execute this function
  The point: this preserves the serial natures of the events
    Programmer sees events/functions occuring one at a time
  s/w never blocks, just registers for an event
  We get efficient use of system w/o confusing illusion of concurrency

Example:
  Very simple server.
  Accepts connections, waits for data, ignores it.
  Note that many of each kind of even can be waiting.
    So we can handle multiple connections/clients at a time.
  Note that we could have different read callbacks, or whatever,
    for different stages of processing.

Why async programming is attractive
  Can get high utilization w/o concurrency.
  The callback functions are called one at a time.
  Thus no locking required.

Why async programming isn't perfect
  No story for multiprocessors
  You have to chop up your program at points that might block.
    For example, assembling complete input.
  The O/S cannot turn all blocking sys calls into events
    Particularly file system I/O
  Most libraries are not written for async I/O
    If they block, they block the whole program

Don't forget to sign up at this URL.
Assignment due next Thurs, on web site.
Reading for discussion on Tuesday.