6.824 2012 Lecture 3: Threads

Introduction
  threads are a fundamental server structuring tool
  you'll use them a lot in the labs
  they can be tricky

Thread = "thread of control"
  threads allow one program to (logically) do many things at once
  a thread includes some per-thread state:
    program counter
    stack pointer
    register set
    stack
    maybe state in the o/s (e.g. what it is waiting for)
  other state is shared by all threads in a program:
    memory
    file descriptors

Example:
  int x = 0;
  f(){
    int y = 0;
    y = y + 1;
    print x + y
  }
  main(){
    create_thread(f);
    create_thread(f);
    ...;
  }

Q: will the threads print 1 or 2? Why?
Q: is there any way for one thread to r/w the other's y?

Why use threads?
  exploit several processors to run an application faster
  interleave different activities in a convenient way
    one thread computes while another waits for disk read
    interleave background spell-check w/ foreground editing
    let one lock_server thread wait for lock while processing
      other client requests in other threads
  you can view the last three as merely a structuring convenience
    so the programmer doesn't have to write the interleaved code
    but it's a big convenience!

Pitfalls of multithreaded programming
  races; example?
    may be difficult to reproduce
  deadlock; example?
    better bug to have than race; you program stops when it happens
  livelock; example?
  bad performance; example?

At a minimum, a thread interface must support:
  creating threads
  mutual exclusion to avoid races
  coordination to let threads wait for + signal events

Pthread interface
  standard interface, for C / UNIX
  not unlike the one described in the paper
  we use it in the labs

Interface
  shortened names, real names are e.g. pthread_create, pthread_mutex_lock
  threads
    tid = create()
    join(tid)
  mutex
    lock(m)
    unlock(m)
  condition variables
    wait(cv, m)
      caller must hold m
      wait() releases m, re-acquires before return
    signal(cv)
      caller should hold m
      wakes up one thread (or none if none waiting)
    broadcast(cv)
      caller should hold m
      wakes up all threads waiting
  other stuff
    which we don't use.

Paper does a nice job teaching how to program with threads, in
particular for servers (like the labs). Worth rereading the paper as
you get more experience.

Example: let's look at fifo.h
  rpc system uses it for work queues
  first walk through code

Scoped locks
  just thin wrapper around pthread mutex lock/unlock
  helps you not have to remember to unlock
  saves a bunch of typing
    if(...){
      ScopedLock sl(&m);
      if(...)
	return ...;  // sl.~ScopedLock releases mu
      ...
      // sl.~ScopedLock releases mu
    }

What if we deleted the ScopedLock at start of enq?
  at start of deq?

Condition variables:
  what if wait() was inside if(), not while()?
  what if enq() called just before deq()'s wait?
  what if we deleted signal at end of enq? at end of deq?
  what if broadcast instead of signal?
  what if while loop in enq just spun, no call to wait?

What is the fifo's mutex protecting?
  it makes code sequences atomic
  specifically:
    list<> operations (vs other operations, e.g. e->nxt=head;head=e)
    enq's not-full check and push (vs another enq)
    deq's not-empty check and pop (vs another deq)
    deq's *e= and pop (vs another deq)
    enq's full-check and wait (vs deq's signal)
    deq's empty-check and wait (vs enq's signal)

What if a thread acquires a lock, and acquires it again?
  should the nested acquire succeed? 
    after all, it's this thread that has the lock
  why / why not?

Deadlock
  example:
    rpc thread receiving a msg:
      rpc library locks
      app locks
    app thread sending a msg:
      app locks
      rpc library locks
  how to fix?
    release one lock before acquiring the other
    or always acquire locks in the same order (e.g. for bank transfer)
  hard if locks are in different modules
    shouldn't have to know about each others' internals

killing a thread can be a mess
  e.g. rpcs creates thread to wait for incoming requests
    what happens to thread when you're done with an rpcs?
  it might be holding a lock
  it might need to clean up
    e.g. free memory
  a whole stack of calling functions might need to clean up
  it might be hard to get its attention at all
    waiting for a lock
    sitting in wait() on a condition variable
  best plan: set a flag asking it to clean itself up
    problem: what if in wait() -- how do we know what to signal()?
    answer: make sure you know where your threads wait
      e.g. what if my thread is waiting in fifo.deq()?

Locking granularity
  one mutex for whole lock_server?
  suppose we found handlers were often waiting for that one mutex
    what are reasonable options?
    one mutex per client?
    one mutex per lock?
  if one mutex per lock
    still need one mutex to protect table of locks
  danger of many locks---deadlock and races 

Why not use pthread mutex directly for each lock_server lock?
  acquire(lid){ pthread_mutex_lock(locks[lid]); }

pthread mutexes themselves are usually pretty cheap
  if you don't need to wait
  50 ns if this core was last to hold it
  200 ns if two cores are constantly trading a lock
  these seem to be spin-locks
  blocking locks would be much slower -- they suspend the thread
    but free up the CPU, good if lock held for a long time