6.824 Lecture 3: Threads

Thread is short for thread of control, a running program with its own
program counter, stack pointer, etc.  A process is
one of more threads executing in a single address space.

The primary reasons to use concurrent programming with threads:
  exploit several processors to run an application faster
  hide long delays (e.g., while waiting for a disk do something else
    on processor)
  run long-running ops concurrently with short ones in user interfaces
  network servers and RPC
    For example,  in lab 1, if one  client is waiting for  lock a, the
    server  may  want  to  process  requests from  other  clients,  in
    particular ones for different locks.

Pitfalls of multithreaded programming
  race condition; can you give me an example?
    may be difficult to reproduce
  deadlock; can you give me an example?
    better bug to have than race; you program stops when it happens
  livelock; can you give me an example?
  starvation
  wrong lock granularity; can you give me an example?

At a minimum, a thread interface must support:
  creating and managing threads
  ways of avoiding race conditions for updates to shared variables
    assume each treads runs on its own processor, sharing a memory
    instructions that appear to be atomic, might not be (e.g., x = x + 1)
  ways of coordinating different threads

Pthread interface
  standard interface, for C / UNIX
  not unlike the one described in the paper
  we use it in the labs

Interface
  (these are all shortened names, see documentation)
  threads
    tid = create()
    join(tid)
  mutex
    lock(m)
    unlock(m)
  condition variables
    wait(cv, m)
    signal(cv)
      wakes up one thread (or none if none waiting)
    broadcast(cv)
      wakes up all threads waiting
  other stuff
    which we don't use.

Paper does a nice job teaching how to program with threads, in
particular for programmers who write multithreaded servers---that is
you.  Worth rereading the paper as you get more experience.

How to use mutexes and cond vars: let's look at fifo.cc
  first walk through code
  what if we deleted the lock at start of enq? at start of deq?
  what if wait() was inside if(), not while()?
  what if enq() called just before deq()'s wait?
  what if we deleted signal at end of enq? at end of deq?
  what if while loop in enq just spun, no call to wait?

Scoped locks
  just thin wrapper around pthread mutex lock/unlock
  help you not have to remember to unlock
  saves a bunch of typing
  int fn() {
    if(...){
      ScopedLock sl(&m);
      if(...)
	return ...;  // sl.~ScopedLock releases mu
    } // sl.~ScopedLock releases mu
  }

What is the fifo lock protecting?
  probably protects list<> internals
  helps avoid push on full list (race between check and push)
  helps avoid pop from empty list (race between check and pop)
  helps avoid missed wakeups

What if a thread acquires a lock, and acquires it again?
  should the nested acquire succeed? 
    after all, it's this thread that has the lock
  why / why not?

Deadlock
  avoid by always acquiring locks in the same order
  can be hard:
    RPC calls down into connection class
    connection class makes up-calls to RPC layer   
  avoiding deadlock requires a little violation of module boundaries

killing a thread is a mess
  e.g. if some class creates a thread for every object,
    and now we're done with that object
  it might be holding a lock
  it might need to clean up
    e.g. free memory
  a whole stack of calling functions might need to clean up
  it might be hard to get its attention at all
    waiting for a lock
    sitting in wait() on a condition variable
  best plan: set a flag asking it to clean itself up
    problem: what if in wait() -- how do we know what to signal()?
    answer: make sure you know where your threads wait
      e.g. what if my thread is waiting in fifo.deq()?

Locking granularity
  one mutex for whole lock_server?
  suppose we found handlers were often waiting for that one mutex
    what are reasonable options?
    one mutex per client?
    one mutex per lock?
  if one mutex per lock
    still need one mutex to protect table of locks
  danger of many locks---deadlock and races 

client classes:
  rpcc (one per server connection)
    call1  got_pdu
  connection (one per rpcc, may come and go)
  PollMgr (one per process, shared by all connections)

client threads:
  application threads
    waits on per-call cond var in rpcc::call1
    retransmission happens here
  PollMgr thread
    up-call to connection when readable
    connection up-calls to rpcc when whole msg
    rpcc::got_pdu wakes up sleeping client thread

why not have app thread directly read the reply from the TCP conn?
  i.e. why the up-calls?

server classes:
  rpcs (one per service)
  tcpsconn
  connection ...
  ThrPool
  PollMgr

server threads:
  tcpsconn, for accept(), automatically produces connections
  PollMgr thread
    up-calls to connection for incoming request
    up-calls to rpcs::got_pdu
    shoves msg into ThrPool's fifo
  ThrPool's 10 workers
    wait() on fifo
    call rpcs::dispatch with msg

why ThrPool?
  why not fire up a new thread per RPC request?

Let's run through abbreviated RPC code in handout

RPC and mutexes may produce distributed deadlock
  suppose server s1's handler does this:
    acquire lk
    call s2
    release lk
  and server s2's handler does this:
    call s1
  ThrPool makes nested RPCs dangerous even w/o mutexes
    imagine if pool was only one worker
  you will run into this in Lab 4
    lock server sends revoke RPCs back to clients
    don't call RPCs from handlers!
    have handler queue work or change state, then return
    a background thread should send the RPC