6.824 Lecture 3: Threads Thread is short for thread of control, a running program with its own program counter, stack pointer, etc. A process is one of more threads executing in a single address space. The primary reasons to use concurrent programming with threads: exploit several processors to run an application faster hide long delays (e.g., while waiting for a disk do something else on processor) run long-running ops concurrently with short ones in user interfaces network servers and RPC For example, in lab 1, if one client is waiting for lock a, the server may want to process requests from other clients, in particular ones for different locks. Pitfalls of multithreaded programming race condition; can you give me an example? may be difficult to reproduce deadlock; can you give me an example? better bug to have than race; you program stops when it happens livelock; can you give me an example? starvation wrong lock granularity; can you give me an example? At a minimum, a thread interface must support: creating and managing threads ways of avoiding race conditions for updates to shared variables assume each treads runs on its own processor, sharing a memory instructions that appear to be atomic, might not be (e.g., x = x + 1) ways of coordinating different threads Pthread interface standard interface, for C / UNIX not unlike the one described in the paper we use it in the labs Interface (these are all shortened names, see documentation) threads tid = create() join(tid) mutex lock(m) unlock(m) condition variables wait(cv, m) signal(cv) wakes up one thread (or none if none waiting) broadcast(cv) wakes up all threads waiting other stuff which we don't use. Paper does a nice job teaching how to program with threads, in particular for programmers who write multithreaded servers---that is you. Worth rereading the paper as you get more experience. How to use mutexes and cond vars: let's look at fifo.cc first walk through code what if we deleted the lock at start of enq? at start of deq? what if wait() was inside if(), not while()? what if enq() called just before deq()'s wait? what if we deleted signal at end of enq? at end of deq? what if while loop in enq just spun, no call to wait? Scoped locks just thin wrapper around pthread mutex lock/unlock help you not have to remember to unlock saves a bunch of typing int fn() { if(...){ ScopedLock sl(&m); if(...) return ...; // sl.~ScopedLock releases mu } // sl.~ScopedLock releases mu } What is the fifo lock protecting? probably protects list<> internals helps avoid push on full list (race between check and push) helps avoid pop from empty list (race between check and pop) helps avoid missed wakeups What if a thread acquires a lock, and acquires it again? should the nested acquire succeed? after all, it's this thread that has the lock why / why not? Deadlock avoid by always acquiring locks in the same order can be hard: RPC calls down into connection class connection class makes up-calls to RPC layer avoiding deadlock requires a little violation of module boundaries killing a thread is a mess e.g. if some class creates a thread for every object, and now we're done with that object it might be holding a lock it might need to clean up e.g. free memory a whole stack of calling functions might need to clean up it might be hard to get its attention at all waiting for a lock sitting in wait() on a condition variable best plan: set a flag asking it to clean itself up problem: what if in wait() -- how do we know what to signal()? answer: make sure you know where your threads wait e.g. what if my thread is waiting in fifo.deq()? Locking granularity one mutex for whole lock_server? suppose we found handlers were often waiting for that one mutex what are reasonable options? one mutex per client? one mutex per lock? if one mutex per lock still need one mutex to protect table of locks danger of many locks---deadlock and races client classes: rpcc (one per server connection) call1 got_pdu connection (one per rpcc, may come and go) PollMgr (one per process, shared by all connections) client threads: application threads waits on per-call cond var in rpcc::call1 retransmission happens here PollMgr thread up-call to connection when readable connection up-calls to rpcc when whole msg rpcc::got_pdu wakes up sleeping client thread why not have app thread directly read the reply from the TCP conn? i.e. why the up-calls? server classes: rpcs (one per service) tcpsconn connection ... ThrPool PollMgr server threads: tcpsconn, for accept(), automatically produces connections PollMgr thread up-calls to connection for incoming request up-calls to rpcs::got_pdu shoves msg into ThrPool's fifo ThrPool's 10 workers wait() on fifo call rpcs::dispatch with msg why ThrPool? why not fire up a new thread per RPC request? Let's run through abbreviated RPC code in handout RPC and mutexes may produce distributed deadlock suppose server s1's handler does this: acquire lk call s2 release lk and server s2's handler does this: call s1 ThrPool makes nested RPCs dangerous even w/o mutexes imagine if pool was only one worker you will run into this in Lab 4 lock server sends revoke RPCs back to clients don't call RPCs from handlers! have handler queue work or change state, then return a background thread should send the RPC