6.824 2013 Lecture 2: Infrastructure: RPC and threads Remote Procedure Call (RPC) a key piece of distrib sys machinery; you'll see in lab 1 goal: easy-to-program network communication hides most details of client/server communication client call is much like ordinary procedure call server handlers are much like ordinary procedures RPC is widely used! RPC ideally makes net communication look just like fn call: Client: z = fn(x, y) Server: fn(x, y) { compute return z } RPC aims for this level of transparency RPC message diagram: Client Server request---> <---response Software structure client app handlers stubs dispatcher RPC lib RPC lib net ------------ net Look at lab-1.go handout (from last lecture): contains some fragments from the lab-1 code we give you main() shows application view: setup, then fn calls "Clerk" is client-side code that knows how to talk to specific server thus not quite like an ordinary fn call clerk.Lock() is a stub: turns fn call into RPC junk you'll modify Lock to try the backup if the primary fails LockServer holds the application-level state (lock table) LockServer.Lock() is a handler contains app-level server code RPC library calls it you'll modify Lock() to forward to the backup A few details: Marshalling: format data into packets Tricky for arrays, pointers, objects, &c Go's RPC library is pretty powerful! Binding: how does client know who to talk to? Might be a name service -- e.g. DNS Threads: Client often has many threads, so > 1 call outstanding, match up replies Handlers may be slow, so server often runs each in a thread RPC problem: what to do about failures? e.g. lost packets, broken network, crashed servers What does a failure look like to the RPC library? It never sees a response from the server It does *not* know if the server saw the request! Maybe server/net failed just before sending reply Simplest scheme: "at least once" behavior RPC library waits for response for a while If none arrives, re-send the request Do this a few times Still no response -- return an error to the application Q: is "at least once" easy for applications to cope with? Simple problem (non-replicated lock server): client sends Lock(a) server gets request, but network drops reply client sends Lock(a) again should server respond "yes"? or "no"? Harder problem: Lock(a) Unlock(a) -- but network delays the packet Unlock(a) re-send, response arrives Lock(a) now network delivers the delayed Unlock(a) !!! Is at-least-once ever OK? yes: if no side effects -- read-only operations yes: if application has its own plan for detecting duplicates which you will need for Lab 1 Better RPC behavior: "at most once" idea: server RPC code detects duplicate requests returns previous reply instead of re-running handler client includes unique ID (UID) with each request uses same UID for re-send server: if seen[uid]: r = old[uid] else r = handler() old[uid] = r seen[uid] = true some at-most-once complexities how to ensure UID is unique? big random number? combine unique client ID (ip address?) with sequence #? server must eventually discard info about old RPCs when is discard safe? idea: unique client IDs per-client RPC sequence numbers client includes "seen all replies <= X" with every RPC much like TCP sequence #s and acks or only allow client one outstanding RPC at a time arrival of seq+1 allows server to discard all <= seq or client agrees to keep retrying for < 5 minutes server discards after 5+ minutes how to handle dup req while original is still executing? server doesn't know reply yet; don't want to run twice idea: "pending" flag per executing RPC; wait or ignore What if an at-most-once server crashes? if at-most-once duplicate info in memory, server will forget and accept duplicate requests maybe it should write the duplicate info to disk? maybe replica server should also replicate duplicate info? What about "exactly once"? at-most-once plus unbounded retries Go RPC is "at-most-once" open TCP connection write request to TCP connection TCP may retransmit, but server's TCP will filter out duplicates no retry in Go code (i.e. will NOT create 2nd TCP connection) Go RPC code returns an error if it doesn't get a reply perhaps after a timeout (from TCP) perhaps server didn't see request perhaps server processed request but server/net failed before reply came back Go RPC's at-most-once isn't enough for Lab 1 it only applies to a single RPC call if primary fails, your client will have to re-send to backup backup may have already seen forwarded request from primary Go RPC can't detect this kind of duplicate *** Now: threads threads are a fundamental server structuring tool you'll use them a lot in the labs they can be tricky Thread = "thread of control" threads allow one program to (logically) do many things at once the threads share memory each thread includes some per-thread state: program counter, registers, stack Example: suppose you want to send RPCs to many servers: var reply1 int var reply2 int go func() { reply1 = rpc to server A }() go func() { reply2 = rpc to server B }() Note: go ... starts a new thread go's argument is a function call anonymous functions when the function returns, the thread disappears reply1 and reply2 are accessible in threads Why use threads? parallelism: performance by running code on many processors concurrency: interleave multiple activities, as above Problem: races started := false func start() { if started == false { ... started = true } } go start() go start() What is a race? The result depends on the detailed timing/interleaving of threads Sometimes OK, often a problem (as in above example) How to avoid races? We want only one thread at a time to execute some code A "critical section" Or we want only one thread at a time to use some piece of data Or we want a multi-step operation to be atomic w.r.t. other threads Go mutexes (MUTual EXclusion) (often called locks) mu sync.Mutex mu.Lock() -- waits for the lock mu.Unlock() defer mu.Unlock() You need a clear idea of what each lock protects And what lock protects each piece of shared data Often not enough to simply wrap Lock/Unlock around every access example: bank transfer A = A - amt B = B + amt audit consistency requires both locks to be held for the whole xaction Problem: threads often need to wait for each other Example: how to wait for those two RPCs to both finish? Go channels c := make(chan int) c <- 1 x := <- c <- c receiving waits until somebody sends sending waits until somebody receives thus no harm if reader is slower than writer Channel example: c1 := make(chan bool) c2 := make(chan bool) go func() { ...; c1 <- true } () go func() { ...; c2 <- true } () <- c1 <- c2 Beware: this code is not correct: var x int done := false go func() { x = f(...); done = true } while done == false { } it's very tempting to write, but Go spec says it's undefined use a channel instead Problem: Deadlock go func() { mu1.Lock(); mu2.Lock(); ... go func() { mu2.Lock(); mu1.Lock(); ... Example: transfer(a, b) { a.lock() b.lock() ... } go func() { transfer("fred", "ivy") } go func() { transfer("ivy", "fred") } Distributed deadlock too: server A handler(): a.Lock() send RPC to server B server B handler(): b.Lock() send RPC to server A you will face this in the labs! how to fix? don't hold more than one lock or always acquire locks in the same order (e.g. for bank transfer) hard if locks are in different modules requires modules to understand each others' internals Locking granularity one mutex for whole lock_server? suppose we found handlers were often waiting for that one mutex what are reasonable options? one mutex per client? one mutex per lock? if one mutex per lock still need one mutex to protect table of locks danger of many locks---deadlock and races mutexes themselves are usually pretty cheap much less than a microsecond if you don't have to wait *** let's look at today's handout -- rpc-handout.go it's a toy RPC system illustrates threads, mutexes, channels it's a toy assumes connection already open only supports an integer arg, integer reply doesn't deal with errors struct ToyClient client RPC state mutex per ToyClient connection to server (e.g. TCP socket) xid -- unique ID per call, to match reply to caller pending[] -- multiple threads may call, need to find them chan on which caller is waiting Call application calls reply := client.Call(procNum, arg) procNum indicates what function to run on server WriteRequest knows the format of an RPC msg basically just the arguments turned into bits in a packet Q: could we move "xid := tc.xid" outside the mutex? after all, we are not changing anything Q: do we need to write inside the mutex? note: Go says you are responsible for preventing concurrent map ops that's one reason the update to pending is locked Listener runs as a background thread not quite right that it may need to wait on chan for caller Q: what if reply comes back very quickly? could Listener() see reply before pending[xid] entry exists? or before caller is waiting for channel? Dispatcher note that the Dispatcher echos the xid back to the client so that Listener knows which Call to wake up Q: why run the handler in a separate thread? main() note registering handler in handlers[] what will the program print?