6.824 Lecture 3: RPC Outline RPC RPC in yfs RPC semantics RPC and threads Lab 2 RPC a stylized version of client/server communication that attempts to make remote procedure calls look like ordinary procedure calls. draw picture with stub, request, stub on server, reply key properties: easy to write programs with model programmers are familiar with good match for many distributed applications (client/server) hides details (e.g., marshaling/unmarshaling) alternatives? directly programming with sockets distributed-shared memory (later in the class) map/reduce dryad MPI ... RPC seems to have won (or lost against sockets?) XML RPC Java RMI Sun RPC map/reduce + dryad implemented using RPC? Key challenges: the semantics in the face of: communication failures (messages may be delayed, variable round trip, never arrive) machines failures did server fail just before the processing the request or just after? sometimes impossible tell the difference between communication failures and machine failuers YFS RPC library rpc.h: the interface to the RPC system; let's look at it briefly. rpcc (+caller) and rpcs marshaling rpc.cc: the implementation. rrpcc creates two threads 216: clock_loop: retransmissions 204: chan_loop: waiting for replies on a channel call1: an RPC! is line 103 (xid++) thread safe? (thanks evan!) 115: unlock(&m); why? 132: lock(&ca.m); why? 170: lock(_timeout_lock); why? 177: what is "pthread_cond_signal"? 182: what is "pthread_cond_wait(&ca.c, &ca.m)"? 254: gotreply: a reply pthread_cond_broadcast(&ca->c)? how many threads? what is the documented race about? 216: clock_loop what is going on here? 301: rpcs another thread, loop, which gets messages 416: loop a new thread for each request can we call pthread methods in RPC handlers? (yes, they are threads) 346: dispatch: why can unlocked before the end? is it safe? h->fn(args, fn) invokes the requested procedure are duplicates detected? what is the RPC semantics? what are the implications for the lock server? can a handler not send a reply? what does lossy do? drop? delay? why a try statement? chan.cc uses both udp and tcp; why? what is the protocol? keep sending until client receives some response. RPC paper what semantics does RPC provide? let's see what the protocol is. when the calls returns to the user, the server has run the procedure once otherwise, an exception, the server has ran the procedure once or not at all let's take an example: acquiring a lock if client and server stay up, client receives lock if client fails, it may have the lock or not (server needs a plan!) if server fails, client may have lock or not if server immediately recovers, client will receive an exception what does a client do in the case of an exception? need to implement some application-specific protocol ask server, do i have the lock? server needs to have a plan for remembering state across reboots e.g., store locks on disk. YFS's RPC is quite different: at-least-once (if we never give up) clients keep trying. server may run procedure several times server must use application state to handle duplicates if requests are not idempotent but difficult to make all request idempotent e.g., server good store on disk who has lock and req id check table for each requst even if server fails and reboots, we get correct semantics What is right? depends where RPC is used. simple applications: at-most-once is cool (more like procedure calls) more sophisticated applications: need an application-level plan in both cases not clear at-once gives you a leg up => Handling machine failures makes RPC different than procedure calls Interactions betweens threads and RPC can one hold locks across an RPC call? should one? can one make an RPC in a handler? what is the risk of this kind of a code in a handler? for (iterator i = beginlist(); !eol(); i++) { unlock(&l) call(client[i], ...) lock(&l) } Lab 2 draw picture with loop back + mount