6.824 Lecture 2: RPC and threads RPC goal: easy-to-program network communication hides most details of client/server communication makes call look much like ordinary procedure call server handlers also look much like ordinary procedures alternatives? directly programming with sockets distributed-shared memory (later in the class) map/reduce MPI ... RPC is widely used XML RPC Java RMI Sun RPC RPC Structure client application client stubs RPC library network server RPC library dispatch server application handlers Example: lab1 lock server want client app code to look like acquire(lid) release(lid) much like a local call, very convenient actually lc->acquire(lid), lc->release(lid) lc indicates which lock server we want to talk to s/w structure app (lock_demo or _tester), lock_client, RPC, ..., RPC, lock_server lock_server handler pseudo-code: acquire(lid): while(held[lid] == true) { wait } held[lid] = true release(lid): held[lid] = false wakeup you will have to think about threads/mutex/condvar C++ STL map for held[] Easy challenges: how client indicates server and procedure automatic marshaling/unmarshaling of arguments/return value Hard challenge: failures network may drop, delay, duplicate, re-order messages network might break altogether, and maybe recover server might crash, and maybe re-start how to provide easy-to-use behavior to clients? Birrell RPC paper we will see many Birrell papers from Xerox PARC, which invented LANs and workstations in 1970s paper's main concerns: naming minimize # of packets (slow CPUs -> slow pkt handling) failures Naming RPC servers Used Grapevine, a name service (a little like DNS) Export(service name, server host) Import(service name) -> server host level of indirection: clients need not hard-code server names multiple servers (use closest) replacement of servers Let's talk about how a client can handle failure client sends a request suppose network discards the request packet what will client observe? what should the client do? how long should client wait before rxmt? Now suppose the network delivered request, but discarded response what will client observe? what should the client do? Simple retransmission leads to "at-least-once" behavior Would our lock_server work under at-least-once? no: send acquire send release, network delays it re-send release, received acquire again now first release delivered, incorrectly releases lock Are there any situations where at-least-once is OK? yes: if no side effects -- read-only operations How can RPC system provide better behavior? remember the RPC requests it has seen, detect duplicates requests need unique IDs, ID repeated on rxmt what to do if server sees a duplicate? client still needs the reply so server remembers replies to previously executed RPCs this yields "exactly-once" behavior Turns out exactly-once is difficult any guesses why? the hard case: server crashes just as it receives request did it execute, and crash before sending reply? or crash before executing? that is, should server re-execute it after restart? Birrell RPC protocol provides "at-most-once" server says "ok" -> executed once server says "???" -> zero or one times, unknown which if server restarts, forgetting replies[] table of completed RPCs Key remaining problem w/ at-most-once client sends request server crashes before sending reply server restarts client re-sends request how does server realize it is a duplicate? What exact situation do we need to detect? retransmitted request server might have seen earlier transmission before crash How to detect cross-crash retransmission? server has a number that uniquely identifies restarts Birrell calls it the ID, we call it the server nonce client obtains server's ID when it first connects during "bind" client sends server ID in every RPC request server checks whether ID in request == current ID if equal, then any previous transmission will be in server's replies[] table if not equal, then there is a problem What to do when server detects cross-crash retransmission? might have been executed already, might not have been send error back to the client and hope it knows how to deal this situation is pretty rare we will have more to say about server crashes later in the course How to ensure server never reuses an ID? server could store ID on disk (if it has a disk) or use boot time (if it has access to a clock) or use a big random number (if it has a source of randomness) When can server discard old saved return values? after e.g. five seconds? no! server can discard if client will never retransmit have client tell server which replies it has received streamlined version: client gives requests ascending numbers, called xids includes xid in every request server includes xid in reply client tells server highest xid for which it has all prev replies includes this in every request Example 1: ordinary calls I'm going to use notation from our RPC system bind req: xid=1 sn=? cn=33 xid_rep=5 proc=1 bind reply: xid=1 sn=22 ... req: xid=6 sn=22 cn=33 xid_rep=5 proc=2 args... server deletes replies w/ xid<=5 from replies[] table replies[xid=6] = r1 reply: xid=6 r1 req: xid=7 sn=22 cn=33 xid_rep=6 proc=2 args... reply: xid=7 xid_rep is explicit, rather than implied by xid, to allow multiple outstanding RPCs from one client e.g. xid=7 sent before reply for xid=6 arrives client has to remember a set of received replies, find max xid for which all previous replies have arrived to cover case in which reply for xid=7 arrives before xid=6 Example 2: slow server req: xid=8 ... rxmt: xid=8 ... if handler has finished when 2nd req arrives: reply with replies[xid=8] so will send two replies client will ignore 2nd reply, no-one waiting for that xid if handler has not finished: ignore request Example 3: server reboot req: xid=9 sn=22 cn=33 ... server crash, reboot, new sn=23 rxmt: xid=9 sn=22 cn=33 ... server sees 22 != 23, replies FORGOTTEN YFS RPC library structure app (many threads) x_client y_client rpcc rpcc conn conn ... conn rpcs x_server (many threads) RPC major state rpcc: current connection to server table of outstanding RPCs which xids have been replied to (for at-most-once) rpcs: handler table active RPCs remembered replies l02.cc hand-out has simplified RPC code from lab1 I've omitted error checks, locks, &c Simplified some C++ notation Check the real code! lock_demo.cc creates a lock_client, tells it where to connect lock_client::lock_client creates rpcc, tells it where to connect calls bind() to get server_nonce lock_client::stat calls call, proc #, arguments, &return rpcc::call1 rpcc::got_pdu rpcs::rpcs rpcs::dispatch rpcs::checkduplicate_and_update rpcs::add_reply