6.824 2012 Lecture 2: RPC Remote Procedure Call (RPC) a key piece of distrib sys machinery -- and lab 1 goal: easy-to-program network communication hides most details of client/server communication client call is much like ordinary procedure call server handlers are much like ordinary procedures alternatives? format/send network packets directly distributed shared memory map/reduce ... RPC is widely used! RPC ideally makes net communication look just like fn call: Client: z = fn(x, y) Server: fn(x, y) { compute return z } RPC aims for this level of transparency Hope: even novice programmers can use function call! RPC message diagram: Client Server request---> <---response RPC software structure: Client Client RPC Net RPC Server Handler App Stub lib lib Stub call marshall send request recv dispatch unmarshall fn(args) wait wait work... return unmarshall recv response send marshall return A few details: Marshalling is tricky for arrays, pointers, objects, &c Client needs to find server's network address ("binding") Client may have multiple threads sending to server match response to waiting thread Server may create thread per request Hard challenge: failures network may drop, delay, duplicate, re-order messages network might break altogether, and maybe recover server might crash, and maybe re-start can we hide this from client applications and server handlers? Birrell RPC paper we will see many Birrell papers from Xerox PARC, which invented LANs and workstations in 1970s paper's main concerns: naming minimize # of packets (slow CPUs -> slow pkt handling) (we use TCP) failures Naming RPC servers Used Grapevine, a name service (a little like DNS) Export(service name, server host) Import(service name) -> server host level of indirection: clients need not hard-code server names multiple servers (use closest) replacement of servers Let's talk about how a client can handle failure client sends a request suppose network discards the request packet what will client observe? what should the client do? how long should client wait before rxmt? Would it be enough to layer RPC on top of TCP? After all, TCP is a reliable transport protocol Now suppose the network delivered request, but discarded response what will client observe? what should the client do? Simple retransmission leads to "at-least-once" behavior Would our lock_server work under at-least-once? no: [arrow diagram] send acquire, response lost timeout, re-send acquire, server will wait forever! you could argue that server shouldn't double-grant but there are other bad scenarios: send acquire send release, network delays it re-send release, received acquire again now first release delivered, incorrectly releases lock Is at-least-once ever OK? yes: if no side effects -- read-only operations yes: if application has its own plan for detecting duplicates How can RPC system provide better behavior? server remember the requests it has seen, detect duplicates requests need unique IDs (XIDs), ID repeated on rxmt what to do if server sees a duplicate? client may still need the reply so server remembers replies to previously executed RPCs this yields "exactly-once" behavior assumes failures are eventually repaired and that client re-tries forever Exactly-once is possible but difficult Server must remember table across reboots (e.g. on disk) Why doesn't the following server code provide exactly-once? if s[xid] != DONE: x = handler(args) s[xid] = DONE else: return previous value We will return to this problem later in the course Birrell RPC protocol provides "at-most-once" server says OK -> executed once server says CRASH -> zero or one times, unknown which why? much easier than exactly once, more useful than at-least-once at-most-once is easy if server does not crash: server maintains s[xid] (state), r[xid] (return value) (this code assumes just one server thread, thus no locks or PENDING) if s[xid] == DONE: return r[xid] x = handler(args) s[xid] = DONE r[xid] = x return x What if the server crashes? Just before or just after call to handler() After server restart: Can't tell whether handler() was called Client will re-send request (no response due to crash) Server *must* reply with CRASH How to decide if a request is a re-sent copy of one sent before crash? Answer: server has a number that uniquely identifies restarts Birrell calls it the ID, we call it the server nonce client obtains server's ID when it first connects during "bind" client sends server ID in every RPC request server checks whether ID in request == current ID if equal, then any previous transmission will be in server's replies[] table if not equal, return CRASH What should client do if server says CRASH? might have been executed already, might not have been thus you know: maybe you transferred the money, maybe you didn't but you know it didn't transfer twice and you know you have to track down what actually happened e.g. ask bank for transaction record this situation is pretty rare more on this later in the course How to ensure server never reuses an ID? server could store ID on disk (if it has a disk) or use boot time (if it has access to a clock) or use a big random number (if it has a source of randomness) When can server discard old saved return values? after e.g. five seconds? no! server can discard if client will never retransmit == if client has received response have client tell server which replies it has received or: client tells server it has received all replies up through some XID lab RPC's xid_rep, in every request message Lab RPC request message format (TCP/IP headers...) xid proc# client nonce server nonce xid_rep arguments... Lab RPC response message format (TCP/IP headers...) xid error code return value [draw these on the board as arrow diagrams] Example 1: ordinary calls bind req: xid=1 sn=? proc=1 bind reply: xid=1 sn=22 ... req: xid=6 sn=22 xid_rep=5 proc=2 args... r[6] = r1 server deletes xid<=5 from s[]/r[] reply: xid=6 r1 req: xid=7 sn=22 xid_rep=6 proc=2 args... reply: xid=7 Why not let xid_rep be implicitly xid - 1 ? to allow multiple outstanding RPCs from one client example: xid=7 sent before reply for xid=6 arrives Client has to compute xid_rep it is highest xid through which all replies have been received our code keeps set of replied-to xids, looks for contiguous prefix Example 2: duplicate requests (client resends too soon, or response lost) req: xid=8 ... req: xid=8 ... (again) if handler has finished when 2nd req arrives: reply with r[8] thus two replies with xid=8 client will ignore 2nd reply, no thread waiting for that xid if handler has not finished: ignore request (thus need more than just DONE -- INPROGRESS) Example 3: server reboot req: xid=9 sn=22 ... server crash, reboot, new sn=23 req: xid=9 sn=22 ... (retransmission) server sees 22 != 23, replies FORGOTTEN l02.cc hand-out has simplified RPC code from lab1 I've omitted error checks, locks, &c Simplified some C++ notation Check the real code! lock_demo.cc creates a lock_client, tells it where to connect lock_client::lock_client creates rpcc, tells it where to connect calls bind() to get server_nonce lock_client::stat calls call(), proc #, arguments, &return rpcc::call1 msg: xid proc cn sn xid_rep args... xid_rep_window_.front()? update_xid_rep(xid)? rpcc::got_pdu rpcs::rpcs rpcs::dispatch why the nonce check? what prob does this solve? when could INPROGRESS occur? is FORGOTTEN possible? how? long delayed request (rebooted server is dealt w/ separately) what is the client nonce for? checkduplicate_and_update(cn, xid, xid_rep, &b, &sz) must keep, for each cn, state about each xid. done? b+sz if done. if s[cn/xid] == INPROGRESS INPROGRESS if s[cn/xid] == DONE DONE, return b and sz if xid <= previous xid_rep FORGOTTEN else s[cn/xid] = INPROGRESS NEW must also trim state discard any cn/xid if xid <= xid_rep must also free buf what must add_reply(cn, xid, b, sz) do? checkduplicate_and_update already set s[cn/xid] = INPROGRESS s[cn/xid] = DONE remember b and sz.