6.824 2001 Lecture 5: RPC Goal: Tools to help us divide up programs onto multiple machines Could be client / server Could be cooperating cluster of machines, front end vs file server Could be large distributed system, like DNS Your program could send a receive network messages directly Use socket calls, read, write Agree on format of data Program would format and parse Network I/O is awkward It isn't the usual abstraction we use for inter-module interfaces We use function call instead Well supported by languages Can we extend this idea to cross-network inter-module interfaces? Remote Procedure Call: The Idea Ordinary local code: seat = reserve(int flight, str name); reserve(...){ read/write DB; return ...; } Now we're going to split into client and server halves. Client: reserve(...){ send request msg to server; recv reply msg from server; return result; } Server: main(){ while(1){ wait for request msg; call reserve(...); (the real implementation) send reply msg back to client; } } Now the programmer can make the same client reserve() call, and implement reserve() on the server the same way, and the RPC machinery takes care of the communcation. What are the potential benefits of RPC? Transparent distributed computing Existing programs don't need to be modified Can write s/w that's location-independent Enforces well-defined interfaces Allows portable interfaces Plug together separately written programs at RPC boundaries e.g. NFS and X clients and servers What does an RPC system consist of? Will base explanation on SUN RPC, which you read about. 1. Standards for wire format of RPC msgs and data types. XDR and RPC. 2. Library of routines to marshal / unmarshal data. 3. Stub generator, or RPC compiler, to produce "stubs". For client: marshal arguments, call, wait, unmarshal reply. For server: unmarshal arguments, call real fn, marshal reply. 4. Server framework: Dispatch each call message to correct server stub. 5. Client framework: Give each reply to correct waiting thread / callback. 6. Binding: how does client find the right server? What needs to be in an RPC request message? (all 32 bits. this is wire format.) xid call/reply rpc version program # program version procedure # auth stuff arguments Of main interest: xid, prog#, proc# Server dispatch uses prog#, proc# Client reply dispatch uses xid Client remembers the xid of each outstanding call Authentication fields An attempt to do cryptographic security at RPC level Transparent to application code Turns out not to work well What "security" means is too app-dependent Typically just holds your numeric UNIX user id, not verification at all The arguments Also encoded with XDR Arguments may be scattered all over memory; linearize them Easy for e.g. int -- same representation, though portable byte order... Arrays and strings? Prepend a length. Complex data structures? e.g. hash table? Unlikely XDR could deal with this automatically. Could benefit from better language support, e.g. run-time type tags. What needs to be in an RPC reply? xid call/reply accepted? (vs bad rpc version, or auth failure) auth stuff success? (vs bad prog/proc #) results How does the stub generator work? You give it a description of the procedure calls and arg/res data types. Sun defines a C-like standard, described in the XDR RFC. It produces: Routines to marshall / unmarshall. Routines to read/write call on the wire. Maybe client / server stubs. What does the client framework do? Keeps track of outstanding requests. For each, xid and caller's thread / callback. Matches replies to caller. Might be multiple callers using one socket. NFS client in kernel. Usually handles timeing out and retransmitting. What does the server framework do? Need a context in which to execute the procedure. In a threaded system: Create a new thread per request. Master thread reads socket[s]. Or a fixed pool of threads, and a queue if too many requests. NFS srvrs. Or just one thread, serial execution. Simplifies concurrency. X srvr. Key feature: support for concurrent RPCs If one RPC takes multiple blocking steps to compute, Can I serve another one in the meantime? For example, DNS. Service routine is an RPC client. May also avoid deadlock if I send RPC to ... to myself In an async programming system: Callback registered per prog#/proc#. (Rather than per socket. fdcb() calls un-marshaling function). What about binding? Client needs to be able to talk to the correct server It needs an IP address Use DNS. Client knows RPC prog #, needs to know server's TCP/UDP port # Could use a well-known port: NFS uses port 2049 Could use a "port mapper" per server server programs register prog#/port with port mapper clients can ask for port, given prog# avoids dedicating scarce port #s to services Example in the handout rx.x is an XDR description declares one program can potentially have many procedures just one procedure declared data types described program/procedure numbers specified Server code Set up connection Wrap it with transport (packetizer) and RPC parser asrv will call our callback for each RPC we're responsible for dispatch Client code sets up connection to server just once gets an "aclnt" handle can use it for multiple RPCs reserve_call stub uses aclnt, specifies proc# aclnt takes care of marshaling, retransmission, waiting multiple client calls may be outstanding callback registered for the xid of each one Did we achieve the transparency we want? Hides marshal / unmarshal. Hides details of send / recv. And TCP vs UDP &c. Hides who the client is. Why does this look ugly? It does *not* hide remote access from programmer. Async style. Cannot hide request/reply inside a stub fn. Which server are we making the call to? Some details are argument passing are different. But in general most of the network I/O machinery is hidden. But does it have the same semantics as a purely local program? [Picture: just req/reply arrows] Two modules on same machine, function call vs. different machines, RPC Does it behave in the same way? I.e. does our use of RPC make remoteness semantically transparent? Suppose the RPC system gets no reply from the server? [time diagrams] RPC machinery re-sends request, transparent to client Maybe first *reply* was lost -- now two reservations! Or maybe first request got last seat, 2nd request denied. Can we fix this transparently? Partial failures -- the bigger picture Local computing: it works, or the whole thing crashes Dist computing: Failures of just server, or client, or network Usually can't tell what went wrong Maybe server is up, but very slow? How does the remaining part of the system continue / recover? Can RPC system recover transparently? Client can re-send to get at-least-once. Can server implement at-most-once? (then we'd get exactly once...) Server code: if seen xid return previous answer else do_reserve() record xid, answer return answer What if server crashes just after do_reserve? Then retransmission will call do_reserve() again! We need an atomic transaction that does do_reserve() and records xid. But now server application code has to cooperate closely w/ RPC impl. What if server didn't already use transactions, DB? Or has an incompatible plan? Solvable, but not in a way that's at all transparent. Usually better for RPC to not bother w/ at-most-once, app handles alone. Areas of RPC non-transparency 1. Partial failure 2. Latency 3. Memory access. pointers, complex data structures. Write-sharing. 4. Concurrency. 5. Synchrony. You might not want to wait for reply 6. Security. You can rarely deal with it transparently. Solutions generally involve exposing RPC properties to the application. Not a good idea to try to hide them Apps may have to be dramatically re-designed for distribution Example: NFS writes NFS was one of the earliest users of RPC in common use. We still use it. RPC is simple: write(file, offset, data, length) More or less mimics what the disk file system does internally. So more or less transparent. 8k at a time, so it will fit in a packet Clear what the intended meaning of this is, right? What are write's actual semantics? What can client count on happening when RPC reply arrives? Presumably write is done, and client can forget about the write. What if server crashes and reboots just after sending reply? The client won't know anything about this. But the data had better be on the disk. Not like a local crash, in which client program dies too. Then, in a sense, client *does* know about the crash. Consequence: Data must be on disk before server sends write() reply. Cannot just write to server's disk cache and return. That means i-node must be updated with new length. And indirect block must be updated with new block #. So three sync disk writes, in different places on disk. Assume 10 ms per seek, so 30 ms per write RPC. 33 writes per second, or 200kbytes per second. But disk and net hardware can move data at 10 megabytes per second! So 2% efficiency. Not good. Solutions to the NFS write problem? Live with slow writes. That's largely what happens. Change file system semantics: Written data may unexpectedly disappear if server reboots! Preserves low-level transparency (i.e. write() interface) but very non-transparent to users Or change interface (break transparency completely): This is now NFS v3 works. Write() just puts data in server cache, then sends reply. Server later batches disk writes from its cache for efficiency. Client *keeps* data in its cache after write returns. When client calls close() or wants to reclaim disk cache space New RPC to force server to write all data Also checks that the server hasn't rebooted in the meantime. If it has, re-send from client disk cache. Conclusion Automatic marshaling has been a big success Mimicing procedure call interface is not that useful Attempt at full transparency has been mostly a failure But people have tried hard and build neat systems -- Network Objects