6.824 2001 Lecture 5: RPC

Goal:
  Tools to help us divide up programs onto multiple machines
  Could be client / server
  Could be cooperating cluster of machines, front end vs file server
  Could be large distributed system, like DNS

Your program could send a receive network messages directly
  Use socket calls, read, write
  Agree on format of data
  Program would format and parse

Network I/O is awkward
  It isn't the usual abstraction we use for inter-module interfaces
  We use function call instead
  Well supported by languages
  Can we extend this idea to cross-network inter-module interfaces?

Remote Procedure Call: The Idea
  Ordinary local code:
    seat = reserve(int flight, str name);
    reserve(...){
      read/write DB;
      return ...;
    }
  Now we're going to split into client and server halves.
  Client:
    reserve(...){
      send request msg to server;
      recv reply msg from server;
      return result;
    }
  Server:
    main(){
      while(1){
        wait for request msg;
        call reserve(...); (the real implementation)
        send reply msg back to client;
      }
    }
  Now the programmer can make the same client reserve() call,
    and implement reserve() on the server the same way,
    and the RPC machinery takes care of the communcation.

What are the potential benefits of RPC?
  Transparent distributed computing
    Existing programs don't need to be modified
    Can write s/w that's location-independent
  Enforces well-defined interfaces
  Allows portable interfaces
    Plug together separately written programs at RPC boundaries
    e.g. NFS and X clients and servers

What does an RPC system consist of?
  Will base explanation on SUN RPC, which you read about.
  1. Standards for wire format of RPC msgs and data types. XDR and RPC.
  2. Library of routines to marshal / unmarshal data.
  3. Stub generator, or RPC compiler, to produce "stubs".
    For client: marshal arguments, call, wait, unmarshal reply.
    For server: unmarshal arguments, call real fn, marshal reply.
  4. Server framework:
    Dispatch each call message to correct server stub.
  5. Client framework:
    Give each reply to correct waiting thread / callback.
  6. Binding: how does client find the right server?  

What needs to be in an RPC request message? 
  (all 32 bits. this is wire format.)
  xid
  call/reply
  rpc version
  program #
  program version
  procedure #
  auth stuff
  arguments

Of main interest: xid, prog#, proc#
  Server dispatch uses prog#, proc#
  Client reply dispatch uses xid
    Client remembers the xid of each outstanding call

Authentication fields
  An attempt to do cryptographic security at RPC level
    Transparent to application code
  Turns out not to work well
    What "security" means is too app-dependent
  Typically just holds your numeric UNIX user id, not verification at all

The arguments
  Also encoded with XDR
  Arguments may be scattered all over memory; linearize them
  Easy for e.g. int -- same representation, though portable byte order...
  Arrays and strings? Prepend a length.
  Complex data structures? e.g. hash table?
    Unlikely XDR could deal with this automatically.
    Could benefit from better language support, e.g. run-time type tags.

What needs to be in an RPC reply?
  xid
  call/reply
  accepted? (vs bad rpc version, or auth failure)
  auth stuff
  success? (vs bad prog/proc #)
  results
  
How does the stub generator work?
  You give it a description of the procedure calls and arg/res data types.
    Sun defines a C-like standard, described in the XDR RFC.
  It produces:
    Routines to marshall / unmarshall.
    Routines to read/write call on the wire.
    Maybe client / server stubs.

What does the client framework do?
  Keeps track of outstanding requests.
    For each, xid and caller's thread / callback.
  Matches replies to caller.
  Might be multiple callers using one socket. NFS client in kernel.
  Usually handles timeing out and retransmitting.

What does the server framework do?
  Need a context in which to execute the procedure.
  In a threaded system:
    Create a new thread per request. Master thread reads socket[s].
    Or a fixed pool of threads, and a queue if too many requests. NFS srvrs.
    Or just one thread, serial execution. Simplifies concurrency. X srvr.
  Key feature: support for concurrent RPCs
    If one RPC takes multiple blocking steps to compute,
      Can I serve another one in the meantime?
    For example, DNS. Service routine is an RPC client.
    May also avoid deadlock if I send RPC to ... to myself
  In an async programming system:
    Callback registered per prog#/proc#.
    (Rather than per socket. fdcb() calls un-marshaling function).

What about binding?
  Client needs to be able to talk to the correct server
    It needs an IP address
    Use DNS.
  Client knows RPC prog #, needs to know server's TCP/UDP port #
  Could use a well-known port: NFS uses port 2049
  Could use a "port mapper" per server
    server programs register prog#/port with port mapper
    clients can ask for port, given prog#
    avoids dedicating scarce port #s to services

Example in the handout
  rx.x is an XDR description
  declares one program
    can potentially have many procedures
  just one procedure declared
  data types described
  program/procedure numbers specified
 
Server code
  Set up connection
  Wrap it with transport (packetizer) and RPC parser
  asrv will call our callback for each RPC
  we're responsible for dispatch
 
Client code
  sets up connection to server just once
  gets an "aclnt" handle
  can use it for multiple RPCs
  reserve_call stub uses aclnt, specifies proc#
  aclnt takes care of marshaling, retransmission, waiting
  multiple client calls may be outstanding
    callback registered for the xid of each one

Did we achieve the transparency we want?
  Hides marshal / unmarshal.
  Hides details of send / recv. And TCP vs UDP &c.
  Hides who the client is.

Why does this look ugly?
  It does *not* hide remote access from programmer.
  Async style. Cannot hide request/reply inside a stub fn.
  Which server are we making the call to?
  Some details are argument passing are different.
  But in general most of the network I/O machinery is hidden.

But does it have the same semantics as a purely local program?
  [Picture: just req/reply arrows]
  Two modules on same machine, function call
    vs. different machines, RPC
  Does it behave in the same way?
  I.e. does our use of RPC make remoteness semantically transparent?

Suppose the RPC system gets no reply from the server?
  [time diagrams]
  RPC machinery re-sends request, transparent to client
  Maybe first *reply* was lost -- now two reservations!
    Or maybe first request got last seat, 2nd request denied.
  Can we fix this transparently?

Partial failures -- the bigger picture
  Local computing: it works, or the whole thing crashes
  Dist computing:
    Failures of just server, or client, or network
    Usually can't tell what went wrong
      Maybe server is up, but very slow?
    How does the remaining part of the system continue / recover?
  Can RPC system recover transparently?

Client can re-send to get at-least-once.
Can server implement at-most-once? (then we'd get exactly once...)
  Server code:
  if seen xid
    return previous answer
  else
    do_reserve()
    record xid, answer
    return answer
  What if server crashes just after do_reserve?
    Then retransmission will call do_reserve() again!
  We need an atomic transaction that does do_reserve() and records xid.
    But now server application code has to cooperate closely w/ RPC impl.
    What if server didn't already use transactions, DB?
    Or has an incompatible plan?
    Solvable, but not in a way that's at all transparent.
    Usually better for RPC to not bother w/ at-most-once, app handles alone.

Areas of RPC non-transparency
  1. Partial failure
  2. Latency
  3. Memory access. pointers, complex data structures. Write-sharing.
  4. Concurrency.
  5. Synchrony. You might not want to wait for reply
  6. Security. You can rarely deal with it transparently.
  Solutions generally involve exposing RPC properties to the application.
    Not a good idea to try to hide them
    Apps may have to be dramatically re-designed for distribution

Example: NFS writes
  NFS was one of the earliest users of RPC in common use. We still use it.
  RPC is simple: write(file, offset, data, length)
  More or less mimics what the disk file system does internally.
    So more or less transparent.
  8k at a time, so it will fit in a packet
  Clear what the intended meaning of this is, right?

What are write's actual semantics?
  What can client count on happening when RPC reply arrives?
  Presumably write is done, and client can forget about the write.

What if server crashes and reboots just after sending reply?
  The client won't know anything about this.
  But the data had better be on the disk.
  Not like a local crash, in which client program dies too.
    Then, in a sense, client *does* know about the crash.

Consequence:
  Data must be on disk before server sends write() reply.
    Cannot just write to server's disk cache and return.
  That means i-node must be updated with new length.
  And indirect block must be updated with new block #.
  So three sync disk writes, in different places on disk.
  Assume 10 ms per seek, so 30 ms per write RPC.
  33 writes per second, or 200kbytes per second.
  But disk and net hardware can move data at 10 megabytes per second!
    So 2% efficiency. Not good.

Solutions to the NFS write problem?
  Live with slow writes. That's largely what happens.
  Change file system semantics:
    Written data may unexpectedly disappear if server reboots!
    Preserves low-level transparency (i.e. write() interface)
      but very non-transparent to users
  Or change interface (break transparency completely):
    This is now NFS v3 works.
    Write() just puts data in server cache, then sends reply.
    Server later batches disk writes from its cache for efficiency.
    Client *keeps* data in its cache after write returns.
    When client calls close() or wants to reclaim disk cache space
      New RPC to force server to write all data
      Also checks that the server hasn't rebooted in the meantime.
      If it has, re-send from client disk cache.

Conclusion
  Automatic marshaling has been a big success
  Mimicing procedure call interface is not that useful
  Attempt at full transparency has been mostly a failure
  But people have tried hard and build neat systems -- Network Objects