6.824 Lecture 2: RPC and threads

Outline
  RPC
  Two RPC implementations:
    - Birrell RPC
    - RPC in yfs
  RPC semantics
  RPC differences

RPC Structure
   client program
   client stub  --- marshall
   RPC comm package --- deliver packets over network
   server stub  --- unmarshall
   server program

RPC
  a stylized version of client/server communication that attempts to
  make remote procedure calls look like ordinary procedure calls.
    draw picture with stub, request, stub on server, reply
  key properties:
    easy to write programs with
      model programmers are familiar with
      good match for many distributed applications (client/server)
      hides details (e.g., marshaling/unmarshaling)
  alternatives?
    directly programming with sockets
    distributed-shared memory (later in the class)
    map/reduce
    dryad
    MPI
    ...
  RPC seems to have found usages
    XML RPC
    Java RMI
    Sun RPC
    map/reduce + dryad implemented using RPC?

Key challenges:
   the semantics in the face of:
     communication failures (messages may be delayed, variable round
       trip, never arrive)
     machines failures
       did server fail just before the processing the request or just after?
   sometimes impossible tell the difference between communication failures and
     machine failuers  

Birrell RPC
   we will see many Birrell papers
   3 Mbit/s ethernet hardware is quite slow
   goals:
     make rpc == procedure call
       e.g., no-timeout on calls
       but server failures breaks RPC == procedure call
     highly efficient

Possible RPC semantics:
  At-least-once  (2008 6.824 lab)
  At-most once (2009 and 2010 6.824 lab and RPC paper)
    how does the paper achieve at-most-once?  (bind is important!)
      UID on export
  Exactly-once?
    See Argus paper later in semester


Naming servers
   What is Grapevine? (precursor to DNS)
   Export
   Import
   Returns unique ID

Birrell RPC protocol for at-most-once semantics
  Simple calls
    two packets
      call identifier: machine ID, process, and seqno
      unique ID returned by bind/export
    one outstanding request
    timeout: resend request
    long-running procedures on server
      cause timeout
    no tear down
  Separate protocol for complicated calls
    see figure 4
  What semantics:
    comm failure?
    machine failure?
    could a restarted server export the same unique ID?

At-least-once versus at-most-once?
  let's take an example: acquiring a lock
    if client and server stay up, client receives lock
    if client fails, it may have the lock or not (server needs a plan!)
    if server fails, client may have lock or not
      at-least-once: client keeps trying
      at-most-once:  client will receive an exception
    what does a client do in the case of an exception?
      need to implement some application-specific protocol
        ask server, do i have the lock?
	server needs to have a plan for remembering state across reboots 
        e.g., store locks on disk.
    at-least-once (if we never give up)
      clients keep trying.  server may run procedure several times
      server must use application state to handle duplicates
        if requests are not idempotent
	but difficult to make all request idempotent
      e.g., server good store on disk who has lock and req id
        check table for each requst
      	even if server fails and reboots, we get correct semantics
  What is right?
    depends where RPC is used.
       simple applications: 
         at-most-once is cool (more like procedure calls)
       more sophisticated applications: 
         need an application-level plan in both cases
	 not clear at-once gives you a leg up
  => Handling machine failures makes RPC different than procedure calls


YFS RPC versus RPC in paper
  Both at-most-once
    Using the same technique (bind and exchange a nonce)
  Protocols differ
    YFS runs on reliable transport
    YFS has no cross-layer optimizations (e.g., piggybacking ACK)
    YFS allows multiple outstanding requests
  RPC not integrated with kernel and language
    Mesa is GC-ed
    Makes RPC implementation challenging
      Thread/processes
      Blocking kernel ops
    Thread cleanup difficult  --- more next lec