6.824 2012 Lecture 2: RPC

Remote Procedure Call (RPC)
  a key piece of distrib sys machinery -- and lab 1
  goal: easy-to-program network communication
    hides most details of client/server communication
    client call is much like ordinary procedure call
    server handlers are much like ordinary procedures
  alternatives?
    format/send network packets directly
    distributed shared memory
    map/reduce
    ...
  RPC is widely used!

RPC ideally makes net communication look just like fn call:
  Client:
    z = fn(x, y)
  Server:
    fn(x, y) {
      compute
      return z
    }
  RPC aims for this level of transparency
  Hope: even novice programmers can use function call!

RPC message diagram:
  Client             Server
    request--->
       <---response

RPC software structure:

Client  Client     RPC       Net      RPC             Server      Handler
 App     Stub      lib                lib              Stub

call    marshall   send     request   recv  dispatch  unmarshall  fn(args)
                   wait               wait                        work...
return  unmarshall recv     response  send            marshall    return
 
A few details:
  Marshalling is tricky for arrays, pointers, objects, &c
  Client needs to find server's network address ("binding")
  Client may have multiple threads sending to server
    match response to waiting thread
  Server may create thread per request

Hard challenge: failures
  network may drop, delay, duplicate, re-order messages
  network might break altogether, and maybe recover
  server might crash, and maybe re-start
  can we hide this from client applications and server handlers?

Birrell RPC paper
  we will see many Birrell papers
  from Xerox PARC, which invented LANs and workstations in 1970s
  paper's main concerns:
    naming
    minimize # of packets (slow CPUs -> slow pkt handling) (we use TCP)
    failures

Naming RPC servers
  Used Grapevine, a name service (a little like DNS)
  Export(service name, server host)
  Import(service name) -> server host
  level of indirection:
    clients need not hard-code server names
    multiple servers (use closest)
    replacement of servers

Let's talk about how a client can handle failure
  client sends a request
  suppose network discards the request packet
  what will client observe?
  what should the client do?
  how long should client wait before rxmt?

Would it be enough to layer RPC on top of TCP?
  After all, TCP is a reliable transport protocol

Now suppose the network delivered request, but discarded response
  what will client observe?
  what should the client do?

Simple retransmission leads to "at-least-once" behavior

Would our lock_server work under at-least-once?
  no: [arrow diagram]
    send acquire, response lost
    timeout, re-send acquire, server will wait forever!
  you could argue that server shouldn't double-grant
  but there are other bad scenarios:
    send acquire
    send release, network delays it
    re-send release, received
    acquire again
    now first release delivered, incorrectly releases lock

Is at-least-once ever OK?
  yes: if no side effects -- read-only operations
  yes: if application has its own plan for detecting duplicates

How can RPC system provide better behavior?
  server remember the requests it has seen, detect duplicates
    requests need unique IDs (XIDs), ID repeated on rxmt
  what to do if server sees a duplicate?
    client may still need the reply
    so server remembers replies to previously executed RPCs
  this yields "exactly-once" behavior
    assumes failures are eventually repaired
    and that client re-tries forever

Exactly-once is possible but difficult
  Server must remember table across reboots (e.g. on disk)
  Why doesn't the following server code provide exactly-once?
    if s[xid] != DONE:
      x = handler(args)
      s[xid] = DONE
    else:
      return previous value
  We will return to this problem later in the course

Birrell RPC protocol provides "at-most-once"
  server says OK -> executed once
  server says CRASH -> zero or one times, unknown which
  why? much easier than exactly once, more useful than at-least-once

at-most-once is easy if server does not crash:
  server maintains s[xid] (state), r[xid] (return value)
  (this code assumes just one server thread, thus no locks or PENDING)
  if s[xid] == DONE:
    return r[xid]
  x = handler(args)
  s[xid] = DONE
  r[xid] = x
  return x

What if the server crashes?
  Just before or just after call to handler()
  After server restart:
    Can't tell whether handler() was called
    Client will re-send request (no response due to crash)
    Server *must* reply with CRASH

How to decide if a request is a re-sent copy of one sent before crash?

Answer:
  server has a number that uniquely identifies restarts
    Birrell calls it the ID, we call it the server nonce
  client obtains server's ID when it first connects
    during "bind"
  client sends server ID in every RPC request
  server checks whether ID in request == current ID
    if equal, then any previous transmission will be in server's replies[] table
    if not equal, return CRASH

What should client do if server says CRASH?
  might have been executed already, might not have been
  thus you know:
    maybe you transferred the money, maybe you didn't
    but you know it didn't transfer twice
    and you know you have to track down what actually happened
      e.g. ask bank for transaction record
  this situation is pretty rare
  more on this later in the course

How to ensure server never reuses an ID?
  server could store ID on disk (if it has a disk)
  or use boot time (if it has access to a clock)
  or use a big random number (if it has a source of randomness)

When can server discard old saved return values?
  after e.g. five seconds? no!
  server can discard if client will never retransmit
    == if client has received response
  have client tell server which replies it has received
  or: client tells server it has received all replies up through some XID
    lab RPC's xid_rep, in every request message

Lab RPC request message format
  (TCP/IP headers...)
  xid
  proc#
  client nonce
  server nonce
  xid_rep
  arguments...

Lab RPC response message format
  (TCP/IP headers...)
  xid
  error code
  return value

[draw these on the board as arrow diagrams]

Example 1: ordinary calls
  bind req: xid=1 sn=? proc=1
  bind reply: xid=1 sn=22
  ...
  req: xid=6 sn=22 xid_rep=5 proc=2 args...
    r[6] = r1
    server deletes xid<=5 from s[]/r[]
  reply: xid=6 r1
  req: xid=7 sn=22 xid_rep=6 proc=2 args...
  reply: xid=7

Why not let xid_rep be implicitly xid - 1 ?
  to allow multiple outstanding RPCs from one client
  example: xid=7 sent before reply for xid=6 arrives

Client has to compute xid_rep
  it is highest xid through which all replies have been received
  our code keeps set of replied-to xids, looks for contiguous prefix

Example 2: duplicate requests (client resends too soon, or response lost)
  req: xid=8 ...
  req: xid=8 ... (again)
  if handler has finished when 2nd req arrives:
    reply with r[8]
    thus two replies with xid=8
    client will ignore 2nd reply, no thread waiting for that xid
  if handler has not finished:
    ignore request (thus need more than just DONE -- INPROGRESS)

Example 3: server reboot
  req: xid=9 sn=22 ...
  server crash, reboot, new sn=23
  req: xid=9 sn=22 ... (retransmission)
  server sees 22 != 23, replies FORGOTTEN

l02.cc hand-out has simplified RPC code from lab1
  I've omitted error checks, locks, &c
  Simplified some C++ notation
  Check the real code!

lock_demo.cc
  creates a lock_client, tells it where to connect

lock_client::lock_client
  creates rpcc, tells it where to connect
  calls bind() to get server_nonce

lock_client::stat
  calls call(), proc #, arguments, &return

rpcc::call1
  msg: xid proc cn sn xid_rep args...
  xid_rep_window_.front()?
  update_xid_rep(xid)?

rpcc::got_pdu

rpcs::rpcs

rpcs::dispatch
  why the nonce check? what prob does this solve?
  when could INPROGRESS occur?
  is FORGOTTEN possible? how?
    long delayed request
    (rebooted server is dealt w/ separately)
  what is the client nonce for?

checkduplicate_and_update(cn, xid, xid_rep, &b, &sz)
  must keep, for each cn, state about each xid.
    done? b+sz if done.
  if s[cn/xid] == INPROGRESS
    INPROGRESS
  if s[cn/xid] == DONE
    DONE, return b and sz
  if xid <= previous xid_rep
    FORGOTTEN
  else
    s[cn/xid] = INPROGRESS
    NEW

  must also trim state
    discard any cn/xid if xid <= xid_rep
    must also free buf

what must add_reply(cn, xid, b, sz) do?
  checkduplicate_and_update already set s[cn/xid] = INPROGRESS
  s[cn/xid] = DONE
  remember b and sz.