6.824 Lecture 2: RPC and threads

RPC
  goal: easy-to-program network communication
    hides most details of client/server communication
    makes call look much like ordinary procedure call
    server handlers also look much like ordinary procedures
  alternatives?
    directly programming with sockets
    distributed-shared memory (later in the class)
    map/reduce
    MPI
    ...
  RPC is widely used
    XML RPC
    Java RMI
    Sun RPC

RPC Structure
   client application
   client stubs
   RPC library
   network
   server RPC library
   dispatch
   server application handlers

Example: lab1 lock server
  want client app code to look like
    acquire(lid)
    release(lid)
    much like a local call, very convenient
    actually lc->acquire(lid), lc->release(lid)
    lc indicates which lock server we want to talk to
  s/w structure
    app (lock_demo or _tester), lock_client, RPC, ..., RPC, lock_server
  lock_server handler pseudo-code:
  acquire(lid):
    while(held[lid] == true) {
      wait
    }
    held[lid] = true
  release(lid):
    held[lid] = false
    wakeup
  you will have to think about threads/mutex/condvar
  C++ STL map for held[]

Easy challenges:
  how client indicates server and procedure
  automatic marshaling/unmarshaling of arguments/return value

Hard challenge: failures
  network may drop, delay, duplicate, re-order messages
  network might break altogether, and maybe recover
  server might crash, and maybe re-start
  how to provide easy-to-use behavior to clients?

Birrell RPC paper
  we will see many Birrell papers
  from Xerox PARC, which invented LANs and workstations in 1970s
  paper's main concerns:
    naming
    minimize # of packets (slow CPUs -> slow pkt handling)
    failures

Naming RPC servers
  Used Grapevine, a name service (a little like DNS)
  Export(service name, server host)
  Import(service name) -> server host
  level of indirection:
    clients need not hard-code server names
    multiple servers (use closest)
    replacement of servers

Let's talk about how a client can handle failure
  client sends a request
  suppose network discards the request packet
  what will client observe?
  what should the client do?
  how long should client wait before rxmt?

Now suppose the network delivered request, but discarded response
  what will client observe?
  what should the client do?

Simple retransmission leads to "at-least-once" behavior

Would our lock_server work under at-least-once?
  no:
    send acquire
    send release, network delays it
    re-send release, received
    acquire again
    now first release delivered, incorrectly releases lock

Are there any situations where at-least-once is OK?
  yes: if no side effects -- read-only operations

How can RPC system provide better behavior?
  remember the RPC requests it has seen, detect duplicates
    requests need unique IDs, ID repeated on rxmt
  what to do if server sees a duplicate?
    client still needs the reply
    so server remembers replies to previously executed RPCs
  this yields "exactly-once" behavior

Turns out exactly-once is difficult
  any guesses why?
  the hard case: server crashes just as it receives request
    did it execute, and crash before sending reply?
    or crash before executing?
    that is, should server re-execute it after restart?

Birrell RPC protocol provides "at-most-once"
  server says "ok" -> executed once
  server says "???" -> zero or one times, unknown which
    if server restarts, forgetting replies[] table of completed RPCs

Key remaining problem w/ at-most-once
  client sends request
  server crashes before sending reply
  server restarts
  client re-sends request
  how does server realize it is a duplicate?

What exact situation do we need to detect?
  retransmitted request
  server might have seen earlier transmission before crash

How to detect cross-crash retransmission?
  server has a number that uniquely identifies restarts
    Birrell calls it the ID, we call it the server nonce
  client obtains server's ID when it first connects
    during "bind"
  client sends server ID in every RPC request
  server checks whether ID in request == current ID
    if equal, then any previous transmission will be in server's replies[] table
    if not equal, then there is a problem

What to do when server detects cross-crash retransmission?
  might have been executed already, might not have been
  send error back to the client and hope it knows how to deal
  this situation is pretty rare
  we will have more to say about server crashes later in the course

How to ensure server never reuses an ID?
  server could store ID on disk (if it has a disk)
  or use boot time (if it has access to a clock)
  or use a big random number (if it has a source of randomness)

When can server discard old saved return values?
  after e.g. five seconds? no!
  server can discard if client will never retransmit
  have client tell server which replies it has received
  streamlined version:
    client gives requests ascending numbers, called xids
    includes xid in every request
    server includes xid in reply
    client tells server highest xid for which it has all prev replies
    includes this in every request

Example 1: ordinary calls
  I'm going to use notation from our RPC system
  bind req: xid=1 sn=? cn=33 xid_rep=5 proc=1
  bind reply: xid=1 sn=22
  ...
  req: xid=6 sn=22 cn=33 xid_rep=5 proc=2 args...
    server deletes replies w/ xid<=5 from replies[] table
    replies[xid=6] = r1
  reply: xid=6 r1
  req: xid=7 sn=22 cn=33 xid_rep=6 proc=2 args...
  reply: xid=7
  xid_rep is explicit, rather than implied by xid,
    to allow multiple outstanding RPCs from one client
    e.g. xid=7 sent before reply for xid=6 arrives
  client has to remember a set of received replies, find max xid
    for which all previous replies have arrived
    to cover case in which reply for xid=7 arrives before xid=6

Example 2: slow server
  req: xid=8 ...
  rxmt: xid=8 ...
  if handler has finished when 2nd req arrives:
    reply with replies[xid=8]
    so will send two replies
    client will ignore 2nd reply, no-one waiting for that xid
  if handler has not finished:
    ignore request

Example 3: server reboot
  req: xid=9 sn=22 cn=33 ...
  server crash, reboot, new sn=23
  rxmt: xid=9 sn=22 cn=33 ...
  server sees 22 != 23, replies FORGOTTEN

YFS RPC library structure
  app (many threads)
  x_client y_client
  rpcc     rpcc
  conn     conn
  ...
  conn
  rpcs
  x_server (many threads)

RPC major state
  rpcc:
    current connection to server
    table of outstanding RPCs
    which xids have been replied to (for at-most-once)
  rpcs:
    handler table
    active RPCs
    remembered replies

l02.cc hand-out has simplified RPC code from lab1
  I've omitted error checks, locks, &c
  Simplified some C++ notation
  Check the real code!

lock_demo.cc
  creates a lock_client, tells it where to connect

lock_client::lock_client
  creates rpcc, tells it where to connect
  calls bind() to get server_nonce

lock_client::stat
  calls call, proc #, arguments, &return

rpcc::call1

rpcc::got_pdu

rpcs::rpcs

rpcs::dispatch

rpcs::checkduplicate_and_update

rpcs::add_reply