6.824 Lecture 2: RPC and threads

Outline
  RPC
  Threads
  RPC in yfs
  RPC semantics

RPC
  a stylized version of client/server communication that attempts to
  make remote procedure calls look like ordinary procedure calls.
    draw picture with stub, request, stub on server, reply
  key properties:
    easy to write programs with
      model programmers are familiar with
      good match for many distributed applications (client/server)
      hides details (e.g., marshaling/unmarshaling)
  alternatives?
    directly programming with sockets
    distributed-shared memory (later in the class)
    map/reduce
    dryad
    MPI
    ...
  RPC seems to have found usages
    XML RPC
    Java RMI
    Sun RPC
    map/reduce + dryad implemented using RPC?

Key challenges:
   the semantics in the face of:
     communication failures (messages may be delayed, variable round
       trip, never arrive)
     machines failures
       did server fail just before the processing the request or just after?
   sometimes impossible tell the difference between communication failures and
     machine failuers  

Thread is short for thread of control, a running program with its own
program counter, stack pointer, etc.  (For this class a process is a
one of more threads executing in a single address space.)

Primary purpose: a way of running code concurrently within a single
process.  For example, in lab 1, if one client is waiting for lock a,
the server may want to process requests from other clients, in
particular ones for different locks.

The primary reasons to use concurrent programming with threads:
  exploit several processors to run an application faster
  hide long delays (e.g., while waiting for a disk do something else
    on processor)
  run long-running ops concurrenty with short ones in user interfaces
  network servers and RPC

At a minimum, a thread interface must support:
  creating and managing threads
  ways of avoiding race conditions for updates to shared variables
    assume each treads runs on its own processor, sharing a memory
    instructions that appear to be atomic, might not be (e.g., x = x + 1)
  ways of coordinating different threads

Pthread interface
  standard interface
  we use it in the labs
  Interface
    threads
      create
      join
    mutex
    condition variables
   More in next lecture.

Pitfall of multithreaded programming
  race condition; can you give me an example?
    may be difficult to reproduce
  deadlock; can you give me an example?
    better bug to have than race; you program stops when it happens
  wrong lock granularity; can you give me an example?
  starvation; can you give me an example?

YFS RPC library
  how to use: lock_demo.cc, lock_client.c, lock_smain.cc, and lock_server.cc
     lock_client.cc calls bind() first---why?  (we will see)
  rpc.h: the interface to the RPC system; let's look at it briefly.
     rpcc (+caller) and rpcs
     marshaling
  rpc.cc: the implementation.
    rrpcc creates two threads
       clock_loop: retransmissions  (you will have to do this)
       chan_loop: waiting for replies on a channel
    rpcc::bind
       a remot procedure to get a unique ID from server
    rpcc::call1: an RPC!
       must bind first
       why lock m?
       why unlock m before call1 has completed?
       lock(&ca.m); why?
       lock(_timeout_lock); why?
       what is "pthread_cond_signal(&_time_cond)"?
       what is "pthread_cond_wait(&ca.c, &ca.m)"?
    rpcc::got_reply: a reply
       pthread_cond_broadcast(&ca->c)?  how many threads?
       what is the documented race about?
    rpcc::clock_loop
       what is going on here?
    rpcs::rpcs
      another thread, loop, which gets messages
    rpcs::loop
      a new thread for each request
      can we call pthread methods in RPC handlers? (yes, they are threads)
    rpcs::dispatch:
      why can unlocked before the end? is it safe?
      h->fn(args, fn) invokes the requested procedure
      checkduplicate_and_update: you must implement this
      switch statement: how can easy case happen?
        what should checkduplicate_and_update do?
  chan.cc
    uses tcp; why?

Possible RPC semantics:
  At-least-once  (last year's 6.824 lab)
  At-most once (this year in 6.824 and RPC paper)
    how does the paper achieve at-most-once?

At-least-once versus at-most-once?
  let's take an example: acquiring a lock
    if client and server stay up, client receives lock
    if client fails, it may have the lock or not (server needs a plan!)
    if server fails, client may have lock or not
      at-least-once: client keeps trying
      at-most-once:  client will receive an exception
    what does a client do in the case of an exception?
      need to implement some application-specific protocol
        ask server, do i have the lock?
	server needs to have a plan for remembering state across reboots 
        e.g., store locks on disk.
    at-least-once (if we never give up)
      clients keep trying.  server may run procedure several times
      server must use application state to handle duplicates
        if requests are not idempotent
	but difficult to make all request idempotent
      e.g., server good store on disk who has lock and req id
        check table for each requst
      	even if server fails and reboots, we get correct semantics
  What is right?
    depends where RPC is used.
       simple applications: 
         at-most-once is cool (more like procedure calls)
       more sophisticated applications: 
         need an application-level plan in both cases
	 not clear at-once gives you a leg up
  => Handling machine failures makes RPC different than procedure calls

YFS RPC versus RPC in paper
  Both at-most-once
    Using the same technique (bind and exchange a nonce)
  Protocols differ
    YFS runs on reliable transport