6.824 Lecture 4: Distributed Programming: Distributed objects

long-delayed rpc code walk-through
  think about questions you might have about lab1
  call1:
    msg: xid proc cn sn xid_rep args...
    xid_rep_window_.front()?
    update_xid_rep(xid)?
  got_pdu:
    lock order, deadlock...
  dispatch:
    why the nonce check? what prob does this solve?
  when could INPROGRESS occur?
  is FORGOTTEN possible? how?
    long delayed request
    (rebooted server is dealt w/ separately)
  what is the client nonce for?

any questions about the lab due tomorrow?

Since writing a distributed application has a number of additional
challenges over sequential programming, it would be nice if there ways
to simplify it. Today we explore such a design for distributed
programming: distributed objects, which is targeted to writing
client/server applications.

Outline:
  Why is RPC not sufficient?
  Distributed objects goals
  Java RMI

Why is RPC not sufficient? Let's look at YFS RPC (admittedly a bit
primitive, but nevertheless):
  programmer has to write stubs
  few data structures can be passed to the client or server
    for example, can you pass a C++ object?  
    a pointer and dereference it remotely?
  programmer must design a scheme for naming remote objects
  server must map names to objects
    locks: lockid_t
    extents: extendid_t

how to get better object support?
  two basic approaches:
  1. pull object to caller
  2. push call to object
  which is the best plan?

distributed object systems
  e.g. Network Objects, CORBA, Java RMI
  "remote object references"
    real object is on server
    clients send method calls to server
  object refs as RPC return values
    c = cartserver.create()
  RPC for object methods
    c.add(item)
  pass remote object to any server
    warehouse.ship(c)
  automatic location of object's server
    warehouse can do c.list()
  distributed GC

first a simple call/return
  o = ???;
  o.fn("hello");
  which server to send to?
  what object on server?
  what about "hello"?
  what does RPC message contain?
  how does RMI s/w on server gain control? thread...
  how does server find the real object?
  where does server-side dispatch fn come from?

what does a stub object look like?
  type?
  contents?
  where did it come from?

is "hello" sent as a remote object ref?

how about passing an object as an argument?
  o1 = ???;
  o2 = ???;
  o1.fn(o2);
  what must o2 look like in the RPC message?
    server host, object ID
  what if o1's server already knows about o2?
    how do we know if two objects are the same?
    must have a table mapping object ID to ptr to o2
  what if o1's server does not know about o2?
    where does it get stub type, implementation?
    can stub stuff be generated purely by client?

there are probably type IDs, so client can re-use stub code
  an object ID must contain type ID, or an RPC to fetch it
  clients and servers must have tables mapping type ID to stub code

client table?
  hostID/objID, local ref, type, ref count (really in stub obj)

host table?
  objID, local ref, client count

when can a server free an object?
  only when no client has a live reference
  server must somehow learn when new client gets a reference
  and when client local ref count drops to zero
  so clients must send RPCs to server to note first/last knowledge
  what if C1 passes to C2, C1 sends de-ref RPC before C2 sends ref?

what if a client crashes?
  will server ever be able to free the object?

what if a server crashes?
  will client object refs work after restart?
  this is something we'll want for YFS!

RMI restrictions
  equals, hash, locks defined on local ref, not obj content
  no fields
  all remote, no way to do some local
  hard choice between copying whole object graph vs remote

what can client do with RMI's remote method exception?
  when will RMI throw the exception?
    network down for a while, timeout
    server down for a while, timeout
    obj ref not valid (server GC'd or rebooted?)
    let's assume timeout
  r/o queries:
    display error to user
    retry at different server, e.g. DNS
  write queries:
    problem: we don't know if server executed
    amazon order: display error to user
      user can go back and check list of orders
      or look at credit card statement
      log! audit!
  what if no user?
    what exactly is the problem in this case?
    usually multi-step operation
      e.g. debit+credit
    each step might be an RPC
      an RPC fails in the middle
    use distributed transactions (two-phase commit)
    or some kind of compensation

are network objects useful?
  could YFS use them?
  automatic stub generation would be good
    it's easy for client and server to disagree in our labs
  the o.method() syntax is nice but not amazing
    little better than method(oid)
  passing refs among hosts might very very cool
    automatically track server location
    but often you know what the relevant server is
    e.g. you know lid is a lock, easy enough to pass to lock srvr
  hard to guess about GC
    local GC certainly hugely convenient
    but remote objects are often persistent
    files, shopping carts -- probably really live in a DB

retrospective on RPC
  more convenenient than direct socket programming
  marshaling, stubs, appearance of function call is nice
    but not life-changing
  but *not* like local call
    failure
    no shared memory
    slow
  big vision basically failed