6.824 2012 Lecture 4: RPC wrap-up; Plan 9

How does the lab RPC library use threads?

client classes:
  rpcc (one per server connection)
  connection (one per rpcc, may come and go)
  PollMgr (one per process, shared by all connections)

client threads:
  application threads
    waits for reply in call1
  PollMgr thread
    up-call to connection when readable
    up-call to rpcc:got_pdu when whole msg
    rpcc::got_pdu wakes up sleeping app thread
  note: no per-rpcc or per-connection threads

why not have app thread directly read the reply from the TCP conn?
  i.e. why the up-calls?

server classes:
  rpcs (one per service)
  tcpsconn (one per rpcs)
  connection (many per rpcs)
  ThrPool (one per rpcs)
  PollMgr (one per process)

server threads:
  tcpsconn, accept()s + makes connections
  PollMgr thread
    up-call to connection for incoming request
    up-call to rpcs::got_pdu
    enq() into ThrPool's fifo
  ThrPool's 10 workers
    deq() from fifo
    call rpcs::dispatch with msg

why ThrPool?
  why not fire up a new thread per RPC request?

Let's run through abbreviated RPC code in handout
  call1:
    why locks m_ at start/end, ca.m in middle?
    wait(ca.c)
  got_pdu:
    broadcast(ca.c)
    lock order, deadlock...
  dispatch:
    why the nonce check? what prob does this solve?
  when could INPROGRESS occur?
  when could FORGOTTEN occur?
    long delayed request
  what is the client nonce for?

RPC and mutexes may produce distributed deadlock
  suppose server s1's handler does this:
    lock m
    rpc call s2
    unlock m
  and server s2's handler does this:
    rpc call s1
  ThrPool makes nested RPCs dangerous even w/o mutexes
    imagine if pool was only one worker
  you will run into this in Lab 4
    lock server sends revoke RPCs back to clients
  lesson: don't even call RPCs from handlers!
    have handler queue work or change state, then return
    a background thread should send the RPC

checkduplicate_and_update(cn, xid, xid_rep, &b, &sz)
  must keep s[cn/xid] -- nil, INPROGRESS, DONE
    if DONE, also b+sz
  if s[cn/xid] == INPROGRESS
    return INPROGRESS
  if s[cn/xid] == DONE
    return DONE, return b and sz
  if xid <= previous xid_rep
    return FORGOTTEN
  else
    s[cn/xid] = INPROGRESS
    return NEW

  must also trim s[]
    discard s[cn/xid] if xid <= xid_rep
    and free buf

what must add_reply(cn, xid, b, sz) do?
  checkduplicate_and_update already set s[cn/xid] = INPROGRESS
  s[cn/xid] = DONE
  remember b and sz.

any final questions about tomorrow's lab?

***

Since writing a distributed application has a number of additional
challenges over sequential programming, it would be nice if there ways
to simplify it. First better RPC (distributed objects), then
distributed O/S (plan 9), next time MapReduce.

Why is RPC not sufficient? Let's look at YFS RPC (admittedly a bit
primitive, but nevertheless):
  programmer has to write stubs
  few data structures can be passed to the client or server
    for example, can you pass a C++ object?  
    a pointer and dereference it remotely?
  programmer must design a scheme for naming remote objects
  server must map names to objects
    locks: lockid_t
    extents: extendid_t

better object support: distributed object systems
  e.g. Network Objects, CORBA, Java RMI
  "remote object references"
    real object is on server
    client code written as if referencing local object
    method calls are actually sent to server
  object refs as RPC return values
    c = cartserver.create()
  RPC for object methods
    c.add(item)
  pass remote object to any server
    warehouse.ship(c)
  automatic location of object's server
    warehouse can do c.list()
  distributed GC

first a simple call/return (loosely based on Java RMI)
  server has some ordinary object o1
  sends it to the client
  client: o.fn("hello")
  what is o on the client?
  which server to send to?
  what object on server?
  what about "hello"?
  what does RPC message contain?
  how does server find the real object?

how about passing an object as an argument?
  o1.fn(o2)
  what must o2 look like in the RPC message?
    server host, object ID
  o1's server needs to cook up a stub object for o2
    where does it get stub type, implementation?

when can a server free an object?
  only when no client has a live reference
  server must somehow learn when new client gets a reference
  and when client local ref count drops to zero
  so clients must send RPCs to server to note first/last knowledge

are network objects useful?
  could YFS use them?
  automatic stub generation would be good
    it's easy for client and server to disagree in our labs
  passing refs among hosts might very cool
    automatically track server location
    useful if it's not obvious what the right server is
    e.g. if > 1 lock_server
  hard to guess about GC
    local GC certainly hugely convenient
    but remote objects are often persistent
    files, shopping carts -- probably really live in a DB

retrospective on RPC
  more convenenient than direct socket programming
  marshaling, stubs, appearance of function call is nice
    but not life-changing
  but *not* like local call -- not transparent
    failure
    no shared memory
    slow
    introduces concurrency
  most of the value is in avoiding network programming mess
    *not* in the analogy to procedure calls

****

Why are we reading the Plan 9 paper?
  we're interested in infrastructure for distributed computing
  just talked about RPC -- pretty low level
  today and next few lectures will be about higher-level infrastructures
  this is about architecture and research style, not techniques

idea: distributed operating system
  analogy: single-machine o/s great platform for apps
    takes care of scheduling, storage, mem mgt, security, &c
    universal platform for phone/laptop/server/supercomputer
  why not distributed o/s as infrastructure for distributed systems?
  many projects in 80s/90s: plan9, amoeba, locus, v, ...
  common approach:
    pick a unifying abstraction
    use it to unify remote and local interaction -- transparency
    examples: IPC, dsm, RMI, files (plan 9)

Who are the authors?
  same bell labs group that invented UNIX in the 1970s
  values:
    simplicity
    tools that work together (pipes, grep, ascii files)
    file-centric (/dev, stdin)
    use what you make -- but don't solve probs you don't have
  they liked the single-machine time-sharing environment
    fostered cooperation, community
  unhappy with 80s isolated PC/workstation model

Big goals?
  computing environment for programmers, researchers
  use modern workstation/server/network hardware
  regain collaborative feel of single time-shared machine
  avoid per-workstation maintenance / config

Sacrifices?
  willing to take years, little commercial/publishing pressure
  willing to tear up existing s/w
    if needed to get the *right* design
    this is a big deal in practice -- POSIX compatibility is a bummer
  willing to pool money to buy shared infrastructure
  willing to all play the same game (not e.g. everyone chooses own O/S)

What did the Plan 9 system look like?
  [diagram]
  lots of cheap "terminals"
    cpu/mem/keyboard/display/net
    maybe no disk
    standard Plan 9 software
    only for interactive s/w (editors) not e.g. compiler
    sit down at any -- log in -- looks the same!
  LAN
  expensive compute servers
  file server
  (not much new at diagram level)

The new part is the O/S design

Unifying design principles:
  Everything is a file
  One protocol
  Private, malleable name spaces

Everything is a file
  devices (just like UNIX)
  network (write "connect 18.26.4.9!23" to /net/tcp/0/ctl)
  graphics windows /dev/cons, /dev/mouse
  /proc/123/mem (ps, debuggers)

Why is "everything a file" a good idea?
  one set of utilities (ls, cat, mount) manages lots of resources
    vs per-subsystem system calls, protocols, formats, &c
  less duplication of effort
    each kind of thing doesn't need its own naming, protection, &c
  analogous to UNIX idea of ASCII formats, stdin/out
    allows general tools like grep, awk, and combinations (e.g. pipes)
  easy to replace / interpose

Why might "everything a file" be a *bad* idea?

Only one protocol -- 9P
  protocol needed to access network file servers &c
  system call -> 9P -> network -> user-level server
    (next lab's use of fuse is a bit like this)
  9P is file-oriented: open, read, write, &c
  can mount a 9P server anywhere in local name space, e.g. /foo
  all servers speak 9P -- files, windows, names, network, ftp

Why is "only 9P" a good idea?
  need some protocol make "everything a file" work across machines
  wasteful to have file server, graphics server, &c to have different protocols
  9P replaces a host of specialized protocols
    since all services appear as files,
    all can be accessed remotely via 9P
  no need for special per-service protocols
    example: no need for special ssh/telnet protocol
    instead of ssh from h1 to h2,
    mount h1's /dev/cons on h2's /dev/cons,
    fire up a shell on h2

Why might "only 9P" be a *bad* idea?

Private, malleable name spaces
  easy for processes to "mount" directories, files
  intention is that users customize to make it easy to find their resources
  conventions prevent chaos
    /dev/cons (my terminal)
    /dev/mouse
    /bin/date (executable for my architecture)

Why customizable namespaces a good idea?
  remote exec on compute server can reproduce entire environment
    all resources via files + mimic local names
  re-create someone else's environment, for debugging
    different s/w versions, perhaps from backup snapshots

Why might customizable per-user namespace be a *bad* idea?
  i.e. why not do it like UNIX -- all users seem same file namespace?

The three principles work together
  Everything is a file + share with 9P => can share everything
    e.g. mount cpu server's /proc locally, debug remote program
  Remote execution can duplicate local environment 
    cpu command 
    mounted files, devices, windows, &c
    contrast to ssh, where remote machine may be very different

Other Plan 9 ideas (some of which other systems now have)
  /proc (really from UNIX 8th edition)
  utf-8
  backups via snapshot to worm
  rfork

What's missing?
  as user computing environment?
    good support for disconnected laptops? iPads?
    would we want it in preference to athena?
  as an infrastructure for building distributed systems?
    i.e. if you are google or facebook
    no story for fault tolerance?
    no story for big computation?
    no story for scalable storage, services?

(based on notes by Russ Cox)