6.824 2013 Lecture 16: Transparency, Network Objects, Plan 9

today:
  transparency
  remote objects
  Plan 9

we've seen many results of "the hard part of distrib computing is Z,
  let's design a technique/system to simplify Z"

examples:
  RPC -- communication
  FDS/Spanner/&c -- storage
  Paxos -- replication, fault tolerance
  Argus -- atomic transactions
  DSM -- application programming

common thread: transparency
  make remote look the same as local
  e.g. FDS makes data available on any client, regardless of location

some systems target transparency directly
  often via naming -- can hide local/remote inside names
  example: network objects
  example: distributed operating systems, Plan 9

Network Objects

Why is standard RPC not very transparent?
  arguments are passed by copying
    so it's not the same object!
  example:
    p = printers.find(printname)
    f = fs.open(filename)
    q = spooler.find(prio)
    q.add(p, f)
      if p.idle():
        p.go(f)
      else: ...
  makes sense locally, but not with e.g. Go RPC
    e.g. if client and three servers (printer, FS, spooler)
    passing p and f won't work correctly
  work-around: pass names
    but then all modules have to explicitly know how to find named objects
    e.g. can't cook up a new Printer object that prints to local buffer

better object support: network objects
  e.g. DEC Network Objects, CORBA, Java RMI
  "remote object references"
    each object is on some home server
    home gives object references to other machines
    any machine can use object reference in the usual way
      call method, pass as RPC argument, &c
    language runtime transparently forwards method calls to home server
  object refs as RPC return values
    c = cartserver.create()
  RPC for object methods
    c.add(item)
  pass remote object to any server
    warehouse.ship(c)
  references directly useable on any server
    warehouse.ship(c):
      c.list()

how do network objects work?
  object reference is serverID+objID
  if server sends a local object to another machine
    (in RPC reply value, including if inside e.g. returned list/hashtable/&c)
    home server table mapping objID to object pointer
    creates a new objID
    actual reply value has serverID+objID, not pointer
  when a machine sees a remote object in an RPC reply
    create a local "stub" object
    stub has all the right methods
    and a slot containing serverID+objID
    RPC library returns pointer to stub object
  when a machine calls a method of a remote object
    really calling a method on the stub
    stub method implementations knows serverID, forward the method call
  when a machine passes a remote object reference in an RPC
    even if it's not the home, must include serverID+objID
  note: all RPC calls are object method calls in this style

this is all about naming
  a local pointer is a name, but not useful remotely
  so introduce level of indirection
  map serverID/objID to server and local pointer

when can a server free (garbage collect) an object?
  only when no client has a live reference
  server must learn when new client gets a reference
  and when client local ref count drops to zero
  so clients must send RPCs to server to note first/last knowledge

are network objects useful?
  if you have lots of servers that interact
    they can eliminate lots of complexity
    program can use obj refs in usual way, rather than names
  but:
    performance -- can't directly use obj data, always remote methods
    fault tolerance -- not clear how to cope w/ crashed server, dangling refs
    persistence -- can't write an object ref to disk

****

Why are we reading the Plan 9 paper?
  it's higher-level infrastructure for distributed computing
    RPC, DSM, storage are pretty low level
  this is about architecture and research style, not techniques
  and it's a story about using naming to gain transparency

idea: distributed operating system
  single-machine O/S very successful
    takes care of scheduling, storage, mem mgt, security, &c
    universal platform for workstation/server/supercomputer
  why not distributed o/s as infrastructure for distributed systems?
  many projects in 80s/90s: plan9, amoeba, locus, v, ...
  common approach:
    pick a unifying abstraction
    use it to unify remote and local interaction -- transparency
    Plan 9: make everything look like a file system

Who are the authors?
  same bell labs group that invented UNIX in the 1970s
  values:
    simplicity
    tools that work together (pipes, grep, ascii files)
    file-centric (/dev, stdin)
    use what you make -- but don't solve probs you don't have
  they liked the single-machine time-sharing environment
    easy to share, easy to administer
    fostered cooperation, community
  unhappy with 80s isolated PC/workstation model

Big goals?
  computing environment for programmers, researchers
  use modern workstation/server/network hardware
  regain collaborative feel of single time-shared machine
  avoid per-workstation maintenance / config

Sacrifices?
  willing to take years, little commercial/publishing pressure
  willing to tear up existing s/w
    if needed to get the *right* design
    this is a big deal in practice -- POSIX compatibility is a bummer
  willing to pool money to buy shared infrastructure
  willing to all play the same game (not e.g. everyone chooses own O/S)

What did the Plan 9 system look like?
  [diagram]
  lots of cheap "terminals"
    cpu/mem/keyboard/display/net
    maybe no disk
    standard Plan 9 software
    only for interactive s/w (editors) not e.g. compiler
    sit down at any -- log in -- looks the same!
  LAN
  expensive compute servers
  file server
  (not much new at diagram level)

The new part is the O/S design

Unifying design principles:
  Everything is a file
  One protocol
  Private, malleable name spaces

Everything is a file
  devices: mouse, audio, kbd, tape drive
  network: write "connect 18.26.4.9!23" to /net/tcp/0/ctl
  graphics windows: /dev/cons, /dev/mouse, /dev/bitblt
  process control: /proc/123/mem (ps, debuggers)
  backups
  ftp client
  /dev/time
  cs -- their DNS server

Why is "everything a file" a good idea?
  one set of utilities (ls, cat, mount) manages lots of resources
    vs per-subsystem system calls, protocols, formats, &c
  less duplication of effort
    each kind of thing doesn't need its own naming, protection, &c
  potential for tools that work together
    like UNIX shell pipes
    grep emacs /proc/*/cmdline
  files/directories are nice for organizing and naming
  you can implement remote file access

Why might "everything a file" be a *bad* idea?

Only one protocol -- 9P
  (as opposed to every service has a different RPC interface)
  protocol needed to access network file servers &c
  system call -> kernel -> 9P -> network -> kernel -> user-level server
    (fuse lets you do this)
  RPCs: open, read, write, close, walk; names and FIDs
    an FID is like a file descriptor
    why FIDs rather than i-numbers?
    FIDs imply server state -- fault tolerance, crash recovery
  can mount a 9P server anywhere in local name space, e.g. /foo
    kernel maintains mount table: local name -> network connection
  all services speak 9P -- files, windows, names, network, ftp

Why is "only 9P" a good idea?
  need some protocol make "everything a file" work across machines
  9P replaces a host of specialized protocols
    since all services appear as files,
    all can be accessed remotely via 9P
  no need for special per-service protocols
    example: no need for X protocol
    mount workstation's /dev/mouse, /dev/bitblt on remote server
    graphics apps just r/w those files, 9P takes care of remoteness

Why might "only 9P" be a *bad* idea?

Private, malleable name spaces
  most machines have a single namespace, all processes see same namespace
  Plan 9 does not -- each process creates its own name space
  easy for processes to "mount" directories, files
  intention is that users customize to make it easy to find their resources
  conventions prevent chaos
    /dev/cons (my terminal)
    /dev/mouse
    /bin/date (executable for my architecture)

Why customizable namespaces a good idea?
  remote exec on compute server can reproduce entire environment
    all resources via files + mimic local names
    mouse, display, audio, home directory, private files on "local" disk
  re-create someone else's environment, for debugging
    different s/w versions, perhaps from backup snapshots

Why might customizable per-user namespace be a *bad* idea?
  i.e. why not do it like UNIX -- all users seem same file namespace?

The three principles work together
  Everything is a file + share with 9P => can share everything
    e.g. mount cpu server's /proc locally, debug remote program
  Remote execution can duplicate local environment 
    sit in front of a Plan 9 terminal
    cpu server command 
    starts a local exportfs
      a 9P server, turns open &c requests into local system calls
    on the server:
      starts a process
      mounts the exportfs as the process's /
        so *everything* is the same as on the terminal: devices, local disk, windows, &c
      special case for /bin
      special case for main file server
    contrast to ssh, where remote machine may be very different
    other users of same compute server may see totally different file name space

Other Plan 9 ideas (some of which other systems now have)
  /proc (really from UNIX 8th edition)
  union mounts
  utf-8
  backups via snapshot to worm
  rfork

Is Plan 9 the right thing for end-user computing?
  attractive for time-sharing enthusiasts
  high-powered PCs made the shared file/compute server less compelling
  laptops made the reliance on servers unattractive
  the Web totally changed what people used computers for
    collab programming &c -> access to Web services

For distributed systems infrastructure?
  i.e. if you are google or facebook
  fault tolerance?
  scalable storage, services?
  big data computation?

(based on notes by Russ Cox)