6.824 2001 Lecture 12: Distributed Operating Systems

The operating system is one of the great success stories of CS.
  Everybody uses them.
  Provide useful abstractions for programming single-machine systems.
  Why not extend the idea to multi-machine systems?

What's the distributed O/S vision?
  All the convenience and unity of a single timesharing machine.
    E.g. you type "ps" and see your processes on all workstations
  But implemented on a LAN of workstations.
  So higher performance:
    Example: my process are automatically run on idle machines.
    But I can list, kill, debug them.
    And my processes work the same as they would on single machine.
      See the same files, pipes, devices, processes, &c

The specific goal here is transparency.
  A distributed operating system should make a collection of computers
  behave like a single computer with better performance and
  reliability.

Transparency can be classified in different categories:
  1. Location transparency---Users cannot tell from the name where
     resources are located.
  2. Migration transparency---Resources can move at will without
     changing their names.
  3. Replication transparency---The users cannot tell how many copies exist.
  4. Concurrency transparency---Multiple users can share resources
  5. Parallelism transparency---Activities might happen in parallel
     without users knowing it.

Naming turns out to be particularly important
  We often do the location binding during name lookup
    And return a special handle bound to actual location
  Also we often do access checks during name lookup

List of names for things that need to be resolved transparently?
  File names
  Device names (printer, X display)
  File descriptors
  Process IDs
  User/group IDs
  Memory addresses -- particularly for shared memory
  TCP/UDP port numbers

List of object types we need to be able to use transparently?
  Files - create, delete
  File blocks - read, write
  Directories - get a listing
  Devices - rewind tape, scroll region of graphics display
  Processes - create, kill, debug
  Pipes - read, write
  Sockets - create, connect, accept
  Memory pages - allocate, load, store, map

If we got all this right, would process migration be relatively easy?

We already know about transparency for some names/objects.
  Network file systems.
  X window protocol.
  Are they fully transparent?
  Is it easy for me to move a file from one server to another?

How would we implement, say, location-transparent PIDs?
  Process is running on a machine, so embed machine's IP addr in PID?
  What if the process moves to a different machine?
  Maybe we need a process location database?
    PID just an index into that database.
    Update the database when the process moves.

How we would implement location-transparent shared memory?
  I.e. distributed shared memory.
  Need consistency.

Could we make a universal name binding system?
  To avoid re-implementing one for each type of name?
  Would probably require separation of naming from object access.
    In contrast, NFS ties them together. LOOKUP is an NFS op.
  Where is PID/page/socket/display XXX?

Could we make a universal object access protocol?
  Perhaps modeled on file descriptor I/O?
  Might give us UNIX stdio style composability.
  And would save us from having a huge number of slightly different protocols.
  Maybe, but there are subtly different I/O models.
    Assume we want to agree on one particular network protocol.
    Files: read and write send RPC to single server.
    Pipes: does the write do the RPC (to reader)?
      Or does the read do an RPC to the writer?
      What if multiple readers? (on different machines)
        cat script | ( sleep 10; /bin/tcsh )
      What if multiple writers?
        ( date ; ls ) | tee ls.out

How transparent is Athena?
  Yes: files, users, e-mail, X windows, printers.
  No: sound card, process IDs, TCP port numbers.