6.824 2005 Lecture 1: Introduction and RPC(1)

Opening
  building distributed systems
  flexible construction, robustness, high performance
  lectures on design, paper reading for case studies
  you'll build real systems
  why take the course?
    synthesize many different areas in order to build working systems
    Internet has made area much more attractive and timely
    hard/unsolved: not many deployed sophisticated distrib systems

Example:
  how to build HotMail?
  mail arrives from outside world
  store it until...
  user's Outlook/Eudora reads/deletes/saves it

Simple Solution:
  One server w/ disk to store mail-boxes
  [picture: MS, sending "clients", reading clients]
  Lots to talk about even w/ simple client/server system

Stable performance under high load
  Example: Starbucks.
    5 seconds to write down incoming request. 10 seconds to make it.
    [graph: x=requests, y=output]
    max thruput at 4 drinks/minute.
    thruput goes to zero at 12 requests/minute.
  Efficiency *decreases* with load -- bad.
  Careful system design to avoid this -- flat line at 4 drinks.
    Peets, for example.
  Better: build systems whose efficiency *increases* w/ load
    w/ e.g. batching, disk scheduling

Issue: scalable performance
  What if more clients than one server can handle?
  How to make use of more servers?
  Load balance must be even for good parallel speedup
  Will dividing the load be easy?
    Easy case: another espresso machine and operator
    Medium: split mailboxes across servers (separate state, redirects)
    Hard: detect spam by looking for duplicate messages

Issue: high availability
  Can I get at my HotMail mailbox if some servers / networks are down?
  Yes: replicate the data.
  Problem: replica consistency. delete mail, re-appears.
  Problem: physical independence vs communication latency
  Problem: partition vs availability. airline reservations.
  Tempting problem: 2 servers, 2x availability, 2x performance?

Issue: security
  old view: secrecy via encryption (msg to Moscow embassy)
            user authentication via passwords &c
            all parties know each other!
  Internet has changed focus.
  sit at Internet cafe, give credit card number to Amazon
  was that really Amazon? who is "Robert Morris"?
  but I don't know "who" Amazon is.
  no purely technical approach is likely to solve this problem

We want to understand the individual techniques, and how to assemble them.

--------------

Course structure
  URL
  meetings: 1/2 lectures on fundamentals, 1/2 reading discussions
  research papers on working systems
  must read papers before class
    otherwise boring, and you can't pick it up by listening
  we will post paper questions 24 hours in advance (one waiting now)
  hand in answer on paper in class, one or two paragraphs
  two in-class quizzes (no final)
  Labs: build real servers.
  Project. look at the project information page!
    teams
    proposal
    conferences
    build the system
    demo
    report
  Athicha is TA, office hours TBA
  Don't forget:
    sign up for course machine accounts
    look at the first lab, due in a week
    read paper for thursday, answer question

--------------

First few lectures: crash course in server design
  partially to explain lab infrastructure

Lab goal: scalable distributed file server
  you have lots of clients, as in Athena
  each mostly working in home directory
  but some shared files as well
  standard design: single sophisticated server

[draw picture of the eventual lab s/w: app, NFS, FS, lockd, blockd]
  [multiple client hosts]
  block server a lot like a disk, just read/write

First lab: logic of lock server
  FS must lock file while updating
    what if two people try to create same file at same time?
    one must fail...
  really a warm-up, we give you framework

First few lectures and papers:
  remote procedure call
  NFS
  server structure

Topic: Remote Procedure Call (RPC)
  Tools to help us divide up programs onto multiple machines
  Big goal: transparency!

Suppose you start with a simple program
  Example 1
  Two modules: application, and database (reserve() and change())
  OK if only one ticket agent in the whole world...

Now you want to split this up:
  A single server running the database.
  Lots of client hosts talking to it over the network.

Could just change the s/w:
  Get rid of function calls.
  Read and write network connections directly.
  Example 2.

Network I/O is awkward!
  Nowhere near as convenient as function calls.
  Compiler automates function call:
    Type checking, placement of args on stack.
  Can we extend automation to cross-network inter-module interfaces?

Remote Procedure Call: The Idea
  Look at Example 3.
  Two pieces: "client stubs", "server stubs".
  Client stubs pretend to be the original server functions.
  Server stubs pretend to be the client functions.
  Thus Example 1 code can be used un-changed!
  Most of the work: "marshalling" and "un-marshalling."
  Note that request message format includes "procedure number".
    And that server stub code "dispatches" on proc num.
  Now the programmer can make the same client reserve() call,
    and implement reserve() on the server the same way,
    and the RPC machinery takes care of the communication.

What are the potential benefits of RPC?
  Transparent distributed computing
    Existing programs don't need to be modified
    Can write s/w that's location-independent
  Enforces well-defined interfaces
  Allows portable interfaces
    Plug together separately written programs at RPC boundaries
    e.g. NFS and X clients and servers

What does an RPC system consist of?
  Will base explanation on SUN RPC, which NFS paper mentions.
  1. Standards for wire format of RPC msgs and data types. XDR and RPC.
  2. Library of routines to marshal / unmarshal data.
  3. Stub generator, or RPC compiler, to produce "stubs".
     For client: marshal arguments, call, wait, unmarshal reply.
     For server: unmarshal arguments, call real fn, marshal reply.
  4. Server framework:
     Dispatch each call message to correct server stub.
  5. Client framework:
     Give each reply to correct waiting thread / callback.
  6. Binding: how does client find the right server?  

What does a Sun RPC request contain?
  all 32 bits. this is wire format.
  UDP header...
  xid
  call/reply
  rpc version
  program #
  program version
  procedure #
  auth stuff
  arguments

Of main interest: xid, prog#, proc#
  Server dispatch uses prog#, proc#
  Client reply dispatch uses xid
    Client remembers the xid of each outstanding call

Authentication fields
  An attempt to do cryptographic security at RPC level
    Transparent to application code
  Turns out not to work well
    What "security" means is too app-dependent
    Authenticate user? host? data?
  Typically just holds your numeric UNIX user id, not verification at all

Marshaling arguments
  "Linearize" data scattered in memory into byte stream
  "Externalize" data representation so it is portable
  Formats defined by XDR standard
  Easy for e.g. int -- same representation, though portable byte order...
  Arrays and strings? Prepend a length.
  Pointers? Follow them? How much data does a char * point to?
    May be unclear how to efficiently linearize e.g. hash table.
    What if circular pointers? I.e. representing a graph structure?
    Need programmer or language support for this.

What needs to be in an RPC reply?
  xid
  call/reply
  accepted? (vs bad rpc version, or auth failure)
  auth stuff
  success? (vs bad prog/proc #)
  results
  
How does the stub generator work?
  You give it a description of the procedure calls and arg/res data types.
    Sun defines a C-like standard.
    See 2nd page of handout.
  It produces:
    Routines to marshall / unmarshall.
    Routines to read/write call on the wire.
    Maybe client / server stubs.

What does the client framework do?
  Keeps track of outstanding requests.
    For each, xid and caller's thread / callback.
  Matches replies to caller.
  Might be multiple callers using one socket. NFS client in kernel.
  Usually handles timeing out and retransmitting.

What does the server framework do?
  Need a context in which to execute the procedure.
  In a threaded system:
    Create a new thread per request. Master thread reads socket[s].
    Or a fixed pool of threads, and a queue if too many requests. NFS srvrs.
    Or just one thread, serial execution. Simplifies concurrency. X srvr.
  Key feature: support for concurrent RPCs
    If one RPC takes multiple blocking steps to compute,
      Can I serve another one in the meantime?
    For example, DNS. Service routine is an RPC client.
    May also avoid deadlock if I send RPC to ... to myself
  In an async programming system:
    Callback registered per prog#/proc#.
    (Rather than per socket. fdcb() calls un-marshaling function).

What about binding?
  Client needs to be able to talk to the correct server
    It needs an IP address
    Use DNS.
  Client knows RPC prog #, needs to know server's TCP/UDP port #
  Could use a well-known port: NFS uses port 2049
  Could use a "port mapper" per server
    server programs register prog#/port with port mapper
    clients can ask for port, given prog#
    avoids dedicating scarce port #s to services

Conclusion
  Automatic marshaling has been a big success
  Mimicing procedure call interface is not that useful
  Attempt at full transparency has been mostly a failure
  But you can push this pretty hard: Network Objects, Java RMI