6.824 Lecture 1: Introduction and lab overview

Introduction
  engineering distributed systems
    abstractions, implementation techniques, etc.
  lectures on general ideas, paper reading for case studies
  you'll build real systems (yfs, inspired by Frangipani)
  why take the course?
    synthesize many different areas in order to build working systems
    Internet has made area much more attractive and timely
    hard/unsolved: not many deployed sophisticated distrib systems
  background: 6.033 ideas and readings
    much more depth: build a real distributed system

What is a distributed system?
  multiple computers
  interconnections
  cooperate to provide some service
  [Example: Athena, Akamai CDN, Google file system, TinyDB, ...]
  [Counter example: shared UNIX time-sharing system]

Why care about distributed computing?
  conquer geographic separation
    Web
  build reliable systems out of unreliable components
    Argus, BFT, Frangipani
  aggregate many computers for high capacity
    aggregate cycles+memory: (ThreadMarks, Dryad)
    aggregate bw (Coral, Shark)
    aggregate disks (Frangipani)
  isolate computers for specific tasks
    authentication server
    backup server

Challenges
  system design
    what does the client do, what does the server do? which servers?
    what are the right protocols?
    for example, today: frangipani versus NFS
  consistency
    shared data with multiple readers and writers
  failures
    communication and hardware
    how do you tell the difference?
    which node is the last one to fail?
  security
    adversary may compromise machines or manipulate messages
  network properties
    sensors net
    clusters
    wide-area
  implementation
    concurrency with servers and clients
    network is often scarce resource
    avoid disk writes       
  
  easy to make distributed system less scalable, less reliable,
  etc. than a centralized system.  (Lamport's definition: a
  distributed system is one where a computer you don't know about
  renders your own useless.)

6.824: distributed *systems*
  infrastructure for building servers and clients: threads+RPC
  distributed programming
  consistency
  fault tolerance
  peer-to-peer/decentralized
  security
  case studies
  [continous thread: system design]

Example:
  how to build a distributed file system?  say AFS.
  users should have same view of the file system and be able to share files

Simple System Design
  One server w/ disk to store directories and files
  [picture: file server, "clients", read file, write file, create,
  etc.]

Topic: implementation
  Server serves multiple clients concurrently
    To get good performance out of disk
  Concurrency challenges:
    Server threads may modify share state (e.g., file cache) concurrently.
      Avoid Race conditions!
    One thread's request may depent on another thread's.
      Avoid deadlock and livelock!

Topic: consistency & protocol design
  What if user operations require multiple file system operations?
    If I move a file from one directory to another, does another use
       see intermediate states?
    What if two users move a file to the same destination directory?
  To offload network and server probably cache files at client
    When does a write on a client become visible to other clients?
    Do we want it to behave like a single-machine file system?

Topic: system design
  What happens if your file server must serve a large community (say MIT)?
  What if more clients than one server can handle?
  What if more users than one server can store files and directories?
  How to use more servers to handle more clients?
  Idea: partition users across servers
    How to do load balance of users?  Statically? Partition name space?
    What if some user need suddenly a lot more space?
    What if some files are much more popular than others?

Topic: fault tolerance
  Can I get to my files when some servers are down or network fails?
  Yes: replicate the data.
  Problem: replica consistency. delete file, re-appears.
  Problem: physical independence vs communication latency
  Problem: partition vs availability.  airline reservations. [[???]]
  Tempting problem: can 2 servers yield 2x availability AND 2x performance?

Topic: security
   Internet provides global exposure to random attacks from millions
     of bored studentsand serious hackers, e.g. intrusions for spam bot nets
   How does the server know that a request is from me?
   How is the file server protected?
   How much do i have to trust the system administors?
   Etc.

Topic: system design
   What if we want to provide an Internet file system?
     aggregate all computers in a gigantic file system
   How do we need to do this? 
     more failures, longer delays, multiple administrative domains

We want to understand the individual techniques, and how to assemble them.

COURSE STRUCTURE

All announcements and info at http://www.pdos.csail.mit.edu/6.824
 look at the first lab, due in a week

Meetings: 1/2 lectures, 1/2 paper discussions (or lab help)

Research papers on working systems, starting today
  must read papers before class otherwise boring, and you can't pick
    it up by listening
  in future, we will post paper questions 24 hours in advance
  hand in answer on paper in class, one or two paragraphs

Two class-length quiz (one in final's week, sorry!)

Labs: build a real cluster file system a la Frangipani
 Labs are due on Thursdays

Project: extend lab in any way you like.
short paper
demo in last class meeting

David Schultz is TA, office hours on Web.

LAB: YET ANOTHER FILE SYSTEM (YFS)

Lab is inspired by Frangipani, a scalable distributed file system (see
paper).  You will build a simplified version of Frangipani.
  
Frangipani goals
  aggregate many disks into a single shared file system
  capacity can be incrementally added
    without stopping the system
    bricks for storage
  good load balance of load/users across disks
    no manual assignment
  tolerates and recovers from machine, network, and disk failures
    without operator intervention
  consistent backups while running the system

Frangipani design
  Each machine runs a Petal server, a Frangipani Server, and a lock service
    this is not strictly necessary, but simplifies thinking about the system
  Petal aggregates disks into one big virtual disk
    interface: put and get
    replicates blocks/extents
    add petal servers to increase storage capacity and throughput
  Frangipani file server
    servers file system requests and uses shared petal to store data
      inodes, directory, data blocks all stored on petal
      all servers serve same file system
    add servers to scale up Frangipani
  Lock server
    file servers uses lock server to provide consistency
      e.g., when creating a file, lock directory
      etc.
    locks are also replicated
  No security beyond traditional file system security
    intended for a cluster, not wide-area

Contrast: NFS (or AFS)
   Clients relay file system calls to server
     clients are simple; they don't implement a file system
     for performance client may implement cache (or use OS file system cache)
   Server implements the file system and exports its local disks
   How do you scale a NFS server?
     buy more disks
     partition file systems over servers
     or, buy file server "mainframe"
   How do you make a NFS server fault tolerant?
     buy RAID system or NAS

YFS
  Same basic structure as Frangipani, but single extent server
    i.e. you don't have to implement Petal.
  Draw picture
    yfs_client (interfaces with OS through fuse)
    extent_server
    lock_server
  Each server written in C++ and pthreads, our own RPC library
    Next two lectures cover infrastructure in detail
  Labs: build YFS incrementally
    L1: simple lock server
      RPC semantics
      Programming with threads
    L2: yfs_server
      basic file server  (no sharing, no storage)
    L3: yfsclient + extent_server
      reading/writing
    L4: yfsclient + extent_server + lock_server
      mkdir, unlink, and use lock server (sharing)
    L5: caching lock_server
      protocol design
    L6: caching extent_server
      consistency using lock_server
    L7: paxos library
      agreement protocol
    L8: fault tolerant lock server
      replicated state machine using paxos
    L9: your choice

Lab 1: simple lock server

- let's look at handout (the tar ball from web site)