6.824 2007 Lecture 17:  Security: integrity and confidentiality

introduction
  start of a series on lecture involving security requirements
  today
    data integrity and confidentiality
    case study: shark

security:
  some of the parties in the system are malicious (adversaries)
    e.g., proxies in shark might try to steal data or modify it
    important in security to state what adversaries can do and not do
      the attack model
      are we protecting against the NSA stealing data with big budget
         or a 16-year hacker who wants to steal music with a small budget?
      do we assume the OS is secure?
         do we rule out some class of attacks?
  despite adversaries we may want to achieve some objectives
    common objectives: integrity, confidentiality, access control, etc.
  security is hard because designer must achieve negative goal

Example of a secure system design:
  Secure telephone between FBI and CIA
  Goal: only people in FBI and CIA buildings allowed to know
  Plan:
    Radiation-proof buildings
    Only one entrance/exit per building
    Armed guards at entrances
    Guards check ID cards and record all in/out
    Pressurized shielded cable between two buildings
    No other cables allowed to leave the buildings
    Pass laws to punish people who reveal government secrets
    Invite NSA to try to steal the messages
    Send fake information, spy on the KGB to see if they learn that info

Note that we're unlikely to achieve perfect security
  No individual technique is perfect
    Pressurized cable only raises attacker's cost
    You can't completely shield a building
    Anyone can be bribed or blackmailed
  We could meet our stated goal but goals could be inappropriate
    Maybe FBI and CIA have uncleared visitors in the building
    Maybe some employees leave the building and talk in their sleep
      Why don't we forbid employees from leaving the building?

General methodology for building secure systems
  Figure out what you are trying to protect and how much it's worth
  Figure out what attacks you're trying to defend against
  State goals or desired properties clearly
    Not "impossible to break"
    How about "attack X on resource Y would cost $Z dollars"
  Structure system so it has two kinds of components:
    Trusted and un-trusted
  Build secure perimeter around trusted components
    Perhaps using physical, legal, or cryptographic techniques
  Minimize the size of the trusted components
    "Trusted" means "could betray you"
    Maybe we should have built a secure *room*, not building
  Analyze the resulting system, and monitor its success

What really works?
  Avoid exposure: unplug that Internet connection.
  Audit: balance your check-book.
  Assume you will lose: be able to tolerate losses (insurance).
  Use legal or social mechanisms: much more mature+powerful than technical schemes.

What are we going to talk about in 6.824?
  Cryptography and cryptographic protocols, since it's well understood.
  System structure, since that's what makes systems secure.

What properties can cryptography help us achieve?
  1. Privacy / confidentiality
  2. Integrity / authenticity

What low-level crypto primitives are available?
  There are many, we will talk about five.
  Our goal is to understand system designs.
  Not to learn to build to crypto primitives,
    not to understand why they're secure.
  Even using crypto primitives correctly is very hard.
  So we just want to know what properties we can assume.

Symmetric-key ciphers.
  Examples: DES, Idea, RC4
  Plain text, cipher text, key.
  Encrypt(P, K) -> C
  Decrypt(C, K) -> P
  Properties:
    If all you know is C, you cannot easily derive P or K.
  Typically K is kept secret, known by two communicating parties.
  Typically used for privacy.
  Typically pretty fast.
  Problem: key distribution; how do two parties securely share a key?

Public-key ciphers.
  Example: RSA
  Two keys, typically called "public" and "private".
  Encrypt(M1, K1) -> M2
  Decrypt(M2, K2) -> M1
  Properties:
    If all you know is M2 and K1, you cannot derive M1.
    If all you know is M1 and K2, you cannot derive a matching M2.
    If all you know is K1, you cannot derive K2.
    If all you know is K2, you cannot derive K1.
  Two typical uses:
    Privacy: only I know K2, but I make K1 public.
      Only I can read M2; private to me.
    Integrity: only I know K1, but I make K2 public.
      Only I can produce an M2 that decrypts to M1.
  Typically very slow: 10s of milliseconds.
  Problem: key distribution; how did I securely learn your public key?

Cryptographic hashes.
  Examples: MD5, SHA-1
  Hash(X) -> Y
  X can be as large as you like, Y is typically 128 or 160 bits.
  Properties:
    If all you know is Y, you cannot find an X s.t. Hash(X) == Y.
    If you know Hash(X) = Y, you can't find X' s.t. Hash(X') = Y.
  Typical use:
    Integrity: hash from RedHat, RPM from mirror, hash checks if RPM is correct
    Integrity: part of a proof that I know a secret.
  Very fast.

Message Authentication Code
  Or "keyed hash"
  Examples: HMAC
  Suppose we both know a key, K.
  If I want to send you M1 so you'll know it's from me,
  I send you M2 = M1 + Hash(K + M1)

Certificate
  Suppose we want publically verifiable documents.
  Example: electronic MIT ID card.
  Registrar generates M1 = "Robert Morris works at MIT."
  Gives me M2 = M1 + RSAEncrypt(Hash(M1), MIT's private key)
  Anyone who knows MIT's public key can check this.
  So *I* can carry around M2 and use it to convince anyone that I work at MIT.
    They don't have to talk directly to MIT to verify this.
    No problem that data came via untrusted channel (me).
  MIT probably wants to include expiration date, picture, ID #, &c in M1.
  Nasty problems: staleness, revocation, expiration.

How does this apply to shark?
  Shark: Scaling File Servers via Cooperative Caching
   Annapureddy, Freedman, Mazieres; NSDI 2005

context
  planetlab experiments/services
  672 hosts!
  start the same program on all 672 hosts
  mount an NFS server? scp or rsync ahead of time?
  how long to send 40 MB to each of 672? 7.5 hours @ 1 MB/s
  Figure 10 says 2500 sec for 185 hosts, almost 45 minutes

very similar problem to web CDNs
  distribute from single source to flash crowd
  Akamai, Coral, BitTorrent
  getting more important: YouTube &c
  
why are we reading this paper?
  CDN ideas are the "same" as in Coral and Bittorrent
  But adds stronger security properties

what is the attack that Shark worries about?
  server is trusted
  clients/proxies can be malicious: modify data and disclose data
    yet, we want to use them to build a CDN    

what are the properties that Shark provides?
  only clients that are allowed to read a file can store that file
    even they get the file from another client
  clients can verify that they received the same data as the server stores
    even they get the file from another client

why chunks?
  reduces time until each client has something to serve
  at the limit of small chunks, full concurrency "soon"
  assume as many clients as chunks
  time to send the file once from the server,
    plus same amt of time for each client to serve chunk to each other
  so we expect 2x single file serving time!
  complicates security plan

how does Shark provide the security properties?
  clients retrieve medata from server
    server is authenticated using public-key crypto
      how does the client know that it has the right server's public key?
    if client is allowed to have access, it receives a set of tokens:
      file token Tf = tok(f) = HMACr(f)
        what is r?
      vector of chunk tokens (with chunk ID=<offset, and size>)
        Tc = tok(c) = HMACr(c)
  client computes indexing key If and Ic
        If = HMACTf(I)
	Ic = HMACTc(I)
    uses the keys for indexing into DHT (a la Coral)
    downloads data and recomputes toke for data downloaded
        Tc ?= HMACr(c)
  client/proxy interaction
    confidentiality: symmetric encryption key = Tc.
    walk through protocol in figure 3
  can any proxy (even ones that are not allowed to read F) serve F chunks?
     why not?
  can any client retrieve a chunk from a legit proxy?
     why not?
  can a proxy send modified chuncks to client without being detected?
     why not?
  can a proxy register itself under Ic?
     yes, but Authp makes sure that the client doesn't waste bw      
  why are all the fields in AuthPand AuthC necessary?
     what attacks are they protecting against?
  on what does Shark's security properties rely on?
     crypto works etc.
     protocol is correct and correctly implemented
     server is trustworthy and implemented correctly
     etc.
     what if an attacker compromises the proxy machine?