6.824 2007 Lecture 17: Security: integrity and confidentiality introduction start of a series on lecture involving security requirements today data integrity and confidentiality case study: shark security: some of the parties in the system are malicious (adversaries) e.g., proxies in shark might try to steal data or modify it important in security to state what adversaries can do and not do the attack model are we protecting against the NSA stealing data with big budget or a 16-year hacker who wants to steal music with a small budget? do we assume the OS is secure? do we rule out some class of attacks? despite adversaries we may want to achieve some objectives common objectives: integrity, confidentiality, access control, etc. security is hard because designer must achieve negative goal Example of a secure system design: Secure telephone between FBI and CIA Goal: only people in FBI and CIA buildings allowed to know Plan: Radiation-proof buildings Only one entrance/exit per building Armed guards at entrances Guards check ID cards and record all in/out Pressurized shielded cable between two buildings No other cables allowed to leave the buildings Pass laws to punish people who reveal government secrets Invite NSA to try to steal the messages Send fake information, spy on the KGB to see if they learn that info Note that we're unlikely to achieve perfect security No individual technique is perfect Pressurized cable only raises attacker's cost You can't completely shield a building Anyone can be bribed or blackmailed We could meet our stated goal but goals could be inappropriate Maybe FBI and CIA have uncleared visitors in the building Maybe some employees leave the building and talk in their sleep Why don't we forbid employees from leaving the building? General methodology for building secure systems Figure out what you are trying to protect and how much it's worth Figure out what attacks you're trying to defend against State goals or desired properties clearly Not "impossible to break" How about "attack X on resource Y would cost $Z dollars" Structure system so it has two kinds of components: Trusted and un-trusted Build secure perimeter around trusted components Perhaps using physical, legal, or cryptographic techniques Minimize the size of the trusted components "Trusted" means "could betray you" Maybe we should have built a secure *room*, not building Analyze the resulting system, and monitor its success What really works? Avoid exposure: unplug that Internet connection. Audit: balance your check-book. Assume you will lose: be able to tolerate losses (insurance). Use legal or social mechanisms: much more mature+powerful than technical schemes. What are we going to talk about in 6.824? Cryptography and cryptographic protocols, since it's well understood. System structure, since that's what makes systems secure. What properties can cryptography help us achieve? 1. Privacy / confidentiality 2. Integrity / authenticity What low-level crypto primitives are available? There are many, we will talk about five. Our goal is to understand system designs. Not to learn to build to crypto primitives, not to understand why they're secure. Even using crypto primitives correctly is very hard. So we just want to know what properties we can assume. Symmetric-key ciphers. Examples: DES, Idea, RC4 Plain text, cipher text, key. Encrypt(P, K) -> C Decrypt(C, K) -> P Properties: If all you know is C, you cannot easily derive P or K. Typically K is kept secret, known by two communicating parties. Typically used for privacy. Typically pretty fast. Problem: key distribution; how do two parties securely share a key? Public-key ciphers. Example: RSA Two keys, typically called "public" and "private". Encrypt(M1, K1) -> M2 Decrypt(M2, K2) -> M1 Properties: If all you know is M2 and K1, you cannot derive M1. If all you know is M1 and K2, you cannot derive a matching M2. If all you know is K1, you cannot derive K2. If all you know is K2, you cannot derive K1. Two typical uses: Privacy: only I know K2, but I make K1 public. Only I can read M2; private to me. Integrity: only I know K1, but I make K2 public. Only I can produce an M2 that decrypts to M1. Typically very slow: 10s of milliseconds. Problem: key distribution; how did I securely learn your public key? Cryptographic hashes. Examples: MD5, SHA-1 Hash(X) -> Y X can be as large as you like, Y is typically 128 or 160 bits. Properties: If all you know is Y, you cannot find an X s.t. Hash(X) == Y. If you know Hash(X) = Y, you can't find X' s.t. Hash(X') = Y. Typical use: Integrity: hash from RedHat, RPM from mirror, hash checks if RPM is correct Integrity: part of a proof that I know a secret. Very fast. Message Authentication Code Or "keyed hash" Examples: HMAC Suppose we both know a key, K. If I want to send you M1 so you'll know it's from me, I send you M2 = M1 + Hash(K + M1) Certificate Suppose we want publically verifiable documents. Example: electronic MIT ID card. Registrar generates M1 = "Robert Morris works at MIT." Gives me M2 = M1 + RSAEncrypt(Hash(M1), MIT's private key) Anyone who knows MIT's public key can check this. So *I* can carry around M2 and use it to convince anyone that I work at MIT. They don't have to talk directly to MIT to verify this. No problem that data came via untrusted channel (me). MIT probably wants to include expiration date, picture, ID #, &c in M1. Nasty problems: staleness, revocation, expiration. How does this apply to shark? Shark: Scaling File Servers via Cooperative Caching Annapureddy, Freedman, Mazieres; NSDI 2005 context planetlab experiments/services 672 hosts! start the same program on all 672 hosts mount an NFS server? scp or rsync ahead of time? how long to send 40 MB to each of 672? 7.5 hours @ 1 MB/s Figure 10 says 2500 sec for 185 hosts, almost 45 minutes very similar problem to web CDNs distribute from single source to flash crowd Akamai, Coral, BitTorrent getting more important: YouTube &c why are we reading this paper? CDN ideas are the "same" as in Coral and Bittorrent But adds stronger security properties what is the attack that Shark worries about? server is trusted clients/proxies can be malicious: modify data and disclose data yet, we want to use them to build a CDN what are the properties that Shark provides? only clients that are allowed to read a file can store that file even they get the file from another client clients can verify that they received the same data as the server stores even they get the file from another client why chunks? reduces time until each client has something to serve at the limit of small chunks, full concurrency "soon" assume as many clients as chunks time to send the file once from the server, plus same amt of time for each client to serve chunk to each other so we expect 2x single file serving time! complicates security plan how does Shark provide the security properties? clients retrieve medata from server server is authenticated using public-key crypto how does the client know that it has the right server's public key? if client is allowed to have access, it receives a set of tokens: file token Tf = tok(f) = HMACr(f) what is r? vector of chunk tokens (with chunk ID=) Tc = tok(c) = HMACr(c) client computes indexing key If and Ic If = HMACTf(I) Ic = HMACTc(I) uses the keys for indexing into DHT (a la Coral) downloads data and recomputes toke for data downloaded Tc ?= HMACr(c) client/proxy interaction confidentiality: symmetric encryption key = Tc. walk through protocol in figure 3 can any proxy (even ones that are not allowed to read F) serve F chunks? why not? can any client retrieve a chunk from a legit proxy? why not? can a proxy send modified chuncks to client without being detected? why not? can a proxy register itself under Ic? yes, but Authp makes sure that the client doesn't waste bw why are all the fields in AuthPand AuthC necessary? what attacks are they protecting against? on what does Shark's security properties rely on? crypto works etc. protocol is correct and correctly implemented server is trustworthy and implemented correctly etc. what if an attacker compromises the proxy machine?