Lec 22: Case study: Tor Past few lectures: security in the content of storage systems Today: case study of security in a communication systems Goal: anonymous communication hide your identity from web site you're using e.g. submitting anonymous complaint about 6.824 hide your surfing habits from dorm-mates, MIT net admins, ISP, FBI, &c we'd like to hide even hints about source/destination your IP address, browser version, same person as yesterday, same person as just visited some other site this is a hard problem! Could use a web proxy, like CoralCDN near source, looks like you're connecting to Coral, not target web site near target web site, connection came from Coral, not from you What attacks might succeed against anonymity via a web proxy? Client must tell proxy the target URL Sniff URL from net near client We could solve this using HTTPS for client<->proxy Sniff content between proxy and target Maybe see user name &c Maybe link successive web interactions Break into the proxy -- or the attacker can set up a proxy! Sniff near client and near server, watch timing What attacker powers are we assuming? Maybe can sniff packets near source. How? Maybe can sniff packets near destination Attacker might *be* the destination Maybe can break into some proxies/routers Can't break into client Do we think anonymous Internet communication is possible? Why? What fundamental ideas/properties can we exploit? Should anonymous Internet communication be allowed? Would be great for spammers, hackers Idea: mixnet chain of proxy-like mixes layers of encryption so each mix only knows prev+next mixes thus: no one mix knows both src and dst old idea: 1980s Chaum Mixes, e-mail schemes, various predecessors to Tor Simple mixnet scheme there are lots of mixes each mix has a well-known public key client wants to send data to DST client picks a sequence of e.g. 3 ORs: R1, R2, R3 M3 = {DST, data}R3 M2 = {R3, M3}R2 M1 = {R2, M2}R1 client sends M1 to R1 What about replies from DST back to client? can DST use mix net in the same way? no -- would have to know client's IP address we want client to be anonymous to DST client prepares a return address, includes in the data R3, {R2, {R1, {client}R1}R2}R3 What if adversary observes timing of client->R1 and R3->dst? Or can modify the timing by delaying packets. Is there any way to defeat this attack? What if adversary sees 700-byte msg go in, 700-byte msg later go out? Is there any way to defeat this attack? Cover traffic and mixing may be too expensive for low-latency system Is it worth proceeding at all if we don't have cover and mixing? I.e. how likely is a true "global passive adversary" who can see all links? What can be done to reduce likelyhood of global passive adversary? What if adversary owns one of the mixes? What if adversary owns e.g. 3 of 10 mixes? What if adversary can DoS some/all of the good mixes? What if adversary records traffic, later steals some private keys? How can client reliably get a list of mixes? Why doesn't Tor use this mixnet scheme? Latency is bad: public-key cryptography is slow Directory of routers Tor: the Second-Generation Onion Router Dingledine, Mathewson, and Syverson goals: low-latency anonymous TCP connections easy for users to use -- they want many users! basic plan: client app, client OP, OR1, OR2, OR3, destination TCP from app to OP TCP from OR3 to destination probably supports any app that uses TCP client sets up circuits through OR network client sets up shared symmetric key w/ each OR so client knows K1, K2, K3 client sends to OR1: circID, {{{data}K3}K2}K1 this is cheap symmetric cryptography, ensures privacy each hop has table mapping circID to key how does OR3 send data back towards client? OR3 sends circID,{reply}K3 to OR2 OR2 sends circID,{{reply}K3}K2 to OR1 OR1 sends circID,{{{reply}K2}K2}K1 to client client knows K1, K2, and K3 key setup? how can client agree w/ ORx about a "session" key? without other ORs/sniffers also seeing? one possibility: client creates key, encrypts w/ ORx's public key, sends over circuit one reason this scheme is bad is that it doesn't provide "forward secrecy" Tor uses Diffie-Hellman key exchange client and ORx each create half a key each sends a function of that key half that is hard to invert each can compute key using its half and the info it received from the other someone with neither half can't figure out the key how to prevent each OR from learning more than the next hop? client sets up one hop to OR1 client sets up OR1->OR2 through partial circuit to OR1 setup pkts are encrypted w/ client/OR1 session key so sniffer can't see "OR2" client sets up OR2->OR3 through partial circuit OR1 can't see "OR2" since protected w/ client/OR2 session key What if OR1 sends client's setup msg to OR7, which attacker controls? Will client realize it's OR7, not OR2? Yes: client encrypts its DH info with OR2's public key. And OR2 must send back a cryptographic hash of the DH key. could circID be used to match up hops? each OR changes circID, has a mapping table inter-OR links are encrypted with TLS Can a client safely re-use the same circuit for many TCP connections? Can a corrupt OR modify the client's data? What does the client need to know about the set of ORs in order to set up a circuit? Why does the client need to know each OR's public key? How does a client learn the set of ORs? Set of directory servers Each OR periodically reports its public key to each directory server Client comes pre-configured with directory server IP addrs and public keys Is it important that the directory servers agree on the set of ORs? How should a directory server decide whether to accept an OR? What if an attacker controls a directory server? What to do about clients that do DNS lookups? Why is this a danger? Review: what attacks does Tor do well on? Sniffing at just one point Controlling just one OR Theft of a few OR private keys (forward secrecy) Review: what attacks does TOR do badly on? Sniffing everywhere to find output timing that's same as input No cover traffic, no real delay+mixing Maybe you don't have to literally sniff, maybe you can "ping" ORs. Confirming a src/dst guess using timing at those two places Someone owning lots of routers or directory servers Or tricking directory servers into accepting bogus ORs Multiple TCP connections per circuit might have been a bad idea Application doing something silly (HTTP headers, DNS lookups, BitTorrent) DoS lots of good routers What about legal/social attacks? Fetch lots of copyrighted movies via Tor to get it shut down Pass laws requiring back-doors for law enforcement Retrospective Tor is very popular 1000s of onion routers run by volunteers all over the world Reasonably secure There are known successful attacks By observing detailed timing of packets Most surprising: has not been shut down via non-technical attacks http://en.wikipedia.org/wiki/Tor_(anonymity_network) http://www.cl.cam.ac.uk/users/sjm217/papers/oakland05torta.pdf