Lec 22: Case study: Tor

Past few lectures: security in the content of storage systems
Today: case study of security in a communication systems

Goal: anonymous communication
  hide your identity from web site you're using
    e.g. submitting anonymous complaint about 6.824
  hide your surfing habits from dorm-mates, MIT net admins, ISP, FBI, &c
  we'd like to hide even hints about source/destination
    your IP address, browser version, same person as yesterday,
    same person as just visited some other site
  this is a hard problem!

Could use a web proxy, like CoralCDN
  near source, looks like you're connecting to Coral, not target web site
  near target web site, connection came from Coral, not from you

What attacks might succeed against anonymity via a web proxy?
  Client must tell proxy the target URL
  Sniff URL from net near client
    We could solve this using HTTPS for client<->proxy
  Sniff content between proxy and target
    Maybe see user name &c
    Maybe link successive web interactions
  Break into the proxy -- or the attacker can set up a proxy!
  Sniff near client and near server, watch timing

What attacker powers are we assuming?
  Maybe can sniff packets near source. How?
  Maybe can sniff packets near destination
  Attacker might *be* the destination
  Maybe can break into some proxies/routers
  Can't break into client

Do we think anonymous Internet communication is possible?
  Why?
  What fundamental ideas/properties can we exploit?

Should anonymous Internet communication be allowed?
  Would be great for spammers, hackers

Idea: mixnet
  chain of proxy-like mixes
  layers of encryption so each mix only knows prev+next mixes
  thus: no one mix knows both src and dst
  old idea: 1980s Chaum Mixes, e-mail schemes, various predecessors to Tor

Simple mixnet scheme
  there are lots of mixes
  each mix has a well-known public key
  client wants to send data to DST
  client picks a sequence of e.g. 3 ORs: R1, R2, R3
  M3 = {DST, data}R3
  M2 = {R3, M3}R2
  M1 = {R2, M2}R1
  client sends M1 to R1

What about replies from DST back to client?
  can DST use mix net in the same way?
  no -- would have to know client's IP address
    we want client to be anonymous to DST
  client prepares a return address, includes in the data
    R3, {R2, {R1, {client}R1}R2}R3

What if adversary observes timing of client->R1 and R3->dst?
  Or can modify the timing by delaying packets.
  Is there any way to defeat this attack?

What if adversary sees 700-byte msg go in, 700-byte msg later go out?
  Is there any way to defeat this attack?

Cover traffic and mixing may be too expensive for low-latency system
  Is it worth proceeding at all if we don't have cover and mixing?
  I.e. how likely is a true "global passive adversary" who can see all links?
  What can be done to reduce likelyhood of global passive adversary?

What if adversary owns one of the mixes?

What if adversary owns e.g. 3 of 10 mixes?

What if adversary can DoS some/all of the good mixes?

What if adversary records traffic, later steals some private keys?

How can client reliably get a list of mixes?

Why doesn't Tor use this mixnet scheme?
  Latency is bad: public-key cryptography is slow
  Directory of routers

Tor: the Second-Generation Onion Router
Dingledine, Mathewson, and Syverson

goals:
  low-latency anonymous TCP connections
  easy for users to use -- they want many users!

basic plan:
  client app, client OP, OR1, OR2, OR3, destination
    TCP from app to OP
    TCP from OR3 to destination
    probably supports any app that uses TCP
  client sets up circuits through OR network
    client sets up shared symmetric key w/ each OR
    so client knows K1, K2, K3
  client sends to OR1:
    circID, {{{data}K3}K2}K1
    this is cheap symmetric cryptography, ensures privacy
    each hop has table mapping circID to key

how does OR3 send data back towards client?
  OR3 sends circID,{reply}K3 to OR2
  OR2 sends circID,{{reply}K3}K2 to OR1
  OR1 sends circID,{{{reply}K2}K2}K1 to client
  client knows K1, K2, and K3

key setup?
  how can client agree w/ ORx about a "session" key?
  without other ORs/sniffers also seeing?
  one possibility:
    client creates key, encrypts w/ ORx's public key, sends over circuit
    one reason this scheme is bad is that it doesn't provide "forward secrecy"
  Tor uses Diffie-Hellman key exchange
    client and ORx each create half a key
    each sends a function of that key half that is hard to invert
    each can compute key using its half and the info it received from the other
    someone with neither half can't figure out the key

how to prevent each OR from learning more than the next hop?
  client sets up one hop to OR1
  client sets up OR1->OR2 through partial circuit to OR1
    setup pkts are encrypted w/ client/OR1 session key
    so sniffer can't see "OR2"
  client sets up OR2->OR3 through partial circuit
    OR1 can't see "OR2" since protected w/ client/OR2 session key

What if OR1 sends client's setup msg to OR7, which attacker controls?
  Will client realize it's OR7, not OR2?
  Yes: client encrypts its DH info with OR2's public key.
    And OR2 must send back a cryptographic hash of the DH key.
  
could circID be used to match up hops?
  each OR changes circID, has a mapping table
  inter-OR links are encrypted with TLS

Can a client safely re-use the same circuit for many TCP connections?

Can a corrupt OR modify the client's data?

What does the client need to know about the set of ORs in order to set up a circuit?
  Why does the client need to know each OR's public key?

How does a client learn the set of ORs?
  Set of directory servers
  Each OR periodically reports its public key to each directory server
  Client comes pre-configured with directory server IP addrs and public keys

Is it important that the directory servers agree on the set of ORs?

How should a directory server decide whether to accept an OR?

What if an attacker controls a directory server?

What to do about clients that do DNS lookups?
  Why is this a danger?

Review: what attacks does Tor do well on?
  Sniffing at just one point
  Controlling just one OR
  Theft of a few OR private keys (forward secrecy)

Review: what attacks does TOR do badly on?
  Sniffing everywhere to find output timing that's same as input
    No cover traffic, no real delay+mixing
    Maybe you don't have to literally sniff, maybe you can "ping" ORs.
  Confirming a src/dst guess using timing at those two places
  Someone owning lots of routers or directory servers
    Or tricking directory servers into accepting bogus ORs
  Multiple TCP connections per circuit might have been a bad idea
  Application doing something silly (HTTP headers, DNS lookups, BitTorrent)
  DoS lots of good routers

What about legal/social attacks?
  Fetch lots of copyrighted movies via Tor to get it shut down
  Pass laws requiring back-doors for law enforcement

Retrospective
  Tor is very popular
  1000s of onion routers run by volunteers all over the world
  Reasonably secure
    There are known successful attacks
    By observing detailed timing of packets
  Most surprising: has not been shut down via non-technical attacks

http://en.wikipedia.org/wiki/Tor_(anonymity_network)
http://www.cl.cam.ac.uk/users/sjm217/papers/oakland05torta.pdf