6.824 Lecture 18: Anonymous e-mail with mix-nets

- Last two lectures discussed how to proof identity.  Today we talk
  about how to keep confidential the identity of the sender.  This
  problem appears difficult, since one today Internet we can in
  principle trace transmissions to its origin.

- Why hide your identity?
   - privacy
   - to do something bad

Let's design a naive e-mail anonymizing proxy.
  Input:
    To: input@anon.net
    From: rtm@mit.edu
    Really-To: 6.824-staff@mit.edu
  Output:
    From: xxx
    To: 6.824-staff@mit.edu
  anonymizes source IP address and some mail headers
  but in what ways does it not provide anonymity?
  intended recipient may see identifying info in:
    other mail headers, and the body of the msg
    note we probably don't want even intended recipient to know us.
  similarly for snoopers on the output side
  snooper on the input side also sees your IP address.
  snooper on both can link in and out messages at proxy.
    by timing. or by length.
  snooper on output side can see sequence of e-mails.
    perhaps can link them together as all from you.
    so revealing id in one breaks all of them.
  legal action against proxy, or theft of backup tapes, may reveal:
    logs linking input and output.
  Equivalently, perhaps the anonymizer is malicious!

- One approach: mix-nets [chaum 1981]

  - A message with data is forwarded through a series of independently
    operated nodes (called a mix); 1 through n.  Lets say s is source
    and d is destination.  The sender picks mix 1 through n from a
    large collection of nodes.

  - Each messages contains a set of instructions and data.  The
    instructions tell the mix to which mix to forward the message to
    and which key to use to encrypt the remaining instructions and
    data.

  - Each mix has a public/private key pair (Kipub, Kipriv)

  - The instructions for each mix are encrypted with Kipub---thus only
    mix i can read them.

  - Assumption: Mixes don't collude.  In practice, pick mixes in
    different places in the world, run by different administrators
    under different laws.

  - The idea is to recursively encrypt instructions with Kipub:

      s encrypts with Knpub "Please forward message to d" = C.
      s encrypts with Kn-1pub "Please forward message to node n and
          encrypt with Kn; C"
      etc.

      Thus, message to mix 1 is E(K1pub, "Please forward message to
      mix 2 encrypted with K1pub" + E(K2pub, "Please forward message to
      mix 2" + ....))

      Mix 1 decrypts with K1priv and finds "Please forward message to
      mix 2 encrypted with K2pub + ...".  mix 1 cannot read "...",
      since it is encrypted with K2pub.  It encrypts the "..."  and a
      header new header that contains the addres of mix 2; and outputs
      the resulting message.

  - To send message M from A -> M1 -> M2 -> B, A prepares this message:

    {M2,{B,{M}PubB}PubM2}PubM1

  - Why does this work?
    1. Messages are encrypted, simple snooping doesn't work.
    2. Message looks different at each hop, can't easily match
       up input with output. (Well, timing and size...)
    3. Some mixes can be bad; only a problem if most/all are bad.
    4. What if first mix is bad? Knows sender, but not recipient
       or content.
    5. What if last mix is bad? Knows recipient, but not sender
       or content.
    It's a problem if most/all of the mixes collude.

  - The main purpose of a mix is to hide the correspondance between
    its sealed input and its unsealed output.  To make the above
    scheme better we want:
       - lots of messages from/to different people.
       - process a batch of input messages at the same time and
         reorder them.
       - make each message really different.  the mix could add a
         random number and encrypt it with the rest of the instruction
         and data.
       - message should have a fixed-length.  the sender should
         fragment data if it doesn't fit in a single message.
       - intermix cover traffic among regular traffic
       - remove duplicate messages
       - protecting against abuse (e.g., hashcash)
       - encrypt data with a symmetric key each time it passes through
         a mix.
       - etc

  - You can also build "reply" blocks, to allow replies to be sent
    without the replier knowing identity of final target.
    For A to allow B to reply, A would prepare:

    {M2,K3,{M1,K2,{A,K1}PubM1}PubM2}PubB

    K1, K2, K3 can be symmetric keys, just used this once,
      made up randomly by A.
    B encrypts message with K3, sends to M2.
    M2 encrypts with K2, sends to M1.
    M1 encrypts with K1, sends to A.
    A receives {{{Reply}K3}K2}K1, knows all keys.

    [Is this what nym.alias.net does?]

    Why the nested symmetric key encryption? Why not have B use
    public key encryption with A's public key?
      Want msg to look different at each hop.

  - Implementation
      - special fields in email headers. for example, type-1
      remailer:
	Anon-To: ...
	Latent-Time:
	Encrypt-key:
      followed by a marker line
	
 
- Other approaches:
      - Onion routing (for IP routing)
      - Crowds (for web browsing)