Lec 21: Spamalytics: an emperical analysis of spam marketing conversion
  Kanich et al. CCS (Oct 2008)

Economics of security
  - Understanding economics helps understand adversary's motivation
    - what attacks are econically-viable for the adversary?
    - what attacks won't the adversary launch?
    - how to counter the adversary's revenue stream?
  - An emerging research area
  - This paper is a nice example

Four parts to the paper
  - Understanding the Storm botnet
  - Infiltrating the Storm botnet
  - Setting up a SPAM business
  - Analysis of the SPAM business

Understanding the Storm botnet
  - Bot master, hosted in "bullet-proof" hosting center
  - Proxy bots, conduits between worker and bot master
    globally accessible (not beyond a firewall)
    why have a conduit?
  - Workers, compromised machines under control of bot master
    how do spammers compromise machines?  (postcard site)
    how many (75,869 connect to authors' proxies)

Organization of Storm
 - peer-to-peer.  why?
   Overnet, based on Kademlia DHT
   to find other nodes
   connect, search, publicize, and Publish
   keys are rendezvous keys for C&C service
    k = time (one per hour)
    v = IP-address+port
 - custom TCP-based protocl for C&C (command and control)
   bot sends through proxy to an associated master process
   master through proxy responds with spam workload task
    > 1 spam template
    a delivery list of email addresses
    a set of named dictionaries

Infiltrating Storm
 - run Storm walmare on globally reacheable machine
 - run 8 proxy bots inside VMs
 - traffic to 8 proxies through a centralized gateway
   parsing C&C messages and rewriting them
     replace intended site links in spam with author's site
      careful with links (similar to real spam links)
      add identifer to end of url (make authors' spams identifiable)
     appends email addresses with addresses for
      created many email accounts
      SMPT sinks
     remove them on bot completion reports

Setting up a SPAM business
  - postcard site to propagate Storm malware (but modified)
    download sw to view postcard
    benign executable (post "data = 1")
  - pharmacy site (with fake shopping card)
  - filter out "crawlers"
    URLs without ID
    blacklist hosts that access robots.txt
    blacklist hosts that disable javascript
    blacklist hosts with more than one unique ID with same user-agent field
    inject new IP addresses (for anti-malware researchers)
  - ethics: "strictly reduce harm"
  - Campaign data sets (table 1, fig 4, 5, and table 2)

Results
  - SPAM conversion pipeline (fig 6 and table 3)
    final conversion rates: 28 out of 350M emails (for revenue)
    what can we conclude from the 28 about revenue?
      average purchase was ~$100
      estimate revenue using all Storm nodes: $7000 for campaign?
      3.5M for year?  repeat business?
      maybe not much money in spam?
    number of new bots/day: 3500 and 8000
  - time to click (figure 7)
  - effects of blacklisting (fig 8)
  - geographic location of hosts that convert (fig 9)
  - who converts (figure 10 and 11)
  - conversion rates

Discussion
  - did the authors deliver on the goal of the paper?
    their model suggests that it is unprofitable to send spam
    some reasons why the model might be incorrect
  - did the author intercept legal attempts by interested customers?
  - did the authors make life hard for anti-spammers?
  - have the authors made it hard for others to do similar studies?
    will spammers fix the server-in-the-middle attack?

----- 2007 lecture

Lec 21: Case study: Dynamo

Paper published at last SOSP (Oct 2007)
Experience paper
Amazon's Dynamo

Dynamo: key/value store
  always writeable
  conflict resolution
  SLA: 300 msec
  interface
    put(key, context, object)
    get(key, context) -> object
    caller picks keys and objects
    ID = MD5(key)
  partitioning
    consistent hashing
  replication
    N nodes
    preference list
  versioning
    version vectors
      what are the ID of the writers?
      context includes version vector
      how large can the vectors gets?
  put & get
    sloppy quorum: 
      R nodes perform get
      W nodes perform put
      R+W > N
      N is healthy nodes in preference list
    hinted handoff
    usually both R + W < N because don't want to wait on the slowest
  permanent failures
    replicas synchronization through merkle trees
    ring membership: manual
      what risk does this avoid?
    add node: transfer keys
  experience
   transfer and synchronization is expensive, because of key scan
     --> independent schemes for partitioning and placement
     Q partitions, Q/S tokens per server, Q >> N, Q >> S*T
     tokens are randomly redistributed
     copy complete partitions
   background tasks