Lec 21: Spamalytics: an emperical analysis of spam marketing conversion Kanich et al. CCS (Oct 2008) Economics of security - Understanding economics helps understand adversary's motivation - what attacks are econically-viable for the adversary? - what attacks won't the adversary launch? - how to counter the adversary's revenue stream? - An emerging research area - This paper is a nice example Four parts to the paper - Understanding the Storm botnet - Infiltrating the Storm botnet - Setting up a SPAM business - Analysis of the SPAM business Understanding the Storm botnet - Bot master, hosted in "bullet-proof" hosting center - Proxy bots, conduits between worker and bot master globally accessible (not beyond a firewall) why have a conduit? - Workers, compromised machines under control of bot master how do spammers compromise machines? (postcard site) how many (75,869 connect to authors' proxies) Organization of Storm - peer-to-peer. why? Overnet, based on Kademlia DHT to find other nodes connect, search, publicize, and Publish keys are rendezvous keys for C&C service k = time (one per hour) v = IP-address+port - custom TCP-based protocl for C&C (command and control) bot sends through proxy to an associated master process master through proxy responds with spam workload task > 1 spam template a delivery list of email addresses a set of named dictionaries Infiltrating Storm - run Storm walmare on globally reacheable machine - run 8 proxy bots inside VMs - traffic to 8 proxies through a centralized gateway parsing C&C messages and rewriting them replace intended site links in spam with author's site careful with links (similar to real spam links) add identifer to end of url (make authors' spams identifiable) appends email addresses with addresses for created many email accounts SMPT sinks remove them on bot completion reports Setting up a SPAM business - postcard site to propagate Storm malware (but modified) download sw to view postcard benign executable (post "data = 1") - pharmacy site (with fake shopping card) - filter out "crawlers" URLs without ID blacklist hosts that access robots.txt blacklist hosts that disable javascript blacklist hosts with more than one unique ID with same user-agent field inject new IP addresses (for anti-malware researchers) - ethics: "strictly reduce harm" - Campaign data sets (table 1, fig 4, 5, and table 2) Results - SPAM conversion pipeline (fig 6 and table 3) final conversion rates: 28 out of 350M emails (for revenue) what can we conclude from the 28 about revenue? average purchase was ~$100 estimate revenue using all Storm nodes: $7000 for campaign? 3.5M for year? repeat business? maybe not much money in spam? number of new bots/day: 3500 and 8000 - time to click (figure 7) - effects of blacklisting (fig 8) - geographic location of hosts that convert (fig 9) - who converts (figure 10 and 11) - conversion rates Discussion - did the authors deliver on the goal of the paper? their model suggests that it is unprofitable to send spam some reasons why the model might be incorrect - did the author intercept legal attempts by interested customers? - did the authors make life hard for anti-spammers? - have the authors made it hard for others to do similar studies? will spammers fix the server-in-the-middle attack? ----- 2007 lecture Lec 21: Case study: Dynamo Paper published at last SOSP (Oct 2007) Experience paper Amazon's Dynamo Dynamo: key/value store always writeable conflict resolution SLA: 300 msec interface put(key, context, object) get(key, context) -> object caller picks keys and objects ID = MD5(key) partitioning consistent hashing replication N nodes preference list versioning version vectors what are the ID of the writers? context includes version vector how large can the vectors gets? put & get sloppy quorum: R nodes perform get W nodes perform put R+W > N N is healthy nodes in preference list hinted handoff usually both R + W < N because don't want to wait on the slowest permanent failures replicas synchronization through merkle trees ring membership: manual what risk does this avoid? add node: transfer keys experience transfer and synchronization is expensive, because of key scan --> independent schemes for partitioning and placement Q partitions, Q/S tokens per server, Q >> N, Q >> S*T tokens are randomly redistributed copy complete partitions background tasks