6.824 Lecture 16: Peer-to-peer systems: Content distribution Why are we reading this paper? Dynamo was built in a peer-to-peer style -- decentralized/symmetric Scatters data widely w/ hashing, then finds it CoralCDN uses similar ideas More sloppy, to tolerate faults and hot spots Focus on finding *nearby* copies of data CoralCDN is also a deployed system You can directly use it NSDI 2010 paper says terabytes/day, 10s of millions of GETs/day On 100s of PlanetLab servers all over the world Quick tutorial on DHTs You want to store small keys and values e.g. key=url val=IP addr of cache server 1000s of servers -- to absorb load and give fault tolerance put(key, value, ttl) get(key) -> values No delete() -- just ttl I'm going to describe Chord, details a bit different than Coral As in Dynamo: ID circle, 0..2^128-1 node's ID = random() key's ID = hash(key) key belongs at successor: node with next highest ID How to find a successor? Slow plan: Each node knows identity (IP address) of next node on ring Linked list of nodes Forward put/get around ring hop by hop Slow but it works Fast plan: Finger table Each hop cuts ID-space distance in half, thus log(n) hops How does a new node join? choose an ID look up that ID to find successor link into the linked list Look up IDs 1/2, 3/4, &c around the ring What does Coral store in its DHT? url -> IP addr of http proxy IP net number -> IP addr of dns/http server router IP -> IP addr of dns/http server What's a web cache? [clients, one cache, origin servers] Client(s) talk to cache server instead of real server If cache has copy of page, serve direct to client If no copy: Cache fetches from origin web server Store on cache Serve to client Client benefits (potentially)? Lower latency Less access link load Higher reliability (like Google search "Cached" link) Avoid restrictions (China, ACM) Anonymity Origin server benefits (potentially)? Reduce load -- link b/w and server CPU This maybe just transfers costs; is there a net win? Reduce ISP b/w costs (fewer bit-miles) Lower load spike from flash crowds What's a *cooperative* web cache? [diagram of world w/ clients, origins, caching proxies] Cache servers can fetch from each other Cache servers can direct clients to another cache server Also called a CDN (content distribution network) Why might a cooperative web cache be useful? Why better than a bunch of independent web caches? Avoid each cache hitting the origin once for each page Bad if a flash crowd Cooperative cache: one cache fetches from origin, shares Cache more distinct pages in a given total amount of cache space By reducing total # of cached copies of each page CoralCDN doesn't actually do this Higher hit rate -- hit in other caches Not useful for popular pages: should just cache everywhere Not useful for unpopular pages: no-one else has viewed them "Long tail" prevents high web hit rates Coral doesn't claim to be good at this High-level technical problems that cooperative caches must solve? Intercept browser requests? After all, URL looks like http://origin.com Nearby cache server? Why is this important? Latency, bit-miles, load balance Nearby copy of page? If flash crowd, how to avoid all caches hitting origin at once? If flash crowd, how to avoid overloading internal mechanisms? *** How to get browser to talk to cache system at all? Idea 1: "Coralize" names like cnn.com.nyud.net Try http://cnn.com.nyud.net:8090 Try dig cnn.com.nyud.net Why is this helpful? Directs browser to Coral rather than the origin server CoralCDN serves the nyud.edu domain Coral DNS server returns an A record for a Coral http proxy Why embed the "cnn.com" in the DNS name? After all, Coral DNS server does not look at the "cnn.com" So Coral proxy knows what the origin server is Browser will tell Coral proxy "cnn.com.nyud.net" Who Coralizes URLs? Maybe origin server operator Maybe whoever posted URL to Slashdot Probably not the client *** How to find a Coral cache near the client? Did the browser likely ask a nearby nyud.net DNS server? No Idea 2: build a mapping of IP net# to nearby Coral cache Using a DHT Each Coral server inserts its IP net# as key, its IP addr as value DNS server does DHT lookup on client IP net# to find nearby Coral server What if client is not on the same IP net# as any Coral server? Might still be "close" to a Coral server E.g. I'm not net 128.30.x.x, Coral server on net 18.x.x.x Both nets are at MIT Idea 3: extend net# mappings with traceroute Each Coral server uses traceroute to find nearby routers Registers itself under IP of each nearby router [diagram: routes to core converge] Coral DNS server traceroutes to client Looks up each router IP address in mapping For example: Traceroute from a 128.30.x.x host sees 18.x.x.x as 3rd router Could one do better than this? I.e., if there's a Coral DNS server near a client, will Coral find it? You're at MIT, nearest Coral is at BU, but BU uses a different ISP? Maybe: ping lots of random IP networks from lots of coral servers Remember, for each IP net, coral server w/ lowest ping time Maybe: analyze BGP topology How to ensure that future client DNS lookups go to nearby Coral DNS server? To further reduce latency First Coral DNS server returns new NS entries But you can't tell client DNS library to replace existing nyud.net NS records So: First Coral DNS server actually returns NS records for L0.nyucd.net And CNAME xxx.nyud.net to xxx.L2.L1.L0.nyucd.net So once a client DNS library learns NS records for L0.nyucd.net they will override original (non-local) NS records for nyucd.net *** How does a Coral cache find a cached copy of a page? Look up the URL in a DHT key=URL, value=IP addr of Coral cache that has the URL can be more than one Coral cache fetches the page from that other cache If DHT had more than one value for key, fetch page from more than one In case one is down or slow How does a Coral cache find a *nearby* copy of a page? Idea: A hierarchy of DHTs, w/ clustering at lower levels [diagram of big/small rings] Nearby (< 20 ms) Coral nodes form an L2 DHT L1: 60 ms L0: global Search in L2 DHT first If nearby copy exists, will find it first Only search L1, L0 if miss in lower level At what levels in DHT hierarchy does a Coral server *insert* info? Once per level How does a Coral server know what clusters to join? Look up its own net#/routers in location hints in the L0 DHT Are real Internet hosts clustered? That is, why does clustering make sense? *** Flash crowd: what if many clients ask for same URL at same time? Suppose not yet cached at all Many Coral caches might ask origin server for page at same time Bad news for the origin server Idea: "lock" a fetch by inserting a token into the DHT key=URL, value=IP of cache fetching that page Before fetching from origin, check the lock If present Ask fetching server to give you a copy as soon as it arrives Servers that have asked fetching server also insert into DHT So fetching Coral caches form a distribution tree Tree limits content serving load on any one node Flash crowd: what if many Coral caches insert w/ same key at same time? Or fetch same key at same time? Won't DHT node serving that key get overloaded? Idea: DHT put(k,v) might place new k/v at node not closest in ID space put(k,v) routes through DHT, closer and closer to k stops when it reaches a node that's already serving k a lot i.e. might stop before real successor in DHT result: entries with a given k are spread over many nodes near k in ID space so update load is spread and get()s can also finish w/o hitting real successor in DHT of course this means that get(k) won't see all entries for k *** How to cope with failed Coral servers? Coral DNS server is down? DNS sends client to a Coral cache that is down? DHT node is down? Can't find DHT record at all? Cache server found via DHT is down? The Question: Why does Coral's DHT store multiple values per key? Fault tolerance: gives choice of many cached copies per URL Fault tolerance: choice of many dns/proxies per net # / router Part of defense against many puts/gets on same key, during flash crowd Performance: does CoralCDN reduce server load? Figure 4 166 clients 166 Coral caches 12 distinct files on a single origin server total client request rate is 100/sec or 6000/min Why the initial peak? Why Level 2 > Level 0? Why do lines die away towards zero? Is this performance good or bad? Does Coral *DHT* handle flash crowds? Figure 4 showed us effect on origin server of flash crowds What about effect on Coral DHT? Figure 8 Lots of servers doing put/get for same DHT key total: 12 million per minute Is the performance good? Is there a hot-spot? Why are the graph #s so much lower than 12 million? get()s satisfied locally (so don't show on graph) most put()s stopped locally by load rules Does CoralCDN have good client latency? Figure 5 The latencies are a significant fraction of a second Is that good or bad? Why are the latencies so high? Why do single-level Asia latencies have a min of one second? Why are the multi-level Asia latencies so much lower? Why does traceroute help? Why does the non-Asia graph look different? Is the global (L0) level a good idea? Why not e.g. independent L1/L2? L0 certainly costs latency E.g. maybe nearest copy is half-way around the world Not clear why useful for cached pages Might be pretty useful for location hints I.e. a client from far away does a Coral DNS lookup Would it make sense to let anyone to run a CoralCDN server? I.e. are the CoralCDN servers trusted? I.e. could Coral servers run on slow links? Coral is specialized, not a general-purpose storage system doesn't store the data, only various location hints not reliable: node failure -> keys lost not consistent: I may not see your updates really about "soft state" and caching, not durable storage this seems like an appropriate use of DHTs 2010 NSDI paper retrospective CoralCDN is pretty heavily used as a general web cache -- but not very low latency! for popular content -- but non-cooperative would be as good! for unpopular content -- cooperation not useful since little overlap for flash crowds, to reduce "slashdot effect" on origin server most justifiable but measurements suggest that crowds build up slowly so much of CoralCDN's special handling maybe not needed