Democratizing content publication with Coral Freedman, Freudenthal, Mazieres NSDI 2004 What's the high-level problem to be solved? You have cooperating caching proxies scattered over the Internet. Direct browser to nearest cached copy. If not cached nearby, fetch from real server into a nearby cache. Why is this helpful? Might reduce server load. Might reduce delay visible to user. Doesn't Akamai already solve this problem? What are the constraints that make it hard? No support from browser. No support from final server. What tools are available? We only get to see DNS and HTTP requests. Assuming "Coralized" names like www.cnn.com.nyucd.edu What can we achieve with just a bunch of DNS servers for nyucd.edu? Browser probably chooses a random DNS server. That DNS server can send the browser an A record for one of the proxies? But which one? Idea 1: if DNS server is close (low ping time) to browser, then DNS server can return any proxy close to the DNS server. So we'd want to somehow cause browser to use nearby Coral DNS server. Idea 2: build a database mapping IP net numbers to nearby proxies, each proxy registers its net number, then DNS server looks up browser's IP net number to find proxy. What about browsers not on the same net as a proxy? Might still be nearby proxy. How does Coral cause browser to use a nearby Coral DNS server? L2.L1.L0 trick to have one chance per hierarchy level nodes(level,count,target) to find good "next" DNS server traceroute and hints in DHT to implement nodes() How does Coral find a nearby cached copy of a URL? What does Coral store in the DHT? router IP addresses (found w/ traceroute) -> nearby proxy 24-bit IP prefixes -> nearby proxy URL -> proxy If browser is at MIT, and nearest proxy is at BU, will we find it? 5 hops to www.cnn.com takes us to BBN planet. Does Coral handle flash crowds (very popular URLs) well? What might go wrong? Every proxy fetches the URL direct from server. DHT hot-spots. What does Coral do about it? What DHT techniques did they use? Hierarchy for locality. Why don't they just cache along the path? How do they choose clusters?