TITLE: Comprehend the Planet AUTHOR: Timothy Roscoe AFFILIATION: Intel Research at Berkeley DATE: August 2005 DISCLAIMER: This is a personal statement and does not necessarily reflect the views of Intel Corporation It's hard to formulate a grand challenge in distributed systems versus, say, genomics. Visibility: distributed systems are invisible to the public; it's hard to give a systems demo. They matter, but from an application standpoint. We are not natural applications designers, but toolbuilders. The world-changing applications that came out of this community happened because the tools we built opened up unforeseen possibilities. Performance: distributed systems usually do something that can always be done in a centralised way, but works better when distributed cleverly (better scalability, lower cost, \ldots). Hence any grand challenge looks a little less grand from the user's standpoint. Relevance: in thinking of big, societally beneficial challenges, one asks: what does most of the world need right now? My answer is clean water, food, health, education, and freedom from violence. What conceivable place could distributed systems work have here? This last problem is interesting, though, because of the perspective it brings. Personally, I end up not caring about service providers, equipment manufacturers, ``road warriors'', or the burgeoning market segment of teenage girls. I worry about the planet. My suggested challenge is understanding the planet, and our effect on it, in real time, from many vantage points, in a many dimensions. The world is a large system. Its climate, politics, tectonics, populations, ecology, and economy are things whose dynamics we do not understand. There is a wealth of academic scientific work in instrumenting and understanding the planet, but it's confined to a set of academic communities -- often vertical silos. Computing in such communities is widespread, but typically not closely informed by the CS community. A success proof-point would be demonstrating continous, real-time fusion of lots of distributed data of different forms, e.g. combining local temperature, air quality, traffic information, personal health monitoring and satellite imagery at global scale -- there is increasing evidence that these are closely linked. Like all grand challenges, there are many potential spin-off benefits: for healthcare, transport systems, etc. Large computations are required, but in addition to the (rather limited) vision of academic Grid computing (basically batch jobs), this continuous processing of data by long-running services, something this community has always regarded as more interesting, useful, a superset of the batch-processing problem. This vision is also complementary to, separate from, but not dependent on, sensor networks. Sensors provide data. Our job is computing on this data at global scale, delivering meaningful results to useful places. The call mentions clean-slate redesigns of the Internet. I argue the Internet has no requirement for redesign based on its current uses. Security bandaids to the network will happen with or without our help, and are therefore unworthy of the name ``grand challenge''. Redesigns of the Internet that are ``cleaner'' without enabling new applications are poorly motivated. However, understanding the planet in real time does require building a different kind of network. Computation must be embedded in the network - in-network processing of data is essential to correctly route data, and reduce it to scale the system to planetary size. Hence most communication will be overlay-based (trees, hypercubes, etc.). TCP-like protocols designed for pairwise communication are inadequate, and research is needed into more appropriate ways to transmit data while sharing network resources. Security: Who controls this information? In this vision protecting ownership and privacy of information is arguably less important that authenticity or fidelity - lots of players have vested interests in influencing this data and its interpretation to their advantage. This challenge requires deep research in many sub-fields of distributed systems: resource management, security, routing, discovery, etc.; and building a substantial artifact - a network of processing elements connected to today's Internet, and probably bootstrapped from it. It's plumbing, but that's what we're good at, and this is on a grand scale. It's naive to think we can improve the world by throwing more distributed systems at it, but they are the hammer we have, and this seems the best-looking nail. I'm also convinced that access to more information by itself will not help, without global change in culture and thinking. The optimistic view is that a project like this facilitates such change.