Tracking, Managing, and Preserving our Digital Lives The amount and variety of digital content human beings generate, publish, store, and query publicly and privately is growing at an astonishing rate. It has been estimated that the world generates about five exabytes of digital information annually. What do we do with these massive amounts of data? With this question, I pose the Grand Challenge of tracking, managing, and preserving our digital lives. Tracking: Data is continuously being generated about us, both by us individually and by others, as we write and sign documents, purchase products, send email, and peruse and query public web sites and databases. The data is captured, stored, and processed in ways that are not currently well-tracked by either the capturer or the capturee. It is difficult for capturers to trace the provenance of data they capture and store. It is difficult for capturees to track what data is being stored and processed about them by whom and where. Managing: Data is being generated at a fast enough rate that losing context becomes easy. We need mechanisms to organize, index, search, and express relationships between distributed data sets in ways that make access to the data efficient, convenient, and secure. Moreover, people use a variety of devices to work and communicate including laptops, cell phones, desktops, and PDAs. Each of these devices is a data repository with its own platform, operating system, applications, and network capabilities. Building a usable system that allows users to track and manage their personal data collections across devices is important. Preserving: The need to preserve large amounts of data indefinitely into the future is urgent both in the public business domain and in the private domain (as families pass down digital heirlooms to future generations). While librarians have millennia of experience in preserving paper artifacts, the computer systems community has yet to fully understand and build extremely-long-lasting distributed archival systems that are able to evolve over time despite aging hardware, faulty software, evolving administration, format migration, economic failures, and attacks. Below is a sample of research problems that arise from this Grand Challenge: Research task: Build a distributed "Watch-Dog" of web content. There are many examples of deliberate "web-scrubbing" of government web sites with the goal of controlling public opinion when politically unflattering events occur. The Wayback Machine provides periodic snapshots of the Web but is centralized and is a single point of trust and administration. Build a distributed web crawler that tracks, detects, and presents subtle changes and provides better coverage of online content than current crawlers. Research task: Build a personal distributed data repository system that automatically tracks and manages a user's personal data collection across heterogeneous devices. The system synchronizes data across devices, creates context by defining relationships between data sets, and quickly retrieves the latest version of a particular data item. The system enables third parties to access and search through a user's data. Access is seamless and easy, yet controlled by the user. Finally, the system allows the user, from any device anywhere in the network, to access data "by name", without having to know the location and details of accessing specific data servers. Data follows the user rather than the user follows the data. Research task: Build a War-Zone testbed where the motto is "anything goes". PlanetLab has proven extremely useful in testing the performance of distributed applications. In a War-Zone testbed, researchers invite third parties to attack and test the resilience of applications. Research task: Use the War-Zone to simulate an aging distributed archival system and subject it to attacks involving sustained effort over long periods of time. Analyze and measure empirically the preservation abilities of state-of-the-art distributed archival systems under sustained attack. We have a long way to go before we can claim control over our digital lives, but this sample of research tasks will provide us with software artifacts that have immediate impact socially, economically, and even politically. Mema Roussopoulos Harvard University