Community Information Management - A Grand Challenge in Distributed Systems Doug Terry Microsoft Research Silicon Valley terry@microsoft.com Problem While the Internet has allowed unprecedented sharing of information via electronic mail and the Web, little support exists for loosely structured communities with common information needs. In this context, I define a "community" as a collection of people with minimal organizational structure that wish to share electronic documents. Consider, for example, the parents of a youth soccer team that need to share team rosters, phone numbers, game and practice schedules, action photos, party plans, and other information that maybe contributed and updated by various community members. Changes to a shared document should be readily visible to all of the relevant community members. A given person may belong to many distinct communities, which change over time with the person's changing activities, interests, location, etc. Documents should remain available for as long as they are useful but no longer. Ideally, documents and their associated communities should be managed in a seamless way while maintaining accessibility, consistency, and privacy. This is naturally a distributed systems problem since, with the emergence of large capacity portable storage devices, users will want to carry much of their daily information with them, e.g. on their terabyte key chains, and yet share select information freely within their communities, probably wirelessly. Challenges Solving the problem of community information management requires fundamental research in a number of areas that go beyond storage and communication including: - Identity: How are communities established and named? How do users refer to shared information? - Replication: Where does information reside? Is information replicated to facilitate sharing, to avoid centralized servers, to increase availability, to provide reliable backup, or all of the above? How are updates tracked and propagated between loosely connected personal and public storage devices? - Trust: Who is trusted to update the info? How is protection enforced? How does a community member recover if their trust is misplaced? - Versioning: How does one access old versions? Can updates be easily undone? - Searching: How is information retrieved and integrated from decentralized repositories of various types: public (e.g. the Web), private (e.g. one's desktop), and community storage. - Lifetime: How does one avoid seeing obsolete or irrelevant info? - Context: How does one's current situation (place, time, meetings, deadlines, companions, etc.) affect their view of and operations on shared info? - Usability: How can users readily form communities and create shared documents without being aware of the underlying distributed nature of the system or its inherent unreliability and vulnerabilities? Approach Expanding on the notion of personal information management (PIM) requires a fresh, clean-sheet, long-term look at information sharing within loosely structured, semitrustful communities. The goal should be to produce an architecture for community information sharing and demonstrate it through a prototype implementation. This research can be conducted within the context of the current Internet and wireless communication networks. It can build upon the Web, e-mail, and modern storage systems (such as the WinFS platform from Microsoft).