Critical Enterprise 2020: A Research Agenda for Distributed IT Infrastructures Karsten Schwan CERCS Research Center/College of Computing Georgia Institute of Technology Today's enterprise systems and applications implement functionality that is critical to the ability of society to function. An example are the operational infrastructures used by large corporations like our nation's airlines, where 24/7 uptime requirements are coupled with the need to respond to surges in demand, the need for live updates or upgrades of the distributed programs and systems, and the need to protect data from disasters. The complexities of applications, technologies, and systems inherent in such large distributed systems are leading us to propose the `Critical Enterprise 2020' Grand Challenge. `Critical Enterprise 2020' will methodically address criticality with the full set of technologies Computer Science can bring to bear, ranging from topics in program verification and testing, to hardware-based methods for component protection or isolation, to research in operating systems that seeks to better protect or manage critical system assets, to middleware that combines the best of today's flexibility or agility inherent in enterprise software infrastructures with the lessons learned from years of developments in high performance and grid computing, to trust models and mechanisms applied to information management to focus systems on critical vs. non-critical actions and data. Academic research should explore topics not open to industry, which is forced to be standards-compliant or compatible with existing hardware or applications: (1) '2020' should focus on `vertical' transparency rather than horizontal layering. That is, future critical distributed systems should not only permit but encourage the integrated and application-specific use of hardware, operating systems, and middleware. Sample functionality includes the dynamic extension of network infrastructures with enterprise knowledge (e.g., business rules for data routing and distribution) and the ability of middleware to be `aware' of underlying resources, their current abilities and constraints. The desired outcome is the application-specific management of system resources. (2) '2020' also indicates that humans will not be able to cope with the innate complexities of future distributed systems. Self-management, self-explanation, and similar system properties are therefore key elements of future solution approaches. (3) `Criticality' provides the metrics by which research should be evaluated, focusing on the ability to meet changeable end user constraints instead of merely improving system-level parameters. To summarize, the challenge is to make methodical improvements across the entire `vertical' stack of hardware to applications. Toward these ends, research should be complemented by community efforts that create an open source ecosystem future researchers can use to study realistic, complex applications and systems. This `Critical Enterprise Facility' (CEF) could leverage proven technologies like EmuLab and PlanetLab for providing external access and use, but its additional focus should be on applications that must operate with strong quality of service or end user utility requirements, running on systems representative of future enterprise-scale hardware. Another unique characteristic of CEF should be its end-to-end nature, reaching all the way from server systems to the mobile end devices that will be routine in future complex IT infrastructures.