INSPECTION-ENFORCED INTERNET SAFETY M. Satyanarayanan School of Computer Science Carnegie Mellon University Problem statement Technology is a double-edged sword. Along with the benefits of a new technology come new risks and new dangers to society. These affect direct beneficiaries as well as innocent by-standers. For a good part of the 20th century, society grappled with risk management of Industrial Revolution inventions: railways, automobiles, aircraft, elevators, chemical plants, and so on. Reducing risk without stifling growth or innovation is a tricky balancing act. Today, one indicator of first-world nation status is successful balancing of these tradeoffs. The Internet of today is an unregulated free-for-all. There is virtually no constraint or supervision on what you connect. In particular, you can connect a machine running any operating system and application suite, including ones with known vulnerabilities. Security warnings and patches are only advisory: there is no enforcement mechanism to ensure that they are heeded, or validation mechanisms to verify continued compliance. The consequences of this total lack of regulation are plain for all to see. The recent plague of worm and virus attacks are only a foretaste of the Internet-enabled mischief that we can anticipate in the future. The challenge is enforceable yet benign regulation of this space; in other words, to regulate without sacrificing the openness of the Internet, or the freedom and flexibility that we enjoy today. Inspection-enforced Safety It is instructive to examine how society copes with the risks of older technologies. Consider automobiles: most states require annual or semi-annual safety and emissions testing of every automobile, conducted by an authorized inspection station. Enforcement is simple: police officers check for inspection stickers. A driver can thus have reasonable assurance that his safety will not be compromised by defects in other cars sharing the road with him. In aviation, regular inspections of each aircraft are performed by FAA employees. For elevators, local or state government employees perform regular inspections. Notice that none of these mechanisms guarantees safety. A recently-inspected car, plane or elevator may still fail, and a forged inspection sticker may pass a cursory check. What these mechanisms provide is an easily-understood and workable framework within which society can specify policies that balance risk and exposure against cost and inconvenience. More frequent and more comprehensive inspections can improve safety, but at greater cost and inconvenience to society. A similar tradeoff exists in better enforcement via more frequent and thorough checks. Can such a framework be created for the Internet? In principle, the metaphor of periodic inspection of computing state by a trusted authority could be used to improve safety on the Internet. In practice, there are many challenges. First, the time scale at which software environments degrade is many orders of magnitude faster than the degradation of mechanical systems through wear and tear. Virus and worm attacks happen at the time scale of minutes and hours rather than weeks or months. To be effective, the frequency of inspections has to match this time scale. Second, physical transport of computing hardware to inspection sites (or inspectors to computing sites) is unworkable, especially if it has to be repeated every few hours. What is needed is a mechanism that logically transports computing state, without physical transport of hardware or people. The Opportunity Efficient hardware virtualization is the critical new building block that makes this approach feasible. VMware is a widely used proprietary VMM today, the open-source Xen VMM is growing in popularity, and Intel recently announced virtualization extensions to the x86 architecture. When combined with a distributed storage mechanism, these can provide the crucial elements of state capture and transport. To ensure integrity, state would need to be timestamped, encrypted and digitally signed at capture. It is now conceivable to think of a future in which EVERY computing device connected to the Internet MUST support complete state capture and the ability to transport that captured state to an inspection site. Combined with an appropriate legal and regulatory framework, these provide the rudiments of an inspection-enforced approach to Internet safety. Although virtualization can occur at many layers (such as Java VMs, for example), its use at the hardware level has a very important societal benefit. It avoids the need to re-educate users because it can be made completely transparent to mass-market personal computing software. Requiring users to give up their current knowledge and experience base, and having them learn new applications or a new computing environment is not viable. Such re-learning may come easily for sophisticated users but is not productive for society at large. For better or worse, the Windows family of operating systems and the Office suite of applications from Microsoft dominate personal computing today. Work at Carnegie Mellon (Internet Suspend/Resume) and Stanford (the Collective) confirm the feasibility of state capture and transport in this space. The Challenge The grand challenge is for the CS research community to create a compelling multi-institution deployment of inspection-enforced Internet safety. Its scale and realism should be sufficient to inspire legislative and regulatory follow-through. Deep and immersive use of this infrastructure is essential. Lab-scale proofs of concept, with occasional non-critical use or demos, will not be convincing enough. As they say at Microsoft, we need to "eat our own dog food." The creation, use and evolution of such open-source infrastructure will rejuvenate and revitalize the team-spirit and exciting collaborative software development efforts that existed across universities in the early days of the Arpanet. Open Problems To keep this document short, I have skipped discussion of the many research problems exposed by this vision. These are the sub-challenges of the above grand challenge. Here are just a few, to stimulate discussion: - Secure state capture: Without a trusted agent, it is impossible to be confident of the integrity of captured state. How should this agent be implemented? Some tamper-proof hardware (such as IBM's TPM) at the virtualization site appears essential. What OS support, if any, is needed? - Scalability: Unless one is clever, this approach will result in huge volumes of data being shipped to inspection sites. Substantial research is needed on optimizations to reduce this volume without compromising the rigor of the inspection process. - Frequency tuning and completeness: What is the right frequency of state capture? Should it be done at periodic intervals or in a randomized manner? How should inspection parameters be adjusted as observed threat levels change? Should snapshots be proactively transmitted to an inspection site, or should they be stored at the client and transmitted only on demand? Should inspection be exhaustive, or probabilistic (much as the IRS audits tax returns)? - Poor connectivity: How should the inspection procedure be modified when network connectivity is poor? Connectivity may be good enough for a compromised machine to spread viruses, but not good enough for frequent state inspection. How does one proceed here? Can the inspection site download code to the trusted local agent to perform the more frequent inspections? What are the tradeoffs here? - Usability: How does periodic VM snapshot generation impact usability? Can snapshots be structured so that they occur in the background, with low impact on user productivity? Can idle CPU cycles and unused physical memory be effectively utilized for this purpose? What are the changes, if any, that needed to be made to the operating system scheduler? What is the smallest CPU and memory footprint achievable for effective snapshotting? How do we deal with resource-poor embedded devices that are Internet-connected?