Grand Challenge: A Distributed System That Automates The Planning Configuration, Control, And Operation Of Computer Networks Douglas Comer Purdue University and Cisco Systems Abstract As networks increase in size and complexity, the problems of network planning, configuration, monitoring, troubleshooting, and security enforcement have become critical. Unfortunately, network management remains the least understood aspect of networking. Current tools focus on the management of individual network devices and services or on specific problems (e.g., detecting misconfigured routing or the analysis of flow data) without treating a network as a single, large system. Consequently, human managers must configure individual network entities in such a way that the resulting network achieves desired behavior. Because the consequences of individual decisions are difficult to understand, minor changes in one entity can have major, unintended impact on other parts of the network; the complexity means that human managers cannot resolve all problems. Indeed, customers of network equipment vendors assert that the complexity of managing their networks is limiting network size. Automated network management must be considered a crucial aspect of any future Internet. New Abstractions Network management is analogous to operating systems in the 1960s: vendors have built piecemeal solutions and found various ways to automate individual tasks, but no consensus exists about underlying abstractions. Furthermore, because future management will be automated, a management infrastructure must supply abstractions that allow programmers to create new management software instead of relying on the human interfaces found in most existing commercial systems. The challenge lies in finding a set of abstractions for management analogous to the process, file, and device abstractions invented for operating systems. Why A Distributed System? The largest networks comprise multiple sites with many human managers and multiple instances of management application software. A management system must provide coordination among sites to enforce global policy, while allowing each site to have autonomy that permits operating during a period of disconnection or making local changes that do not affect global policies; such functionality requires a distributed system. Research Problems Include * Find a small, elegant set of fundamental abstractions for network management that are sufficient for known management tasks, are extensible for new tasks, and accommodate management applications. * Devise a language that can be used to express management policies. * Create a distributed systems architecture for network management that can scale to arbitrary size; also consider issues of fault-tolerance, persistence, and temporary disconnection. * Formulate a role-based access control mechanism and associated roles sufficient to coordinate multiple managers and management applications. * Investigate the interface between a management system and underlying network elements to find a vendor-independent API that network elements can use to accommodate the new management architecture. * Define protocols for communication among components of the management system and network elements that are efficient and secure. Resources Required Estimating 20 PIs and graduate students over 5 years plus distributed lab facilities for experimentation, the cost will be $15M to $20M.