System call interface: exokernels

Required reading: Exokernel paper.

We are moving into a different mode of teaching. By studying UNIX v6 we are now on top of understanding operating systems as of roughly 1975. A lot of things have changed since then. For example, v6 had no support for networking; now we cannot imagine not having the Internet and the Web. In the remainder of the term, we study operating system concepts that were not in v6. We do so through studying papers from the OS research literature.

Overview

A central theme in operating system design is the kernel interface. The kernel interface is the API of the operating system, and therefore the choice of API determines the structure, features and limitations of an operating system. This and the next lecture will look at the system call interface in more detail.

The v6 design is typically called a monolithic design. The system call interface is the programming interface for application programmers. The programmer must live with the interface that T&R have defined. The interface provides the process, interprocess communication, file, tty, and user abstractions.

The interface and its implemenation are determined by the kernel implementors. Applications that need a different interface, or a different implementation of the interface, cannot run on v6. (Unless, of course, you convince T&R to change v6, which was the model at Bell Labs.)

What applications cannot run on v6, other than network applications? Another way of asking this question is "What is the intended applications of v6?". This question suprisingly hard to answer, because v6 didn't have a set of precise requirements. v6 and its successors are fixed points of what the UNIX developers needed to develop UNIX.

So, what are example of applications that don't run on v6? Databases, because they require transactions, which are impossible to implement on the UNIX file system, because of its weak reliability semantics. (One may be able to use the raw I/O interface by writing blocks directly to /dev/rk0, but every write is synchronous.) Multiple threads within a single address space are difficult to get right, because if one thread performs a system call the whole process is blocked. A process that has an image slightly larger than physical memory. Local servers are difficult to implement, but there is no way to communicate with an arbitrary process (other than through the file system), because pipe require a common anchestor.

If you want to write these kinds of applications, there is only one way out and that is to add system calls. This approach is taken by the modern versions of UNIX; FreeBSD 4.5 has 364 system calls. If an application programmer desires a different interface or different implementation, the programmer is stuck; he has to wait until the next release (assuming he could convince the kernel developers).

Recently monolithic operating systems added support for downloadable kernel modules, which allows a programmer to add its own code to a running kernel (assuming he has superuser privileges). Of course, if the programmer has created a bug in his code, then all other programs on the machine may suffer too. If the programmer wants his code to run on other machines, he has to convince the owners of those machines also to download his code in their kernel.

Although monolithic operating systems are the dominant operating system architecture for desktop and server machines, it is worthwhile to consider alternative architectures, even it is just to understand operating systems better. This lecture looks at exokernels; next one at microkernels.

Exokernels

The exokernel architecture takes an end-to-end approach to operating system design. In this design, the kernel just securely multiplexes physical resources; any programmer can decide what the operating system interface and its implementation are for his application. One would expect a couple of popular APIs (e.g., v6) that most applications will link against, but a programmer is always free to replace that API, partially or completely.

To get the exokernel model straight, it is helpful to define (or refresh) a couple of terms:

Kernel: the program that runs in kernel mode.
Library: user-level code against which application link.
Application: user-level code that runs in an address space.

The central challenge in an exokernel design it to provide flexibility, but provide fault isolation. This challenge breaks down into three problems:

tracking owner ship of resources;
ensuring fault isolation between applications;
revoking access to resources.

To understand these challenges and the solutions in more detail, is easiest to first look at a couple of examples:

User-level thread package that deals well with blocking system, page faults, etc. (Start with v6 and see where you run into problems.)
High-performance web server performing optimizations across module boundaries (e.g., file system and network stack). (Start with v6 and see where you run into problems.)

Exokernel paper discussion

How is physical memory multiplexed? Kernel tracks for each physical page who has it.
How is the processor multiplexed? Time slices.
How is the network multiplexed? Packet filters.
What is the plan for revoking resources?
- Expose information so that application can do the right thing.
- Ask applications politely to release resources of a given type.
- Ask applications with force to release resources
What is an environment? The processor environment: it stores sufficient information to deliver events to applications: exception context, interrupt context, protected entry context, and addressing context. This structure is processor specific.
How does on implement a minimal protected control transfer on the x86? Lab 4's approach (see exercise 3) has some short comings: what are they? (It is essentially a polling-based solution.) What is the better way? Set up a specific handler to be called when an environment wants to call this environment. How does this impact scheduling of environments? (i.e., give up time slice or not?)
How does one dispatch exceptions (e.g., page fault) to user space on the x86? Give each environment a separate exception stack in user space, and propagate exceptions on that stack. See exercise 5, lab 4.
How does on implement processes in user space? The thread part of a process is easy. The difficult part it to perform the copy of the address space efficiently; one would like to share memory between parent and child. This property can be achieved using copy-on-write. The child should, however, have its own exception stack; See exercise 6, lab 4.
What are the examples of extensibility in this paper? (RPC system in which server saves and restores registers, different page table, and stride scheduler.)