Frequently Asked Questions about micro-kernels and L4

Q: How do people use micro-kernels today?

A: Microkernels are sometimes used in embedded situations, e.g.
controlling the radio hardware in a cell phone, or moving packets
around in a network router. You can read about how L4 is used here:

  https://en.wikipedia.org/wiki/L4_microkernel_family

Q: Why aren't micro-kernels more commonly used?

A: Why some promising technologies become popular and others don't is
not always clear! But here are some guesses

One reason is that traditional operating systems are mature and have
lots of useful software available for them: editors, window systems,
databases, network protocols, web servers, image-processing libraries,
&c. Much of this software would require significant effort to port to
a micro-kernel. Another reason is that the advantages of micro-kernels
may not be compelling enough for some people to motivate them to
switch, and go to the expense of moving to a new environment. Another
reason is that traditional operating systems adopted many of the good
ideas from micro-kernels, such as IPC and flexible virtual memory.

The fact that many micro-kernel-based systems provided UNIX
compatibility by running a complete UNIX kernel, as in the L4+Linux
paper, was probably viewed by some as undermining much of the original
justification for micro-kernels.

Q: Why did people pursue micro-kernel projects like L4?

A: L4 is the result of a research project that started in the early
1990s, one of many micro-kernel projects at that time. The people who
worked on these projects felt that traditional kernels (e.g. Linux)
were too complex, and hoped to design much simpler kernels. The
general approach was to try to eliminate all but the absolutely
necessary functions from the kernel, moving code that didn't need to
be there into user-space server processes (e.g. for device drivers,
file systems, network protocols, &c), which would interact via IPC
messages. The hoped-for benefits:

  * fewer kernel bugs, since the kernel would be simpler.
  * better security, because it might be easier to reason about
    the security of a smaller kernel, and because bugs in
    user-level services might be less damaging than bugs in
    the kernel.
  * better performance, because it might be possible to optimize
    a simple kernel more effectively than a complex kernel.
  * easier to modify and extend the operating system, since
    perhaps one could replace or modify user-level services more easily
    than kernel code.
  * more robust in the face of software failures: since most of
    the operating system would exist in user-level services,
    if one failed, it could be re-started without rebooting
    the whole computer.
  * a more elegant design.

Q: Why does the paper focus on IPC performance?

A: This paper was part of a larger debate in the academic computer
science world about how kernels should be designed. Initial work on
micro-kernels seemed compelling due to their simplicity; then people
realized that the initial designs were slow due to lots of IPC, and
started to think microkernels were a bad idea; then the L4 authors
made IPC a lot faster and published this paper (among others) to try
to persuade people not to ignore micro-kernels on account of low
performance.

Q: Would a micro-kernel based system likely be more secure than
one based on a monolithic kernel?

A: Perhaps. It's true that typical monolithic kernels contain large
amounts of code that don't fundamentally need to execute with full
hardware permissions, and thus don't really need to be in the kernel.
Moving such code to user space might reduce potential security and
reliability problems if the code has bugs (as, inevitably, it often
does). There's been a fair amount of research into moving
functionality out of the kernel, e.g. into user-level device drivers.
Here's an example based on a microkernel:

  https://www.minix3.org/docs/jorrit-herder/osr-jul06.pdf

This approach seems promising for device drivers, file systems, and
network protocols, since they tend to be self-contained and have
well-defined interfaces. But it has proved more difficult for other
parts of the kernel.

A limitation here is that many bugs are equally a problem whether they
occur in user space or in the kernel. If your disk driver has a bug that
causes it to sometimes read or write the wrong block on the disk, moving
the driver to user space is not likely to help. Or if your file system
sometimes doesn't enforce file permissions correctly, that's a security
problem that's equally threatening whether the file system code is in
the kernel or in user space. If your TCP implementation has a buffer
overflow bug that allows an attacker to inject and execute their own
code remotely, then the attacker may be able to read or modify your
network traffic even if the TCP code runs in user space.

Q: Is it easier to develop operating system code for a micro-kernel
than for a monolithic kernel?

A: It seems easier to implement things in a monolithic kernel than in
a micro-kernel-based system in which the functionality is split among
many user-level servers. Different modules in a monolithic kernel can
easily look at each others' data and call each others' functions. Such
interaction is more painful when the modules are in different user
processes interacting with IPC.

As a result, for example, even though today's reading involves a
microkernel, most of the paper's O/S functionality is in a single
monolithic user-level server, the Linux server.

Q: What is the small-address-space optimization?

A: L4 uses the following clever trick to avoid having to switch page
tables and thus flush the TLB when it is context-switching from one
task to another (or switching between L4 kernel and a user-level
task). Instead of giving each task a separate page table, L4 has just
one page table, and maps each task to a different range of virtual
addresses in that page table. Thus all the tasks (and the L4 kernel)
are in the page table at the same time.

This only works on Intel x86 processors, which had (in addition to page
tables) a feature called "segments" which allow the kernel to change the
base offset in virtual address space that address zero refers to, and
also to restrict the maximum virtual address that a user process can
refer to. So when switch to a task, L4 would adjust the segment
registers to cause address zero to refer to that task's virtual address
range in the single page table.

This trick only works if processes are smallish (e.g. <= hundreds
of megabytes) because the authors were using a 32-bit computer with
32-bit virtual addresses. 32 bits gets you only 4 gigabytes of virtual
address space, so the trick relies on individual tasks being much less
than 4 gigabytes, so that they can all fit into 4 gigabytes.

Q: What's a tagged TLB?

A: A non-tagged TLB is indexed by virtual address; when you switch
address spaces, which changes the meaning of virtual addresses, the
TLB content is no longer correct and must be deleted.

A tagged TLB associates an address space identifier (ASID) with each
entry, as well as a virtual address. The operating system allocates a
different ASID to each process. The result is that the TLB can keep
multiple mappings for any given virtual address, a separate mapping for
each process. So you don't have to flush the TLB when switching address
spaces.

Q: Why does L4+Linux perform better in the paper's benchmarks than the
architecturally similar Mach+Linux?

A: The paper does not explain. One possibility is that L4's
small-address-space optimization may be greatly decreasing
context-switch costs.

Q: What are micro and macro benchmarks?

A: People use "micro-benchmark" to refer to a benchmark that attempts
to isolate and measure the performance of a single simple operation,
for example the cost of entering or leaving the kernel. Such
benchmarks are nice because it's often easy to understand exactly what
they are measuring, and attribute the results to specific design and
implementation decisions.

However, real-world applications are complex mixes of lots of different
operations, so it is hard to predict real-world performance from
micro-benchmarks.

As a result, people also use "macro-benchmarks" consisting of entire
applications, e.g. compiling the Linux kernel. These are more
representative of the performance you'd really observe, but it's also
often hard to understand why macro-benchmarks get the performance they
do.