Coordination

Required reading: Fast mutual exclusion for uniprocessors

Overview

We revisit the topic of mutual-exclusion coordination: the techniques to protect variables that are shared among multiple threads. These techniques allow programmers to implement atomic sections so that one thread can safely update the shared variables without having to worry that another thread intervening.

We focus today on uniprocessors. Even on uniprocessor one has to provide mutual-exclusion coordination, because the scheduler might schedule another thread in response to a hardware interrrupt (e.g., end of a time slice or a page fault).

The technique for mutual-exclusion coordination used in v6 is disabling/enabling interrupts in the kernel, through spln() calls. In v6, the only program with multiple threads is the kernel, thus that is the only program that needs need mutual-exclusion mechanism. User-level processes don't share data directly.

As we discussed in last lecture, microkernels force server programs to deal with many similar issues as monolithic kernels, and they therefore require concurrent handling of events. This organizational requirement raises the question "How to handle multiple events concurrently at a user-level?"

  1. Use a thread package that doesn't preempt threads. An approach sometimes called cooperative threads. Thus, the scheduler always resumes the thread that was running before preemption unless the thread released the processor explicitly. A number of systems use this approach, but it is not without challenges either: If a thread calls printf in a section of code that must be executed atomically, how do i know that printf doesn't release the processor?
  2. Use a thread package that provides primitives to mark critical sections atomically (e.g., acquire and release lock). These primitives must be implemented as atomic instructions, which can be done with, for example, a hardware try-set-lock (TSL) instruction.
  3. Don't use a thread package; organize the application around events. In this approach, the programmer structures the program as a single loop that process events (e.g., network packet arrival, disk read completion). Each event is handled atomically.

Solutions 1 and 3 aren't good enough for multiprocessors.

To implement any three of these options for a user-level thread package, we need to avoid blocking system calls. The kernel must support asynchronous system calls or scheduler activations.

Can we use a thread package that use the v6 kernel approach? That is, can we allow a user-level thread to disable and reenable interrupts? (Answer: not, if we care about fault isolation.)

List and insert example:


struct List {
  int data;
  struct List *next;
};

List *list = 0;

insert(int data) {
  List *l = new List;
  l->data = data;
  l->next = list;  // A
  list = l;        // B
}

fn() {
  insert(100);
}

main() {
  thread_create(..., fn);
  thread_create(..., fn);
  thread_schedule();
}

What needs to be atomic? The two statements labeled A and B should always be executed together, as an indivisible fragment of code. If two threads execute A and B interleaved, then we end up with an incorrect list. To see that this is the case, draw out the list after the sequence A1 (statement executed A by thread 1), A2 (statement A executed by thread 2), B2, and B1).

How could this erroneous sequence happen? If thread 1 and 2 are running on a multiprocessors, the varilable list lives in physical memory shared among multiple processors, connected by a bus. The accesses to the shared memory will be ordered in some total order by the bus/memory system. If the programmer doesn't coordinate the execution of the statements A and B, any order can happen, including the erroneous one.

How else could it happen? Thread 1's time slice runs out after executing A, and the kernel switches to thread 2, which then executes A and B. Then thread 2 is done, and the kernel may schedule thread 1 again, which executes B. This sequence of events produces the erroneous order.

We need to extend the thread package so that the programmer can express that A and B should be executed as single atomic instruction. We generally use a concept like locks to mark an atomic region, acquiring the lock at the beginning of the section and releasing it at the end:

void acquire(int *lock) {
  while (TSL(lock) != 0) ;
}

void release (int *lock) {
  *lock = 0;
}

Acquire and release, of course, need to be atomic too, which can, for example, be done with a hardware atomic TSL (try-set-lock) instruction:

The semantics of TSL are:

   R <- [mem]   // load content of mem into register R
   [mem] <- 1   // store 1 in mem.

In a harware implementation, the bus arbiter guarantees that both the load and store are executed without any other load/stores coming in between.

We can use locks to implement an atomic insert, or we can use TSL directly:

int insert_lock = 0;

insert(int data) {

  /* acquire the lock: */
  while(TSL(&insert_lock) != 0)
    ;

  /* critical section: */
  List *l = new List;
  l->data = data;
  l->next = list;
  list = l;

  /* release the lock: */
  insert_lock = 0;
}

The paper

The paper addresses how to provide mutual-exclusion primitives that are completed implemented in software. The motivation is that some uniprocessors don't provide a hardware-level atomic instructions. It turns out, however, that the approach advocated in the paper is in general a good one, because software implementations are often more efficient than hardware implementations.

The paper describes several ways to implement try-set-lock in software:

Paper discussion

  • Should we use RAS in the lab? If so, where would it be useful? We have carefully constructed the kernel and library so far that you don't have to worry about coordination. (The best way of dealing with concurrency is to structure the system in way that you don't have to worry about it!) As soon environments start sharing variables that they can write, you have to, though. When you do, you could use RAS to implement a software TSL, or use the hardware TSL, provided by the x86.