6.1810 2023 Lecture 12: Coordination (sleep&wakeup)

# plan
  review two points about xv6 thread switching
  sequence coordination
    sleep & wakeup
    lost wakeup problem
  termination

review #1: why hold p->lock across swtch()?
   this requirement affects many situations in xv6
   yield()                     scheduler()
   -------                     -----------
   acquire(&p->lock);
   p->state = RUNNABLE;
   swtch();
                               swtch();
                               release(&p->lock);

two reasons to hold p->lock across swtch():
  1. prevent another core's scheduler from seeing p->state == RUNNABLE
     until after the original core has stopped executing the thread
     and after the original core has stopped using the thread's stack
  2. prevent timer interrupt from calling yield() during swtch()
     (remember: acquire() turns off interrupts)
     2nd swtch() would overwrite already-saved registers in context

review #2: why does xv6 never hold a spinlock when yielding the CPU?
   (other than p->lock)
   on a single-core machine, imagine this:
     P1           P2
     acq(x)
     sched()
                  acq(y)
                  acq(x)
   P2 will wait forever:
     P2 will spin waiting for P1 to release x
     P2 holds y, so it must keep interrupts off
     No clock interrupts, so P2 won't yield() and P1 won't run
     So P1 won't ever release x
   possible even on multi-core, with more locks/threads
   solution: forbid holding lock when yielding the CPU!
     (other than p->lock)

# topic: sequence coordination
  threads often wait for specific conditions:
    a disk read is complete (signaled by an interrupt)
    a pipe has data to read (written by another process)
    a child has exited

# coordination is a fundamental building-block for thread programming.
  but subject to rules that can present puzzles.

# why not just have a while-loop that spins until condition is true?
  pipe read:
    while buffer is empty {
    }
  pipe write:
    put data in buffer

# better solution: coordination primitives that yield the CPU
  there are a bunch e.g. barriers, semaphores, event queues.
  xv6 uses sleep & wakeup
    used by many systems
    similar to "condition variables"

# example: uartwrite() and uartintr() in kernel/uart.c
  recall: the UART is the device hardware that connects to the console
  I've simplified my copy of uart.c to focus on sleep/wakeup story 
    see "code" link on schedule page
  the basic idea:
    the UART h/w accepts one byte of output at a time
      h/w takes a long time to send each byte, perhaps a millisecond
    processes writing the console must wait until UART sends prev char
    the UART h/w interrupts after it has sent each character
      writing thread should give up the CPU while waiting for interrupt
  write() calls uartwrite() with a buffer of output
    uartwrite() writes first byte (if it can)
    uartwrite() calls sleep() to wait for the UART's interrupt
    uartintr() calls wakeup()
  tx_busy == 0 is the "condition" uartwrite() is waiting for
    set by uartwrite(); cleared by uartintr()
  the "&tx_chan" argument serves to link the sleep and wakeup
    "wait channel"
    just a number, tells wakeup which sleeping threads to wake
  simple and flexible:
    sleep/wakeup don't need to understand what you're waiting for

# why the lock argument to sleep()?
  sadly you cannot design sleep() as cleanly as you might hope
  sleep() cannot simply be "wait for wakeup() to be called"
  the problem is called "lost wakeups"
  here's the story

# suppose just sleep(chan); how would we implement?
  here's a BROKEN sleep/wakeup
  [on board, not in vi]
  broken_sleep(chan)
    p->state = SLEEPING;
    p->chan = chan;
    sched();
    (now scheduler() won't run it -- not RUNNABLE)
  wakeup(chan)
    for each p:
      if p->state == SLEEPING && p->chan == chan:
        p->state = RUNNABLE
  wakeup wakes up all threads sleeping on chan
    may wake up more than one thread
  sleep and wakeup treat wait channel as just a 64-bit number
    they know nothing of the actual condition the thread is waiting for
    chan usually the address of a convenient shared object
  (I've omitted p->lock, which both need)

# how would uart code use this (broken) sleep/wakeup?
  int busy
  int chan
  uartwrite(buf):
    for each char c:
      while busy:
        broken_sleep(&chan)
      send c
      busy = 1
  uartintr():
    busy = 0
    wakeup(&chan)
  busy==0 is the condition we're waiting for
  &chan is the wait channel (a dummy variable)

# but what about locking?
  - driver's data structures e.g. busy
  - UART hardware 
  both uartwrite() and uartintr() need to lock
  should uartwrite() hold a lock for its whole lifetime?
    no: then uartintr() can't get lock and clear busy
  maybe uartwrite() could release the lock before sleep()?
  let's try it -- modify uart.c to call broken_sleep()
    release(&uart_tx_lock);
    broken_sleep(&tx_chan);
    acquire(&uart_tx_lock);
  make qemu ; cat README

what goes wrong when uartwrite() releases the lock before broken_sleep()?
  uartwrite() saw tx_busy = 1 (UART not done with previous byte)
  interrupt occurred after release(), before broken_sleep()
    tx_busy = 0
  then uartwrite() calls sleep()
  uartwrite() went to sleep EVEN THOUGH UART TX INTERRUPT WAS DONE
    and tx_busy is zero...
  now there is nothing to wake up uartwrite(), it will sleep forever
    really, until next UART interrupt, due to input

this is the "lost wakeup" problem.

we need to eliminate the window between uartwrite()'s check of
  the condition, and sleep() marking the process as asleep.
  we'll use locks to prevent wakeup() from running during the entire window.

we'll change the sleep() interface and the way it's used.
  sleep() will require that there be a "condition lock" that prevents
    the condition from changing, and require that the callers of
    both sleep() and wakeup() hold this lock
  sleep(chan, lock)
    caller must hold lock
    sleep releases lock, re-acquires before returning
  wakeup(chan)
    caller must hold lock

sleep does not know what the condition is, or how the lock relates to it
  BUT the lock must prevent changes to the condition being waited for!
  AND the lock must be passed to sleep()

(repair uart.c)

let's look at wakeup(chan) in proc.c
  it scans the process table, looking for SLEEPING and chan
  it grabs each p->lock
  remember also that caller acquired condition lock before calling wakeup()
  so wakeup() holds BOTH the condition lock and each p->lock

Q: why is it enough to just set p->state = RUNNABLE?
   why does that cause the thread to (eventually) run?

let's look at sleep() in proc.c
  hmm: sleep *must* release the condition lock
    since we can't hold locks when calling swtch(), other than p->lock
    and waker will need it to modify the condition
  Q: how to prevent wakeup() from running after sleep() releases
     the condition lock?
  A: acquire p->lock *before* releasing condition lock
     so, throughout, sleep() holds either lk, p->lock, or both
  since wakeup() holds *both* locks, it's enough for sleep() to hold *either*
    to force wakeup() to spin rather than look at process calling sleep()
  now wakeup() can't proceed until after sleep()'s swtch() completes
    so wakeup() is guaranteed to see p->state==SLEEPING and p->chan==chan
  thus: no lost wakeups!
    wakeup() cannot sneak in between check of condition and sleep()

[time diagram, or narrate, lines for locks]
  uartwrite()         uartintr()
  -----------         ----------
  acq tx_lock         acq tx_lock
  check tx_busy       tx_busy = 0
  sleep()             wakeup
  acq p->lock         acq p->lock
  rel tx_lock         check SLEEPING / chan
  SLEEPING            rel p->lock
  swtch               rel tx_lock
  rel p->lock

note that uartwrite() wraps the sleep() in a loop
  i.e. re-checks the condition after sleep() returns, may sleep again
  two reasons:
    1. maybe multiple waiters, another thread might have set tx_busy
    2. kill() wakes up processes even when condition isn't true
  all uses of sleep are wrapped in a loop, so they re-check

# Another example: piperead() in pipe.c
  the condition is data waiting to be read (nread != nwrite)
  the condition lock is the pipe's pi->lock
  pipewrite() calls wakeup() at the end
  what is the race if piperead() used broken_sleep()?
    lost wakeups are not just about interrupts!
  note the the loop around sleep()
    multiple processes may be reading the same pipe
  why the wakeup() at the end of piperead()?

# the sleep/wakeup interface/rules are a little complex
  in particular, sleep() needs the condition lock
  but, sleep() doesn't need to understand the condition
  sleep/wakeup is pretty flexible, though low-level
  there are other schemes that are cleaner but perhaps less general-purpose
    e.g. the counting semaphore in today's reading
  all have to cope with lost wakeups, one way or another

*************

# another coordination challenge -- how to shut down a thread?

# xv6 has two ways to get rid of processes: exit() and kill()

# exit() puzzle: a thread cannot free all of its own resources
  e.g. its own stack, which it is still using
  e.g. its struct context, which it may need to call swtch()

# exit():
  where we want to end up:
    p->state = UNUSED, so a future fork() can use this proc[] slot
    and all resources free
  the strategy:
    exit() frees some things, but not struct proc or kernel stack
    parent's wait() does the final frees

# exit() in proc.c:
    some cleanup
    wake up wait()ing parent
    p->state = ZOMBIE
      can't be UNUSED yet!
        still using kernel stack and proc[] entry
      ZOMBIE means ready for parent's wait()
        but fork() won't allocate it (not UNUSED)
        and scheduler won't run it (not RUNNABLE)
    swtch() to scheduler

# wait() in proc.c (parent will eventually call):
  scans proc[] table for any child with p->state==ZOMBIE
  calls freeproc()
    (p->lock held...)
    trapframe, pagetable, ..., p->state=UNUSED
  could child still be executing, just after exit() set p->state to ZOMBIE?
    no: exit() held p->lock during that time
    so it's safe to free child's stack and struct proc
  thus: wait() is not just for app convenience, but for O/S as well

#  what avoids a lost exit()->wait() wakeup?
   if child exit()s between parents check for ZOMBIEs, and parent sleep()
   what's the condition?
     some child is a ZOMBIE
   what's the condition lock?
   can't be child's p->lock, since parent wait() doesn't know
     which child it is waiting for!
   so: wait_lock is the condition lock
     it's a global variable, not per-process
   wait():
     holds wait_lock while scanning proc[] for a ZOMBIE child
     wait_lock is the condition lock wait() passes to sleep()
   exit():
     holds wait_lock around p->state=ZOMBIE and wakeup(p->parent)
   so: child exit() must wait for parent to finish ZOMBIE check and sleep
   so: parent can't miss wakeup between checking for ZOMBIE and sleeping
   a better design might have a separate wait_lock per parent

# kill() puzzle: thread X cannot just destroy thread Y
  what if Y is executing on another core?
    can't just free its kernel stack or user memory!
  what if Y holds locks?
  what if Y is in the middle of a complex update to important data structures?
    must let it finish!
  so: kill() can't directly destroy the target.

# kill() solution: 
  kill() sets p->killed flag, nothing else
  the target process itself checks for p->killed
    and calls exit() itself
  look for "if(p->killed) exit(-1);" in usertrap()
    not in the middle of anything here, and no locks are held
    so it's safe to exit()

# what if kill() target is sleep()ing?
  in that case it doesn't hold locks, and isn't executing!
  is it OK for kill() to destroy the target right away?
  no: what if waiting for disk midway through file creation?

# xv6 solution to kill() of sleep()ing process
  see kill() in proc.c
    changes SLEEPING to RUNNABLE -- like wakeup()
    so sleep() will return, probably before condition is true
  some sleep loops check for p->killed
    e.g. piperead(), consoleread()
    otherwise read could hang indefinitely for a killed process
  some sleep loops don't check p->killed
    e.g. virtio_disk.c
    can't just quit since maybe midway through a file system update
    not so bad since disk reads are pretty quick
  so a kill()ed process may continue for a while
    but usertrap() will exit() after the system call finishes

# xv6 spec for kill
  if target is in user space
    will die next time it makes a system call or takes a timer interrupt
  if target is in the kernel
    target will never execute another user instruction
    but may spend a while yet in the kernel

# Next week:
  File system
  Midterm! during class, in this room, look at previous midterms.