6.S081/6.828 2019 Lecture 10: Coordination (sleep&wakeup)

# plan
  finish context switch
  sequence coordination
    sleep & wakeup
    lost wakeup problem
  termination

*********

let's pick up where we left off in thread context switch.
two "spin" processes, but only one CPU
  pre-emptive switching via timer interrupts
diagram:
  kernel thread per process, to execute system calls and interrupts
    trapframe; stack; context
  scheduler thread (stack and context)
the kernel is itself a parallel program, with threads,
  shared memory, and locks
on each CPU, exactly one thread is running
  either that CPU's scheduler, or a process (kernel or user)
process-to-process switch occurs via intermediate scheduler thread
  why?
  a yielding thread needs to
    p->state = RUNNABLE
    swtch() -- and stop using the stack
  neither simple order works!
    can't swtch first
    another CPU may start running while stack is still in use
  so yielding thread acquires p->lock, sets p->state = RUNNABLE, swtch()
  scheduler releases p->lock 
  other CPUs must lock first, so they can't see RUNNABLE
    until after yielding thread is done with swtch()

let's fast-forward into the scheduler thread

make qemu-gdb
$ spin
control-B
(gdb) tb swtch
(gdb) c
(gdb) where
(gdb) stepi 28
(gdb) stepi

scheduler()'s loop looks at all processes, finds a RUNNABLE one
  will likely be the *other* spin process
locks the new process, sets state to RUNNING, calls swtch()
  one reason for lock is to prevent other CPUs' schedulers from running it

(gdb) tb swtch
(gdb) c

now we're swtch()ing from the scheduler() thread to the other "spin" process

look at kernel/swtch.S (again)

(gdb) stepi 28 -- now just about to execute swtch()'s ret
(gdb) print $ra
(gdb) where

now we're resuming in a timer interrupt in the *other* spin process
  at some point in the past, it yielded the CPU
  and called swtch()
  but now it will run again, and return to user space

on a single-CPU machine, all but one thread are in swtch()
  stack diagram
   Th1     Sch     Th2
   |        |      |
            |      |
           swtch   swtch
when the running thread calls swtch, a *different* swtch returns

Q: is there pre-emptive scheduling of kernel threads?
   yes -- timer interrupt and yield() can occur while in kernel.
   usertrap() calls intr_on() to enable interrupts in the kernel
   kerneltrap() calls yield(), i.e. timer causes pre-emption
   where to save registers of interrupted kernel code?
     kernelvec.S pushes them on the kernel stack (since already in kernel).
   is pre-emption in the kernel useful?
     valuable if some system calls have lots of compute.
     or if we need a strict notion of thread priority (this is hard).

Q: what if timer interrupt while in scheduler's swtch()?
   and then kerneltrap() calls yield()?
   1. may save registers in the wrong context!
      since cpus[i]->proc is set even though still running scheduler
   2. will deadlock!
      since p->lock is held during swtch(), but yield() needs it

So: xv6 always turns off interrupts while a thread holds any spinlock
  kernel/spinlock.c
  acquire() disables
  release() enables (if it's the outermost)
  important for most device drivers
    consoleread():
      acquire(cons.lock)
      look at cons.buf
      release(cons.lock)
    consoleintr():
      acquire(cons.lock)
      append to cons.buf
      release(cons.lock)
    what if console interrupt while consoleread() holds lock?
    deadlock!
  interrupts *can* still occur on other CPUs

***********

# new thread topic: sequence coordination
  threads need to wait for specific events or conditions:
    wait for disk read to complete (event is from an interrupt)
    wait for pipe writer to produce data (event is from a thread)
    wait for any child to exit

# why not just have a while-loop that spins until event happens?

# better solution: coordination primitives that yield the CPU
  xv6 uses sleep & wakeup
  there are others e.g. barriers, semaphores, event queues.

# here's a BROKEN sleep/wakeup
  sleep(chan)
    sleeps on a "channel", a number/address
      names the condition/event we are sleeping on
    p->state = SLEEPING;
    p->chan = chan;
    sched();
  wakeup(chan)
    wakeup wakes up all threads sleeping on chan
    may wake up more than one thread
    for each p:
      if p->state == SLEEPING && p->chan == chan:
        p->state = RUNNABLE
  flexible:
    sleep/wakeup don't need to know what you're waiting for
    no need to allocate explicit coordination objects

# how would we use this (broken) sleep/wakeup?
  disk_read():
    buf->done = false
    start(buf, block)
    if(buf->done == false)
      sleep(buf)
  disk_intr():
    buf->done = true
    wakeup(buf)
  buf is the sleep channel
  buf->done==true is the "condition" we're waiting for.
  (this is a classic top half / bottom half driver)

# but what about locking?
  - driver's data structures e.g. buf.done
  - disk hardware
  both disk_read() and disk_intr() need to hold a lock
  should disk_read() hold a lock for the whole sequence?
    no: then disk_interrupt() can't get lock and set done
  maybe disk_read could release the lock before sleep()?
  let's try it!
    virtio_disk.c
    stressfs

what went wrong when disk_read released the lock before sleep()?
  disk_read saw that request wasn't yet done
  interrupt occurred after release, before sleep()
  disk_read went to sleep EVEN THOUGH DISK WAS DONE
  now there is nothing to wake up disk_read, it will block forever

this is the "lost wakeup" problem.

we need to eliminate that window between disk_read's check of
  the condition, and sleep() marking the process as asleep.
  we'll use locks to prevent wakeup() from running during the entire window.

we need to change the sleep() interface and the way it's used.
  sleep(chan, lock)
    caller must hold lock
    sleep releases lock, re-acquires before returning
  wakeup(chan)
    caller must hold lock
  (edit virtio_disk.c)

// caller must hold lock!
sleep(chan, lock)
  acquire(p->lock)
  release(lock)
  p->chan = chan
  p->state = SLEEPING
  sched()
  release(p->lock)
  acquire(lock)

// caller must hold lock!
wakeup(chan)
  for p in procs:
    acquire(p->lock)
    if p->state == SLEEPING && p->chan = chan:
      p->state = RUNNABLE
    release(p->lock)

why does this work?
  sleeper holds condition lock (cl) OR p->lock OR both
    from before check of condition until after marked as sleeping
  waker holds BOTH condition lock and p->lock
  diagram:
    |----cl----|
             |--- p->lock ---|
                           |--------cl-------|
                               |- p->lock -|
  thus:
    either complete sleep() sequence runs, then wakeup(),
      and the sleeper will be woken up
    or wakeup() runs first, before potential sleeper checks condition,
      so sleeper will see that the condition is true
  no lost wakeup!

note that vdisk_read() wraps the sleep() in a loop
  i.e. re-checks the condition after sleep() returns, may sleep again
  multiple reasons:
    maybe multiple waiters, another thread might have consumed the event
    channels aren't guaranteed to be unique, they are just numbers
    we'll see that kill wakes up processes even when condition isn't true
  all uses of sleep are wrapped in a loop, so they re-check

# Another example: piperead()
  the condition is data waiting to be read (nread != nwrite)
  pipewrite() calls wakeup() at the end
  what is the race if piperead() used broken_sleep()?
  do we really need the loop around sleep()? (yes!)
  why the wakeup() at the end of piperead()?

# the sleep/wakeup interface/rules are a little complex
  there are other schemes that may be cleaner, faster, specialized, &c
  semaphores, condition variables, wait queues
  they all have to cope with lost wakeups, one way or another

*************

# another coordination challenge -- how to terminate a thread?
  a puzzle in many threading systems, since we may need to free resources that
    might still be in use
  two cases: exit() and kill()

# ordinary case: process voluntarily quits with exit() system call
  danger: after p->state = UNUSED, another CPU may allocate it!
    while it's still executing in exit(), on its stack
    this is the same problem as yield faces with p->state = RUNNABLE
    same solution:
      exit() acquires p->lock, sets p->state=UNUSED, calls swtch
      scheduler releases p->lock, after process stops executing
  danger: can't free p->kstack while still in use!
    solution:
      xv6 doesn't free kernels stack: permanently associated with proc[i]
    this would be a puzzle if we wanted to completely free processes
    could have some other thread do it, e.g. parent or scheduler

# how does kill(target_pid) work?
  problem: may not be safe to forcibly terminate a process
    it might be executing in the kernel
      using its kernel stack, page table, proc[] entry
    it might hold locks
      and must finish to restore invariants
      e.g. in the middle of fork()ing a new process
    so: kill() can't directly destroy the target!
  solution: 
    kill() sets p->killed flag, nothing else
    usertrap() checks p->killed before return, and exit()s
      no locks are held at that point
      so kill() doesn't disturb e.g. critical sections

# what if kill() target is sleep()ing?
  in that case it doesn't hold locks, and isn't executing!
  in this case can kill() destroy it right away?
  might be OK:
    sleeping for console input
  might not be OK:
    might be waiting for disk midway through file creation

# xv6 solution to kill() of sleep()ing process
  see kill() in proc.c
    changes SLEEPING to RUNNABLE -- like wakeup()
    so sleep() will return, probably before condition is true
  some sleep loops check for p->killed
    e.g. piperead(), consoleread()
    otherwise read could hang indefinitely for a killed process
  some sleep loops don't check p->killed
    e.g. virtio_disk.c
    OK not to check p->killed since disk reads are pretty quick
  so a kill()ed process may continue for a while
    but usertrap() will exit() after the system call finishes

# xv6 spec for kill
  if target is in user space
    will die next time it makes a system call or takes a timer interrupt
  if target is in the kernel
    target will never execute another user instruction
    but may spend quite a while yet in the kernel

# Summary
  sleep/wakeup let threads wait for specific events
  concurrency and interrupts mean we have to worry about lost wakeups
  termination is a pain in threading systems