6.1810 2023 Lecture 11: Thread switching

Topic: more "under the hood" with xv6
  Previously: system calls, interrupts, page tables, locks
  Today: process/thread switching

Why support concurrent tasks?
  time-sharing: many users and/or many running programs.
  program structure: prime number sieve.
  parallel programming: exploit multi-core hardware.

xv6 involves two related forms of concurrency:
  * processes
  * threads inside the kernel

what's a thread?
  the state required for an independent serial execution
    registers, pc, stack
  so each thread, taken alone, executes in the ordinary way
  typically multiple threads share memory
    see each other's changes
    so they need locks when they interact
  usually more threads than CPUs
  threading system multiplexes CPUs among threads
    today's main topic

threads versus processes?
  in xv6, a process has a single user-level thread,
    and a single kernel thread to execute its system calls
  so there's sharing of data among xv6 kernel threads, 
    but not at user level
  thus the locks in the xv6 kernel

can one have multiple threads in one process at user level?
  not in xv6
  but yes in most operating systems e.g. linux
    to support parallel programming at user level

there are techniques other than threads for interleaving multiple tasks
  look up event-driven programming, or state machines
  threads are not the most efficient, but they are convenient

a thread might be executing, or not
  if executing
    the thread is using the CPU and registers
    and it has a stack and memory
  if not executing ("suspended", "blocked", "waiting")
    registers must be saved somewhere in memory
    stack and memory need to be preserved
  when a thread is suspended, and then resumed
    must save state, and then restore it

xv6 process saved state
  [diagram]
  user memory
  user stack (C function call chain &c)
  user saved registers (in trapframe (TF))
  kernel stack
  kernel saved registers ("context" (CTX))
  kernel p->state

"kernel thread" is a process's thread in the kernel
  each process has its own kernel thread

if an xv6 process is not executing
  it's typically waiting for some event -- console, pipe, disk, &c
    because it made a system call into the kernel, e.g. read()
  may be waiting midway through complex kernel system call implementation
    needs to resume in kernel system call code where it left off
    thus the saved kernel registers (and stack)
    as well as saved user registers

it may look like each xv6 process has two independent threads
  one user, one kernel
  but really there's just one
    if executing in kernel, user thread is waiting for trap return
    if executing in user space, kernel thread is empty
  I'll use "process" and "thread" interchangeably

p->state values
  RUNNING -- executing on a CPU right now (either in user or kernel)
  RUNNABLE -- wants to execute, but isn't (suspended, in kernel)
  SLEEPING -- waiting for an event (suspended, in kernel read(), wait(), &c)
  p->state tells the scheduler how to handle the process
  guards against mistakes like executing on two CPUs at same time

how xv6 switches from one process to another -- "context switch"
  e.g. P1 calls read() and waits, P2 is RUNNABLE
  [P1, TF1, KSTACK1, CTX1, swtch(),
   CTXs, STACKs, swtch(),
   CTX2, KSTACK2, TF2, P2]
  TF = trapframe = saved user registers
  CTX = context = saved kernel registers
  getting from one user process to another involves multiple transitions:
    user -> kernel; saves user registers in trapframe
    kernel thread -> scheduler thread; saves kernel thread registers in context
    scheduler thread -> kernel thread; restores kernel thread registers from ctx
    kernel -> user; restores user registers from trapframe

struct proc in proc.h
  p->trapframe holds saved user thread's registers
  p->context holds saved kernel thread's registers
  p->kstack points to the thread's kernel stack
  p->state is RUNNING, RUNNABLE, SLEEPING, &c
  p->lock protects p->state (and other things...)

why a separate scheduler thread?
  so that there's always a stack on which to run scheduler loop
  e.g. switching away from an exiting process
  e.g. fewer processes than CPUs

scheduler thread details
  one per CPU; each has a stack and a struct context
  a kernel thread always switches to this CPU's scheduler thread
    which switches to another kernel thread, if one is RUNNABLE
    there are never direct process to process switches
  each CPU's scheduler thread keeps scanning the process table until
    it finds a RUNNABLE thread
  if there is no RUNNABLE thread, the scheduler is "idle"

what if user code computes without ever making a system call?
  will that stop other processes from running?
  CPU's timer hardware forces periodic interrupt
    xv6 timer handler switches ("yields")
  "pre-emption" or "involuntary context switch"

# Code

pre-emptive switch demonstration
  vi user/spin.c
  two CPU-bound processes
  my qemu has only one CPU
  we'll watch xv6 switch between them

make qemu-gdb
gdb
(gdb) c
show user/spin.c
spin

you can see that they alternate execution.
xv6 is switching its one CPU between the two processes.
  driven by timer interrupts
  we can see how often the timer ticks
how does the switching work?

I'm going to put a break-point at the timer interrupt.
(gdb) b trap.c:81
(gdb) c
(gdb) tui enable
(gdb) where

we're in usertrap(), handling a device interrupt from the timer,
  that occurred in user space.
(timerinit() in kernel/start.c configures the RISC-V timer hardware).

what was running when the timer interrupt happened?

(gdb) p p->name
(gdb) p p->pid
(gdb) p/x p->trapframe->epc

let's look for the saved epc in user/spin.asm
timer interrupted user code in the increment loop, no surprise

(gdb) step into yield() in proc.c
(gdb) next
(gdb) p p->state

change p->state from RUNNING to RUNNABLE -> give up CPU but want to run again.
note: yield() acquires p->lock
  to prevent another CPU from running this RUNNABLE thread!
  we're still using its kernel stack
  and we have not yet saved its kernel registers in its context

(gdb) next ...
(gdb) step into sched()

sched() makes some sanity checks, then calls swtch()

(gdb) next (until about to call swtch())

about to context switch to this CPU's scheduler thread
  swtch will save the current RISC-V registers in first argument (p->context)
  and restore previously-saved registers from 2nd argument (c->context)
  c->context are saved registers from this CPU's scheduler thread
  c == cpus[0], but c has been optimized away...
let's see what register values swtch() will restore

(gdb) p/x cpus[0]->context

where is cpus[0]->context.ra?
  i.e. where will swtch() return to?
  (gdb) x/i cpus[0]->context.ra
  swtch() will return to the scheduler() function in proc.c

(gdb) tbreak swtch
(gdb) c
(gdb) tui disable

we're in kernel/swtch.S
swtch() saves current registers in xx(a0)
  a0 is the first argument, p->context
swtch() then restores registers from xx(a1)
  a1 is the second argument, c->context
then swtch returns

Q: swtch() neither saves nor restores $pc (program counter)!
   so how does it know where to start executing in the target thread?

Q: why does swtch() save only 14 registers (ra, sp, s0..s11)?
   the RISC-V has 32 registers -- what about the other 18?
     zero, gp, tp
     t0-t6, a0-a7
   note we're talking about kernel thread registers
     all 32 user register have already been saved in the trapframe

registers at start of swtch:
(gdb) where
(gdb) p $pc  -- swtch
(gdb) p $ra  -- sched
(gdb) p $sp

registers at end of swtch:
(gdb) stepi 28   -- until ret
(gdb) p $pc  -- swtch
(gdb) p $ra  -- scheduler
(gdb) p $sp  -- stack0+??? -- entry.S set this up at boot
(gdb) where
(gdb) stepi
(gdb) tui enable

we're in scheduler() now, in the "scheduler thread",
  on the scheduler's stack

scheduler() just returned from a call to swtch()
  it made that call a while ago, to switch to our process's kernel thread
  that previous swtch() saved scheduler()'s registers
  our processes's call to swtch() restored scheduler()'s saved registers
  p here refers to the interrupted process

(gdb) p p->name
(gdb) p p->pid
(gdb) p p->state

remember yield() acquired the process's lock
  now scheduler releases it
  the scheduler() code *looks* like an ordinary acquire/release pair
    but in fact scheduler acquires, yield releases
    then yield acquires, scheduler releases
  unusual: the lock is released by a different thread than acquired it!

Q: why hold p->lock across swtch()?
   yield() acquires
   scheduler() releases
   could sched() release p->lock just before calling swtch()?

p->lock makes these steps atomic:
  * p->state=RUNNABLE
  * save registers in p->context
  * stop using p's kernel stack
  so other CPU's scheduler won't start running p until all steps complete

scheduler()'s loop looks at all processes, finds one that's RUNNABLE
  keeps looping until it finds something -- may be idle for a while
  in this demo, will find the other spin process

Q: does scheduler() give proc[0] an unfair advantage?
   since its for loop always starts by looking at proc[0]?

let's fast-forward to when scheduler() finds a RUNNABLE process

(gdb) tbreak proc.c:463
(gdb) c

scheduler() locked the new process, sets state to RUNNING
  now another CPUs' scheduler won't run it

p is the other "spin" process:

(gdb) p p->name
(gdb) p p->pid
(gdb) p p->state

let's see where the new thread will start executing after swtch()
  by looking at $ra (return address) in its context

(gdb) p/x p->context.ra
(gdb) x/i p->context.ra

new thread will return into sched()

look at kernel/swtch.S (again)

(gdb) tbreak swtch
(gdb) c
(gdb) stepi 28 -- now just about to execute swtch()'s ret
(gdb) p $ra
(gdb) where

now we're in a timer interrupt in the *other* spin process
  in the past it was interrupted, called yield() / sched() / swtch()
  but now it is resuming, and will return to user space

Q: what is the "scheduling policy"? 
   i.e. how does xv6 decide what to run next if multiple threads are RUNNABLE?
   is it a good policy?

Q: is there pre-emptive scheduling of kernel threads?
   as distinct from user threads, which we know can be pre-empted.
   yes -- timer interrupt and yield() can occur while in kernel.
   yield() called by kerneltrap() in kernel/trap.c
   where to save registers of interrupted kernel code?
     not in p->trapframe, since already has user registers.
     not in p->context, since we're about to call yield() and swtch()
     kernelvec.S pushes them on the kernel stack (since already in kernel).
   is pre-emption in the kernel neccessary?
     maybe not.
     valuable if some system calls have lots of compute.
     or if we need a strict notion of thread priority.

Q: why does scheduler() enable interrupts, with intr_on()?
   There may be no RUNNABLE threads
     They may all be waiting for I/O, e.g. disk or console
   Enable interrupts so device has a chance to signal completion
     and thus wake up a thread
   Otherwise, system will freeze

Q: why does sched() comment say only p->lock can be held?
   suppose process P1, holding lock L1, yields CPU
   process P2 runs, calls acquire(L1)
   P2's acquire spins with interrupts turned off
     so timer interrupts won't occur
     so P2 won't yield the CPU
     so P1 can't execute
     so P1 won't release L1, ever

Q: can we get rid of the separate per-cpu scheduler thread?
   could sched() directly swtch() to a new thread?
   that would be faster -- avoids one of the swtch() calls
   we'd move scheduler()'s loop into sched()
   maybe -- but:
     scheduling loop would run in sched() on a thread's kernel stack
     what if that thread is exiting?
     what if another cpu wants to run the thread?
     what if there are fewer threads than CPUs -- i.e. too few stacks?
     can be dealt with -- give it a try!

Summary
  xv6 provides a convenient thread model for kernel code
  pre-emptive via timer interrupts
  transparent via switching registers and stack
  multi-core requires careful handling of stacks, locks
  next lecture: mechanisms for threads to wait for each other