6.1810 2024 Lecture 11: Thread switching Topic: more "under the hood" with xv6 Previously: system calls, interrupts, page tables, locks Today: process/thread switching Why support concurrent tasks? time-sharing: many users and/or many running programs. program structure: prime number sieve. parallel programming: exploit multi-core hardware. what's a thread? think about code that's executing in the ordinary serial way suppose we wanted to suspend it, put it to one side, do other things, and then later resume that code? what would we need to preserve? memory, stack, CPUs registers, CPUs program counter a "thread" alternates between executing code, and, when suspended, the saved state needed to resume it [simple two-thread alternation diagram] a threading system manages the switching among threads how does xv6 use threads? [simple diagram] each xv6 process has a single user-level thread, and a single kernel thread a process's kernel thread executes system call implementations and may have to suspend itself to wait for input &c or to let other processes' threads run xv6 kernel threads share memory -- they all execute in the kernel address space, they all use kernel data structures, thus they need to lock a process's user and kernel threads are not independent! if executing in kernel, user thread is waiting for trap return if executing in user space, kernel thread is empty I'll use "process" and "thread" interchangeably if an xv6 process is not executing it's often waiting for some event -- console, pipe, disk, &c because it made a system call into the kernel, e.g. read() may be waiting midway through complex kernel system call implementation needs to resume in kernel system call code where it left off thus its kernel registers (and stack) must be saved xv6 process/thread saved state user memory user saved registers (in trapframe (TF)) kernel stack kernel saved registers ("context" (CTX)) kernel p->state p->state tells the scheduler how to handle the process RUNNING -- executing on a CPU right now (either in user or kernel) RUNNABLE -- wants to execute, but isn't (suspended, in kernel) SLEEPING -- waiting for an event (suspended, in kernel read(), wait(), &c) helps avoid mistakes like executing on two CPUs at same time how xv6 switches from one process to another -- "context switch" e.g. P1 calls read() which waits, P2 is RUNNABLE [P1, TF1, KSTACK1, CTX1, swtch(), CTXs, STACKs, swtch(), CTX2, KSTACK2, TF2, P2] TF = trapframe = saved user registers CTX = context = saved kernel registers getting from one user process to another involves multiple transitions: user -> kernel; saves user registers in trapframe kernel thread -> scheduler thread; saves kernel thread registers in context scheduler thread -> kernel thread; restores kernel thread registers from ctx kernel -> user; restores user registers from trapframe struct proc in proc.h p->trapframe holds saved user thread's registers p->context holds saved kernel thread's registers p->kstack points to the thread's kernel stack p->state is RUNNING, RUNNABLE, SLEEPING, &c p->lock protects p->state (and other things...) proc[NPROC] array in proc.c -- one per process why a separate scheduler thread? so that there's always a stack on which to run scheduler loop e.g. switching away from an exiting process e.g. fewer processes than CPUs scheduler thread details one per CPU; each has a stack and a struct context a kernel thread always switches to this CPU's scheduler thread which switches to another kernel thread, if one is RUNNABLE there are never direct process to process switches each CPU's scheduler thread keeps scanning the proc[] process table until it finds a RUNNABLE thread if there is no RUNNABLE thread, the scheduler is "idle" what if user code computes without ever making a system call? will that stop other processes from running? CPU's timer hardware forces periodic interrupt xv6 timer handler switches ("yields") "pre-emption" or "involuntary context switch" # Code pre-emptive switch demonstration vi user/aaa.c, bbb.c two CPU-bound processes my qemu has only one CPU we'll watch xv6 switch between them make qemu-gdb gdb (gdb) c aaa & bbb you can see that they alternate execution. xv6 is switching its one CPU between the two processes. driven by timer interrupts we can see about how often the timer ticks how does the switching work? I'm going to put a break-point at the timer interrupt. (gdb) tb trap.c:81 (gdb) c (gdb) tui enable (gdb) where we're in usertrap(), handling a device interrupt from the timer, that occurred in user space. what was running when the timer interrupt happened? (gdb) p p->name (gdb) p/x p->trapframe->epc let's look for the saved epc in user/aaa.asm timer interrupted user code in the increment loop (gdb) step into yield() in proc.c (gdb) next (gdb) p p->state change p->state from RUNNING to RUNNABLE -> give up CPU but want to run again. note: yield() acquires p->lock to prevent another CPU from running this RUNNABLE thread! yield() is still using its kernel stack! and we have not yet saved its kernel registers in its context (gdb) next ... (gdb) step into sched() sched() makes some sanity checks, then calls swtch() (gdb) next (until about to call swtch()) vi kernel/swtch.S swtch() performs a context switch from one kernel thread to another I'll first run through quickly, then look at details (gdb) tb swtch (gdb) c (gdb) where caller is yield()/sched() (gdb) si 28 (gdb) where suddenly we are in main()/scheduler(), not yield()/sched()! (gdb) si it's as if we've just returned from a swtch() call in scheduler() even though it was sched() that called swtch() how did that happen? let's do this again, look at details. (gdb) tb sched (gdb) c (gdb) tb swtch (gdb) c (gdb) where vi kernel/swtch.S swtch saves registers into xx(a0) i.e. p->context.xx and then loads registers from xx(a1) i.e. c->context.xx i.e. cpus[0]->context.xx Q: what governs where swtch's ret returns to? Q: where did that value come from? first, let's look at current ra (gdb) x/i $ra swtch() (after saving current ra) will load a new ra from 0(a1) at line 25 0(a1) is cpus[0]->context.ra (gdb) x/i cpus[0]->context.ra where is that ra address? let's look in kernel/kernel.asm it's just after scheduler's call (jal) to swtch as if we were now returning from scheduler's previous swtch() to this process line 26 also loads a new sp -- stack pointer -- what is it? (gdb) x cpus[0]->context.sp loading the sp effectively switches stacks as well as the current position on the stack each thread has its own stack containing info about callers (arguments, variables, return addresses) (gdb) si 28 (gdb) x/i $ra (gdb) where now the thread switch to scheduler() is complete! saving/restoring registers is the heart of it (gdb) si swtch() saved p's registers so that later we can swtch() back to p (gdb) p/x p->context (gdb) x/i p->context.ra (gdb) p/x p->context.sp (gdb) p/x p->kstack so if scheduler() later swtch's to p, swtch() will return to sched(), on p's kernel stack, as if p's original call to swtch() returned Q: why doesn't swtch need to save/restore $pc? Q: why does swtch() save only 14 registers (ra, sp, s0..s11)? the RISC-V has 32 registers -- what about the other 18? zero, gp, tp t0-t6, a0-a7 ok, we're in scheduler() now, in the "scheduler thread", on the scheduler's stack [first half of this diagram] aaa bbb usertrap usertrap yield main yield sched scheduler sched swtch -> swtch swtch -> swtch scheduler() just returned from a call to swtch() it made that call a while ago, to switch to our process's kernel thread that previous swtch() saved scheduler()'s registers our processes's call to swtch() restored scheduler()'s saved registers p here refers to the interrupted process (gdb) p p->name (gdb) p p->state remember yield() acquired the process's lock now scheduler releases it the scheduler() code *looks* like an ordinary acquire/release pair but yield acquires, scheduler releases scheduler acquires, yield releases unusual: the lock is released by a different thread than acquired it! Q: why hold p->lock across swtch()? p->lock prevents interference during these steps: * p->state=RUNNABLE * save registers in p->context * stop using p's kernel stack so another CPU's scheduler won't start running p until all steps complete scheduler()'s loop looks at all processes, finds one that's RUNNABLE keeps looping until it finds something -- may be idle for a while in this demo, will find the other aaa/bbb process let's fast-forward to when scheduler() finds a RUNNABLE process, and is about to swtch() to it. (gdb) tb proc.c:463 (gdb) c scheduler() locked the new process, sets state to RUNNING RUNNING means another CPU's scheduler won't run it p is the other aaa/bbb process: (gdb) p p->name (gdb) p p->state let's see where the new thread will start executing after swtch() by looking at $ra (return address) in its context (gdb) x/i p->context.ra look in kernel/kernel.asm for that address... it's just after the process's earlier call to swtch() new thread will return into sched() [complete the diagram] Q: does scheduler() give proc[0] an unfair advantage? since its for loop starts at proc[0]? Q: what is the "scheduling policy"? i.e. how does xv6 decide what to run next if multiple threads are RUNNABLE? is it a good policy? look at kernel/swtch.S (again) (gdb) tb swtch (gdb) c (gdb) si 28 -- now just about to execute swtch()'s ret (gdb) x/i $ra (gdb) where now we're in a timer interrupt in the *other* aaa/bbb process in the past it was interrupted, called yield() / sched() / swtch() but now it is resuming, and will return to user space and swtch() set up cpus[0].context so this process can eventually swtch back to the scheduler. (gdb) x/i cpus[0].context.ra will a timer interrupt while in kernel cause a yield? i.e. can kernel code be pre-empted, as well as user? yes -- kerneltrap() calls yield() where are the process's saved registers? three sets of saved registers: * user registers in trapframe, via trampoline * interrupted kernel code registers on stack, via kernelvec * interrupt handler's registers in p->context, via swtch() is pre-emption in the kernel necessary? no, since kernel has no infinite CPU-bound loops. but valuable if some system calls have lots of compute. or if we need a strict notion of thread priority. why isn't code allowed to hold a spinlock across a context switch? other than p->lock P1: acquire(L1) yield() P2: acquire(L2) acquire(L1) P2 holds L2, so interrupts are off while P2 is spinning in acquire(L1) -> timer interrupts won't occur -> P2 won't yield the CPU -> P1 can't execute -> P1 won't release L1, ever can we get rid of the separate per-cpu scheduler thread? could sched() directly swtch() to a new thread? that would be faster -- avoids one of the swtch() calls we'd move scheduler()'s loop into sched() maybe -- but: scheduling loop would run in sched() on a thread's kernel stack what if that thread is exiting? what if there are fewer threads than CPUs -- i.e. too few stacks? so may be tricky why does scheduler() enable interrupts, with intr_on()? (demo: change intr_on() to intr_off(). note CPUS=1.) There may be no RUNNABLE threads They may all be waiting for I/O, e.g. disk or console Enable interrupts so device has a chance to signal completion and thus wake up a thread Otherwise, system will freeze are threads efficient? memory cost: a stack per thread CPU time cost: swtch() usually efficient enough -- and certainly convenient a problem if you have lots of them (memory), or switch often (cpu) there are techniques other than threads for interleaving tasks look up event-driven programming, or state machines threads are not the most efficient, but they are convenient can a user-space process have multiple user-space threads? xv6 does not support this but others do e.g. Linux, MacOS often a kernel thread per user thread Next week: sleep() and wakeup() -- waiting for events