6.1810 2025 Lecture 11: Device drivers, interrupts Topic: device drivers a CPU needs attached devices: storage, communication, display, &c OS device drivers control these devices device handling can be hard: devices often have rigid and complex interfaces devices and CPU run in parallel -- concurrency interrupts hardware wants attention now! e.g., pkt arrived software must set aside current work and respond on RISC-V use same trap mechanism as for syscalls and exceptions interrupts can arrive at awkward times most code in production kernels is device drivers you will write one for a network card Where are devices? [CPU, bus, RAM, disk, net, uart] Programming devices: memory-mapped I/O device hardware has some control and status registers device registers live at a physical "memory" address ld/st to these addresses read/write device control registers platform designer decides devices' addresses example device: UART Universal Asynchronous Receiver Transmitter serial interface, input and output "RS232 port", e.g. qemu console a uart is hardware -- transistors qemu emulates the common 16550 uart chip data sheet: 16550.pdf link on schedule page, or web search data sheet details physical, electrical, and programming [rx wire, receive shift register, receive FIFO] [transmit FIFO, transmit shift register, tx wire] 16-byte FIFOs memory-mapped 8-bit registers at physical address UART0=0x10000000: (page 9 of 16550.pdf) 0: RHR / THR -- receive/transmit holding register 1: IER -- interrupt enable register, 0x1 is receive enable, 0x2 transmit ... 5: LSR -- line status register, 0x1 is receive data ready how does a kernel device driver use these registers? simple example: uartgetc() in kernel/uart.c ReadReg(RHR) turns into *(char*)(0x10000000 + 0) why does the UART have FIFO buffers? device driver must cope with times when device is not ready read() but rx FIFO is empty write() but tx FIFO is full LSR bits: Data Ready, Transmitter Empty how should device drivers wait? perhaps a "busy loop": while((LSR & 1) == 0) ; return RHR OK if waiting is unlikely -- if input is sure to arrive soon but too wasteful for the console! often no input (keystrokes) are waiting in FIFO many devices are like this -- may need to wait a long time for I/O a solution: interrupts when device needs driver attention, device raises an interrupt UART interrupts if: rx FIFO goes from empty to not-empty, or tx FIFO goes from full to not-full how does kernel see interrupts? [add PLIC to diagram, connected to address bus and devices] device -> PLIC -> CPU -> trap -> usertrap()/kerneltrap() -> devintr() PLIC chooses src device and dst core, since more than one of both trap.c devintr() scause high bit indicates the trap is from a device interrupt a PLIC register indicates which device interrupted the "IRQ" -- UART's IRQ is 10 IRQs are defined by the platform -- qemu in this case an interrupt is usually just a hint that device state might have changed the real truth is in the device's status registers device driver must read them to decide action, if any for UART, check LSR to see if rx FIFO non-empty, tx FIFO non-full as in uartgetc() one interrupt may signal multiple actions needed xv6 must ask both the device and the RISC-V for interrupts: uartinit() uart.c:71 WriteReg(IER, IER_TX_ENABLE | IER_RX_ENABLE); intr_on() / intr_off() riscv.h:289 w_sstatus(r_sstatus() | SSTATUS_SIE); for places where an interrupt would break kernel code Let's look at the shell reading input from the console/UART. Example of thread / interrupt cooperation. % make qemu-gdb % gdb (gdb) c (gdb) tbreak sys_read (gdb) c (gdb) tui enable (gdb) where sys_read() fileread() consoleread() (gdb) ptype cons "producer/consumer buffer" [diagram: buf, r, w] (gdb) print cons there's nothing to read yet... sleep() sh is now waiting for the uart to interrupt. now let's look at uart interrupt handling. I'm going to press return. Q: where should I tell gdb to put a breakpoint to see the interrupt? (gdb) print/x $stvec (gdb) print kernelvec (gdb) tb kernelvec (gdb) c what happened? UART -> PLIC -> stvec -> kernelvec (gdb) where in kernel; no process was running; scheduler() kernelvec.S: if a process had been executing in user space, trap would have gone to trampoline and usertrap(), which we've seen kernelvec like trampoline, but for traps while kernel is executing saves registers on current stack; which stack? in this case, special scheduler stack if executing system call in kernel, some proc's kernel stack if in kernel, and interrupts enabled, $sp and stack guaranteed valid kernelvec ends by jumping to kerneltrap() -- C code (gdb) tb kerneltrap (gdb) c all kinds of traps arrive at kerneltrap(), code needs to decide cause. (gdb) next ... into devintr() devintr() (gdb) p/x $scause scause high bit means it's an interrupt p. 96 / Table 22 in riscv privileged manual plic_claim() to find IRQ (which device) (gdb) p irq the PLIC generates IRQ 10 for the UART uartintr() uartgetc() what's in the LSR? (gdb) x/1bx 0x10000005 16550.pdf page 9 says low bit is Data Ready if LSR says data ready, fetch from RHR x/1bx 0x10000005 -- note low bit no longer set consoleintr() backspace/newline/&c processing print cons x/3b cons.buf wakeup() return through devintr, plic_complete(), kerneltrap scheduler will now resume sh's read() system call since woken up let's break in sh's consoleread() -- just after sleep() (gdb) tb console.c:105 (gdb) c (gdb) where consoleread() sees our character in cons.buf[cons.r] sh's read returns, with my typed newline character General device-driver pattern: bottom-half and top-half [diagram: system call calls bottom-half, interrupt is top-half] bottom half: called by a process's system call, e.g. write() or read() may tell the device to start output or input may wait for input to be ready, or output to complete shared information (buffer) top half: the interrupt handler reads input, or sends more output, from/to device hardware interacts with "bottom half" process put input where bottom half can find it tell bottom half that input has arrived or that more output can be sent does *not* run in context of bottom-half process maybe on different core maybe interrupting some other process so interactions must be arms-length -- buffers, sleep()/wakeup() What if multiple devices ask to interrupt at the same time? The PLIC distributes interrupts among cores Different interrupts can be handled in parallel on different cores Each interrupt is claimed by first core to call plic_claim() Each individual device has at most one interrupt in play PLIC knows done via plic_complete() If enabled, a device interrupt can occur between any two instructions Example: suppose the kernel is counting something in a global variable n bottom half: n = n + 1 interrupt top half: n = n + 1 the machine code for n=n+1 looks like this: lw a4, n add a4, a4, 1 sw a4, n what if an interrupt occurs between lw and add? and interrupt handler also says n = n + 1? One solution: briefly disable interrupts in bottom half intr_off() n = n + 1 intr_on() intr_off(): w_sstatus(r_sstatus() & ~SSTATUS_SIE); Good, but not enough: interrupt could arrive on a different CPU More on this when we look at locking What happens to interrupts while SSTATUS_SIE is zero? PLIC/CPU remember pending interrupts deliver when kernel re-enables interrupts Production and consumption are usually decoupled -- concurrent Input from device: Can arrive at time when reader not waiting Can arrive faster, or slower, than reader can read Buffering and batching help match speeds, increase efficiency Output to device: If device is slow, want to buffer output so process can continue If device is fast, want to send in batches for efficiency So producer/consumer buffers are common We've seen this at two levels: UART internal FIFOs, for device and driver -- plus interrupts cons.buf, for bottom-half and top-half -- plus sleep/wakeup We'll see this again: pipes net lab Interrupts incur overhead around a microsecond "overhead" == cost *excluding* useful device driver work the time required for CPU trap, save registers, maybe switch page table, decide which device, and later restore everything and return pipelines, large register sets, cache/TLB misses, slow RAM What if interrupt rate is high? Example: ethernet can deliver millions of packets / second At that rate, big fraction of CPU time in interrupt *overhead* Polling: an event notification strategy for high rates Tell device (or PLIC) not to generate interrupts for the device Check for input periodically, e.g. in scheduler() or timer interrupt Then process everything accumulated since last poll More efficient than interrupts at high rates Perhaps switch strategies based on measured rate DMA (direct memory access) can move data efficiently the xv6 uart driver reads bytes one at a time in software CPUs are not efficient for this: off chip, not cacheable, 8 bits at a time OK only for low-speed devices most fast devices automatically copy batches of input to RAM -- DMA then interrupt input is already in ordinary RAM CPUs read/write RAM efficiently Interrupt evolution over time Decades ago: Interrupt overhead was a few cycles so: simple h/w, smart s/w, lots of interrupts Now: Overhead is 1000s of cycles so: smart h/w, does lots of work for each interrupt Interrupts and device handling a continuing area of concern Special fast interrupt paths (also for page faults, sys calls) Spread device work over CPUs User-space device drivers -- avoid kernel altogether Next week: networking