6.1810 2023 Lecture 9: Device drivers

Topic: device drivers
  a CPU needs external devices: storage, communication, &c
    OS is in charge of programming the devices
  new issues/complications:
    devices often have rigid and complex interfaces
    devices and CPU run in parallel -- concurrency
    interrupts
      hardware wants attention now!
        e.g., pkt arrived
      software must set aside current work and respond
        on RISC-V use same trap mechanism as for syscalls and exceptions
      interrupts can arrive at awkward times
   most code in production OSes is device drivers
     you will write one for a network card
   
Where are devices?
  [CPU, bus, RAM, uart/disk/net/&c, PLIC, interrupts]

Programming devices: memory-mapped I/O
  device controller has address range
  ld/st to these addresses read/write control registers
  platform designer decides where devices live in physical memory space

example device: UART
  Universal Asynchronous/Synchronous Transmitter
  serial interface, input and output
  "RS232 port"
  qemu emulates the very common 16550 chip
    many vendors, many "data sheets" on the web
    16550.pdf
    data sheet details physical, electrical, and programming
  [rx wire, receive shift register, FIFO]
  [transmit FIFO, transmit shift register, tx wire]
  16-byte FIFOs
  memory-mapped 8-bit registers at physical address UART0=0x10000000:
    (page 9 of 16550.pdf)
    0: RHR / THR -- receive/transmit holding register
    1: IER -- interrupt enable register, RX_ENABLE=0x1 and TX_ENABLE=0x2
    ...
    5: LSR -- line status register, RX_READY=0x1 and TX_IDLE=0x20

how does a kernel device driver use these registers?
  simple example: uartgetc() in kernel/uart.c
  ReadReg(RHR) turns into *(char*)(0x10000000 + 0)

why does the UART have FIFOs?

device driver must cope with times when device is not ready
  read() but rx FIFO is empty
  write() but tx FIFO is full

how should device drivers wait?

perhaps a "busy loop":
  while((LSR & 1) == 0)
    ;
  return RHR
OK if waiting is unlikely -- if input nearly always available
thus not OK for console!
  often no input (keystrokes) are waiting in FIFO
  most devices are like this -- may need to wait a long time for I/O

the solution: interrupts
  when device needs driver attention, device raises an interrupt
  UART interrupts if:
    rx FIFO goes from empty to not-empty, or
    tx FIFO goes from full to not-full

how does kernel see interrupts?
  device -> PLIC -> trap -> usertrap()/kerneltrap() -> devintr()
  trap.c devintr()
  scause high bit indicates the trap is from a device interrupt
  a PLIC register indicates which device interrupted
    the "IRQ" -- UART's IRQ is 10
    IRQs are defined by the "board" -- qemu in this case

an interrupt is usually just a hint that device state might have changed
  the real truth is in the device's status registers
    device driver must check them to decide action, if any
  for UART, check LSR to see if rx FIFO non-empty, tx FIFO non-full
  uart.c uartintr()

Typical device driver structure
  [diagram: top-half/bottom-half]
  top half:
     executing a process's system call, e.g. write() or read()
     starts I/O, perhaps waits
  bottom half:
     the interrupt handler
     reads input, or sends more output, from/to device hardware
     needs to interact with "top half" process
       put input where top half can find it
       tell it more input has arrived
       or that more output can be sent
     does *not* run in context of top-half process
       maybe on different core
       maybe interrupting some other process
     so interactions must be arms-length -- buffers, sleep/wakeup

Let's look at how xv6 sets up the interrupt machinery

registers that control interrupts
  sie --- supervisor interrupt enabled register
    one bit per software interrupt, external interrupt, timer interrupt
  UART IER
  PLIC claim --- get next IRQ; acknowledge IRQ
  sstatus --- supervisor status register, SIE bit enables interrupts
  
xv6's interrupt setup code:
  start():
    w_sie(r_sie() | SIE_SEIE | SIE_STIE | SIE_SSIE);
  main():
    consoleinit();
      uartinit()
    plicinit();
    scheduler();
      intr_on();
         w_sstatus(r_sstatus() | SSTATUS_SIE);

Let's look at the shell reading input from the console/UART
% make qemu-gdb
% gdb
(gdb) c
(gdb) break sys_read
(gdb) c
(gdb) tui enable
sys_read()
  fileread()
    consoleread()
      this is the top half
      look at cons.buf, cons.r, cons.w -- "producer/consumer buffer"
      [diagram: buf, r, w]
      sleep()

now for the bottom half
(gdb) break kerneltrap
(gdb) c
<press return>

how did we get here?
  (gdb) where
  in kernel; no process wanted to run; scheduler()
  UART -> PLIC -> stvec -> kernelvec
  (gdb) p/x $stvec
  (gdb) p kernelvec

kernelvec.S:
  kernelvec like trampoline, but for traps from kernel
    simpler: stack is valid, page table already kernel
    so can just save registers on stack, jump to kerneltrap
  save registers on current stack;  which stack?
    in this case, special scheduler stack
    if executing system call in kernel, some proc's stack

  kerneltrap()
    devintr()
      (gdb) p/x $scause
      scause high bit means it's an interrupt
        p. 85 / Table 4.2 in riscv manual
      plic_claim() to find IRQ (which device)
      (gdb) p irq
      uartintr()
        uartgetc()
        x/1bx 0x10000005
        check LSR for rx, copy from RHR to buffer, wake up
        consoleintr()
          print cons
          print cons.buf[cons.r]
          wakeup()
        x/1bx 0x10000005 -- note low bit no longer set
      plic_complete(irq)
  return through devintr, kerneltrap, kernelvec
  (gdb) b *0x80005b92 -- the end of kernelvec
  ...
  scheduler will now run the top half -- sh's read()
    since woken up
    let's break in top half -- consoleread()
    (gdb) b console.c:99
    (gdb) c
    (gdb) where
    consoleread()'s sleep() returns
    consoleread() sees our character in cons.buf[cons.r]
    sh's read returns, with my typed character

What if multiple devices want to interrupt at the same time?
  The PLIC distributes interrupts among cores
    Interrupts can be handled in parallel on different cores
  If no CPU claims the interrupt, the interrupt stays pending
    Eventually each interrupt is delivered to some CPU

What if kernel has disabled interrupts when a device asks for one?
  by clearing SIE in sstatus, with intr_off()
  PLIC/CPU remember pending interrupts
  deliver when kernel re-enables interrupts
    
Interrupts involve several forms of concurrency
  1. Between device and CPU
     Producer/consumer parallelism
  2. If enabled, interrupts can occur between any two instructions!
     Disable interrupts when code must be atomic
  3. Interrupt may run on different CPU in parallel with top half
     Locks: next lecture

Producer/consumer parallelism
  For input:
    Can arrive at time when reader not waiting
    Can arrive faster, or slower, than reader can read
    Want to accumulate input, and read(), in batches for efficiency
  For output:
    If device is slow, want to buffer output so process can continue
    If device is fast, want to send in batches for efficiency
  A common solution pattern
    producer/consumer buffer
    notification
  We've seen this at two levels:
    UART internal FIFOs, for device and driver -- plus interrupts
    cons.buf, for top-half and bottom-half -- plus sleep/wakeup
  We'll see this again when we look at pipes
    
If enabled, an interrupt can occur between any two instructions
  Example:
    suppose the kernel is counting something in a global variable n
    top half: n = n + 1
    interrupt bottom half: n = n + 1
    the machine code for n=n+1 looks like this:
      a4 = &n
      lw   a5,0(a4)
      addw a5,a5,1
      sw   a5,0(a4)
    what if an interrupt occurs between lw and addw?
      and interrupt handler also says n = n + 1?
  One solution: briefly disable interrupts in top half
    intr_off()
    n = n + 1
    intr_on()
    intr_off(): w_sstatus(r_sstatus() & ~SSTATUS_SIE);
  RISC-V automatically turns off interrupts on a trap (interrupt/exception/syscall)
    trampoline cannot tolerate a second interrupt to trampoline!
    e.g. would overwrite registers saved in trapframe
  More on this when we look at locking

Interrupt evolution
  Interrupts used to be cheap in CPU cycles; now they take many cycles
    due to pipelines, large register sets, cache misses
    interrupt overhead is around a microsecond -- excluding actual device driver code!
  So:
    old approach: simple h/w, smart s/w, lots of interrupts
    new approach: smart h/w, does lots of work for each interrupt

What if interrupt rate is very high? 
  Example: modern ethernet can deliver millions of packets /second
  At that rate, big fraction of CPU time in interrupt *overhead*
  
Polling: an event notification strategy for high rates
  Top-half loops until device says it is ready
    e.g. uartputc_sync()
    Or perhaps check in some frequently executed kernel code e.g. scheduler()
  Then process everything accumulated since last poll
  More efficient than interrupts if device is usually ready quickly
  Perhaps switch strategies based on measured rate

DMA (direct memory access) can move data efficiently
  the xv6 uart driver read bytes one at a time in software
    CPUs are not very efficient for this: off chip, not cacheable, 8 bits at a time
  but OK for low-speed devices
  most devices automatically copy input to RAM -- DMA
    then interrupt
    input is already in ordinary memory
    CPU memory operations usually much more efficient

Interrupts and device handling a continuing area of concern
  Special fast paths
  Clever spreading of work over CPUs
  Forwarding of interrupts to user space
    for page faults and user-handled devices
    h/w delivers directly to user, w/o kernel intervention?
    faster forwarding path through kernel?
  We will be seeing these topics later in the course