6.828 2017 Lecture 8: System calls, Interrupts, and Exceptions Let's start with the homework alarmtest.c alarm(10, periodic) asks kernel to call periodic() every 10 "ticks" in this process that is, every 10 ticks of CPU time that this process consumes three pieces: add a new system call count ticks as the program runs (timer interrupt) kernel "upcall" to periodic() the call to periodic() is a simplified UNIX signal glue for a new system call syscall.h: #define SYS_alarm 22 usys.S: SYSCALL(alarm) alarmtest.asm -- mov $0x16,%eax -- 0x16 is SYS_alarm syscall.c syscalls[] table sysproc.c sys_alarm() why all this machinery? at a high level, alarmtest just wants to make a function call to sys_alarm it has to be indirect (via INT, SYS_alarm) to maintain isolation break sys_alarm where how did syscall know which system call? trapframe, on kernel stack, has saved user eax print myproc()->tf->eax where does sys_alarm find the arguments, ticks and handler? on the user stack x/4x myproc()->tf->esp does the handler value make sense? look in alarmtest.asm now we need to take some action whenever the timer h/w interrupts decrement ticksleft if expired upcall to handler (periodic()) reset ticksleft device interrupts arrive just like INT and pagefault h/w pushes esp and eip on kernel stack s/w saves other registers, into a trapframe vector, alltraps, trap() timer interrupts served by IRQ_TIMER case in trap() original IRQ_TIMER task is to keep track of wall-clock time, in ticks execute to trap without an implementation break vector32 where print/x tf->eip print/x tf->esp x/4x tf->esp what was the user program doing at this point? tf->eip in alarmtest.asm user code could have been interrupted anywhere so we can't rely on anything about the user stack and we need to restore registers exactly, since program didn't save anything Q: how to arrange for upcall to alarm handler? call myproc()->alarmhandler() ? tf->eip = myproc()->alarmhandler ? Q: how to ensure handler returns to interrupted user code? add our code... run alarmtest without gdb let's run with gdb list trap to find breakpoint print/x tf->eip before assignment print/x tf->eip after assignment break *0x74 c info reg will it return somewhere reasonable in alarmtest.asm? x/4x $esp Q: what's the security problem in my new trap() code? Q: what if trap() directly called alarmhandler()? it's a bad idea but what exactly would go wrong? let's try it it doesn't crash! but it doesn't print alarm! either. why not? fetchint... apparently it gets back to user space (to print .) -- how? program, timer trap, alarmhandler(), INT, sys_write("alarm!"), return... stack diagram it is disturbing how close this came to working! why can kernel code directly jump to user instructions? why can user instructions modify the kernel stack? why do system calls (INT) work from the kernel? none of these are intended properties of xv6! the x86 h/w does *not* directly provide isolation x86 has many separate features (page table, INT, &c) it's possible to configure these features to enforce isolation but isolation is not the default! Q: what happens if just tf->eip = alarmhandler, but don't push old eip? let's try it user stack diagram Q: what if trap() didn't check for CPL 3? let's try it -- seems to work! how could tf->cs&3 == 0 ever arise from alarmtest? let's force the situation with (tf->cs&3)==0 and making alarmtest run forever unexpected trap 14 from cpu 0 eip 801067cb (cr2=0x801050cf) what is eip 0x801067cb in kernel.asm? tf->esp = tf->eip in trap(). what happened? it was a CPL=0 to CPL=0 interrupt so the h/w didn't switch stacks so it didn't save %esp so tf->esp contains garbage (see comment at end of trapframe in x86.h) the larger point is that interrupts can occur while in the kernel (in xv6, not JOS) Q: what will happen if user-supplied alarm handler fn points into the kernel? (with the correct trap() code) Q: what if another timer interrupt goes off while in user handler? works, but confusing, and will eventually run out of user stack maybe kernel shouldn't re-start timer until handler function finishes Q: is it a problem if periodic() modifies registers? how could we arrange to restore registers before returning? let's step back and talk about interrupts a bit more generally the general topic: h/w wants attention now! s/w must set aside current work and respond where do traps come from? (I use "trap" as a general term) device -- data ready, or completed an action, ready for more exception/fault -- page fault, divide by zero, &c INT -- system call IPI -- kernel CPU-to-CPU communication, e.g. to flush TLB where do device interrupts come from? diagram: CPUs, LAPICs, IOAPIC, devices data bus interrupt bus the interrupt tells the kernel the device hardware wants attention the driver (in the kernel) knows how to tell the device to do things often the interrupt handler calls the relevant driver but other arrangements are possible (schedule a thread; poll) how does trap() know which device interrupted? i.e. where did tf->trapno == T_IRQ0 + IRQ_TIMER come from? kernel tells LAPIC/IOAPIC what vector number to use, e.g. timer is vector 32 page faults &c also have vectors LAPIC / IOAPIC are standard pieces of PC hardware one LAPIC per CPU IDT associates an instruction address with each vector number IDT format is defined by Intel, configured by kernel each vector jumps to alltraps CPU sends many kinds of traps through IDT low 32 IDT entries have special fixed meaning xv6 sets up system calls (IRQ) to use IDT entry 64 (0x40) the point: the vector number reveals the source of the interrupt diagram: IRQ or trap, IDT table, vectors, alltraps IDT: 0: divide by zero 13: general protection 14: page fault 32-255: device IRQs 32: timer 33: keyboard 46: IDE 64: INT let's look at how xv6 sets up the interrupt vector machinery lapic.c / lapicinit() -- tells LAPIC hardware to use vector 32 for timer trap.c / tvinit() -- initializes IDT, so entry i points to code at vector[i] this is mostly purely mechanical, IDT entries correspond blindly to vectors BUT T_SYSCALL's 1 (vs 0) tells CPU to leave interrupts enabled during system calls but not during device interrupts Q: why allow interrupts during system calls? Q: why disable interrupts during interrupt handling? vectors.S (generated by vectors.pl) first push fakes "error" slot in trapframe, since h/w doesn't push for some traps second push is just the vector number this shows up in trapframe as tf->trapno how does the hardware know what stack to use for an interrupt? when it switches from user space to the kernel hardware-defined TSS (task state segment) lets kernel configure CPU one per CPU so each CPU can run a different process, take traps on different stacks proc.c / scheduler() one per CPU vm.c / switchuvm() tells CPU what kernel stack to use tells kernel what page table to use Q: what eip should the CPU save when trapping to the kernel? eip of the instruction that was executing? eip of the next instruction? suppose the trap is a page fault? some design notes * interrupts used to be relatively fast; now they are slow old approach: every event causes an interrupt, simple h/w, smart s/w new approach: h/w completes lots of work before interrupting * an interrupt takes on the order of a microsecond save/restore state cache misses * some devices generate events faster than one per microsecond e.g. gigabit ethernet can deliver 1.5 million small packets / second * polling rather than interrupting, for high-rate devices if events are always waiting, no need to keep alerting the software * interrupt for low-rate devices, e.g. keyboard constant polling would waste CPU * switch between polling and interrupting automatically interrupt when rate is low (and polling would waste CPU cycles) poll when rate is high (and interrupting would waste CPU cycles) * faster forwarding of interrupts to user space for page faults and user-handled devices h/w delivers directly to user, w/o kernel intervention? faster forwarding path through kernel? we will be seeing many of these topics later in the course