Interrupts & Exceptions

Required reading: xv6 trapasm.S, trap.c, syscall.c, initcode.S, usys.S. Skim vectors.S, lapic.c, ioapic.c, picirq.c.
You will need to consult IA32 System Programming Guide chapter 5 (skip 5.7.1, 5.8.2, 5.12.2).

Unrelated to lecture: Lab2 due tomorrow at 11:59pm, Lab3 is out.

Introduction

  last week we transferred from kernel to user
  today: how to get from user to kernel
  three reasons for transitions:
    system calls
    program faults (div by zero, page fault)
    external device interrupts
  why do we need to take special care for user -> kernel?
    security/isolation
    only kernel can touch devices, MMU, FS, other process' state, &c
    think of user program as a potential malicious adversary
  what has to happen?
    save user state for future transparent resume
    set up for execution in kernel (stack, segments)
    choose a place to execute in kernel
    get at system call arguments
    do it all securely
  it's neat that interrupts, faults, system call use same mechanism!

Calling a System Call from User Space

Start with the open("console", O_RDWR) in init.c (load-symbols "kernel.sym", vb 0x1b:0x21)
It's an ordinary call to the user-library open() function, defined in usys.S.
The kernel system call code expects the system call number in eax, it's SYS_open or 10 in this case.
Now the stack has the return EIP, a pointer to "console", and the constant 2 (print-stack 5).
What does the int 0x30 instruction do?

Execute the int. Now where are we? How did we get here?

The INT instruction

The x86 CPU supports 256 interrupt vectors. Different hardware conditions produce interrupts through different vectors. The kernel can tell why the interrupt occured by noting the vector. The vector refers to an descriptor in the IDT. Each descriptor contains a segment selector, an offset in that segment, and a DPL.

The INT instruction takes the following steps (these will be similar to all interrupts and faults, though there are slight differences):

decide the vector number, in this case it's the 0x30 in int 0x30.
fetch the interrupt descriptor for vector 0x30 from the IDT. the CPU finds it by taking the 0x30'th 8-byte entry starting at the physical address that the IDTR CPU register points to.
check that CPL <= DPL in the descriptor (but only if INT instruction).
save ESP and SS in a CPU-internal register (but only if target segment selector's PL < CPL).
load SS and ESP from TSS ("")
push user SS ("")
push user ESP ("")
push user EFLAGS
push user CS
push user EIP
clear some EFLAGS bits
set CS and EIP from IDT descriptor's segment selector and offset

INT is a complex instruction. Does it really need to take all those steps? Why not let the kernel interrupt handler do some of them? For example, why does INT need to save the SS and ESP?

xv6 set up the IDT in tvinit(), set the IDTR in idtinit(), and set the SS and ESP in the TSS in setupsegs(). Run info idt 48 to see how the IDT is set up to handle vector 0x30.

Trap Handling

int 0x30 entered the kernel at vector48, generated by vectors.pl.

What is the current CPL? How was it set? Could the user abuse the INT instruction to exercise privilege or break the kernel?

print-stack 5 in order to see what int put on the stack. Compare to Figure 5-4. What stack is being used?

vector48 pushes a few items on the stack and then jumps to alltraps. Why not have vector 48 in the IDT point directly to alltraps?

Single-step until the call to trap. print-stack 18. Compare with struct trapframe.

At the start of trap(), what is tf->trapno? How was it set?

System call dispatch, arguments and return value

trap() calls syscall(), since trapno in this case is T_SYSCALL (0x30).

syscall() dispatches to a function it finds by indexing into the syscalls array. It uses the eax from the trap frame as the index. What is in that eax? Where was it set?

Now we're in sys_open(). Where are the arguments the user program originally passed to open()? How can the kernel get at them?

sys_open() calls argint() to get its 2nd argument. Argint calculates the value cp->tf_esp + 4 + 4*n. What is this? Why the first 4? Why the 4*n?

fetchint() checks that the address is not beyond the end of user memory. But addr was just calculated by kernel code (in argint()); since the kernel code is trustworthy, is this check really neccessary?

Why do we do seemingly redundant checks for addr and then addr+4? Can't we just check addr+4?

Why does fetchint() add p->mem to addr?

Back to sys_open(). It does its job (which we will talk about later) and finally returns a file descriptor using the ordinary C return statement. syscall() puts that return value in cp->tf->eax. Why?

Trap Return

syscall() returns to trap(), and trap() returns to alltraps. b "trap"+0x195, single-step until alltraps. print-stack 18 to see the trap frame again. What is different and why?

single-step until iret, print-stack 5, single-step into user space. Print the registers and stack. What will the return value to the original call to open() be?

What would happen if a user program divided by zero? What if kernel code divided by zero?

In Unix, traps often get translated into signals to the process. Some traps, though, are (partially) handled internally by the kernel -- which ones?

Some traps push an extra error code onto the stack (typically containing the segment descriptor that caused a fault). But this error code isn't pushed by the INT instruction. Can the user confuse the kernel by invoking INT 0xc (or any other vector that usually pushes an error code)? Why not?

Device Interrupts

Like system calls, except: devices generate them at any time, there are no arguments in CPU registers, nothing to return to, usually can't ignore them. There is hardware on the motherboard to signal the CPU when a device needs attention (e.g. the user has typed a character on the keyboard). There's usually a separate vector for each device. Let's look at the timer interrupt; the timer hardware generates an interrupt 100 times per second so that the kernel can track the passage of time and so the kernel can time-slice among multiple running processes. The timer interrupts through vector 32.

info idt 32, then set a breakpoint at vector32 (vb 0x8:...)

print-stack 5. What was the CPU doing at the time of the interrupt? What stack is being used?

The interrupt will have pushed different numbers of words on the stack depending on whether the CPU was in user-space or the kernel; how does iret know how many words to pop?

What prevents lots of interrupts from coming in all at once and overflowing the kernel stack? Print the registers; IF=0x200. info idt 32, info idt 48.

trap(), when it's called for a time interrupt, does just two things: increment the ticks variable, and call yield(). The latter, as we will see, may cause the interrupt to return in a different process!

Turns out our kernel had a subtle security bug in the way it handled traps... vb 0x1b:0x11, run movdsgs, step over breakpoints that aren't mov ax, ds, dump_cpu and single-step. dump_cpu after mov gs, then vb 0x1b:0x21 to break after sbrk returns, dump_cpu again.

JOS

JOS has a rather different structure from xv6.

Since JOS does not use segmentation, where do traps vector in JOS?

JOS also has a very different kernel architecture: only one kernel stack, as opposed to one per process in xv6. The kernel is not re-entrant (cannot be interrupted), so all IDT entries are interrupt gates in JOS.